Parsing and conversion of Torrent file

Introduction to Torrent

The seed file of BitTorrent protocol (English: Torrent file) can save the metadata of a group of files. Files in this format are defined by the BitTorrent protocol. The extension is usually ". Torrent".

The. torrent seed file is essentially a text file, including tracker information and file information. Tracker information mainly refers to the address of the tracker server and the settings for the tracker server used in BT download. The file information is generated according to the calculation of the target file, and the calculation results are encoded according to the Bencode rules in the BitTorrent protocol. Its main principle is to virtually divide the downloaded files into blocks of equal size. The block size must be an integer power of 2k (because it is a virtual block, each block file is not generated on the hard disk), and write the index information and Hash verification code of each block into the seed file; Therefore, the seed file is the "index" of the downloaded file.

Torrent structure

The contents of the Torrent file have been stored in the Bencoding encoding type. On the whole, it is a dictionary structure, as shown below:

Torrent overall structure

Key name

data type

Optional

Key value meaning

announce

string

required

Tracker's Url

info

dictionary

required

The entry is mapped to a dictionary whose key will depend on one or more shared files

announce-list

array[]

optional

The Url of the standby Tracker, which exists in the form of a list

comment

string

optional

remarks

created by

string

optional

Information about the creator or creator

Torrent single file Info structure

Key name

data type

Optional

Key value meaning

name

string

required

Suggested file name to save to

piceces

byte[]

required

SHA-1 integration Hash for each file block.

piece length

long

required

Bytes per file block

Torrent multi file Info structure

Key name

data type

Optional

Key value meaning

name

string

required

Recommended directory name to save to

piceces

byte[]

required

SHA-1 integration Hash for each file block.

piece length

long

required

Bytes per file block

files

array[]

required

File list. The contents stored in the list are dictionary structure

files dictionary structure:

Key name

data type

Optional

Key value meaning

path

array[]

required

A string list of corresponding subdirectory names, and the last item is the actual file name

length

long

required

The size of the file in bytes

Torrent actual structure Preview

After serializing the whole dictionary with JSON, the structure of single file and multi file is roughly as follows. Note: JSON omits most of the contents of the pieces summary and only shows the beginning part. In addition, due to the setting of my serialization tool, all integers will be serialized into string types.

{
    "creation date": "1604347014",
    "comment": "Torrent downloaded from https://YTS.MX",
    "announce-list": [
        [
            "udp://tracker.coppersurfer.tk:6969/announce"
        ],
        [
            "udp://9.rarbg.com:2710/announce"
        ],
        [
            "udp://p4p.arenabg.com:1337"
        ],
        [
            "udp://tracker.internetwarriors.net:1337"
        ],
        [
            "udp://tracker.opentrackr.org:1337/announce"
        ]
    ],
    "created by": "YTS.AG",
    "announce": "udp://tracker.coppersurfer.tk:6969/announce",
    "info": {
        "pieces": "ᆲimᅬヒ\u000b*゚ᆲト... ...",
        "name": "Love And Monsters (2020) [2160p] [4K] [WEB] [5.1] [YTS.MX]",
        "files": [
            {
                "path": [
                    "Love.And.Monsters.2020.2160p.4K.WEB.x265.10bit.mkv"
                ],
                "length": "5215702961"
            },
            {
                "path": [
                    "www.YTS.MX.jpg"
                ],
                "length": "53226"
            }
        ],
        "piece length": "524288"
    }
}

Torrent file code

According to the above, Torrent files are stored in Bencoding code, so we need to have a general understanding of Bencoding code.

Bencoding consists of four basic types of data:

  • string: string
  • intergers: integer type
  • lists: list type
  • Dictionary: dictionary type

String type

String type is represented by the following structure: string length: original text of string, for example: 42:udp://tracker.pirateparty.gr:6969/announce .

Shaping type

The integer type is represented by the following structure: I < shaping data > E. for example, i1234e indicates that the shaping data is 1234.

List type

The list type is represented by the following structure: l < list data > e, that is, the list starts with the letter L and ends with the letter e. the data in the middle is the data in the list, and the value in the middle can be one of any four types.

Dictionary type

The dictionary type is represented by the following structure: d < dictionary data > e, that is, the dictionary starts with the letter D and ends with the letter e. the data in the middle is the data in the dictionary, and the value in the middle can be one of any four types.

Actual combination analysis

Let's look at the actual content analysis according to the above description. Let's take the following data as an example:

d8:announce49:udp://tracker.leechers-paradise.org:6969/announce13:announce-listll49:udp://tracker.leechers-paradise.org:6969/announceel48:udp://tracker.internetwarriors.net:1337/announceeee

You can first try to analyze this string of contents according to the above contents. I split this string of data to facilitate you to understand and view. It can be clearly seen that it consists of a dictionary with two key values, one of which is announcement and the other is announcement list. The value of one of them is udp://tracker.leechers-paradise.org:6969/announce , one is a list, and a layer of list is nested in the list.

d
  8:announce  
  49:udp://tracker.leechers-paradise.org:6969/announce
  13:announce-list
      l
         l
           49:udp://tracker.leechers-paradise.org:6969/announce
         e
         l
           48:udp://tracker.internetwarriors.net:1337/announce
         e
      e
e

Torrent file parsing

According to the above understanding of Torrent file coding, it is very simple for us to use code for Torrent file. We only need to read the seed byte stream, judge which type it is and convert it accordingly.

That is, read the file bytes and judge which type the bytes belong to: 0-9: string type, i: shaping data, l: list data, d: dictionary data

Then obtain the content of the data according to the specific type of each data, and then read the next file byte to obtain the next data type. According to this analysis, the pseudo code is as follows:

Get string value

// Enter this method when the content corresponding to the read byte is 0-9
String readString(byte[] info,int offset) {
	// Read the data before ':', that is, the length of the string
	int length = readLength(info,offset);
	// Get the actual string content according to the string length
    string data = readData(info,length,offset);
    // Returns the read string content. The offset read in the whole reading process should be added to offset
    return data;
}

Get integer type

Here is a note. Considering the problem of data boundary, such as java and other languages, it is recommended to use Long type to prevent data from crossing the boundary.

// When the content corresponding to the read byte is i, enter the method
Long readInt(byte[] info,int offset) {
	// Read data before the first 'e', including 'e'
    string data = readInt(info,offset)
    return Long.valueOf(data);
}

Get list type

Because the list type can be mixed with any of the four types, the above two methods are needed.

// When the content corresponding to the read byte is l, enter the method
List readList(byte[] info,int offset){
    List list = new List();
    // Read until the first 'e'
    while(info[offset] != 'e'){
        swtich(info[offset]){
            // If it is a list, read the list and add to it
            case 'l':
              list.add(readList(info,offset));
              break;
            // If it is a dictionary, read the dictionary and add to the list
            case 'd':
              list.add(readDictionary(info,offset));
              break;
            // If it is shaping data, read the data and add it to the list
            case 'i':
              list.add(readInt(info,offset));
              break;
            // If it is a string, read the string data and add it to the list
            case '0-9':
              list.add(readString(info,offset));
        }
    }
    // offset moves forward one bit and moves the end character 'e' of the list to read
    offset++;
    return list;
}

Read dictionary type

The type of reading dictionary is very similar to that of the list. The only difference is that the key value needs to be distinguished. The key of the dictionary can only be a string, so judge it in turn.

// When the content corresponding to the read byte is d, enter the method
Dictionary readDictionary(byte[] info,int offset){
    Dictionary dic = new Dictionary();
    // When the key is null, the string is the key, otherwise it is the value 
    String key = null;
    // Read until the first 'e'
    while(info[offset] != 'e'){
        swtich(info[offset]){
            // If it is a list, read the list and add it to the dictionary. There must be a key when adding the list. Add it directly and leave the key blank
            case 'l':
              dic.put(key,readList(info,offset));
              key = null;
              break;
            // If it is a dictionary, read the dictionary and add it to the dictionary. There must be a key when adding the dictionary. Add it directly and leave the key blank
            case 'd':
              dic.put(key,readDictionary(info,offset));
              key = null;
              break;
            // If it is shaping data, read the data and add it to the dictionary. There must be a key when adding shaping data. Directly add and empty the key
            case 'i':
              dic.put(key,readInt(info,offset));
              key = null;
              break;
            // If it is a string
            case '0-9':
              string data = readString(info,offset);
              // When the key is null, the string is the key, otherwise it is the value 
              if(key == null){
                  key = data;
              }else{
                  dic.put(key,data);
                  key = null;
              }
        }
    }
    // offset moves forward one bit and moves the end character 'e' of the list to read
    offset++;
    return dic;
}

Torrent files and Magnet

Magnetic link and Torrent file can be converted to each other. This paper only discusses how to convert Torrent file into Magnet magnetic link.

Magnet overview

Magnetic link is composed of a set of parameters. The order of parameters is not particular, and its format is the same as the query string at the end of HTTP link. The most common parameter is "xt", which is the abbreviation of "exact topic". It is usually the URN formed by the content hash function value of a specific file, for example:

magnet:?xt=urn:bith:YNCKHTQCWBTRNJIV4WNAE52SJUQCZO5C

Note that although this link points to a specific file, the client application must still search to determine where, and if so, can obtain that file (that is, search through DHT, which realizes the conversion from Magnet to Torrent, which is not discussed in this article).

Some field names are shown in the table below:

Field name

meaning

magnet

Protocol name

xt

The abbreviation of exact topic, which contains the uniform resource name of the file hash value. BTIH (BitTorrent Info Hash) represents the hash method name. ED2K, AICH, SHA1 and MD5 can also be used here. This value is the identifier of the file and is indispensable.

dn

The abbreviation of display name, which indicates the file name displayed to the user. This item is optional.

tr

An abbreviation for tracker, which represents the address of the tracker server. This item is also optional.

bith

BitTorrent info hash, seed hash function

Convert Torrent to Magnet

  • dn: the file name displayed to the user

This is the value corresponding to the name key in the Info dictionary in the Torrent file

  • tr: tracker server address

That is, the values corresponding to the announce and announce list keys in the Torrent file

  • Pitch: seed hash value

That is, the SHA1 hash value (Hex) of the dictionary corresponding to info in the Torrent file According to the figure below, it is 4:infod. If the address of d is used as the starting index of the hash original text, it is Adress:00 01A3

At the end of the whole info, if the address of e is used as the ending index address of the hash original text, it will be Adress:03 0BE7

As can be seen from the above:

magnet = 'magnet:?xt=urn:btih:'+Hex(Sha1(info))+'&dn='+encode(name)+'&tr='+encode(announce)

Combined with the implementation of the previous part, we can record startindex and endindex when reading info, that is:

Dictionary readDictionary(byte[] info,int offset){
	//...
    case 'd':
      bool record = key == 'info';
      if(record){
          startindex = offset;
      }
      readDictoinary(info,offset);
      if(record){
          endindex = offset
      }
}

string getBith(byte[] info,int start,int end){
  // Get the byte array from start to end in info and summarize it
  byte[] infoByte = new byte[infoEnd - infoStart + 1];
  System.arraycopy(torrentBytes, infoStart, infoByte, 0, infoEnd - infoStart + 1);
  return Hex.toHex(Sha1.toSha1(infoByte));
}

Concrete implementation

I have realized some of the above logic through Java (Torrent file parsing and Magnet link generation). If you need reference, you can get relevant content at the following website:

Tool Catalog: https://github.com/Rekent/common-utils/tree/master/src/main/java/com/rekent/tools/utils/torrent

Source code of parsing class: https://github.com/Rekent/common-utils/blob/master/src/main/java/com/rekent/tools/utils/torrent/TorrentFileResovler.java

Dependent jar package: https://github.com/Rekent/common-utils/releases/tag/v0.0.3

Calling method:

public void testResolve() throws Exception {
		String path = "C:\\Users\\Refkent\\Downloads\\Test.torrent";
		TorrentFile torrentFile = TorrentFileUtils.resolve(path);
		System.out.println(torrentFile.print());
		System.out.println(torrentFile.getHash());
		System.out.println(torrentFile.getMagnetUri());
}

Reference

Added by jmac2501 on Mon, 28 Feb 2022 12:49:07 +0200