Bencode
From Wikipedia, the free encyclopedia
Bencode (pronounced "Bee-Encode") is the encoding used by the peer-to-peer file sharing system BitTorrent for storing and transmitting loosely structured data.
It supports four different types of values:
Bencoding is most commonly used in .torrent files. These metadata files are simply bencoded dictionaries.
While less efficient than a pure binary encoding, bencoding is simple and (because numbers are encoded in decimal notation) is unaffected by endianness, which is important for a cross platform application like BitTorrent. It is also fairly flexible, as long as applications ignore unexpected dictionary keys, so that new ones can be added without creating incompatibilities.
[edit] Encoding algorithm
Bencode uses ASCII characters as delimiters and digits.
- An integer is encoded as i<number in base 10 notation>e. Note that negative values are allowed by prefixing the number with a minus sign, but leading zeros are not allowed (although the number zero is still represented as "0"). The number 42 would thus be encoded as "i42e".
- A byte string (a sequence of bytes, not necessarily characters) is encoded as <length>:<contents>. (This is similar to netstrings, but without the final comma.) The length is encoded in base 10, like integers, but must be non-negative (zero is allowed); the contents are just the bytes that make up the string. The string "spam" would be encoded as "4:spam". The specification does not deal with encoding of characters outside the ASCII set; to mitigate this, some BitTorrent applications explicitly communicate the encoding (most commonly UTF-8) in various non-standard ways.
- A list of values is encoded as l<contents>e . The contents consist of the bencoded elements of the list, in order, concatenated. A list consisting of the string "spam" and the number 42 would be encoded as: "l4:spami42ee"; note the absence of separators between elements.
- A dictionary is encoded as d<contents>e. The elements of the dictionary are again encoded and concatenated, in such a way that each value immediately succeeds the key associated with it. All keys must be byte strings and must appear in lexicographical order. A dictionary that associates the values 42 and "spam" with the keys "foo" and "bar", respectively, would be encoded as follows: "d3:bar4:spam3:fooi42ee". (This might be easier to read by inserting some spaces: "d 3:bar 4:spam 3:foo i42e e".)
There are no restrictions on what kind of values may be stored in lists and dictionaries; they may (and usually do) contain other lists and dictionaries. This allows for arbitrarily complex data structures to be encoded; it's one of the advantages of using bencoding.
[edit] Trivia
- For each possible (complex) value, there is only a single valid bencoding; ie. there is a bijection between values and their encodings. This has the advantage that applications may compare bencoded values by comparing their encoded forms, eliminating the need to decode the values.
- Many encodings can be decoded manually, but since the bencoded values often contain binary data, and may become quite complex, it is generally not considered a human-readable encoding.
- Bencoding serves similar purposes as markup languages like XML and JSON, allowing complex yet loosely structured data to be stored in a platform independent way.
[edit] External links
- Official BitTorrent protocol specifications
- Another BitTorrent protocol specification
- A PHP Bencode / decode implementation
- File_Bittorrent2 - Another PHP Bencode/decode implementation
- The original BitTorrent implementation in Python as standalone package
- Torrent Loader - View the Bencoded data in a torrent file