Venti

This article is about the piece of software. For the coffee cup size, see Starbucks#Products. For the Roman gods of the winds, see Anemoi.

Venti is a network storage system that permanently stores data blocks. A 160-bit SHA-1 hash of the data (called score by Venti) acts as the address of the data. This enforces a write-once policy since no other data block can be found with the same address: the addresses of multiple writes of the same data are identical, so duplicate data is easily identified and the data block is stored only once. Data blocks cannot be removed, making it ideal for permanent or backup storage. Venti is typically used with Fossil to provide a file system with permanent snapshots.

History

Venti was designed and implemented by Sean Quinlan and Sean Dorward at Bell Labs. It appeared in the Plan 9 distribution in 2002. Development has been continued by Russ Cox who has reimplemented most of the server, written a library for creating datastructures (files, directories and meta-data) to store in Venti and implemented optimizations. Venti is available both in the Plan 9 distribution and for many UNIX-like operating systems[1] as part of Plan 9 from User Space. Venti is included as part of Inferno with accompanying modules for access. There is a Go set of programs to build your own Venti servers. Included are examples using different kinds of backend storage.

Details

Venti is a user space daemon.[2] Clients connect to Venti over TCP[2] and communicate using a simple RPC-protocol. The most important messages of the protocol are listed below. Note that there is no message to delete an address or modify data at a given address.

The data block stored by Venti must be greater than 512 bytes in length and smaller than 56 kilobytes. So, if a Venti user/client wants to store larger data blocks, it has to make a datastructure (which can be stored in Venti). For example, Fossil uses hash trees to store large files. Venti itself is not concerned with the contents of a data block; it does however store the type of a data block.

The design of Venti has the following consequences:

The data blocks are stored on hard drives. The disks making up the available storage, typically a RAID, is called the data log. This data log is split up in smaller pieces called arenas, which are sized so they can be written to other media such as CD/DVD or magnetic tape. Another set of hard drives is used for the index, which maps scores to addresses in the data log. The data structure used for the index is a hash table with fixed-sized buckets. Venti relies on the scores to be randomly distributed so buckets do not fill up. Since each lookup costs one disk seek time, an index usually consists of multiple hard drives with low access time.

Usage

The Venti server may be used by clients in several ways. The Plan 9 operating system makes use of Venti for daily archival snapshots of the file system. These copies of the main filesystem can be mounted as a filetree of full copies organized by date. The utility programs 'vac' and 'unvac' can be used to store and retrieve data from a Venti server in the form of individual files or as a directory and its contents. 'Vacfs' allows browsing of the data associated with a vac score without full retrieval of all remotely stored data. Data and index scores can be duplicated between Venti servers using 'rdarena' and 'wrarena'. Plan 9 from Bell Labs, Plan 9 from User Space, Inferno and any other clients that implement the Venti protocol can all be used interchangeably to store and retrieve data.[3]

Hash collisions

A basic principle of information theory is the pigeonhole principle, which states that if set A contains more values than set B, then for any function that maps A to B there will be members of B that are associated with more than one member of set A. In the case of Venti, the set of possible SHA-1 hashes is obviously smaller than the set of all possible blocks that could be stored in the filesystem, and thus a hash collision is possible.

The risk of accidental hash collision in a 160-bit hash is very small, even for exabytes of data. Historically, however, many hash functions become increasingly vulnerable to malicious hash collisions due to both cryptographic and computational advances.[4] Venti does not address the issue of hash collisions; as of this time, it is still computationally infeasible to find collisions in SHA-1, but it may become necessary for Venti to switch to a different hash function at some point in the future.

See also

References

  1. Such as Linux, FreeBSD, NetBSD, OpenBSD, SunOS or Mac OS X
  2. 2.0 2.1 2.2 2.3 Lukkien, Mechiel. Venti Analysis and Memventi Implementation. Thesis. University of Twente, 2007. N.p.: n.p., n.d. University of Twente Theses Repository. Web. 13 Oct. 2014. <http://essay.utwente.nl/694/1/scriptie_Lukkien.pdf>.
  3. "Venti (6) man page in the Plan 9 4th edition manual". Man.cat-v.org. Retrieved 2013-04-21.
  4. "HASH COLLISION Q&A." Cryptography Research. Rambus, n.d. Web. 12 Jan. 2010. <https://web.archive.org/web/20100306180648/http://www.cryptography.com/cnews/hash.html>.

External links