Parchive
Filename extension |
.par, .par2, .par3, .pa3, .p?? |
---|---|
Type of format | Erasure code |
Parchive (a portmanteau of parity archive, and formally known as Parity Volume Set Specification[1]) is an erasure code system that produces par files for checksum verification of data integrity, with the capability to perform data recovery operations that can repair or regenerate corrupted or missing data. Parchive was originally written to solve the problem of reliable file sharing on Usenet,[2] but it is now commonly used for protecting any kind of data from data corruption, bit rot, and accidental or malicious damage. Despite the name, Parchive uses more advanced techniques that do not utilize simplistic parity methods of error detection and correction.
The original SourceForge Parchive project has been inactive since November 9, 2010.[3] As of 2014, Par1 is obsolete, Par2 is mature for widespread use, and Par3 is an experimental version being developed by MultiPar author Yutaka Sawada.[4][5][6][7]
History
Parchive was intended to increase the reliability of transferring files via Usenet newsgroups. Usenet was originally designed for informal conversations, and the underlying protocol, NNTP was not designed to transmit arbitrary binary data. Another limitation, which was acceptable for conversations but not for files, was that messages were normally fairly short in length and limited to 7-bit ASCII text.[8]
Various techniques were devised to send files over Usenet, such as uuencoding and Base64. Later Usenet software allowed 8 bit Extended ASCII, which permitted new techniques like yEnc. Large files were broken up to reduce the effect of a corrupted download, but the unreliable nature of Usenet remained.
With the introduction of Parchive, parity files could be created that were then uploaded along with the original data files. If any of the data files were damaged or lost while being propagated between Usenet servers, users could download parity files and use them to reconstruct the damaged or missing files. Parchive included the construction of small index files (*.par in version 1 and *.par2 in version 2) that do not contain any recovery data. These indexes contain file hashes that can be used to quickly identify the target files and verify their integrity.
Because the index files were so small, they minimized the amount of extra data that had to be downloaded from Usenet to verify that the data files were all present and undamaged, or to determine how many parity volumes were required to repair any damage or reconstruct any missing files. They were most useful in version 1 where the parity volumes were much larger than the short index files. These larger parity volumes contain the actual recovery data along with a duplicate copy of the information in the index files (which allows them to be used on their own to verify the integrity of the data files if there is no small index file available).
In July 2001, Tobias Rieper and Stefan Wehlus proposed the Parity Volume Set specification, and with the assistance of other project members, version 1.0 of the specification was published in October 2001.[9] Par1 used Reed-Solomon error correction to create new recovery files. Any of the recovery files can be used to rebuild a missing file from an incomplete download.
Version 1 became widely used on Usenet, but it did suffer some limitations:
- It was restricted to handle at most 255 files.
- The recovery files had to be the size of the largest input file, so it did not work well when the input files were of various sizes. (This limited its usefulness when not paired with the proprietary RAR compression tool.)
- The recovery algorithm had a bug, due to a flaw[10] in the academic paper[11] on which it was based.
- It was strongly tied to Usenet and it was felt that a more general tool might have a wider audience.
In January 2002, Howard Fukada proposed that a new Par2 specification should be devised with the significant changes that data verification and repair should work on blocks of data rather than whole files, and that the algorithm should switch to using 16 bit numbers rather than the 8 bit numbers that PAR 1 used. Michael Nahas and Peter Clements took up these ideas in July 2002, with additional input from Paul Nettle and Ryan Gallagher (who both wrote Par1 clients). Version 2.0 of the Parchive specification was published by Michael Nahas in September 2002.[12]
Peter Clements then went on to write the first two Par2 implementations, QuickPar and par2cmdline. Abandoned since 2004, Paul Houle created phpar2 to supersede par2cmdline, but phpar2 is now unmaintained as of 2014. Yutaka Sawada created MultiPar to supersede QuickPar. Sawada maintains par2cmdline to use as MultiPar's PAR engine backend.
On May 10, 2014, Sawada reported a hash collision security problem in par2cmdline (the backend for MultiPar):[13]
I'm not sure this problem can be used for DoS attack against automated Par2 usage. If someone has a skill to forge CRC-32, it is possible to make a set of source file and Par2 file, which freeze a Par2 client for several hours.
Versions
Versions 1 and 2 of the file format are incompatible. (However, many clients support both.)
Parity Volume Set Specification 1.0
For Par1, the files f1, f2, ..., fn, the Parchive consists of an index file (f.par), which is CRC type file with no recovery blocks, and a number of "parity volumes" (f.p01, f.p02, etc.). Given all of the original files except for one (for example, f2), it is possible to create the missing f2 given all of the other original files and any one of the parity volumes. Alternatively, it is possible to recreate two missing files from any two of the parity volumes and so forth.[14]
Par1 supports up to 256 recovery files. Each recovery file must be the size of the largest input file.
Parity Volume Set Specification 2.0
Par2 files generally use this naming/extension system: filename.vol000+01.PAR2, filename.vol001+02.PAR2, filename.vol003+04.PAR2, filename.vol007+06.PAR2, etc. The +01, +02, etc. in the filename indicates how many blocks it contains, and the vol000, vol001, vol003 etc. indicates the number of the first recovery block within the PAR2 file. If an index file of a download states that 4 blocks are missing, the easiest way to repair the files would be by downloading filename.vol003+04.PAR2. However, due to the redundancy, filename.vol007+06.PAR2 is also acceptable. There is also an index file filename.PAR2, it is identical in function to the small index file used in PAR1.
Par2 supports up to 65536 (216) recovery blocks (however, par2cmdline, the official PAR2 implementation, it limited to 32767 blocks at once). Input files are split into multiple equal-sized blocks so that recovery files do not need to be the size of the largest input file.
Although Unicode is mentioned in the PAR2 specification as an option, most PAR2 implementations do not support unicode.[15]
Directory support is included in the PAR2 specification.
Parity Volume Set Specification 3.0
Par3 is a planned improvement over Par2.[16][17][18][19] The authors intend to fix problems related to creating or repairing when the block count or block size is very high. Par3 also adds support for including directories (file folders) in a parchive and Unicode characters in file names. In addition, the authors plan to enable the Par3 algorithm to identify files that have been moved or renamed.[20]
Software
Windows
- MultiPar (freeware) — Builds upon QuickPar's features and GUI, and Yutaka Sawada's fork of par2cmdline as the PAR2 backend.[21] It has support for Par3, multithreading, multiple processors, and the ability to recurse subfolders. MultiPar is able to add recovery data to ZIP and 7-Zip[22] files, with a few minor caveats.[23] MultiPar is also verified to work with Wine under PCBSD, and may work with other operating systems too.[24] Although the Par2 and Par3 components are (or will be) open source, the MultiPar GUI on top of them is currently not open source.[25] See also: MultiPar forum and MultiPar Alternatives. Note: multipar.eu is a spam site that may be offering a malicious version of MultiPar.[26]
- QuickPar (freeware) — unmaintained since 2004, superseded by MultiPar.
- par2+tbb (GPLv2) — a concurrent (multithreaded) version of par2cmdline 0.4 using TBB.
- Par-N-Rar (GPL)
- phpar2 — advanced par2cmdline with multithreading and highly optimized assemblercode (about 66% faster than QuickPar 0.9.1)
- Rarslave (GPLv2)
- SmartPAR (freeware) — Unmaintained since 2002 and obsolete as this application written for Microsoft Windows only works with the original Par1 (PAR) Parchive format parity files. Superseded by QuickPar. It uses Reed-Solomon error correction to create new recovery files. SmartPAR is able to correct errors and recover missing parts of distributed files from PAR files.[27] Last stable release 0.13d1 dated January 22, 2002[28]
- Mirror — First PAR implementation, unmaintained since 2001.
- Original par2cmdline — (obsolete).
- par2cmdline by BlackIkeEagle.
Mac OS X
- MacPAR deLuxe 4.2
- UnRarX
- par2+tbb is a concurrent (multithreaded) version of par2cmdline 0.4 using TBB, GPLv2, or later.
Linux
- The par2 utility, which is a maintained fork of par2cmdline.
- PyPar2 1.4, a frontend for par2.
- GPar2 2.03
- par2+tbb is a concurrent (multithreaded) version of par2cmdline 0.4 using TBB, GPLv2, or later.
FreeBSD
- par2+tbb is a concurrent (multithreaded) version of par2cmdline 0.4 using TBB, GPLv2, or later. It is available in the FreeBSD Ports system as par2cmdline-tbb.
- par2cmdline is available in the FreeBSD Ports system as par2cmdline.
POSIX
Software for POSIX conforming operating systems:
See also
- Bit rot
- Disc rot
- Data corruption
- Checksum
- Comparison of file archivers – Some file archivers are capable of integrating parity data into their formats for error detection and correction:
- RAID – RAID levels at and above RAID 5 make use of parity data to detect and repair errors.
- SnapRAID uses drives filled with parity files to pool drives together with redundancy and recovery capabilities. It is similar to Parchive in that it can repair damaged files and restore deleted files, but it is much more automated. In some use cases, it can supplement or replace Parchive, and it compares favorably with other tools like FlexRAID, ZFS, Btrfs, etc.[29]
- ZFS
- ICE ECC[30] — a file verification and repair tool. It allows you to protect your important files and sensitive data against digital corruption using Reed-Solomon codes. It does not utilize Parchive, and it is incompatible with MultiPar.
References
- ↑ Re: Correction to Parchive on Wikipedia, reply #3, by Yutaka Sawada: "Their formal title are "Parity Volume Set Specification 1.0" and "Parity Volume Set Specification 2.0."
- ↑ "Parchive: Parity Archive Volume Set". Retrieved 2009-10-29.
The original idea behind this project was to provide a tool to apply the data-recovery capability concepts of RAID-like systems to the posting and recovery of multi-part archives on Usenet.
- ↑ "Parchive: Parity Archive Tool". Retrieved 2012-09-02.
- ↑ possibility of new PAR3 file
- ↑ Question about your usage of PAR3
- ↑ Risk of undetectable intended modification
- ↑ PAR3 specification proposal not finished as of April 2011
- ↑ Kantor, Brian; Lapsley, Phil (February 1986). "Character Codes". Network News Transfer Protocol. IETF. p. 5. sec. 2.2. RFC 977. https://tools.ietf.org/html/rfc977#section-2.2. Retrieved 2009-10-29.
- ↑ Nahas, Michael (2001-10-14). "Parchive: Parity Volume Set specification 1.0". Retrieved 2009-04-07.
- ↑ Plank, James S.; Ding, Ying (April 2003). "Note: Correction to the 1997 Tutorial on Reed-Solomon Coding". Retrieved 2009-10-29.
- ↑ Plank, James S. (September 1997). "A Tutorial on Reed-Solomon Coding for Fault-Tolerance in RAID-like Systems". Retrieved 2009-10-29.
- ↑ Nahas, Michael; Clements, Peter; Nettle, Paul; Gallagher, Ryan (2003-05-11). "Parity Volume Set Specification 2.0". Retrieved 2009-10-29.
- ↑ v1.2.5.3 is public
- ↑ Wang, Wallace (2004-10-25). "Finding movies (or TV shows): Recovering missing RAR files with PAR and PAR2 files". Steal this File Sharing Book (1st ed.). San Francisco, California: No Starch Press. pp. 164 – 167. ISBN 1-59327-050-X. Retrieved 2009-09-24.
- ↑ http://www.quickpar.co.uk/forum/viewtopic.php?id=1065 QuickPar forum posting
- ↑ http://hp.vector.co.jp/authors/VA021385/ Beta release from MultiPar with PAR3 beta functionality
- ↑ http://www.quickpar.org.uk/forum/viewtopic.php?id=1264 QuickPar forum posting – status PAR3
- ↑ http://www.quickpar.co.uk/forum/viewtopic.php?id=1047 QuickPar forum posting – PAR3 specifications
- ↑ http://hp.vector.co.jp/authors/VA021385/par3_spec_prop.htm PAR3 proposal
- ↑ http://www.livebusinesschat.com/smf/index.php?topic=4751.0 PAR3 move/rename brainstorming
- ↑ v1.2.5.3 is public
- ↑ http://sourceforge.net/tracker/?func=detail&aid=3141214&group_id=14481&atid=364481 7-Zip Parchive feature request
- ↑ How to add recovery record to ZIP or 7-Zip archive
- ↑ MultiPar works with PCBSD 9.0
- ↑ contacted you, asking about sourcecode
- ↑ Re: spamsite and spamsite links
- ↑ Wang, Wallace (2004-10-25). "Finding movies (or TV shows): Recovering missing RAR files with PAR and PAR2 files". Steal this File Sharing Book (1st ed.). San Francisco, California: No Starch Press. pp. 164 – 167. ISBN 1-59327-050-X. Retrieved 2009-09-24.
- ↑ "Parchive: Parity archive tool". Retrieved 2009-09-26.
- ↑ SnapRAID comparison
- ↑
External links
- Parchive project - full specifications and math behind it
- Introduction to PAR and PAR2
- Slyck's Guide To The Usenet Newsgroups: PAR & PAR2 Files
- Another introduction to PAR and PAR2 and more information from the same site
- Guide to repair files using PAR2
- par2+tbb
- Par-N-Rar
- Rarslave