Self-extracting archive
A self-extracting archive is a computer executable program which contains compressed data in an archive file combined with machine-executable program instructions to extract this information on a compatible operating system and without the necessity for a suitable extractor to be already installed on the target computer. The executable part of the file is known as the stub and the non-executable part the archive.
Overview
On executing a self-extracting archive under an operating system which supports it, the archive contents are extracted. Non-self-extracting archives contain the data files only and therefore need to be extracted with a compatible program. Self-extracting archives cannot self-extract under a different operating system but can still be opened with a suitable extractor as this tool will disregard the executable part of the file and instead extract only the archive resource.
For example, an archive may be called somefiles.zip - it can be opened under any operating system by a suitable archive manager which supports both the file format and compression algorithm used. It could alternatively be converted into somefiles.exe which will self-extract on a machine running Microsoft Windows without the need for that suitable archive manager. It will not self-extract under Linux, but can be opened with a suitable Linux archive manager.
There are several functionally equivalent but incompatible archive file formats, including ZIP, RAR, 7z and many others. Some programs can manage (create, extract, or modify) only one type of archive whilst many others can handle multiple formats. There is additionally a distinction between the file format and compression algorithm used. A single file format, such as 7z, can support multiple different compression algorithms including LZMA, LZMA2, PPMd and BZip2. For a decompression utility to correctly expand an archive of either the self-extracting or standard variety, it must be able to operate on both the file format and algorithm used. The exact executable code placed at the beginning of a self-extracting archive may therefore need to be varied depending on what options were used to create the archive. The decompression routines will be different for a LZMA 7z archive when compared with a LZMA2 7z archive, for example.
Several programs can create self-extracting archives. For Windows there are WinZip, WinRAR, 7-Zip, WinUHA, KGB Archiver, the built-in IExpress wizard and many others, some experimental. For Macintosh there are StuffIt, The Unarchiver, and 7zX. There are also programs that create self-extracting archives on Unix as shell scripts which utilizes programs like tar and gzip (which must be present in destination system). Others (like 7-Zip or RAR) can create self-extracting archives as regular executables in ELF format. An early example of a self-extracting archive was the Unix shar archive in which one or more text files were combined into a shell script that when executed recreated the original files.
Advantages
Archiving files rather than sending them separately allows several related files to be combined into a single resource. It also has the benefit of reducing the size of files not already efficiently compressed (many compression algorithms cannot make already compressed data any smaller. Compression will therefore usually reduce the size of a plain text document but hardly affect a JPEG picture or a word processor document. This is because most modern Word Processor file formats now involve a certain level of compression already). Self-extracting archives also extend the advantages of compressed archives to users who do not have the necessary programs installed on their computer to otherwise extract their contents, but are running a compatible operating system. However, for users who do have archive managing software, a self-extracting archive may still be slightly more convenient.
Self-extracting archives also allow for their contents to be encrypted for security, provided the chosen underlying compression algorithm and format allow for it. In many cases though the file and directory names are not part of the encryption and can be seen by anyone, even without the key or password. Additionally, some encryption algorithms rely on there being no known partial plaintexts available so if an attacker is able to guess part of the contents of the files from their names or context alone they may be able to break the encryption on the entire archive with only a reasonable amount of computing power and time. Care therefore needs to be taken or a more suitable encryption algorithm used.
Disadvantages
A disadvantage of self-extracting archives is that running executables of unverified reliability, for example when sent as an email attachment or downloaded from the Internet, may be a security risk. An executable file described as an self-extracting archive may actually be a malicious program. One protection against this is to open it with an archive manager instead of executing it (losing the minor advantage of self-extraction); the archive manager will either report the file as not an archive or will show the underlying metadata of the executable file - a strong indication that the file is not actually a self-extracting archive.
Additionally, some systems for distributing files do not accept executable files in order to prevent the transmission of malicious programs. These systems disallow self-extracting archive files unless they are cumbersomely renamed by the sender to, say, somefiles.exx, and later renamed back again by the recipient. This technique is gradually becoming less effective however as an increasing number of security suites and antivirus software packages instead scan file headers for the underlying format rather than relying on a correct file extension. These security systems will not be fooled by an incorrect file extension and are particularly prevalent in the analysis of email attachments.
Self-extracting archives will only run under the operating system with which they are compatible. Also, since they must include executable code to handle the extraction of the contained archive file, they are a little larger than the original archive. There is said to be a small associated overhead with the use of self-extracting archives over the more conventional type.
See also
- Installer
- Shar
- Kolmogorov complexity, a theoretical lower bound on the size of a self-extracting archive