Fat binary
A fat binary (or multiarchitecture binary) is a computer executable program which has been expanded (or "fattened") with code native to multiple instruction sets which can consequently be run on multiple processor types. The usual method of implementation is to include a version of the machine code for each instruction set, preceded by code compatible with all operating systems which executes a jump to the appropriate section. This results in a file larger than a normal one-architecture binary file, thus the name.
The use of fat binaries is not common in operating system software; there are several alternatives to solve the same problem, such as the use of an installer program to choose an architecture-specific binary at install time, distributing software in source code form and compiling it in-place, or the use of a virtual machine (such as with Java) and Just In Time compilation.
Apple
Apple's fat binary
A fat-binary scheme smoothed the Apple Macintosh's transition, beginning in 1994, from 68k microprocessors to PowerPC microprocessors. Many applications for the old platform ran transparently on the new platform under an evolving emulation scheme, but emulated code generally runs slower than native code. Applications released as "fat binaries" took up more storage space, but they ran at full speed on either platform. This was achieved by packaging both a 68000-compiled version and a PowerPC-compiled version of the same program into their executable files. The older 68K code (CFM-68K or classic 68K) continued to be stored in the resource fork, while the newer PowerPC code was contained in the data fork, in PEF format.[1])
Fat binaries were larger than programs supporting only the PowerPC or 68k, which led to the creation of a number of utilities that would strip out the unneeded version. In the era of small hard drives, when 80 MB hard drives were a common size, these utilities were sometimes useful as program code was generally a large percentage of overall drive usage.
NeXT's/Apple's multi-architecture binaries
NeXTSTEP Multi-Architecture Binaries
Fat binaries were a feature of NeXT's NeXTSTEP/OPENSTEP operating system, starting with NeXTSTEP 3.1. In NeXTSTEP, they were called "Multi-Architecture Binaries". Multi-Architecture Binaries were originally intended to allow software to be compiled to run both on NeXT's Motorola 68k-based hardware and on Intel IA-32-based PCs running NeXTSTEP, with a single binary file for both platforms. It was later used to allow OPENSTEP applications to run on PCs and the various RISC platforms OPENSTEP supported. Multi-Architecture Binary files are in a special archive format, in which a single file stores one or more Mach-O subfiles for each architecture supported by the Multi-Architecture Binary. Every Multi-Architecture Binary starts with a structure (struct fat_header) containing two unsigned integers. The first integer ("magic") is used as a magic number to identify this file as a Fat Binary. The second integer ("nfat_arch") defines how many Mach-O Files the archive contains (how many instances of the same program for different architectures). After this header, there are nfat_arch number of fat_arch structures (struct fat_arch). This structure defines the offset (from the start of the file) at which to find the file, the alignment, the size and the CPU type and subtype which the Mach-O binary (within the archive) is targeted at.
The version of the GNU Compiler Collection shipped with the Developer Tools was able to cross-compile source code for the different architectures on which NeXTStep was able to run. For example it was possible to choose the target architectures with multiple '-arch' options (with the architecture as argument). This was a convenient way to distribute a program for NeXTStep running on different architectures.
It was also possible to create libraries (e.g. using libtool) with different targeted object files.
Mach-O and Mac OS X
Apple Computer acquired NeXT in 1996 and continued to work with the OPENSTEP code. Mach-O became the native object file format in Apple's free Darwin operating system (2000) and Apple's Mac OS X (2001), and NeXT's Multi-Architecture Binaries continued to be supported by the operating system. Under Mac OS X, Multi-Architecture Binaries can be used to support multiple variants of an architecture, for instance to have different versions of 32-bit code optimized for the PowerPC G3, PowerPC G4, and PowerPC 970 generations of processors. It can also be used to support multiple architectures, such as 32-bit and 64-bit PowerPC or PowerPC and x86.[2]
Apple's Universal binary
In 2005, Apple announced another transition, from PowerPC processors to Intel x86 processors. Apple promotes the distribution of new applications that support both PowerPC and x86 natively by using executable files in Multi-Architecture Binary format. Apple calls such programs "Universal applications" and calls the file format "Universal binary" as perhaps a way to distinguish this new transition from the previous transition, or other uses of Multi-Architecture Binary format.
Universal binary format is not necessary for forward migration of pre-existing native PowerPC applications; for this role, Apple supplies Rosetta, a PowerPC (PPC) emulator. However, Rosetta has a fairly steep performance overhead, so developers are encouraged to offer both PPC and Intel binaries, using Universal binaries. The obvious cost of Universal binary is that every installed executable file is larger, but in the years since the release of the PPC, hard-drive space has greatly outstripped executable size; while a Universal binary might be double the size of a single-platform version of the same application, free-space resources generally dwarf the code size, which becomes a minor issue. In fact, often a Universal-binary application will be smaller than two single-architecture applications because program resources can be shared rather than duplicated. Nevertheless, Mac OS X does include the lipo and ditto command-line application to remove versions from the Multi-Architecture Binary image.
Apple includes utilities in the Xcode development environment which allow applications to be delivered in both 32-bit and 64-bit versions, targeted for the Intel and/or PowerPC architecture. Universal binaries created with this in mind can contain up to four versions of the executable code (32-bit PowerPC, 32-bit x86, 64-bit PowerPC, and 64-bit x86).
Linux
FatELF: Universal Binaries for Linux
FatELF is a Fat Binary implementation for Linux and other Unix-like operating systems. Technically, FatELF is an extension of the ELF binary format.[3] Additionally to the CPU architecture abstraction (byte order, word size, CPU instruction set, etc.), there is the advantage of binaries with support for multiple kernel ABIs and versions.
FatELF has several use-cases, according to developers:[4]
- Distributions no longer need to have separate downloads for various platforms.
- Separated /lib, /lib32 and /lib64 trees are not required anymore in OS directory structure.
- The correct binary and libraries are centrally chosen by the system instead of shell scripts.
- If the ELF ABI changes someday, legacy users can be still supported.
- Distribution of web browser plug ins that work out of the box with multiple platforms.
- Distribution of one application file that works across Linux and BSD OS variants, without a platform compatibility layer on them.
- One hard drive partition can be booted on different machines with different CPU architectures, for development and experimentation. Same root file system, different kernel and CPU architecture.
- Applications provided by network share or USB sticks, will work on multiple systems. This is also helpful for creating portable applications and also cloud computing images for heterogeneous systems.[5]
A proof-of-concept Ubuntu 9.04 image is available (VM image of Ubuntu 9.04 with Fat Binary support). Up to now the FatELF is not integrated in the kernel mainline.[6] Progress on FatELF has stopped, and the developer has declared FatELF to be dead. [7] Later Gordon declared he would take up the project again if a distribution shows interest.[8]
DOS
Combined COM-style binaries for CP/M-80 and DOS
CP/M-80 executables for the Intel 8080 processor use the same .COM file extension as DOS-compatible operating systems for Intel 8086 binaries. In both cases programs are loaded at offset +100h and executed by jumping to the first byte in the file. As the opcodes of the two processor families are not compatible, attempting to start a program under the wrong operating system leads to incorrect and unpredictable behaviour.
In order to avoid this, some methods have been devised to build fat binaries which contain both a CP/M-80 and a DOS program, preceded by initial code which is interpreted correctly by both operating systems. The methods either combine two fully functional programs each built for their corresponding environment, or add stubs which cause the program to exit gracefully if started on the wrong processor. For this to work, the first few instructions in the .COM file have to be valid code for both 8086 and 8080 processors, which would cause the processors to branch into different locations within the code. For example, the utilities in the MYZ80 emulator start with EBh, 52h, EBh. An 8086 sees this as a jump and reads its next instruction from offset +154h whereas an 8080 or compatible goes straight through and reads its next instruction from +103h. Another method to keep an MS-DOS-compatible operating system from erroneously executing .COM programs for CP/M-80 and MSX-DOS machines is to start the 8080 code with C3h, 03h, 01h, which is decoded as a "RET" instruction by x86 processors, thereby gracefully exiting the program, while it will be decoded as "JP 103h" instruction by 8080 processors and simply jump to the next instruction in the program.
Some CP/M-80 3.0 .COM files may have one or more RSX overlays attached to them by GENCOM.[9] If so, they start with an extra 256-byte header. In order to indicate this, the first byte in the header is set to C9h, which works both as a signature identifying this type of COM file to the CP/M 3.0 executable loader, as well as a RET instruction for 8080-compatible processors which leads to a graceful exit if the file is executed under older versions of CP/M-80.
C9h is never appropriate as the first byte of a program for any x86 processor (it has different meanings for different generations, but is never a meaningful first byte); the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding incorrect operation.
Combined COM and SYS files
DOS device drivers start with a file header whose first four bytes are FFFFFFFFh by convention, although this is not a requirement. This is fixed up dynamically by the operating system when the driver loads (typically in the DOS BIOS when it executes DEVICE statements in CONFIG.SYS). Since DOS does not reject files with a .COM extension to be loaded per DEVICE and does not test for FFFFFFFFh, it is possible to combine a COM program and a device driver into the same file by placing a jump instruction to the entry point of the embedded COM program within the first four bytes of the file (three bytes are usually sufficient). If the embedded program and the device-driver sections share a common portion of code, or data, it is necessary for the code to deal with being loaded at offset +100h as a .COM style program, and at 0h as a device driver.
Crash-protected system files
Under DOS, some files have file extensions which do not reflect their actual file type. For example, COUNTRY.SYS is not a DOS device driver, but a binary NLS database file for use with the CONFIG.SYS COUNTRY statement. The PC DOS and DR-DOS system files IBMBIO.COM and IBMDOS.COM are special binary images, not COM-style programs. Trying to load COUNTRY.SYS with a DEVICE statement or executing IBMBIO.COM at the command prompt will cause unpredictable results.
It is sometimes possible to avoid this by utilizing techniques similar to those described above. For example, under DR-DOS 7.02 or higher, if these files are called inappropriately, embedded stubs will just display some file version information and exit gracefully.
References
- ↑ Apple Computer (March 11, 1997). "Creating Fat Binary Programs". Archived from the original on March 7, 2004. Retrieved 2011-06-20, original not available anymore.
- ↑ Apple Computer (March 8, 2006). ""Universal Binaries and 32-bit/64-bit PowerPC Binaries" in the Mac OS X ABI Mach-O File Format Reference". Retrieved 2006-07-13.
- ↑ Gordon, Ryan. "fatelf-specification v1". icculus.org. Retrieved 2010-07-25.
- ↑ Gordon, Ryan. "FatELF: Universal Binaries for Linux.". icculus.org. Retrieved 2010-07-13.
- ↑ Windisch, Eric (2009-11-03). "Subject: Newsgroups: gmane.linux.kernel, Re: FatELF patches...". gmane.org. Retrieved 2010-07-08.
- ↑ Gordon, Ryan. "FatELF: Turns out I liked the uncertainty better.". icculus.org. Retrieved 2010-07-13.
- ↑ Holwerda, Thom (2009-11-03). "Ryan Gordon Halts FatELF Project". osnews.com. Retrieved 2010-07-05.
- ↑ Gordon, Ryan C. (2009-11-08). "No one will ever know it if I keep my mouth shut tight, tight, tight.". icculus.org. Retrieved 2013-07-17. "If a distro wants to take a shot at FatELF, I'm totally on board. I'll happily contribute work, improvements, whatever you need. Just let me know. I've gotten emails from several interested parties, so I'll probably patch up a few more things and see what happens. I would love to ship games as FatELF files eventually, so maybe there's something to be said for getting the public on-board and then saying "there." to the kernel maintainers."
- ↑ John Elliott: CP/M 3.0 COM file header. Article on extended CP/M-80 3.0 COM file header (John Elliott's article on the extended CP/M-80 3.0 COM file header).
External links
- Method and apparatus for architecture independent executable files (Google patents)
- FatELF: Universal Binaries for Linux