Class (file format)

From Wikipedia, the free encyclopedia

In the Java programming language, source files (.java files) are compiled into class files which have a .class extension. Since Java is a platform-independent language, source code is compiled into bytecode, which it stores in a .class file. If a source file has more than one class, each class is compiled into a separate .class file. These .class files can be loaded by any Java Virtual Machine (JVM).

Since JVMs are available for many platforms, the .class file compiled in one platform will execute in a JVM of another platform. This makes Java platform-independent.

As of 2006, the modification of the class file format is being considered under Java Specification Request (JSR) 202.

Contents

[edit] Structure

[edit] Table representation of the class file format

The structure of the class file format can be visualized by a table as follows:

location in memory by byte value item size description
0x00000000 0xCA hexadecimal = 1100 1010 binary magic number 4 bytes magic number used to identify file as conforming to the class file format
0x00000001 0xFE hexadecimal = 1111 1110 binary
0x00000002 0xBA hexadecimal = 1011 1010 binary
0x00000003 0xBE hexadecimal = 1011 1110 binary
0x00000004 u2 minor_version minor version number 2 bytes minor version number of the class file format being used
0x00000005
0x00000006 u2 major_version major version number 2 bytes major version number of the class file format being used
0x00000007
0x00000008 u2 constant_pool_count = (one plus the number of entries in the constant pool table) constant pool count 2 bytes constant pool count
0x00000009
0x0000000A array of constant pool entries where constant_pool.length = constant_pool_count - 1 constant pool variable length (length is equal to sizeof(constant_pool_entry_size) * (constant_pool_count - 1)) constant pool
...
...
...
new_address1 = 0x0000000A + sizeof(constant_pool); the access flags for the class file access flags 2 bytes access flags
new_address1 + 1
new_address1 + 2 this class this class 2 bytes this class
new_address1 + 3
new_address1 + 4 super class super class 2 bytes super class
new_address1 + 5
new_address1 + 6 interface count interface count 2 bytes interface count
new_address1 + 7
new_address1 + 8 array of interfaces interface table variable length (interface count * 2) interface table
...
...
...
new_address2 = new_address1 + 8 + sizeof(interfaces); field count field count 2 bytes field count
new_address2 + 1
new_address2 + 2 array of fields field table variable length (field count * sizeof(field_info)) field table
...
...
...
new_address3 = new_address2 + 2 + sizeof(fields); method count method count 2 bytes method count
new_address3 + 1
new_address3 + 2 array of methods method table variable length (method count * sizeof(method_info)) method table
...
...
...
new_address4 = new_address3 + 2 + sizeof(attributes); attribute count attribute count 2 bytes attribute count
new_address4 + 1
new_address4 + 2 array of attributes attribute table variable length (attribute count * sizeof(attribute_info)) attribute table
...
...
...

[edit] C programming language representation of the class file format

The structure of the class file format can be fully described using the C programming language as follows:


struct Class_File_Format {
   u4 magic_number;   //unsigned, 4 byte (32 bit) number that
                       //indicates the start of a class file
                       //the actual value is defined in the Java
                       //Virtual Machine Specification as
                       //0xCAFEBABE in hexadecimal, which equals
                       //1100 1010 1111 1110 1011 1010 1011 1110
                       //in binary, and 3,405,691,582 in decimal

   u2 minor_version;   //unsigned, 2 byte (16 bit) minor version number
   u2 major_version;   //unsigned, 2 byte (16 bit) major version number

   u2 constant_pool_count   //unsigned, 2 byte (16 bit) number
                             //indicating the number of entries
                             //in the constant pool table, plus
                             //one

   //the constant pool table
   cp_info constant_pool[constant_pool_count - 1];


   u2 access_flags;

   u2 this_class;
   u2 super_class;


   u2 interfaces_count;   //unsigned, 2 byte (16 bit) number
                           //indicating the number of entries
                           //in the table of superinterfaces
                           //of this class

   //the table of superinterfaces of this class
   u2 interfaces[interfaces_count];


   u2 fields_count;   //unsigned, 2 byte (16 bit) number
                       //indicating the number of entries in
                       //the table of fields of this class

   //the table of fields of this class
   field_info fields[fields_count];


   u2 methods_count;   //unsigned, 2 byte (16 bit) number
                       //indicating the number of entries in
                       //the table of methods of this class

   //the table of methods of this class
   method_info methods[methods_count];


   u2 attributes_count;   //unsigned, 2 byte (16 bit) number
                           //indicating the number of
                           //attributes in the attributes
                           //table

   //the attributes table
   attribute_info attributes[attributes_count];
}

[edit] Trivia

Class files are identified by the following 4 byte header (in hexadecimal): CA FE BA BE.

The history of this magic number was explained by James Gosling:

"We used to go to lunch at a place called St Michael's Alley. According to local legend, in the deep dark past, the Grateful Dead used to perform there before they made it big. It was a pretty funky place that was definitely a Grateful Dead Kinda Place. When Jerry died, they even put up a little Buddhist-esque shrine. When we used to go there, we referred to the place as Cafe Dead. Somewhere along the line it was noticed that this was a HEX number. I was re-vamping some file format code and needed a couple of magic numbers: one for the persistent object file, and one for classes. I used CAFEDEAD for the object file format, and in grepping for 4 character hex words that fit after "CAFE" (it seemed to be a good theme) I hit on BABE and decided to use it. At that time, it didn't seem terribly important or destined to go anywhere but the trash-can of history. So CAFEBABE became the class file format, and CAFEDEAD was the persistent object format. But the persistent object facility went away, and along with it went the use of CAFEDEAD - it was eventually replaced by RMI."

[edit] References