Java performance

From Wikipedia, the free encyclopedia

Java is often perceived as significantly slower and more memory-consuming than natively compiled languages such as C or C++. However, Java programs' execution speed have improved a lot, due to introduction of Just-In Time compilation[1] and mainly optimizations in the Java Virtual Machine itself introduced over time.

Contents

[edit] Virtual Machine optimization techniques

A lot of optimizations have improved the performance of the Java Virtual Machine over time. However, although Java was often the first Virtual machine to implement them successfully, they have often been used in other similar platforms as well.

[edit] Just-In-Time compilation

Further information: JIT compiler and HotSpot

Early Java Virtual Machine always interpreted bytecodes. This had a huge performance penalty (between a factor 10 and 20 for Java versus C in average applications). [3]

This is certainly a major reason why Java programs are still considered as slow, but Java 1.3 saw the introduction of a JIT compiler called HotSpot : The Virtual Machine continually analyzes the program's performance for "hot spots" which are frequently or repeatedly executed.[2] These are then targeted for optimization, leading to high performance execution with a minimum of overhead for less performance-critical code.

This speeds a lot Java programs, making Java performance closer to C or C++.[3]

[edit] Adaptive optimization

Further information: Adaptive optimization

Adaptive optimization is a technique in computer science that performs dynamic recompilation of portions of a program based on the current execution profile. With a simple implementation, an adaptive optimizer may simply make a trade-off between Just-in-time compilation and interpreting instructions. At another level, adaptive optimization may take advantage of local data conditions to optimize away branches and to use inline expansion to decrease context switching.

[edit] Garbage collection

Further information: Garbage collection (computer science)

The 1.0 and 1.1 Virtual Machines used a mark-sweep collector, which could fragment the heap after a garbage collection. This had huge performance penalty. Starting with Java 1.2, the Virtual Machines switched to a generational collector, which has a much better performance. [4] Modern Virtual Machines use a variety of techniques that have further improve the Garbage collection performance.[4]

[edit] Other optimization techniques

[edit] Split bytecode verification

Prior to execute a class, the JVM verifies its bytecodes (see Bytecode verifier). This verification is lazily performed : classes bytecodes are only loaded and verified when they are used, and not at the beginning of the program. However, as the Java Class libraries are also regular Java classes, they must also be loaded when they are used, which makes that the start-up time of a Java program is often longer than for C++ programs, for example.

A technique named Split-time verification, first introduced in the J2ME edition of the Java platform, is used in the Java Virtual Machine since the Java 6 version. It split the verification of bytecodes in two phases: [5]

  • Design-time : during the compilation of the class from source to bytecode
  • runtime : when loading the class.

[edit] Escape analysis and lock coarsening

Further information: Lock (computer science) and Escape analysis

Java is able to manage Multi threading at the language level. Multi threading is a technique that allows to :

  • improve user perceived impression about program speed, by allowing user actions while the program performs tasks.
  • take advantage of multi-core architectures, allowing to perform two unrelated tasks at the same time, by two different cores.

However, programs that use Multi threading need to take extra care of Objects shared between threads, locking access to shared methods or blocks of code when they are used by one of the threads. Locking a block or an Object is itself a time-consuming operation, due to the nature of the underlying Operating system-level operation involved (see Concurrency control and Lock Granularity).

As the Java library does not know which methods will be used by more than one thread, the standard library always locks blocks of code when necessary in a Multi threaded environment.

Prior from Java 6, the virtual machine always locked Objects and blocks when asked by the program (see Lock Implementation), even if there was no risk for an Object to be modified by two different threads at the same time. For example, in this case, the local Vector was locked before each of the add operations to ensure that it will not be modified by other threads (Vector is synchronized), but it can't be modified, because it is strictly local to the method :

 public String getNames() {
      Vector v = new Vector();
      v.add("Me");
      v.add("You");
      v.add("Her");
      return v.toString();
 }

Starting with Java 6, code blocks and Objects are locked only when necessary,[6][7] so in the above case, the virtual machine would not lock the Vector Object at all.

[edit] Register allocation Improvements

Prior to Java 6, allocation of registers was very primitive (they did not live across blocks), which was a problem in architectures which did not have a lot of registers available, as x86 for example. If there are no more registers available for an operation, the compiler must somewhere perform copy from register to memory (or memory to register), which takes time (registers are typically much faster to access).

An Optimization of register allocation was introduced in this version: [8] it was then possible to use the same registers across blocks (when applicable), reducing access to the memory. This allegedly leaded to approximately 60% performance gain in some benchmarks. [9]

In this example, one unique register could be used for result, and the doSomethingElse method.

 public int doSomething() {
   int result = 1;
   for (int i = 0; i < 10000; i++) { 
      result = result + doSomethingElse(i * 20);
   }
   return result;
 }
 private int doSomethingElse(int value) {
    return value * 10;     
 }

[edit] Class Data Sharing

Class data sharing (called CDS by Sun) is a mechanism which reduces the startup time for Java applications, and also reduces memory footprint. When the JRE is installed, the installer loads a set of classes from the system jar file (the jar file containing all the Java class library, called rt.jar) into a private internal representation, and dumps that representation to a file, called a "shared archive". During subsequent JVM invocations, this shared archive is memory-mapped in, saving the cost of loading those classes and allowing much of the JVM's Metadata for these classes to be shared among multiple JVM processes [10].

The corresponding improvement for start-up time is more noticeable for small programs [11].

[edit] Java Versions Performance Improvements

Further information: Java version history

Apart from the improvements listed here, each Sun's Java version introduced many performance improvements in the Java API.

[edit] Java 1.2

Introduced at the Virtual machine level:

[edit] Java 1.3

Introduced at the Virtual machine level:

[edit] Java 1.4

See here, for a Sun overview of performance improvements between 1.3 and 1.4 versions.

[edit] Java 5.0

Introduced at the Virtual machine level :

See here, for a Sun overview of performance improvements between 1.4 and 5.0 versions.

[edit] Java 6

Introduced at the Virtual machine level :

Other improvements :


[edit] Comparison to other languages

Java is often Just-in-time compiled at runtime by a Virtual machine. Hence, when Just-in-time compiled, its performance is: [13]

  • lower than the performance of compiled languages as C or C++, but not significantly,
  • close to other Just-in-time compiled languages as C# or OCaml,
  • much better than truly interpreted languages as Perl, Ruby, Python, or PHP.

[edit] Comparison to C and C++

Java programs take more time to start, are slower and use more memory than their C or C++ equivalents.[6]

[edit] Program Speed

Although Java is still slower than C or C++, the average performance of Java programs have increased a lot over time, mostly in the speed area. It must also be said that benchmarks often measure performance for small programs, but there is less difference between Java and C or C++ in many real life programs (and sometimes no performance difference at all).[7]

Also some optimizations that are possible in Java (or other JIT languages) are not possible in native languages: [14]

  • pointers makes optimization hard in native languages.
  • Adaptive optimization is not possible in these languages, as the code is compiled once before any program execution, and thus can not take advantage of the architecture and the code path.
  • Some Garbage collection mechanisms are often necessary in big native applications, in order to avoid memory leaks, and they often perform less than in virtual machines:
    • it is possible to have a much better performance at the virtual machine (lower) level than at the programmers level,
    • the structure of the Java memory model ensures that there is less memory fragmentation than in native languages, mainly because there are no pointers, so the compiler always knows were the variables are accessed.
  • Escape analysis techniques can not be used in C++ for example, because the compiler can not know where an Object will be used (also because of pointers).

[edit] Startup time

Java startup time is often much slower than for C or C++, because a lot of classes (and first of all classes from the platform Class libraries) must be loaded before being used,[8] and because any JIT system requires compilation of the Java program into the native architecture before it can be executed. This has been reduced starting in Java 6 with the use of the Split bytecode verification. It is also more visible in small programs that perform a simple operation and then exit, because the Java library initialization and JIT compilation can represent many times the load of the actual program's operation.

[edit] Memory usage

Java memory usage is heavier than for C or C++, because of the huge library that must be loaded prior to the program execution, [15] because both the Java binary and native recompilations will typically be in memory at once, and because the virtual machine itself consumes memory.

In C or C++ the use of statically-linked libraries can increase memory usage, reducing the disparity between C/C++ and Java.

[edit] Trigonometric functions

Trigonometric functions performance is bad compared to C for example. [16] It is partly due to the fact that by specification, the Java Virtual Machine must ensure the same results in mathematic results over different architectures, which is not natively the case in some architectures, such as x86 (some hardware trigonometric implementations routines are very poorly implemented). [17] The Virtual Machine must then execute additional code to ensure that the results are reproducible across architectures.

[edit] Notes

  1. ^ A similar JIT mechanism is also used by the .NET framework virtual machine.
  2. ^ With the JIT technique, code is first interpreted, then "hot spots" are compiled on the fly. This is the reason why it is necessary to execute the programs a few times before measuring performances in benchmarks.
  3. ^ This article shows that the performance gain between interpreted mode and Hotspot is of more than a factor 10.
  4. ^ For example, the duration of pauses is less noticeable now. See for example this clone of Quake 2 written in Java: Jake2.
  5. ^ See here for a benchmark showing a huge performance boost - about 60% - from Java 5.0 to 6 for the application JFreeChart
  6. ^ It is also the case for the .NET platform for example, as it uses the same types of techniques.
  7. ^ See for example the benchmark of Jake2, which is a clone of Quake 2 written in Java by translating the original GPL C code. The Java 5.0 version performs better in some hardware configurations that its C counterpart : 260/250 fps versus 245 Fps [1]
  8. ^ It seems that much of the startup time is due to IO-bound operations rather than JVM initialization or class loading [2]. Some tests showed that although the new Split verification technique improved class loading by roughly 40%, it only translated to about 5% startup improvement for large programs.

[edit] See also

[edit] External links

[edit] Benchmark results

[edit] Theory and Various Documents

[edit] Benchmark Definitions