Java Client VM Performance

When Java applications perform poorly on Windows, Java developers blame Windows and Windows developers blame Java. But most likely, the actual cause is using the wrong virtual machine. Oracle ships two VMs for 32-bit platforms, and Windows defaults to the slower one. On this page I compare the two VMs, and show how to use the faster one on 32-bit Windows.

Note: This article applies to Oracle Java SE 8u112 and earlier. Possibly Oracle will fix its 32-bit Java distribution in a future version, but I don’t have much hope for that.

Overview

Oracle traditionally provides two different Java virtual machines (VMs) on 32-bit platforms: Java HotSpot Client VM and Server VM. As the name implies, the Client VM was originally intended for client applications. It is “tuned for reducing start-up time and memory footprint” but lacks the Server VM’s high-quality optimizing compiler.

That sounds sensible but there are two problems with this distinction. First, increasingly powerful hardware and tiered compilation in the Server VM have largely erased the Client VM’s advantage in start-up time. Second, the Server VM is supposed to be available on all platforms, making the Client VM’s limitations irrelevant; but that’s not actually the case. This table is adapted from Oracle’s Server-Class Machine Detection:

SystemDefault VM
SPARC32 bitSolaris Server VM if server-class;
otherwise, Client VM
i58632 bitSolaris
i58632 bitLinux
i58632 bitWindows Client VM
SPARC64 bitSolaris Server VM (no Client VM)
AMD6464 bitSolaris
AMD6464 bitLinux
AMD6464 bitWindows

One of these rows does not belong. Most 32-bit JREs ship with the Server VM and use it automatically on “server-class” systems, defined as “at least 2 CPUs and at least 2 GB of physical memory.” Since all modern CPUs feature multiple cores, no current Linux or Solaris machine would ever default to the Client VM. However, the 32-bit JRE for Windows always defaults to the Client VM – and worse, it does not even include the Server VM! This is not apparent from the table but it’s explicitly stated in Oracle’s redistribution license, and of course you can easily inspect the JRE and see for yourself.

For this reason, the start-up flag -server that normally selects the Server VM on 32-bit systems has absolutely no effect on the 32-bit Windows JRE. (Tip: Use the -showversion flag to verify which VM is actually running.) Deploying to 64-bit Windows does not avoid this issue, either, because the 32-bit JRE supports 64-bit Windows and might well get installed there for better compatibility with other software. In that case, too, only the Client VM will be available.

Bizarrely, a 32-bit Server VM for Windows does exist and can be legally redistributed – but Oracle includes it only in the 32-bit JDK’s private JRE! I’m at a complete loss to come up with any good reason for this nonsense. In the following sections I’ll show how the Client VM can damage your Java application’s performance, and how you can work around Oracle’s wrong-headed JRE packaging.

Further Reading. Charlie Hunt & Binu John’s standard guide Java Performance (Addison-Wesley 2012) describes the two runtime options in “HotSpot VM JIT Compilers” (p.92ff). That book was written before Java 7, however. The tiered compilation that Hunt & John mention in “Future Enhancements” (p.100) is now the default in Java 7, and so their prediction has come true:

As tiered compilation improves in its optimization capabilities and matures, it is likely to be the recommended JIT compiler for both client and server families of Java applications.

But even on the old Java 6 platform, Hunt & John recommend using the slower Client VM only if absolutely necessary due to startup or memory footprint constraints (“Client or Server Runtime,” p.260).

VM Test Programs

Two small test programs demonstrate both the superiority of the Server VM optimizer and the deployment of a 32-bit Server VM on Microsoft Windows. Both programs are Java console executables. Our test runner first uses the -showversion flag to verify that we’ve selected the correct VM, and then performs the following tests on both VM versions:

  • The first program does nothing and returns immediately from the main method. This eliminates any possible benefit from code optimization and emphasizes start-up time, where the Client VM should excel. To make sure we capture the full elapsed time, we use external timing with echo %time% batch statements (10 ms resolution).
  • The same program performs 1 million calculations of a specified Fibonacci number, externally timed in the same fashion. We test three numbers (1, 100, 1000) covering a range from short tasks dominated by JVM start-up and compilation work, to long-running calculations dominated by payload code execution.
  • Finally, we run the popular Java benchmark SciMark 2.0a. We use standard parameters to obtain the Composite Score (CS) which is calculated internally by the benchmark itself. Higher numbers are better in this case.

The download package JavaTest.zip (31 MB, ZIP archive) comprises the precompiled executables, with source code for the Fibonacci test, plus a customized bundled Oracle JRE which takes up most of the space. Please refer to the enclosed ReadMe.txt file and the various batch files for the required development tools and expected file paths.

Sample Test Results

The following table shows sample test results on my system, comprising Windows 10 (64 bit) on an Alienware Andromeda X51 R3 with one Intel Core i7 6700K CPU (4 GHz) and 16 GB RAM (dual-channel, 2133 MHz). The tests were not conducted with any kind of scientific rigor; I simply ran the programs three times and averaged the results.

The first four tests show total execution times in milliseconds. SciMark always runs for about 30 seconds, so the last two rows show SciMark’s Composite Score (CS) and peak private memory consumption, determined using Windows Task Manager. For the record, the same test on an older system that was about half as fast showed identical SciMark ratios, although less pronounced Fibonacci differences.

Test Client Server Factor Server/64
Do Nothing67 ms80 ms(1.2)60 ms
Fibonacci 160 ms77 ms(1.3)57 ms
Fibonacci 100297 ms97 ms3.173 ms
Fibonacci 10002387 ms370 ms6.5340 ms
SciMark Score1563 CS2361 CS1.52561 CS
SciMark Memory6352 KB8720 KB(1.4)12264 KB

Column Client shows the results for the bundled 32-bit Client VM of the Oracle Java SE Runtime Environment 8u112. Column Server shows the results for the 32-bit Server VM of the Oracle Java SE Development Kit 8u112, copied to the bundled JRE as described below. Column Factor shows the improvement for Server VM over Client VM, or vice versa (in parentheses).

For reference, column Server/64 shows the results for the 64-bit Server VM on my system. These tests are run by a separate batch script in the download package. As you can see, the 64-bit version is close to the 32-bit Server VM overall, but notably faster once again.

Bonus Test: Star Chess

The following table shows additional test results obtained for the console runner of Star Chess 2.0.4. All test runs use an initial planet distribution of 39213, and four computer players with a prediction tree depth of two turns. The test environment was the same as above.

Test Client Server Factor Server/64
Star Chess Speed274,808444,1341.6451,074
Star Chess Memory13.6 MB65.4 MB(4.8)109 MB

Speed denotes the average number of game positions generated per second. Memory was measured as for SciMark. The speed differences are well in line with the SciMark test, while the differences in memory use are even more pronounced.

Bonus Test: Tektosyne

The following table shows additional test results obtained from the benchmark dialog of the Tektosyne 6.0.0 demo application. The test case was intersecting two planar subdivisions with 120 edges each (240 combined at 50–50 distribution). The test environment was the same as above.

Test Client Server Factor Server/64
Tektosyne Time1058.75675.831.6675.83
Tektosyne Memory50 MB300 MB(6.0)850 MB

Time denotes average microseconds. Memory was measured as for SciMark. The speed differences are again in line with SciMark and Star Chess, and memory use once again grows for the Server VMs. Unlike the other tests this one runs a JavaFX GUI application, not a console executable, so some of the increased memory consumption is likely attributable to JavaFX overhead.

Test Conclusions

Do Nothing and Fibonacci 1 show a somewhat greater start-up delay for the 32-bit Server VM. However, the 64-bit Server VM seems to erase this delay, and on all VM versions the total time from start to exit is so short that it is difficult to measure. So on modern hardware, the Client VM’s alleged advantage in start-up time ranges from irrelevant to downright nonexistent.

On the other hand, Fibonacci 100 is already much faster on the Server VMs, and the difference increases to two seconds for Fibonacci 1000. The other tests – SciMark, Star Chess, Tektosyne – likewise show substantially better performance on the Server VMs, but also a higher peak memory use.

You can find a few more sample test results on my Uncontested Lock Performance and .NET Struct Performance pages, covering flow control and floating-point arithmetic respectively. The Server VMs mostly outperform the Client VM in these cases as well.

Prefer the Server VM

The Client VM may still be useful in edge cases when you must conserve memory at all costs, but you should certainly prefer the Server VMs unless proven otherwise. The 64-bit Server VM does have a very substantial memory overhead but you can get nearly the same performance with less memory use from the 32-bit Server VM, so that should be your first choice if memory is low.

The advantage is obvious in fields such as numerical computing, but I would also recommend the Server VM for client applications which are supposedly the domain of the Client VM. Users should rarely notice the greater memory consumption – client applications typically have the whole machine to themselves, unlike server jobs. On the other hand, interactive users will get very cranky when they stare at an hourglass while unoptimized code works on some expensive task!

Finally, consider the fact that Oracle does not bother to provide a Client VM for 64-bit JREs, despite the ubiquity of 64-bit operating systems today, including “client” systems like Windows Vista and later. I believe the Client VM is best seen as a historical artifact that is largely obsolete, and should be pulled from the 32-bit distribution altogether. Oracle might retain it as a separate download (or JDK component) for severely memory-limited devices.

Bundling the Server VM

Having established that you typically want your Java program to run on the Server VM, we now get to the problem of how to accomplish that on Windows. The Server VM is selected automatically only if all of the following are true:

  • Your program will only ever deploy to 64-bit Windows.
  • Those systems will always have the 64-bit JRE installed.
  • Those systems will never have a 32-bit JRE installed that might accidentally get selected.

Most likely this is unrealistic when you don’t manage all deployment systems yourself. Using the -server switch to select the Server VM is not sufficient because the 32-bit JRE on Windows does not even include the Server VM. Only the 32-bit JDK includes the Server VM, and then only as part of the private JDK runtime rather than the public JRE.

What to do? You can refer fellow programmers to an explanation such as this one, but for the general public there’s only one realistic option: copy the Server VM from the 32-bit JDK’s private JRE to the public JRE, then bundle that augmented JRE with your application. This procedure is in fact allowed and recommended by Oracle:

On Microsoft Windows platforms, the JDK includes both the Java HotSpot Server VM and Java HotSpot Client VM. However, the JRE for Microsoft Windows platforms includes only the Java HotSpot Client VM. Those wishing to use the Java HotSpot Server VM with the JRE may copy the JDK’s jre\­bin\­server folder to a bin\­server directory in the JRE. Software vendors may redistribute the Java HotSpot Server VM with their redistributions of the JRE.

All told, you must perform the following steps to guarantee that a Java program redistributed to arbitrary Windows systems will use the Server VM:

  1. Copy the Server VM from the 32-bit JDK’s private JRE to the public 32-bit JRE, as described above.
  2. Copy the jre folder containing the augmented 32-bit JRE to your Java application’s folder. Ensure that your deployment package also includes that jre folder.
  3. The batch script or launcher utility that runs your Java application must call jre\­bin\­java[w].exe to use your bundled JRE, and specify the -server switch. Example:
    jre\bin\java.exe -server -jar main.jar

Annoyingly, bundling the augmented JRE 8u112 adds 59 MB to a ZIP-compressed distribution package. If you don’t need its full functionality you can save 28 MB by deleting various optional components listed in the JRE 8 Readme, but that process is rather laborious. Also, savings will be less if your application requires JavaFX which is one of these optional components.

Since the Server VM works fine, can be legally redistributed, and itself requires only 6 MB (uncompressed!), it’s really quite incomprehensible why Oracle doesn’t simply add it to the public JRE. Maybe an older version was considered too buggy to ship, and that decision was never revisited. Or maybe Scott McNealy and Larry Ellison just wanted to show how much they hate Microsoft…