Scaling Enterprise Java on 64-bit Multi-Core X86-Based Serversby Michael Juntao Yuan, Dave Jaffe
Multi-core and 64-bit CPUs are the hottest commodities in the enterprise server market these days. In recent years, as the cost and power requirement of faster CPU clock speeds has increased, the growth in raw clock speed (usually measured in megahertz) of single CPUs has slowed down. Hardware manufacturers continue to improve X86-based server performance by increasing both the multitasking capability and internal data bandwidth. Both Intel and Advanced Micro Devices are shipping 64-bit processors with two internal CPU cores, and quad core processors are soon to follow. Ninth-generation servers from Dell exploit this new generation of chips. The PowerEdge 1955 blade server, for example, supports up to two 64-bit dual core processors in a blade configuration, with up to ten such blades in a 7-rack unit (12.25") chassis.
However, those new generations of servers also pose new challenges for the software. For instance, to take advantage of the multi-core CPUs, the software application must be able to execute tasks in parallel across the CPUs; to take advantage of the 64-bit memory bandwidth, the application must also be able to manage a large amount of memory efficiently. As a key software platform on enterprise servers, Java Enterprise Edition (Java EE) is on the forefront of this multi-core, 64-bit revolution. Java EE developers must adapt to those challenges to make the most out of hardware investment.
When Java first came out in 1997, the state-of-the-art PC had a single CPU with a less than 300MHz clock speed and less than 64MB of RAM. The first Java applications were mostly on the client side. High-performance multitasking and large memory handling were clearly not the priority for Java designers at that time. But as Java became widely adopted for server-side applications, things started to change. Web applications are inherently multithreaded since each web request can be handled in a separate thread, parallel to other requests. The latest Java platform has greatly improved performance on modern server hardware.
In this article, we look at the current state of enterprise Java and analyze the challenges it faces with the new generation of servers. Based on our experience working on Java EE applications running on the JBoss Application Server in the Dell Scalable Enterprise Technology Center, we provide solutions and tips to scale your Java EE applications to the latest server hardware.
Tune the JVM
The core of the Java platform is the Java Virtual Machine (JVM). The entire Java application server runs inside a JVM. The JVM takes many startup parameters as command line flags, and some of them have great implications on the application performance. So, let's examine some of the important JVM parameters for server applications.
First, you should allocate as much memory as possible to the JVM using the
-Xms<size> (minimum memory) and
-Xmx<size> (maximum memory) flags. For instance, the
-Xms1g -Xmx1g tag allocates 1GB of RAM to the JVM. If you don't specify a memory size in the JVM startup flags, the JVM would limit the heap memory to 64MB (512MB on Linux), no matter how much physical memory you have on the server! More memory allows the application to handle more concurrent web sessions, and to cache more data to improve the slow I/O and database operations. We typically specify the same amount of memory for both flags to force the server to use all the allocated memory from startup. This way, the JVM wouldn't need to dynamically change the heap size at runtime, which is a leading cause of JVM instability. For 64-bit servers, make sure that you run a 64-bit JVM on top of a 64-bit operating system to take advantage of all RAM on the server. Otherwise, the JVM would only be able to utilize 2GB or less of memory space. 64-bit JVMs are typically only available for JDK 5.0.
With a large heap memory, the garbage collection (GC) operation could become a major performance bottleneck. It could take more than ten seconds for the GC to sweep through a multiple gigabyte heap. In JDK 1.3 and earlier, GC is a single threaded operation, which stops all other tasks in the JVM. That not only causes long and unpredictable pauses in the application, but it also results in very poor performance on multi-CPU computers since all other CPUs must wait in idle while one CPU is running at 100% to free up the heap memory space. It is crucial that we select a JDK 1.4+ JVM that supports parallel and concurrent GC operations. Actually, the concurrent GC implementation in the JDK 1.4 series of JVMs is not very stable. So, we strongly recommend you upgrade to JDK 5.0. Using the command line flags, you can choose from the following two GC algorithms. Both of them are optimized for multi-CPU computers.
If your priority is to increase the total throughput of the application and you can tolerate occasional GC pauses, you should use the
-XX:UseParallelOldGC(the latter is only available in JDK 5.0) flags to turn on parallel GC. The parallel GC uses all available CPUs to perform the GC operation, and hence it is much faster than the default single thread GC. It still pauses all other activities in the JVM during GC, however.
If you need to minimize the GC pause, you can use the
-XX:+UseConcMarkSweepGCflag to turn on the concurrent GC. The concurrent GC still pauses the JVM and uses parallel GC to clean up short-lived objects. However, it cleans up long-lived objects from the heap using a background thread running in parallel with other JVM threads. The concurrent GC drastically reduces the GC pause, but managing the background thread does add to the overhead of the system and reduces the total throughput.
Furthermore, there are a few more JVM parameters you can tune to optimize the GC operations.
On 64-bit systems, the call stack for each thread is allocated 1MB of memory space. Most threads do not use that much space. Using the
-XX:ThreadStackSize=256kflag, you can decrease the stack size to 256k to allow more threads.
-XX:+DisableExplicitGCflag to ignore explicit application calls to
System.gc(). If the application calls this method frequently, then we could be doing a lot of unnecessary GCs.
-Xmn<size>flag lets you manually set the size of the "young generation" memory space for short-lived objects. If your application generates lots of new objects, you might improve GCs dramatically by increasing this value. The "young generation" size should almost never be more than 50% of heap.
Since the GC has a big impact on performance, the JVM provides several flags to help you fine-tune the GC algorithm for your specific server and application. It's beyond the scope of this article to discuss GC algorithms and tuning tips in detail, but we'd like to point out that the JDK 5.0 JVM comes with an adaptive GC-tuning feature called ergonomics. It can automatically optimize GC algorithm parameters based on the underlying hardware, the application itself, and desired goals specified by the user (e.g., the max pause time and desired throughput). That saves you time trying different GC parameter combinations yourself. Ergonomics is yet another compelling reason to upgrade to JDK 5.0. Interested readers can refer to Tuning Garbage Collection with the 5.0 Java Virtual Machine. If the GC algorithm is misconfigured, it is relatively easy to spot the problems during the testing phase of your application. In a later section, we will discuss several ways to diagnose GC problems in the JVM.
Finally, make sure that you start the JVM with the
-server flag. It optimizes the Just-In-Time (JIT) compiler to trade slower startup time for faster runtime performance. There are more JVM flags we have not discussed; for details on these, please check out the JVM options documentation page.
Use New Platform APIs
Besides the JVM, the Java platform libraries have also gone through extensive changes to accommodate the newer server hardware. We strongly recommend you upgrade your application to JDK 5.0+ in order to take advantage of all the performance enhancements built into the platform. Three new library APIs introduced in the last two major versions of the JDK are of particular importance for multi-CPU computers.
The concurrency utility library in JDK 5.0 is very important for multithread applications. It simplifies the Java thread API and provides a thread-safe set of
Collectionimplementations. For instance, the new
ConcurrentHashMapis a thread-safe
HashMapand you can read/write it without a
synchronizedblock. We will get to this in more detail later in this article.
The NIO (New I/O) library was introduced in JDK 1.4. It allows multiple threads to share one physical connection (e.g., a socket) to the hard disk or network. A thread no longer needs to block the I/O socket to read or write data. Using NIO, we can greatly reduce the thread waiting time caused by a limited number of blocked sockets. The NIO is especially useful in multi-CPU computers where CPUs often wait on the I/O, and where there are many threads.
The logging library introduced in JDK 1.4 provides a convenient API to log information from the application to the console, logfiles, or network destinations. The important performance feature of the logging library is that you can configure the logging output by changing the logging level at runtime via configuration files. This helps us to reduce logging--which involves slow I/O operation and is a major cause for CPU waiting--at runtime, without recompiling the application code.
You should write new applications and upgrade older applications to use the concurrency, NIO, and logging APIs whenever possible. If you cannot upgrade, you should use alternative open source libraries that provides similar features. For instance, Doug Lea's util.concurrent library has many of the same features as the JDK 5.0 concurrency API; furthermore, the Apache Log4j library is comparable to the JDK 1.4+ logging library.
Pages: 1, 2