understanding java native memory

Recently when working on a Java application that used a lot of JNI code, I found myself asking a lot of questions about how memory is managed in the JNI layer. When trying to find answers, I ran into a lot of terminology that can be tricky to sort through. Specifically:

native memory
heap memory
virtual memory
resident memory
reserved memory
committed memory

After many hours of digging I put this post together to document what I learned about all of these in terms to help others (and my future self) with sorting them out.

This post is focused on Java running on Linux, but the general concepts apply to any OS that is running Java.

native memory

I had never heard the term “native memory” before working with Java. I initially assumed it just described memory that is used in JNI native code. This is partly true, but there’s more to it than just this. It’s not an official term. I’ve seen it used in different contexts.

“native memory tracking”

Oracle’s Native Memory Tracking documentation hints to a definition of “native memory” without coming out and offering it directly. The NMT tool breaks down “native memory” into several categories:

Java Heap
Class
Thread
Code
GC
Compiler
Internal
Symbol
Memory Tracking
Pooled Free Chunks
Unknown

See NMT Memory Categories for explanations of what each of these mean.

This categorization gives us a high level idea of what “native memory” consists of, but this is not a concise definition. The best definition of “native memory” I could find is in an article from IBM: Thanks for the memory, Linux.

This is a lengthy article, and I highly recommend reading the entire thing, but let me offer the following TLDR summary:

what is native memory?

The short answer is that “native memory” basically refers to the memory used by the JVM process itself. This includes the heap, anything allocated from the JNI native code, the data structures used to manage the heap, the stack memory, class definitions, and anything else the JVM might need to function.

You could say that “native memory” also includes memory used by other processes and by the operating system itself.

At the end of the day, the JVM is just a regular process. It needs memory to function, and in the Java world this seems to be called “native memory.”

I feel like the rest of the world would just call this “memory.”

java heap memory

I’m going to assume most readers are familiar with the concept of heap memory.

The java heap is a singular heap space that used for storing objects created with new. Java offers a lot of interesting heap parameters and garbage collection options for automatic management of the heap.

I only bring it up in this article to describe how it relates to these other memory terms.

how does heap relate to native memory?

Native memory is the memory used by the JVM as it runs on the OS. This includes the memory it uses for the Java heap space, so Java heap memory is just a JVM-managed subset of the native memory used by the rest of the JVM and the code you’ve written to run in the JVM.

heap is limited by java, native memory is not

When the Java process starts, it has a memory footprint on the host operating system. Part of this is the heap space. The heap has a limited size, and as it starts to fill, Java will automatically collect garbage. If the user exhausts the heap, Java will die.

Native memory use is only limited by the operating system. If Java needs to allocate non-heap memory it will simply increase the size of its footprint on the operating system.

To better understand what I mean by “footprint” and how native memory is managed within the JVM process space, we first need to clear up some additional terms.

virtual and resident memory

Virtual memory is an operating system concept that is in no way unique to Java. Any process running on the OS will use virtual memory. The JVM is just another process on the OS, so it will have its own “virtual memory space.” I won’t get into details about virtual memory and paging in this post. I just want to focus on what the terminology means and how the virtual memory is limited by the physical machine running the JVM.

physical limits of memory

The amount of virtual memory that the Java process uses is only limited by the ability for memory addresses to be referenced by CPU instructions. The virtual space on 32-bit operating systems is generally limited between 2GB and 3GB (2^32 is 4GB on 32-bit systems, but not all of those 4GB are useable in the address space). Operating systems running on 64-bit addresses can work with much larger vitual address spaces.

The resident memory is the actual physical space taken by the mapping of virtual memory addresses to physical memory space. The operating system decides when addresses that are in the virtual space get mapped to real physical memory. If virtual memory is not allocated to physical memory, it may be stored on disk (if paging is enabled) or it may be reserved but not yet committed.

committed and reserved memory

Before a process actually uses memory, it might want to reserve some virtual memory addresses ahead of time. This is done mostly to keep memory contiguous where appropriate. Generally speaking, it’s a good idea to store memory that is related (like all of the memory used in a single class instance) in the same place. It helps a lot with keeping the system running efficiently (this is mostly related to low-level CPU caching strategies).

A good example of “reserved memory” is the Java heap. When we start Java, we pass in the -Xmx parameter to tell the process how much heap we want to maintain. At startup, this amount of memory is reserved, but not committed.

When we want to actually assign a value to a virtual memory address, we have to commit the address. This will cause a write back to our physical memory device, and the virtual address will no longer be available in the JVM until we free it (either directly or indirectly via garbage collection).

putting it all together

Let’s say I start up Java and give it an -Xmx param of 4GB. Let’s say that Java needs an additional 1GB of memory for all of the other Java things Java does (see the NMT stuff above). So on startup, Java will tell the OS that it wants 5GB of virtual space.

At this point, only the memory that is actively being used by Java will get a space on the the physical memory device managed by the operating system. The other addresses we have set up will stay out of physical memory, because they’re so far unused.

Let’s say that the virtual memory address “0” is reserved to be the start of the heap. Now we have reserved the address range from “0” to “4G-1” for heap space. This means that our other 1GB of stuff will end up at address “4G” to “5G-1”

As Java starts up, it will need to use some of these “4G” to “5G-1” addresses. As we write to this virtual address space, the virtual addresses get mapped by the OS to actual physical addresses. “4G+512” might map to the physical address “2G+1024.” We won’t know where our actual allocations end up on the real system, because this is hidden by the OS. We call the actual mapped memory our resident memory. Click the diagram below once to see this.

JVM

Virtual Memory

reserved heap

reserved other

Physical

Memory

other processes

committed java heap

committed non-heap native

Now let’s say I want some new Object(). Java now needs to commit some of its reserved heap space. As this happens we will see the physical resident memory footprint increase. Click the diagram above a second time to see this.

Let’s say we commit the addresses “0” to “512”. We need to map these virtual addresses to physical memory. We might get addresses “4G+512” to “4G+1024” on the physical drive, and we’ve increased our resident size by 512.

performance optimization

There’s one caveat to this: -Xms with -XX:+AlwaysPreTouch.

Maybe we don’t want to wait until we need our new Object() before we actually go and commit our heap space and get a physical allocation on the device in the resident memory.

That operation takes time, and if we’re latency critical we might want to get our memory sorted out on the OS before we need to start managing object creation.

If we know that upon startup of our system that we need 1Gb of space, we can set -Xms to 1Gb. If we use the -XX:+AlwaysPreTouch param, then when Java starts up, it will “touch” the virtual memory addresses in the heap from “0” to “1G” (I think by writing 0 to the address). This commitment of the reserved space will cause the OS to go ahead and assign the Gb onto the physical device. We can then skip that step later on when we want to actually use the space.

This can help us ensure that the allocation is contiguous, and can help us prevent any virtual memory juggling on the OS side when we’re ready to use our memory.

garbage collection and native memory

When we use the Java heap correctly we avoid memory issues by relying on the garbage collector to free up space as its needed.

However, when the garbage collector frees the memory from the physical device, it makes room for additional allocations in the heap reservation, but it might not reduce the resident size.

The resident size doesn’t get released for reasons I’ll get into later. For now, think of it like the JVM process knows it might need that memory in the future, so it won’t tell the OS it’s done with it yet. In this sense, the memory has been committed, and we don’t want to have to reclaim it from the OS if we know we might re-use that space for a different purpose later.

Click the diagram below for an overly simplified example of how garbage collection keeps our memory usage within our defined limits, but might not impact resident size.

JVM

Virtual Memory

reserved heap

reserved other

Physical

Memory

other processes

committed java heap

committed non-heap native

In a perfect world, we can rely on garbage collection to prevent over-commitment of memory onto the physical drive. As long as we keep our heap use within our -Xmx param, we can predict roughly how much memory the java process will consume.

If we’ve set the heap to a limit that is small enough to fit on our physical system, we shouldn’t have to worry about native memory exhaustion.

native memory exhaustion

In the diagram above we can watch how the heap memory gets magically cleaned up and recycled in the resident space as necessary. However, the native memory that was initially allocated (shown in pink) never goes away (neither do the blue committed spaces for heap, but they are re-used so let’s not get into that yet).

This is because any non-heap native memory that is used by Java is managed separately from the Java heap. Garbage collection does not clean up the memory allocated in this space.

This is generally OK if the overall memory use is low, but it can be problematic with memory-bound applications.

Let’s say we are using JNI code to allocate a large sum of memory. Because the virtual memory used by the java process is not limited, we can allocate as much native memory as we’d like (until we kill the process). Every time we write to this memory, we expand our footprint in the resident memory on the physical device.

JVM

Virtual Memory

reserved heap

reserved other

Physical

Memory

other processes

committed java heap

committed non-heap native

After we’ve committed so much native memory space, there’s a few things that could go wrong.

Failure case 1: We could try to allocate more native memory, and not have enough physical memory for the allocation.

It’s unlikely you’ll just over-allocate native memory at the start of your application. It’s more likely your application code will be allocating native memory and heap memory throughout the lifetime of the running process. This will make identifying this problem a bit more difficult. This also might increase fragmentation of your process memory.

This is demonstrated in the following diagram. Click it to start an animation.

JVM

Virtual Memory

reserved heap

reserved other

Physical

Memory

other processes

committed java heap

committed non-heap native

Failure case 2: Java might try to commit some of the reserved heap memory, and find that there’s no more native memory available on the phyiscal space.

In this scenario, we haven’t committed all of the heap yet, and we start to allocate too much native memory. When the JVM goes to allocate and commit physical memory for so-far-unused heap, the OS will not have enough physical memory to accommodate the JVM’s need.

This is demonstrated in the following animation. Click it to start.

JVM

Virtual Memory

reserved heap

reserved other

Physical

Memory

other processes

committed java heap

committed non-heap native

To prevent these failures, it’s important to understand not only how much heap your application will be using, but also how much native memory it will use. If you are using JNI code and allocating a lot of memory throughout the life of the allocation, it might make sense to dial back your heap size.

malloc and free (or new and delete in c++)

There’s one big thing I’m omitting from all of this discussion, and that’s what it means to “allocate native memory.” Also I previously mentioned that the heap memory that is first committed and then freed isn’t actually released back to the operating system.

The reason for this relates to malloc. The JVM you’re using is likely implemented in C++. That implementation will either use malloc directly when allocating native memory for java internals, and for the heap. If it’s not using malloc, it might be using new, which indirectly will also call malloc under the hood.

The retention of process memory for the heap, and for anything else you use in JNI code when using malloc or new along with free and delete comes from the internals of the malloc library call.

There’s a whole world of malloc implementations and this gets be be very complex indeed. In general, as a Java developer, you wouldn’t need to worry about this. However, if you’re using a lot of JNI code, you might quickly run into problems with the internals of the malloc library.

That’s all I’ll say about this for now, as this can easily be the topic of another post. For now, if you’re interested in learning more, see the glibc malloc internals wiki page or read about my personal favorite, jemalloc.

native memory

“native memory tracking”

what is native memory?

java heap memory

how does heap relate to native memory?

heap is limited by java, native memory is not

virtual and resident memory

physical limits of memory

committed and reserved memory

putting it all together

performance optimization

garbage collection and native memory

native memory exhaustion

malloc and free (or new and delete in c++)

comments