Hiroshi Yamauchi: Making the JVM release memory

Friday, June 21, 2013

Making the JVM release memory

It's well known that a Java application (the JVM) won't typically release much memory once it's warmed up, even when the application is lightly loaded or even idle at a later time.

If you plot the memory usage (or the resident set size) of a Java application, it typically looks like a mostly flat line after an upslope at the beginning. At a low level, this corresponds to the memory pages of the Java heap gradually getting allocated, and once all the pages are allocated, the memory usages stays mostly flat even when a large portion of the heap is not used * **.

This can be a problem if a Java application is run on a non-dedicated system (a server or desktop) where it co-exists with other (non-Java) applications. In a non-dedicated system, one application that's not playing nice with others by dominating the memory can slow down the other applications, or prevent them from running.

This is where an experimental JVM feature, DeallocateHeapPages, that I worked on comes in. It causes the underlying memory pages that correspond to the unused (free) parts of the heap to be deallocated (released) and helps reduce the memory usage of a Java application. Internally, it calls the system call madvise(MADV_DONTNEED) for the bodies of free chunks in the old generation without unmapping the heap address space.

Another way to look at this is that this feature makes the memory usage of a Java application behave more like that of a C/C++ application where the process memory usage is more in line with the memory actually used by the application.

This has been very useful for servers and desktop tools that we have at Google and helped save a lot of memory (RAM) usage.

The implementation currently supports the concurrent mark sweep (CMS) collector and the Linux platform.

Here's the email thread on the OpenJDK mailing list and a link to the JVM patch:

http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2013-January/005664.html

http://cr.openjdk.java.net/~hiroshi/webrevs/dhp/webrev.00/

The patch hasn't been accepted (yet) as the support for all the other OS platforms is deemed necessary for that to happen, which it lacks. I might be able to address that at some point, if I have the time and resources to make it happen.

* For simplicity, I am ignoring the memory use other than the heap such as the native C heap and the thread stacks here as the heap uses usually by far the largest amount of memory.

** Though the serial garbage collector (-XX:+UseSerialGC) of the JVM can occasionally shrink the heap and return memory, it's almost never used in production for an obvious performance reason. The parallel collector and the concurrent mark sweep (CMS) collector, which are often used in production, almost never shrink the heap and return memory, in my experience.

11 comments:

Dave Minter said...: I'm curious - do you know of any particular reason why it's necessary to specify a maximum heap size up front when kicking off the JVM rather than just allowing it to consume resources up to the OS enforced limits? Or is it just a historical accident? It's often been a frustration to me when really I want to say "use as much memory as possible" rather than constraining it.

Anyway, nice work on the new feature, thanks!; August 14, 2013 at 5:39 AM
Hiroshi Yamauchi said...: Dave,

I don't know of a particular reason as a fact as I wasn't there when the JVM was originally designed.

But I think one major factor might be that the Java heap was implemented as a single contiguous memory region. The advantages of it being a single contiguous memory region are, as I understand, 1) to make the JVM simple and quick (e.g., it's easy to check if an address points to an object in the Java heap with a simple address range check, or it's easy to arrange for a data structure that has to be parallel to the heap space such as the card table), and 2) to reduce the address space fragmentation and leave enough space for other things like memory mapped files, thread stacks, the JIT code cache, etc. (especially in 32-bit systems.) If it has to be a single contiguous memory region, it has to be reserved upfront at startup as it may not be possible to expand it later (e.g., the subsequent address range is already taken by other things.) Hence, the maximum heap size option is there. But I don't think in theory that there's a reason the Java heap has to be a single contiguous memory region.

There might be other reasons.; August 15, 2013 at 7:15 PM
Dave Minter said...: That makes sense - thanks for the illuminating response.; August 16, 2013 at 5:04 AM
Anonymous said...: This is very interesting and from my point of view a feature that is painfully missing. It is great you took the time to do it. Is it possible to apply your patch to my default JVM? IT would be certainly useful for a current project.; December 12, 2013 at 11:43 AM
Anonymous said...: To complete a bit....Can anyone apply the patch, or one needs some serious expertise and tools? Thank you.; December 12, 2013 at 11:46 AM
Hiroshi Yamauchi said...: Sorry about the delayed response. I'd say that it might need some expertise when applying a patch to and building OpenJDK. But it shouldn't be too hard with the right skills and mindset.; March 22, 2014 at 7:12 PM
Anonymous said...: A FreeBSD implementation of this would be much appreciated.; May 9, 2014 at 8:07 AM
Anonymous said...: Hi, did you continue with the development of the patch or do you know of any other efforts to make the JVM actually free memory to the OS?; March 9, 2017 at 5:20 AM
GoldenKevin said...: If you need help with a Windows implementation, VirtualUnlock() might be worth a look. I know for a fact that it successfully decreases the process's "working set size" (equivalent to Linux's "resident set size") when applied to memory-mapped files, but I haven't tested it on HotSpot's heap.

Relevant documentation: "Calling VirtualUnlock on a range of memory that is not locked releases the pages from the process's working set. [...] If any of the pages in the specified range are not locked, VirtualUnlock removes such pages from the working set, sets last error to ERROR_NOT_LOCKED, and returns FALSE."; June 29, 2017 at 5:31 PM
Unknown said...: For the record, in the meantime the Golang runtime learned the same MADV_DONTNEED trick, and it has alternative impls for Windows and Mac OS:

https://github.com/golang/go/blob/193088b246f4bbe9a7d3a84ec7f4cc6786dac043/src/runtime/mem_darwin.go#L21-L24
https://github.com/golang/go/blob/193088b246f4bbe9a7d3a84ec7f4cc6786dac043/src/runtime/mem_linux.go#L77-L146
https://github.com/golang/go/blob/193088b246f4bbe9a7d3a84ec7f4cc6786dac043/src/runtime/mem_windows.go#L32-L60

Would be nice to have this patch resurrected...; February 3, 2018 at 3:25 PM
Unknown said...: It seems also node uses MADV_DONTNEED, together with MADV_FREE. So I opened https://github.com/golang/go/issues/23687 to have go also use MADV_FREE.; February 3, 2018 at 7:24 PM