It's well known that a Java application (the JVM) won't typically release much memory once it's warmed up, even when the application is lightly loaded or even idle at a later time.
If you plot the memory usage (or the resident set size) of a Java application, it typically looks like a mostly flat line after an upslope at the beginning. At a low level, this corresponds to the memory pages of the Java heap gradually getting allocated, and once all the pages are allocated, the memory usages stays mostly flat even when a large portion of the heap is not used * **.
This can be a problem if a Java application is run on a non-dedicated system (a server or desktop) where it co-exists with other (non-Java) applications. In a non-dedicated system, one application that's not playing nice with others by dominating the memory can slow down the other applications, or prevent them from running.
This is where an experimental JVM feature, DeallocateHeapPages, that I worked on comes in. It causes the underlying memory pages that correspond to the unused (free) parts of the heap to be deallocated (released) and helps reduce the memory usage of a Java application. Internally, it calls the system call madvise(MADV_DONTNEED) for the bodies of free chunks in the old generation without unmapping the heap address space.
Another way to look at this is that this feature makes the memory usage of a Java application behave more like that of a C/C++ application where the process memory usage is more in line with the memory actually used by the application.
This has been very useful for servers and desktop tools that we have at Google and helped save a lot of memory (RAM) usage.
The implementation currently supports the concurrent mark sweep (CMS) collector and the Linux platform.
Here's the email thread on the OpenJDK mailing list and a link to the JVM patch:
The patch hasn't been accepted (yet) as the support for all the other OS platforms is deemed necessary for that to happen, which it lacks. I might be able to address that at some point, if I have the time and resources to make it happen.
* For simplicity, I am ignoring the memory use other than the heap such as the native C heap and the thread stacks here as the heap uses usually by far the largest amount of memory.
** Though the serial garbage collector (-XX:+UseSerialGC) of the JVM can occasionally shrink the heap and return memory, it's almost never used in production for an obvious performance reason. The parallel collector and the concurrent mark sweep (CMS) collector, which are often used in production, almost never shrink the heap and return memory, in my experience.