There are two JVM flags/options that control how many GC threads are used in the JVM: ParallelGCThreads and ParallelCMSThreads.
ParallelGCThreads
This flag controls the number of threads used in the parallel garbage collector. This includes the young generation collector used by default. If the parallel GC is used (-XX:+UseParallelGC) or turned on by default on a 'server-class' machine, this is what you care with regard to the number of GC threads. Here's the formula that decides how many GC threads are used in the JVM on Linux/x86:
ParallelGCThreads = (ncpus <= 8) ? ncpus : 3 + ((ncpus * 5) / 8)
Some examples are:
When ncpus=4, ParallelGCThreads=4
When ncpus=8, ParallelGCThreads=8
When ncpus=16, ParallelGCThreads=13
A rationale for the number of GC threads lower than the core count in higher core count machines, that I can think of, is that parallel GC does not scale perfectly and the extra core count didn't help or even degraded the performance.
ParallelCMSThreads
This flag controls the number of threads used for the CMS (concurrent mark and sweep) garbage collector (-XX:+UseConcMarkSweepGC). CMS is often used to minimize the server latency by running the old generation GC with the application threads mostly concurrently. Even when CMS is used (for the old gen heap), a parallel GC is used for the young gen heap. So, the value of ParallelGCThreads still matters. Here's how the default value of ParallelCMSThreads is computed on Linux/x86:
ParallelCMSThreads = (ParallelGCThreads + 3) / 4
Some examples are:
When ncpus=4, ParallelCMSThreads =1
When ncpus=8, ParallelCMSThreads =2
When ncpus=16, ParallelCMSThreads =4
Typically, when the CMS GC is active, the CMS threads occupy the cores. The rest of the cores are available for application threads. For example, on a 8 core machine, since ParallelCMSThreads is 2, the remaining 6 cores are available for application threads. (As a side note, because all the threads have the same scheduling priority at the POSIX thread level in the JVM under Linux/x86, the CMS threads may not necessarily be on cores all of the time.)
Takeaways
Here are the takeaways for GC tuners out there:
- Since ParallelCMSThreads is computed based on the value of ParallelGCThreads, overriding ParallelGCThreads when using CMS affects ParallelCMSThreads and the CMS performance.
- Knowing how the default values of the flags helps better tune both the parallel GC and the CMS GC. Since the Sun JVM engineers probably empirically determined the default values in certain environment, it may not necessarily be the best for your environment.
- If you have worked around some multithreaded CMS crash bug in older Sun JDKs by running it single-threaded (for example this one), the workaround would have caused a tremendous performance degradation on many-core machines. So, if you run newer JDK and still uses the workaround, it's time to get rid of the workaround and allow CMS to take advantage of multicores.