Friday, April 10, 2009

An event dispatch bug in JVMTI

I encountered an event dispatch bug in JVMTI. See here for the communication on the openjdk mailing list. Here's a summary of what happened.

According to the JVMTI spec, no JVMTI events should be sent during the JVMTI dead phase (after the VMDeath event was sent). However, I observed that the CompiledMethodLoad and CompiledMethodUnload events were sent during the dead phase after the Agent_OnUnload callback happened. These compile events were actually for the last Java method JIT-compiled. This can cause a nasty memory corruption bug because is Agent_OnUnload is usually where the data structures of a JVMTI agent are deallocated and the callback handlers for the above compile events touch the already-deallocated data structure.

After looking into the Hotspot code, I noticed that events dispatch and the JVMTI phase changes are not synchronized at all (i.e, race conditions). And in theory, this bug can happen not just for the two compile events, but for any events. In practice, this bug would probably happen for the compile events because those are triggered by the compile threads rather than application threads. I was able to suppress this bug in two ways.
  1. By not deallocating memory (perhaps, only the one related to the compile events) in Agent_OnUnload. That way, late event callbacks handlers only touch still-valid memory.
  2. By adding extra synchronization in the Hotspot JVMTI code (the details are in the mailing list log).
1 is a more practical approach where you cannot change the VM code, or you want to be portable. 2 is harder since it's not obvious what the performance implications would be and because once we start fixing the race conditions, we need to fix more race conditions.

As far as I can tell, the same race conditions exist in updating event callback handlers (SetEventCallbacks) and enabling/disabling individual event callback (SetEventNotificationMode). So, what does it mean? It means that, if you are a JVMTI agent writer, your request to change the event callback handlers in the middle of an application run or to disable an event dispatch temporarily and enable back later on may not be honored due to the race conditions. Scary? Yes, especially on modern multicore machines.