Wednesday, December 31, 2008

A server compiler crash during loop optimization

I encountered a server compiler crash which happens in a Google server. Here's a link to more details:

It appears to happen in the split-if graph transformation during the loop optimization. It only happens in a big Java server and I couldn't create a small test for the crash. I was able to create a small patch as a workaround that detects the crash condition early and aborts the JIT compilation of the particular method.

It is, unfortunately, often the case of JVM crashes only happen in an entire application run (production code) and they are not reproducible in tests. One possible reason is that only some particular method inlining patterns trigger crash during JIT compilation.

JNI crashes in FontManager code

I hit a JVM crash happening in the Java 2D font manager code. Here's the stack trace at the crash point:

Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V []
C []
C [] FT_Stream_Close+0x19
C [] FT_Stream_Free+0x25
C []
C [] FT_Done_Face+0x78
C []

This appeared to be due to two JNI pitfalls:
  • The JNIEnv is unique to the thread. It cannot be saved by one thread and reused by another. Use GetEnv instead.
  • The font2D jobject in freetypeScaler.c needs to be converted into a global reference because its lifetime exceeds the lifetime of a native method call.
BTW, here's a link to one of the JNI references:

Martin Buchholz and I suggested a patch. But here's a fix that was actually submitted by Igor Nekrestyanov:

Thanks, guys.

Java 2D incompatibility between Sun JDK and OpenJDK

The Java 2D font renderer component was replaced when Sun JDK was open-sourced as OpenJDK with an open source version. But it appeared to have some bug. It's was about the vertical position gap in rendered fonts. I reported this bug on the OpenJDK mailing list and got a bug at:

The results from the test (included in the bug page):

Sun JDK: x=5.78125,y=-47.796875,w=633.71484,h=57.515625
OpenJDK: x=5.78125,y=0.0,w=637.21875,h=96.34375

As you can see, the vertical positions are significantly off.

A fix was provided in OpenJDK7 by Sun folks:

Thanks to Phil Race and Igor Nekrestyanov.

Stabilizing AsyncGetCallTrace

AsyncGetCallTrace() is the unofficial interface for low overhead CPU profiling in the JVM. It allows CPU profilers to use a signal (SIGPROF) to collect samples of stack traces. Unlike many Java profiling tools out there that instruments bytecode, it has much lower runtime overhead. The problem was that it's a bit unstable and sometimes caused JVM crashes in OpenJDK6. So, I backported the change

into OpenJDK6 b11 (the patch attached) to avoid JVM crashes in the
AsyncGetCallTrace(). The change appears to have been first introduced
in OpenJDK7 b27 (Hotspot v13-b01).

I sent the patch to the OpenJDK mailing list, but unfortunately it's not been accepted yet.

[Note: the patch is in Hotspot V14. So, upgrading to the JDK based on it will fix the crashes.]

Inaccurate permgen usage stats (jmap -permstat)

The jmap is a JVM memory inspection tool. "jmap -permstat pid" is the option to show the stats on the permgen (permanent generation) heap inside the JVM, where class metadata and interned strings are allocated.

I happened to find a bug in jmap. In a small test, the "jmap -permstat" command reports only about 50-60% of the permgen memory usage (compared to the actual usage of permgen based on what the "jmap" command reports). To fix it, I contributed the following patch to OpenJDK, which increases the number up to 80-90%:

Thanks to Martin Buchholz who submitted the patch on my behalf and Swamy Venkataramanappa and Daniel Daugherty for reviewing it.