The patch + the commit message: http://hg.openjdk.java.net/jdk7/hotspot-comp/hotspot/rev/e8443c7be117

Sun bug:

http://bugs.sun.com/view_bug.do?bug_id=6921969

Mailing list posts:http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-January/002639.html

http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2010-February/002660.html

It applies well to the BigInteger.mulAdd loop:

private final static long LONG_MASK = 0xffffffffL; static int mulAdd(int[] out, int[] in, int offset, int len, int k) { long kLong = k & LONG_MASK; long carry = 0; offset = out.length-offset - 1; for (int j=len-1; j >= 0; j--) { long product = (in[j] & LONG_MASK) * kLong + (out[offset] & LONG_MASK) + carry; out[offset--] = (int)product; carry = product >>> 32; } return (int)carry; }

It improved an internal microbenchmark that uses BigInteger.mulAdd by 12%, and crypto.rsa and crypto.signverify in SPECjvm2008 by 7% and 2.3%, respectively in my measurements.

Here's low level details. On x86 (32 bit), a 64 bit long multiplication is implemented as a sequence of multiple 32 bit multiplications as in:

MOV EBP,ECX.lo IMUL EBP,EDX MOV EDX,ECX.hi IMUL EDX,EAX ADD EBP,EDX MUL EDX:EAX,ECX.lo ADD EDX,EBP

This is based on the following idea:

Result = x * y

is implemented as

lo(result) = lo(x_lo * y_lo)

hi(result) = hi(x_lo * y_lo) + lo(x_hi * y_lo) + lo(x_lo * y_hi)

where lo(x) indicates the lower 32 bits of x and hi(x) the higher 32 bits of x.

In the above mulAdd code, the higher 32 bits of both of operands are known to be zero because of the '&' operations. In such cases, some 32 bit multiplications are redundant. In the above formula, if x_hi=0 and y_hi=0,

lo(result) = lo(x_lo * y_lo)

hi(result) = hi(x_lo * y_lo)

because lo(x_hi * y_lo) = 0 and lo(x_lo * y_hi) = 0.

In this case, we need to emit only

MUL EDX:EAX,ECX.lo

which is one instruction instead of 7 and is faster to execute. There are variations of this where only one of x_hi and y_hi is zero.

I thank Chuck Rasbold, Tom Rodriguez, Christian Thalinger, and Vladimir Kozlov who reviewed this change.

## No comments:

Post a Comment