Saturday, June 26, 2010

Short/Character.reverseBytes compiler intrinsics

I worked on a small improvement on the Hotspot server compiler.

Here's the mailing list thread:

Here's the committed change:

The Short/Character.reverseBytes() methods swap the bytes within the 2 byte short or char values. This is commonly done when, for a certain reason, one needs to convert a value from a big endian to little endian, or vice versa. The Short.reverseBytes() method looks like:

public static short reverseBytes(short i) {
  return (short) (((i & 0xFF00) >> 8) | (i << 8));

Ordinarily, this method is most likely going to be compiled into machine instruction sequences including an logical AND and two shift instructions. Since there is a machine instruction on x86 that does the swapping (BSWAP), why not let the compiler generate such code as a special code generation rule (called compiler intrinsics). In fact, the Integer/Long.reverseBytes() methods already took advantage of the instruction. This improvement is similar. The intrinsified Short.reverseBytes() looks like on x86 (32 bit):

BSWAP  reg 
SAR    reg, 16

Suppose the register contained four bytes B1 B2 B3 B4 (from the highest to the lowest on 32 bit). Since it was a short value (signed 2-byte integer), it must be either 0x00 0x00 B3 B4 (if the highest bit of B3 is 0) or 0xFF 0xFF B3 B4 (if the highest bit of B3 is 1). What we want eventually is either 0x00 0x00 B4 B3 (if the highest bit of B4 is 0) or 0xFF 0xFF B4 B3 (if B4's highest bit is 1).

The BSWAP instruction reverses the byte order of the word (four bytes) in register reg and produces B4 B3 B2 B1 in the register. Then, the SAR instruction shifts the word to the right by 2 bytes so that the lower two bytes are B4 B3. Since SAR fills the higher bits with the same bit as B4's highest bit, we get what we want.

Character.reverseBytes() are likewise, except that it needs to be treated unsigned.

In a microbenchmark bundled with the JDK, jdk/test/java/nio/Buffer/SwapMicroBenchmark, the char and short byte swap methods did speed up by 32% and 42%, respectively, with this improvement.

Thanks to colleagues, Chuck Rasbold (putting up the webrev on my behalf), Martin Buchholz (who worked on the JDK change to make it effective), Tom Rodriguez (who reviewed the change and finished the SPARC implementation), Christian Thalinger (who also reviewed the change).

No comments: