Fine Tuning Java Code - JVM An 'int' Oriented Architecture
Summary
In order to avoid integer type conversions within the JVM and therefore increasing speed, try to always use int primitive types instead of short, byte, char.
JAVA virtual machine : an 'int' oriented architecture
Tip 2 (int versus short/byte/char)
In order to avoid integer type conversions within the JVM and therefore increasing speed, try to always use int primitive types instead of short, byte, char.
Although the JAVA language defines several integer primitive data types (int, short, char, byte and boolean), internally the JVM only deals with int values. It uses specific instructions to convert from integer values to other integer primitive types - i2c (to turn an int into a char), i2b (to turn an int into a byte), i2s (to turn an int into a short). Therefore it follows that using byte short, char may slow down runtime execution.
The only area where byte, short, char values might exist in the JVM is when they are involved in arrays. A JVM may use compacted representations for arrays of such types. They are therefore instructions to access (read/write) such as: baload (to load a byte from a byte[]), bastore (to set a byte into a byte[]), saload and sastore (read/write an short[]), caload and castore (to read/write an char[]). Notice that there is NO instruction for boolean[] which are considered as byte[] (for an application that uses a huge number of boolean - like automation processes - it could be valuable to add two instructions zaload and zastore in order to maximise space taken by boolean[]).
Let's illustrate all this with some examples.
Let say that a designer knows that its int variable is bounded into [-128..+127], and therefore might decide to declare a local variable of type byte. This example illustrates the differences in the use of a local of type int and a local of type byte.
int i = 127;
byte b = (byte) i ;
i++;
b++;
The compiler generates following bytecode:
....
1B // 3 : iload_1
91 // 4 : i2b
3D // 5 : istore_2
84 01 01 // 6 : iinc local variable 1 by 1
1C // 9 : iload_2
04 // A : iconst_1
60 // B : iadd
91 // C : i2b
3D // D : istore_2
....
Line 4 shows the transformation of the i (local variable 1) into a byte using i2b bytecode.
Line 6 performs the ++, increment the value of i by one. This is done in ONE instruction (iinc has two arguments). The next 5 instructions, from steps 9 to D, perform the same ++ increment on local variable 2 (i.e. b). Lets consider why it costs so much to do an increment on a byte.
The local variable is "stored" within the operand JAVA stack - an int oriented stack. Because there is no support for byte addition, the iadd instruction is used in replacement. So the local variable 2 is pushed, 1 is pushed too, then the iadd between two int occurs. A transformation is here requested in order to remain in the [-128..+127] range of the byte type. This is done by the i2b of the line C, just before assigning back local variable 2 with its new value !!!
Note that the space required for a byte local is the same as the one required for a local variable of type int. So unless one really wants to enforce a local to be ranged in [-128..+127], there is little reason to use byte for local variable.
In the previous example, i is set to 127. So i++ will set i to 128, whereas b++ will set b to a value outside the byte type boundaries. Therefore once b++ has executed, b is -128. Indeed, i2b is a narrowing conversion. It discards all the bits but the 8 lowest ones of the 32 bits int, and then sign extends the resulting int so as to represent the byte value with the int.
In JAVA, i2b could be formally coded as
int b = theByteAsAnInt & 0xFF;
if (b>127) b |= 0xFFFFFF00;
b is the byte value coded as an int
(int) 0xFFFFFF80 --> 128
(byte)0xFFFFFF80 --> -128
In JAVA there is NO unsigned byte. So when considering a value within the range [0..255], one can use an array of bytes but needs to write extra work in order to undo the job done by the JVM when it represents a byte using int.
byte[] arrayOfBytes = .....
byte b = arrayOfBytes[n]; //b is ranged in [-128..+127]
int i = b&0xFF ; //i is ranged in [0..+255]
To put briefly, (i) at the source level, the designer is often obliged to add extra bit manipulation (such as &0xFF); (ii) the compiler will add lots of instructions under the hood in order to simulate byte semantic (note : all the previous notions apply to short and char too). So, unless byte is really required, it is better to use int, and array of int instead of byte[], because each time a byte gets in the way the JVM must do extra work at runtime.
