Fine Tuning Java Code - JVM An 'int' Oriented Architecture

Summary

In order to avoid integer type conversions within the JVM and therefore increasing speed, try to always use int primitive types instead of short, byte, char.

 

JAVA virtual machine : an 'int' oriented architecture

Tip 2 (int versus short/byte/char)

In order to avoid integer type conversions within the JVM and therefore increasing speed, try to always use int primitive types instead of short, byte, char.

Although the JAVA language defines several integer primitive data types (int, short, char, byte and boolean), internally the JVM only deals with int values. It uses specific instructions to convert from integer values to other integer primitive types - i2c (to turn an int into a char), i2b (to turn an int into a byte), i2s (to turn an int into a short). Therefore it follows that using byte short, char may slow down runtime execution.

The only area where byte, short, char values might exist in the JVM is when they are involved in arrays. A JVM may use compacted representations for arrays of such types. They are therefore instructions to access (read/write) such as: baload (to load a byte from a byte[]), bastore (to set a byte into a byte[]), saload and sastore (read/write an short[]), caload and castore (to read/write an char[]). Notice that there is NO instruction for boolean[] which are considered as byte[] (for an application that uses a huge number of boolean - like automation processes - it could be valuable to add two instructions zaload and zastore in order to maximise space taken by boolean[]).

Let's illustrate all this with some examples.

Let say that a designer knows that its int variable is bounded into [-128..+127], and therefore might decide to declare a local variable of type byte. This example illustrates the differences in the use of a local of type int and a local of type byte.

int i = 127;
byte b = (byte) i ;
i++;
b++;
          

The compiler generates following bytecode:

....
1B           // 3   : iload_1 
91           // 4   : i2b 
3D           // 5   : istore_2 
84 01 01     // 6   : iinc local variable 1 by 1
1C           // 9   : iload_2 
04           // A   : iconst_1 
60           // B   : iadd 
91           // C   : i2b 
3D           // D   : istore_2 
....
          

Line 4 shows the transformation of the i (local variable 1) into a byte using i2b bytecode.

Line 6 performs the ++, increment the value of i by one. This is done in ONE instruction (iinc has two arguments). The next 5 instructions, from steps 9 to D, perform the same ++ increment on local variable 2 (i.e. b). Lets consider why it costs so much to do an increment on a byte.

The local variable is "stored" within the operand JAVA stack - an int oriented stack. Because there is no support for byte addition, the iadd instruction is used in replacement. So the local variable 2 is pushed, 1 is pushed too, then the iadd between two int occurs. A transformation is here requested in order to remain in the [-128..+127] range of the byte type. This is done by the i2b of the line C, just before assigning back local variable 2 with its new value !!!

Note that the space required for a byte local is the same as the one required for a local variable of type int. So unless one really wants to enforce a local to be ranged in [-128..+127], there is little reason to use byte for local variable.

In the previous example, i is set to 127. So i++ will set i to 128, whereas b++ will set b to a value outside the byte type boundaries. Therefore once b++ has executed, b is -128. Indeed, i2b is a narrowing conversion. It discards all the bits but the 8 lowest ones of the 32 bits int, and then sign extends the resulting int so as to represent the byte value with the int.

In JAVA, i2b could be formally coded as

int b = theByteAsAnInt & 0xFF;
if (b>127) b |= 0xFFFFFF00;
b is the byte value coded as an int

(int) 0xFFFFFF80 -->  128 
(byte)0xFFFFFF80 --> -128
          

In JAVA there is NO unsigned byte. So when considering a value within the range [0..255], one can use an array of bytes but needs to write extra work in order to undo the job done by the JVM when it represents a byte using int.

byte[] arrayOfBytes = .....
byte b = arrayOfBytes[n]; //b is ranged in [-128..+127]
int i = b&0xFF ; //i is ranged in [0..+255]
          

To put briefly, (i) at the source level, the designer is often obliged to add extra bit manipulation (such as &0xFF); (ii) the compiler will add lots of instructions under the hood in order to simulate byte semantic (note : all the previous notions apply to short and char too). So, unless byte is really required, it is better to use int, and array of int instead of byte[], because each time a byte gets in the way the JVM must do extra work at runtime.

Join our mailing list to receive the latest white papers, book reviews and course schedules once a month.





I.T. Professionals at training

Click here to read this month's top 10 tips for improving your Production Chain


Are You Project Optimised?

Simply answer 20 multiple choice questions to find out. Your full report (with rating and advice) will be emailed to you immediately on completion.


TRY IT NOW >>

 

Company News