First look at the simplest print
System.out.println(new Object());
The fully qualified class name and a string of strings of this class will be output:
java.lang.Object@6659c656
What's after the @ symbol? Is it hashcode or the memory address of the object? Or something else?
In fact, @ what follows is only the hashcode value of the object, the hashcode displayed in hexadecimal. Let's verify:
Object o = new Object(); int hashcode = o.hashCode(); // toString System.out.println(o); // hashcode hex System.out.println(Integer.toHexString(hashcode)); // hashcode System.out.println(hashcode); // This method is also to obtain the hashcode of the object; But with object Unlike hashcode, this method ignores the rewritten hashcode System.out.println(System.identityHashCode(o));
Output result:
java.lang.Object@6659c656 6659c656 1717159510 1717159510
How is the hashcode of the object generated? Is it really the memory address?
The content of this article is based on JAVA 8 HotSpot
Generation logic of hashCode
The logic of generating hashCode in the JVM is not so simple. It provides several strategies, and the generation results of each strategy are different.
Take a look at the hashCode generated in the openjdk source code Core method:
static inline intptr_t get_next_hash(Thread * Self, oop obj) { intptr_t value = 0 ; if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random() ; } else if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; } else if (hashCode == 2) { value = 1 ; // for sensitivity testing } else if (hashCode == 3) { value = ++GVars.hcSequence ; } else if (hashCode == 4) { value = intptr_t(obj) ; } else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; } value &= markOopDesc::hash_mask; if (value == 0) value = 0xBAD ; assert (value != markOopDesc::no_hash, "invariant") ; TEVENT (hashCode: GENERATE) ; return value; }
It can be found from the source code that the generation strategy is controlled by a hashCode global variable, which defaults to 5; This variable is defined in another header file:
product(intx, hashCode, 5, "(Unstable) select hashCode generation algorithm" )
It is clear in the source code that... (unstable) select the algorithm generated by hashCode, and the definition here can be controlled by the jvm startup parameters. First confirm the default value:
java -XX:+PrintFlagsFinal -version | grep hashCode intx hashCode = 5 {product} openjdk version "1.8.0_282" OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08) OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)
Therefore, we can configure different hashcode generation algorithms through the jvm startup parameters and test the generation results under different algorithms:
-XX:hashCode=N
Now let's look at the different performance of each hashcode generation algorithm.
Algorithm 0
if (hashCode == 0) { // This form uses an unguarded global Park-Miller RNG, // so it's possible for two threads to race and generate the same RNG. // On MP system we'll have lots of RW access to a global, so the // mechanism induces lots of coherency traffic. value = os::random(); }
This generation algorithm uses a random number generation strategy of Park Miller RNG. However, it should be noted that... This random algorithm will appear spin waiting when it is highly concurrent
The first algorithm
if (hashCode == 1) { // This variation has the property of being stable (idempotent) // between STW operations. This can be useful in some of the 1-0 // synchronization schemes. intptr_t addrBits = intptr_t(obj) >> 3 ; value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ; }
This algorithm is really the memory address of the object, and directly obtains the IntPtr of the object_ T type pointer
The second algorithm
if (hashCode == 2) { value = 1 ; // for sensitivity testing }
There is no need to explain this... Fixed return 1 should be used for internal test scenarios.
Interested students can try - XX:hashCode=2 to start this algorithm and see if the hashCode results have become 1.
The third algorithm
if (hashCode == 3) { value = ++GVars.hcSequence ; }
This algorithm is also very simple, self incrementing. This self incrementing variable is used for the hashCode of all objects. Let's try the effect:
System.out.println(new Object()); System.out.println(new Object()); System.out.println(new Object()); System.out.println(new Object()); System.out.println(new Object()); System.out.println(new Object()); //output java.lang.Object@144 java.lang.Object@145 java.lang.Object@146 java.lang.Object@147 java.lang.Object@148 java.lang.Object@149
Sure enough, it's self increasing... It's a little interesting
The fourth algorithm
if (hashCode == 4) { value = intptr_t(obj) ; }
In fact, there is little difference between this algorithm and the first algorithm. They all return the object address, but the first algorithm is a variant.
The fifth algorithm
The last one is also the default generation algorithm. This algorithm is used when the hashCode configuration is not equal to 0 / 1 / 2 / 3 / 4:
else { // Marsaglia's xor-shift scheme with thread-specific state // This is probably the best overall implementation -- we'll // likely make this the default in future releases. unsigned t = Self->_hashStateX ; t ^= (t << 11) ; Self->_hashStateX = Self->_hashStateY ; Self->_hashStateY = Self->_hashStateZ ; Self->_hashStateZ = Self->_hashStateW ; unsigned v = Self->_hashStateW ; v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ; Self->_hashStateW = v ; value = v ; }
Here is a hash value obtained by XOR operation through the current state value. It is more efficient than the previous self increasing algorithm and random algorithm, but the repetition rate should also be relatively higher. However, what does the repetition of hashCode matter
Originally, the jvm does not guarantee that this value will not be repeated. For example, the chain address method in HashMap is used to solve hash conflicts
summary
hashCode can be a memory address, or it can not be a memory address. It can even be a constant of 1 or a self increasing number! You can use any algorithm you want!