How is the hashCode value of Java generated? Is it related to the memory address of the object?

How is the hashCode value of Java generated? Is it related to the memory address of the object?

First look at the simplest print

System.out.println(new Object());

The fully qualified class name and a string of strings of this class will be output:

java.lang.Object@6659c656

@What is after the symbol? Is it hashcode or the memory address of the object? Or something else?

In fact, @ what follows is only the hashcode value of the object, the hashcode displayed in hexadecimal. Let's verify:

Object o = new Object();
int hashcode = o.hashCode();
// toString
System.out.println(o);
// hashcode hex
System.out.println(Integer.toHexString(hashcode));
// hashcode
System.out.println(hashcode);
// This method is also to obtain the hashcode of the object; But with object Unlike hashcode, this method ignores the rewritten hashcode
System.out.println(System.identityHashCode(o));

Output result:

java.lang.Object@6659c656
6659c656
1717159510
1717159510

How is the hashcode of the object generated? Is it really the memory address?

The content of this article is based on JAVA 8 HotSpot

Generation logic of hashCode

The logic of generating hashCode in the JVM is not so simple. It provides several strategies, and the generation results of each strategy are different.

Take a look at the core method of generating hashCode in openjdk source code:

static inline intptr_t get_next_hash(Thread * Self, oop obj) {
  intptr_t value = 0 ;
  if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random() ;
  } else
  if (hashCode == 1) {
     // This variation has the property of being stable (idempotent)
     // between STW operations.  This can be useful in some of the 1-0
     // synchronization schemes.
     intptr_t addrBits = intptr_t(obj) >> 3 ;
     value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
  } else
  if (hashCode == 2) {
     value = 1 ;            // for sensitivity testing
  } else
  if (hashCode == 3) {
     value = ++GVars.hcSequence ;
  } else
  if (hashCode == 4) {
     value = intptr_t(obj) ;
  } else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

  value &= markOopDesc::hash_mask;
  if (value == 0) value = 0xBAD ;
  assert (value != markOopDesc::no_hash, "invariant") ;
  TEVENT (hashCode: GENERATE) ;
  return value;
}

It can be found from the source code that the generation strategy is controlled by a hashCode global variable, which defaults to 5; This variable is defined in another header file:

product(intx, hashCode, 5,                                            
         "(Unstable) select hashCode generation algorithm" ) 

It is clear in the source code that... (unstable) select the algorithm generated by hashCode, and the definition here can be controlled by the jvm startup parameters. First confirm the default value:

java -XX:+PrintFlagsFinal -version | grep hashCode

intx hashCode                                  = 5                                   {product}
openjdk version "1.8.0_282"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_282-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.282-b08, mixed mode)

Therefore, we can configure different hashcode generation algorithms through the jvm startup parameters and test the generation results under different algorithms:

-XX:hashCode=N

Now let's look at the different performance of each hashcode generation algorithm.

Algorithm 0

if (hashCode == 0) {
     // This form uses an unguarded global Park-Miller RNG,
     // so it's possible for two threads to race and generate the same RNG.
     // On MP system we'll have lots of RW access to a global, so the
     // mechanism induces lots of coherency traffic.
     value = os::random();
  }

This generation algorithm uses a random number generation strategy of Park Miller RNG. However, it should be noted that... This random algorithm will appear spin waiting when it is highly concurrent

The first algorithm

if (hashCode == 1) {
    // This variation has the property of being stable (idempotent)
    // between STW operations.  This can be useful in some of the 1-0
    // synchronization schemes.
    intptr_t addrBits = intptr_t(obj) >> 3 ;
    value = addrBits ^ (addrBits >> 5) ^ GVars.stwRandom ;
}

This algorithm is really the memory address of the object, and directly obtains the IntPtr of the object_ T type pointer

The second algorithm

if (hashCode == 2) {
    value = 1 ;            // for sensitivity testing
}

There is no need to explain this... Fixed return 1 should be used for internal test scenarios.

Interested students can try - XX:hashCode=2 to start this algorithm and see if the hashCode results have become 1.

The third algorithm

if (hashCode == 3) {
    value = ++GVars.hcSequence ;
}

This algorithm is also very simple, self incrementing. This self incrementing variable is used for the hashCode of all objects. Let's try the effect:

System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());
System.out.println(new Object());

//output
java.lang.Object@144
java.lang.Object@145
java.lang.Object@146
java.lang.Object@147
java.lang.Object@148
java.lang.Object@149

Sure enough, it's self increasing... It's a little interesting

The fourth algorithm

if (hashCode == 4) {
    value = intptr_t(obj) ;
}

In fact, there is little difference between this algorithm and the first algorithm. They all return the object address, but the first algorithm is a variant.

The fifth algorithm

The last one is also the default generation algorithm. This algorithm is used when the hashCode configuration is not equal to 0 / 1 / 2 / 3 / 4:

else {
     // Marsaglia's xor-shift scheme with thread-specific state
     // This is probably the best overall implementation -- we'll
     // likely make this the default in future releases.
     unsigned t = Self->_hashStateX ;
     t ^= (t << 11) ;
     Self->_hashStateX = Self->_hashStateY ;
     Self->_hashStateY = Self->_hashStateZ ;
     Self->_hashStateZ = Self->_hashStateW ;
     unsigned v = Self->_hashStateW ;
     v = (v ^ (v >> 19)) ^ (t ^ (t >> 8)) ;
     Self->_hashStateW = v ;
     value = v ;
  }

Here is a hash value obtained by XOR operation through the current state value. It is more efficient than the previous self increasing algorithm and random algorithm, but the repetition rate should also be relatively higher. However, what does the repetition of hashCode matter

The jvm does not guarantee that this value will not be repeated. For example, the chain address method in HashMap is used to solve hash conflicts

last:

As technicians, what we fear most is to be content with the status quo and stay where we are. Then you may face your own career crisis at the age of 30, because you work so long and only improve your age, and your technology will remain unchanged for thousands of years!

If you want to make a breakthrough and realize your dream in the future, maybe you need to read the above Java learning materials, hoping to be helpful to your career development.

How to get it: just * * like + follow * * and enter [Java architecture resource exchange group] , find the administrator to get Oh -!

Keywords: Java Back-end Interview

Added by Geteburg on Fri, 28 Jan 2022 13:51:45 +0200