What is hashCode and the relationship between hashCode() and equals()

1. What is hashCode:

hashCode is the hash code of an object. It is an integer value derived from some information of the object. By default, it represents the storage address of the object. Hash code can improve the efficiency of retrieval. It is mainly used to quickly determine the storage address of objects in hash storage structure, such as Hashtable and hashMap.

Why can hashcode improve retrieval efficiency? Let's take a look at an example. What is the simplest way to judge whether a collection contains an object? Take out each element in the collection one by one and compare it with the object to be found. When the result of equals() comparison between the element and the object to be found is true, stop searching and return true. Otherwise, return false. If there are many elements in a set, such as 10000 elements, and there is no object to find, it means that your program needs to take 10000 elements from the set and compare them one by one to get a conclusion. This is very inefficient. At this time, Hash algorithm (hash algorithm) can be used to improve the efficiency of finding elements from the set, and the data can be directly allocated to different areas according to a specific algorithm. The set is divided into several storage areas, each object can calculate a hash code, and the hash code can be grouped (calculated using different hash functions), each group corresponds to a storage area. According to the hash code of an object, you can determine which area the object should be stored in, greatly reducing the number of query matching elements.

For example, HashSet uses the hash algorithm to access the set of objects. It internally groups the hash code and divides the storage area of the object by taking the remainder of a number n. when looking for an object from the HashSet set, the Java system first calls the hashCode() method of the object to obtain the hash code of the object, and then finds the corresponding storage area according to the hash, Finally, each element in the storage area is obtained and compared with the object by equals(), so that the conclusion can be obtained without traversing all the elements in the collection.

Next, calculate a group of hash codes through hashCode() of String class:

  1. public class HashCodeTest {
  2. public static void main(String[] args) {
  3. int hash= 0;
  4. String s= "ok";
  5. StringBuilder sb = new StringBuilder(s);
  6. System.out.println(s.hashCode() + " " + sb.hashCode());
  7. String t = new String( "ok");
  8. StringBuilder tb = new StringBuilder(s);
  9. System.out.println(t.hashCode() + " " + tb.hashCode());
  10. }
  11. }
  1. Operation results:
  2. 3548 1829164700
  3. 3548 2018699554

We can see that the strings s and t have the same hash code, because the hash code of the string is derived from the content. The string buffer sb and tb have different hash codes. This is because StringBuilder does not override the hashCode() method. Its hash code is the Object storage address calculated by the default hashCode() of the Object class, so the hash code is naturally different. So how to rewrite a better hashCode method is not difficult. As long as we reasonably organize the hash codes of objects, we can make different objects produce more uniform hash codes. For example, the following example:

  1. public class Model {
  2. private String name;
  3. private double salary;
  4. private int sex;
  5. @Override
  6. public int hashCode() {
  7. return name.hashCode() + new Double(salary).hashCode() + new Integer(sex).hashCode();
  8. }
  9. }

In the above code, we can combine the hash codes of various attribute objects reasonably, and finally produce a relatively good or more uniform hash code. Of course, the above is only a reference example, and we can also implement it in other ways, As long as the hash code can be made more uniform (the so-called uniformity means that the hash code generated by each Object should not conflict). However, it should be noted that there are two improvements to the hashCode method in java 7. First, the java publisher wants us to use a more secure calling method to return the hash code, that is, use the null safe method Objects.hashCode (note that it is not Object but java.util.Objects) method. The advantage of this method is that if the parameter is null, it only returns 0, otherwise it returns the result of the hashCode called by the Object parameter. The source code of Objects.hashCode is as follows:

  1. public static int hashCode(Object o) {
  2. return o != null ? o.hashCode() : 0;
  3. }

Therefore, our modified code is as follows:

  1. import java.util.Objects;
  2. public class Model {
  3. private String name;
  4. private double salary;
  5. private int sex;
  6. @Override
  7. public int hashCode() {
  8. return Objects.hashCode(name) + new Double(salary).hashCode() + new Integer(sex).hashCode();
  9. }
  10. }

java 7 also provides another method, Java util. Objects. Hash (object... Objects), which can be called when we need to combine multiple hash values. Further simplify the above code:

  1. import java.util.Objects;
  2. public class Model {
  3. private String name;
  4. private double salary;
  5. private int sex;
  6. @Override
  7. public int hashCode() {
  8. return Objects.hash(name,salary,sex);
  9. }
  10. }

Well, we've talked about what hashCode() should be introduced. There's one more thing to say. If we provide an array type variable, we can call arrays hashCode() to calculate its hash code, which is composed of the hash code of the array element.

2. Relationship between equals() and hashCode():

The Java superclass Object class has defined the equals() and hashCode() methods. In the Object class, equals() compares whether the memory addresses of the two objects are equal, and hashCode() returns the memory address of the Object. Therefore, hashCode is mainly used for searching, and equals() is used to compare whether two objects are equal. However, sometimes we may need to rewrite these two methods according to specific requirements. When rewriting these two methods, we mainly pay attention to the following features:

(1) If the equals() result of two objects is true, the hashcodes of the two objects must be the same;

(2) The hashCode() results of the two objects are the same, which does not mean that the equals() of the two objects must be true. It only means that the two objects are in a hash storage structure.

(3) If the object's equals() is overridden, the object's hashCode() is also overridden.

3. Why override the hashCode() method when overriding equals():

Before answering this question, let's first understand the process of putting elements into the collection, as shown in the following figure:

When putting an object into a collection, first judge whether the hashcode value of the object to be placed is equal to the hashcode value of any element in the collection. If not, directly put the object into the collection. If the hashcode values are equal, then judge whether the object to be placed is equal to any object in the storage area through equals(). If equals() judges that it is not equal, directly put the element into the collection, otherwise it will not be placed.

Similarly, when using get() to query elements, the collection class also calls key Hashcode () calculates the array subscript, and then looks at the results of equals(). If it is true, it is found, otherwise it is not found.

Suppose we override the Object's equals() but not the hashcode () method, because the hashcode () method in the superclass Object always returns the memory address of an Object, and the memory address of different objects is always unequal. At this time, even if we rewrite the equals() method, there will be no specific effect, because we can't ensure that two objects whose equals() result is true will be hashed in the same storage area, that is, obj1 The result of equals (obj2) is true, but obj1 cannot be guaranteed hashCode() == obj2. The result of hashcode () expression is also true; In this case, the data is not unique, because if the hashcode () is not equal, the equals method will not be called for comparison, so rewriting equals() is meaningless.

Taking HashSet as an example, if the hashCode() method of a class does not comply with the above requirements, when the comparison results of two instance objects of this class with the equals() method are equal, they should not be stored in the set set at the same time. However, if they are stored in the HashSet set set, Because the return values of their hashCode() methods are different (HashSet uses hashCode() in Object, and its return value is the memory address of the Object), the second Object is calculated according to the hash code and may be placed in a region different from the first Object. In this way, it is impossible to compare the equals method with the first Object, and it may also be stored in the HashSet set; Therefore, the hashCode() method in the Object class cannot meet the requirements for objects to be stored in the HashSet, because its return value is calculated from the memory address of the Object, and the hash value returned by the same Object at any time during program operation is always the same. Therefore, as long as there are two different instance objects, even if their equals method comparison results are equal, The return value of their default hashCode method is different.

Next, let's give a few small examples to test:

3.1 test 1: overwrite equals() but not hashCode(), resulting in non uniqueness of data.

  1. public  class HashCodeTest {  
  2.      public static void main(String[] args) {  
  3.         Collection set =  new HashSet();  
  4.         Point p1 =  new Point( 1,  1);  
  5.         Point p2 =  new Point( 1,  1);  
  7.         System.out.println(p1.equals(p2));  
  8.         set.add(p1);    //(1)  
  9.         set.add(p2);    //(2)  
  10.         set.add(p1);    //(3)  
  12.         Iterator iterator = set.iterator();  
  13.          while (iterator.hasNext()) {  
  14.             Object object = iterator.next();  
  15.             System.out.println(object);  
  16.         }  
  17.     }  
  18. }  
  20. class Point {  
  21.      private  int x;  
  22.      private  int y;  
  24.      public Point(int x, int y) {  
  25.          super();  
  26.          this.x = x;  
  27.          this.y = y;  
  28.     }  
  30.      @Override  
  31.      public boolean equals(Object obj) {  
  32.          if ( this == obj)  
  33.              return  true;  
  34.          if (obj ==  null)  
  35.              return  false;  
  36.          if (getClass() != obj.getClass())  
  37.              return  false;  
  38.         Point other = (Point) obj;  
  39.          if (x != other.x)  
  40.              return  false;  
  41.          if (y != other.y)  
  42.              return  false;  
  43.          return  true;  
  44.     }  
  46.      @Override  
  47.      public String toString() {  
  48.          return  "x:" + x +  ",y:" + y;  
  49.     }  
  50. }  
  1. Output results:
  2. true
  3. x: 1, y: 1
  4. x: 1, y: 1

Cause analysis:

  • When executing set When add (P1) (1), the set is empty and directly stored in the set;
  • When executing set When adding (p2) (2), first judge whether the storage area where the hashCode value of the Object p2 is located has the same hashCode. Because the hashCode method is not overwritten, the hashCode method of the Object is used by default to return the integer after memory address conversion. Because the address values of different objects are different, there are no objects with the same hashCode value as p2, so they are directly stored in the collection.
  • When executing set When add (p1) (3), because p1 has been stored in the collection, the hashCode value returned by the same object is the same. Continue to judge whether equals returns true. Because it is the same object, it returns true. At this time, jdk considers that the object already exists in the collection, so it discards it.

3.2 test 2: overwriting hashCode() but not equals() will still lead to non uniqueness of data.

Modify the Point class:

  1. class Point {  
  2.      private  int x;  
  3.      private  int y;  
  5.      public Point(int x, int y) {  
  6.          super();  
  7.          this.x = x;  
  8.          this.y = y;  
  9.     }  
  11.      @Override  
  12.      public int hashCode() {  
  13.          final  int prime =  31;  
  14.          int result =  1;  
  15.         result = prime * result + x;  
  16.         result = prime * result + y;  
  17.          return result;  
  18.     }  
  20.      @Override  
  21.      public String toString() {  
  22.          return  "x:" + x +  ",y:" + y;  
  23.     }  
  25. }  
  1. Output results:
  2. false
  3. x: 1, y: 1
  4. x: 1, y: 1

Cause analysis:

  • When executing set When add (P1) (1), the set is empty and directly stored in the set;
  • When executing set When adding (p2) (2), first judge whether the storage area where the hashCode value of the object p2 is located has the same hashCode. The hashCode method is overwritten here. The hashcodes of p1 and p2 are equal, so continue to judge whether equals() is equal, because equals() is not overwritten here. By default, "= =" is used to judge, and "= =" compares the memory addresses of the two objects, So here, equals () will return false, so the collection is considered to be different objects, so p2 is stored in the collection.
  • When executing set When add (p1) (3), because p1 has been stored in the collection, the hashCode value returned by the same object is the same, and equals returns true. At this time, it is considered that the object already exists in the collection, so it is discarded.

Combining the above two tests, to ensure the uniqueness of elements, you must cover hashCode and equals at the same time.

(Note: when inserting the same element in the HashSet (hashCode and equals are equal), the newly added element will be discarded, while when inserting the same Key (Value is different) in the HashMap, the original element will be overwritten.)

4. Memory leak caused by hashCode():

  1. public class RectObject {
  2. public int x;
  3. public int y;
  4. public RectObject(int x,int y){
  5. this.x = x;
  6. this.y = y;
  7. }
  8. @Override
  9. public int hashCode(){
  10. final int prime = 31;
  11. int result = 1;
  12. result = prime * result + x;
  13. result = prime * result + y;
  14. return result;
  15. }
  16. @Override
  17. public boolean equals(Object obj){
  18. if( this == obj)
  19. return true;
  20. if(obj == null)
  21. return false;
  22. if(getClass() != obj.getClass())
  23. return false;
  24. final RectObject other = (RectObject)obj;
  25. if(x != other.x){
  26. return false;
  27. }
  28. if(y != other.y){
  29. return false;
  30. }
  31. return true;
  32. }
  33. }

We have rewritten the hashCode and equals methods in the parent Object. We can see that in the hashCode and equals methods, if the X and Y values of two RectObject objects are equal, their hashCode values are equal, and equals returns true;

  1. import java.util.HashSet;
  2. public class Demo {
  3. public static void main(String[] args){
  4. HashSet<RectObject> set = new HashSet<RectObject>();
  5. RectObject r1 = new RectObject( 3, 3);
  6. RectObject r2 = new RectObject( 5, 5);
  7. RectObject r3 = new RectObject( 3, 5);
  8. set.add(r1);
  9. set.add(r2);
  10. set.add(r3);
  11. r3.y = 7;
  12. System.out.println( "Size before deletion size:"+set.size()); //3
  13. set.remove(r3);
  14. System.out.println( "Size after deletion size:"+set.size()); //3
  15. }
  16. }
  1. Operation results:
  2. Size before deletion size: 3
  3. Size after deletion size: 3

Here, we found a problem. When we called remove to delete r3 object, we thought r3 was deleted, but in fact it was not deleted. This is called memory leak, that is, the unused object is still in memory. So after we did this many times, the memory burst. Take a look at the source code of remove:

  1. public boolean remove(Object o) {
  2. return map.remove(o)==PRESENT;
  3. }

Then take a look at the source code of the remove method of map:

  1. public V remove(Object key) {
  2. Entry<K,V> e = removeEntryForKey(key);
  3. return (e == null ? null : e.value);
  4. }

Take another look at the source code of the removeEntryForKey method:

  1. /**
  2. * Removes and returns the entry associated with the specified key
  3. * in the HashMap. Returns null if the HashMap contains no mapping
  4. * for this key.
  5. */
  6. final Entry<K,V> removeEntryForKey(Object key) {
  7. int hash = (key == null) ? 0 : hash(key);
  8. int i = indexFor(hash, table.length);
  9. Entry<K,V> prev = table[i];
  10. Entry<K,V> e = prev;
  11. while (e != null) {
  12. Entry<K,V> next = e.next;
  13. Object k;
  14. if (e.hash == hash &&
  15. ((k = e.key) == key || (key != null && key.equals(k)))) {
  16. modCount++;
  17. size--;
  18. if (prev == e)
  19. table[i] = next;
  20. else
  21. prev.next = next;
  22. e.recordRemoval( this);
  23. return e;
  24. }
  25. prev = e;
  26. e = next;
  27. }
  28. return e;
  29. }

We can see that when we call the remove method, we will first use the hashCode value of the object to find the object and then delete it. This problem is because we modified the value of the Y attribute of the r3 object, and because the y value is involved in the operation in the hashCode() method of the RectObject object, the hashCode of the r3 object is changed, Therefore, r3 was not found in the remove method, so the deletion failed. That is, the hashCode of r3 has changed, but its storage location has not been updated and is still in the original location, so we can't find it when we use its new hashCode

The above memory leak tells me a message: if we participate in the hashCode operation of the object's attribute value, we can't modify its attribute value during deletion, otherwise it will lead to memory leak.

5. hashCode() method and equals() method of basic data type and String type:

(1) hashCode(): the eight basic types of hashCode() are very simple, which is to directly return their numerical size. String objects use a complex calculation method, but this calculation method can ensure that if the values of this string are equal, their hashcodes are equal.

(2) equals(): the equals method of eight basic types of encapsulated classes is to directly compare values. The equals method of String type is to compare the values of strings.

6. Does the value of hashcode change before and after JVM GC?

Answer the question of a small partner in the comment area: the storage location of an object will change after GC. Will the hashcode of this object change? If the user thread obtains the hashcode of the object before GC, and then GC, will the object not be found after GC according to the hashcode? The answer was no!

As mentioned earlier, without rewriting hashcode (), hashcode is generated according to the memory address mapping of the object. And Java Lang. object has three conventions for hashcode() method:

  • First, when the field used by an object's equals() method remains unchanged, the value of multiple calls to the hashCode() method should remain unchanged.
  • Second, if the two object equals(Object o) methods are equal, the hashCode() method value must be equal.
  • Third, if the two object equals(Object o) methods are not equal, the hashCode() method value is not required to be equal, but in this case, try to ensure that the hashcodes are different to improve performance.

We know that when the JVM performs GC operations, the memory address of the object will change whether it is the tag replication algorithm or the tag collation algorithm, but the hashcode needs to remain unchanged. How does the JVM realize this function?

When the hashcode method is not called, the position used to store the hashcode in the object header is 0. However, when the hashCode() method is called for the first time, the corresponding hashcode value will be calculated and stored in the object header. When called again, you can get the calculated hashcode directly from the object header.

The above method ensures that even if the GC occurs and the object storage address changes, the hashcode value will not be affected. For example, before GC, hashCode() method was called, and hashcode value was stored, never mind even if the address changed. This is especially true when hashCode is invoked after GC occurs.

(1) Code validation:

The following is a simple code to verify the results of memory addresses and hashcode values before and after GC. First, introduce JOL dependency into the project:

  1. <dependency>
  2. <groupId>org.openjdk.jol</groupId>
  3. <artifactId>jol-core</artifactId>
  4. <version> 0.10</version>
  5. </dependency>

The verification code is as follows:

  1. public static void main(String[] args)
  2. {
  3. Object obj = new Object();
  4. long address = VM.current().addressOf(obj);
  5. long hashCode = obj.hashCode();
  6. System.out.println( "GC front-Memory address:" + address);
  7. System.out.println( "GC front-hashcode Value:" + hashCode);
  8. new Object();
  9. new Object();
  10. new Object();
  11. System.gc();
  12. long afterAddress = VM.current().addressOf(obj);
  13. long afterHashCode = obj.hashCode();
  14. System.out.println( "GC after-Memory address:" + afterAddress);
  15. System.out.println( "GC after-hashcode Value:" + afterHashCode);
  16. System.out.println( "---------------------");
  17. System.out.println( "Memory address = " + (address == afterAddress));
  18. System.out.println( "hashcode = " + (hashCode == afterHashCode));
  19. }

Output results:

  1. GC front-Memory address: 31883104632
  2. GC front-hashcode Value: 331844619
  3. GC after-Memory address: 29035177568
  4. GC after-hashcode Value: 331844619
  5. ---------------------
  6. Memory address = false
  7. hashcode = true

The storage method of hashcode was also mentioned earlier. Let's simply verify and observe the changes of information in object header:

  1. public static void main(String[] args)
  2. {
  3. // Create objects and print information about objects in the JVM
  4. Object person = new Object();
  5. System.out.println(ClassLayout.parseInstance(person).toPrintable());
  6. // Call the hashCode method. If the hashCode method is overridden, call the System#identityHashCode method
  7. System.out.println(person.hashCode());
  8. // System.out.println(System.identityHashCode(person));
  9. // Print the information in the object JVM again
  10. System.out.println(ClassLayout.parseInstance(person).toPrintable());
  11. }

Execution results:

  1. java.lang.Object object internals:
  3. 0 4 (object header) 01 00 00 00 ( 00000001 00000000 00000000 00000000) ( 1)
  4. 4 4 (object header) 00 00 00 00 ( 00000000 00000000 00000000 00000000) ( 0)
  5. 8 4 (object header) e 5 01 00 f 8 ( 11100101 00000001 00000000 11111000) (- 134217243)
  6. 12 4 (loss due to the next object alignment)
  7. Instance size: 16 bytes
  8. Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
  9. 863831416
  10. java.lang.Object object internals:
  12. 0 4 (object header) 01 78 05 7d ( 00000001 01111000 00000101 01111101) ( 2097510401)
  13. 4 4 (object header) 33 00 00 00 ( 00110011 00000000 00000000 00000000) ( 51)
  14. 8 4 (object header) e 5 01 00 f 8 ( 11100101 00000001 00000000 11111000) (- 134217243)
  15. 12 4 (loss due to the next object alignment)
  16. Instance size: 16 bytes
  17. Space losses: 0 bytes internal + 4 bytes external = 4 bytes total

Before and after calling the hashcode method, we can see that the Value stored in a row with OFFSET 0 has changed from 1 to 2097510401, that is, the Value of hashcode has been stored. If the corresponding method is not called, it will not be stored.

Keywords: Java

Added by jek1134 on Tue, 04 Jan 2022 08:44:21 +0200