[JVM] StringTable - string constant pool

1. StringTable

StringTable location of different JDK versions

1.1 basic characteristics of string

  • String string, represented by a pair of "".
  • The String class is declared as final and cannot be inherited
  • String: implements the Serializable interface: indicates that the string supports serialization, and implements the Comparable interface: indicates that the string can compare the size.
  • String defines final char[] value in JDK 8 and before to store string data. Changed to byte [] when JDK 9. ( http://openjdk.java.net/jeps/254)

Conclusion: String is no longer stored in char [] but in byte [] with code mark, which saves some space.

public final class String implments java.io.Serializable, Comparable<String>, CharSequence {
    @Stable
    private final byte[] value;
    // ...
}

Status quo of StringBuffer and StringBuilder:

String related classes such as abstractstringbuilder, StringBuilder and StringBuffer will be updated to use the same representation, as will the hotspot VM's intrinsic string operations

  • String: represents an immutable character sequence (immutability)

    • When the string is re assigned, the assigned memory area needs to be rewritten, and the original value cannot be used for assignment.
    • When connecting an existing string, you also need to assign a value to the memory area again. The original value cannot be used for assignment.
    • When you call the replace() method of String to modify the specified character or String, you also need to reassign the memory area assignment. You cannot use the original value for assignment.
  • Assign a value to a string by literal means (different from new). At this time, the string value is declared in the string constant pool.

  • Strings with the same content will not be stored in the string constant pool.

    • The String Pool of String is a Hashtable of fixed size. The default size length is 1009. If there are too many strings in the String Pool, the Hash conflict will be serious, resulting in a long linked list. The direct impact of a long linked list is when calling String Performance degrades significantly when intern().
    • Use - XX:StringTableSize to set the length of a StringTable.
    • In JDK6, the StringTable is fixed, which is the length of 1009. Therefore, if there are too many strings in the constant pool, the efficiency will decline quickly. StringTableSize setting is not required.
    • In JDK7, the default value of StringTable length is 60013
    • Starting from JDK8, when setting the length of StringTable, 1009 is the minimum value that can be set.

1.2 memory allocation of string

  • There are eight basic data types and a special type String in the Java language. In order to make them run faster and save memory, these types provide a concept of constant pool.
  • The constant pool is similar to a cache provided at the Java system level. The constant pools of the eight basic data types are system coordinated, and the constant pools of String type are special. There are two main ways to use it.
    • String objects declared directly in double quotes are stored directly in the constant pool.
      • For example: String info = "ABC";
    • If it is not a String object declared in double quotation marks, you can use the intern() method provided by String.
  • Java 6 and before, the string constant pool was stored in the permanent generation.
  • Oracle engineers in Java 7 have made great changes to the logic of the string pool, that is, the position of the string constant pool has been adjusted to the Java heap.
    • All strings are saved in the Heap, just like other ordinary objects, which allows you to adjust the Heap size when tuning the application.
    • The concept of string constant pool was originally used a lot, but this change gives us enough reason to reconsider using string in Java 7 intern().
  • After the implementation of Java 8 method area is changed from permanent generation to meta space, string constants are stored in the heap.

Why should StringTable be adjusted?

  1. PermSize is smaller by default
  2. Low frequency of permanent garbage collection

Official website:

https://www.oracle.com/technetwork/java/javase/jdk7-relnotes-418459.html#jdk7changes

1.3 basic operation of string

The Java language specification requires exactly the same String literal, which should contain the same Unicode character sequence (including the constant of the same code point sequence), and must point to the same String class instance.

1.4 string splicing operation

  1. The splicing results of constants and constants are in the constant pool. The principle is compilation time optimization.
  2. Constants with the same content will not exist in the constant pool.
  3. As long as one of them is a variable, the result is in the heap. The principle of variable splicing is StringBuilder.
  4. If the result of splicing calls the intern() method, the string object not yet in the constant pool will be actively put into the pool and the object address will be returned.

Test code - compile time optimization:

@Test
public void test1() {
    String s1 = "a" + "b" + "c";	//Compile time optimization: equivalent to "abc"
    String s2 = "abc";

    /**
         * final. java compiled into Class, and then execute class
         * String s1 = "abc"
         * String s2 = "abc";
         */
    System.out.println(s1 == s2);       //true
    System.out.println(s1.equals(s2));  //true
}

Test code - variables before and after splicing:

@Test
public void test2() {
    String s1 = "javaEE";
    String s2 = "hadoop";

    String s3 = "javaEEhadoop";
    String s4 = "javaEE" + "hadoop";   //Compile time optimization: equivalent to "javaEEhadoop"

    // If there are variables before and after the splicing symbol, it is equivalent to new String() in the heap space. The specific content is the result after splicing.
    String s5 = s1 + "hadoop";
    String s6 = "javaEE" + s2;
    String s7 = s1 + s2;


    System.out.println(s3 == s4);   //true
    System.out.println(s3 == s5);   //false
    System.out.println(s3 == s6);   //false
    System.out.println(s3 == s7);   //false

    System.out.println(s5 == s6);   //false
    System.out.println(s5 == s7);   //false
    System.out.println(s6 == s7);   //false

    // intern(): judge whether there is a value of javaEEhadoop in the string constant pool:
    // If yes: returns the address of this string in the constant pool.
    // If it does not exist: load a copy of javaEEhadoop in the constant pool and return the address of this object.
    String s8 = s6.intern();
    System.out.println(s3 == s8);   //true
}

Test code - variables before and after splicing:

@Test
public void test3() {
    String s1 = "a";
    String s2 = "b";
    String s3 = "ab";
    /**
         * The following execution details of s1 + s2 (variables appear on both sides of the connecting symbol, and s is a temporary variable):
         * 1,StringBuilder s = new StringBuilder();
         * 2,s.append(s1);
         * 3,s.append(s2);
         * 4,s.toString();   --->   (Heap) is approximately equal to new String("ab")
         *
         * Supplement: StringBuilder is used after JDK 5.0, and StringBuffer is used before JDK 5.0
         */

    String s4 = s1 + s2;
    System.out.println(s3 == s4);   //false
}

Test code - constant before and after splicing:

@Test
public void test4() {
    final String s1 = "a";
    final String s2 = "b";
    String s3 = "ab";
    String s4 = s1 + s2;

    //Both sides of the connecting symbol are still constants
    System.out.println(s3 == s4);   //true
}

String splicing does not necessarily use StringBuilder:

  • If the left and right sides of the splice symbol are string constants ("") or constant references (final), compile time optimization is still used

When final modifies the structure of classes, methods, basic data types and reference data types, it is recommended to use it when final can be used.

Test code splicing & append efficiency comparison:

public void method1() {
    String str = "";
    for (int i = 0; i < 100000; i++) {
        str = str + "a";	//Each cycle will create a StringBuidler and String
    }
}

public void method2() {
    // Just create a StringBuilder
    StringBuilder builder = new StringBuilder();
    for (int i = 0; i < 100000; i++) {
        builder.append("a");
    }
}

@Test
public void test() {
    long startTime = System.currentTimeMillis();

    //method1();
    //method2();

    long endTime = System.currentTimeMillis();

    System.out.println("Elapsed time:" + (endTime - startTime));
}

Call method1:

Call method2:

Adding strings through the append() method of StringBuilder is much more efficient than String splicing.

contrast:

  • Create object problem:
    • StringBuilder's append() method: only one object is created from beginning to end
    • String splicing method of string: too many StringBuilder and string objects are created
  • String splicing method of string: because too many StringBuilder and string objects are created in memory, the memory is too large, and GC takes extra time

Optimization:

  • In actual development, if it is basically determined that the length of the string added before and after is not higher than a certain limit value highLevel, specify the capacity capacity highLevel for StringBuilder through the constructor (to avoid frequent expansion of the array and occupation of memory, StringBuilder s = new StringBuilder (highLevel);)

1.5 use of intern()

1.5.1 general

intern(): judge whether there is a value of javaEEhadoop in the string constant pool:

  • Exists: returns the address of this string in the constant pool.
  • Nonexistent: load a copy of javaEEhadoop in the constant pool and return the address of this object.

s.intern() == t.intern() is equivalent to s.equals(t)

If it represents a String object declared in double quotation marks, you can use the intern method provided by String: the intern method will query whether the current String exists from the String constant pool. If it does not exist, it will put the current String into the constant pool.

// For example:
String myInfo = new String("I love you").intern();

That is, if you call string. On any string Intern method, the class instance pointed to by its return result must be exactly the same as the string instance directly in the form of constant.

Therefore, the result of the following expression must be true:

("a" + "b" + "c").intern() == "abc"

Generally speaking, interconnected string is to ensure that there is only one copy of the string in memory, which can save memory space and speed up the execution of string operation tasks. Note that this value will be placed in the string inter pool.

/**
 * How to ensure that the variable s points to the data in the string constant pool?
 * There are two ways:
 * 1,Literal quantity declaration method: String s = "ABC";
 * 2,intern()Method: String s = new string ("ABC") intern();
 * 	  		      String s = new StringBuidler("ABC").toString().intern();
 */

Title: how many objects will new String("ab") create?

  • There are two types of String objects:
    • An object is created in heap space through the new keyword
    • One object is "ab" (bytecode instruction ldc) in the constant pool

Extension: how many objects will new String("a") + new String("b") create?

  • Object 1: new StringBuilder()
  • Object 2: new String("a")
  • Object 3: "a" in constant pool
  • Object 4: new String("b")
  • Object 5: "b" in constant pool

  • In depth analysis: toString() of StringBuilder:
    • Object 6: new String("ab")
  • toString() is called. In the string constant pool, "ab" is not generated

1.5.2 examples

JDK 6 – vs – JDK 7/8

public class StringIntern1 {
    public static void main(String[] args) {
        
        ///1. The first case
        String s = new String("1");
        s.intern();			// '1' already exists in the string constant pool before calling this method
        String s2 = "1";
        System.out.println(s == s2);
        /** 
         * JDK 6 + 7/8 :   
         *       false: 
         *            s : new Keyword address created in heap space
         *            s1: The address of the object in the string constant pool
		 */
        
        ///2. The second case
        String s3 = new String("1") + new String("1");		//The address of s3 variable record is: new String("11")
        // "11" does not exist in the string constant pool after executing the previous line of code
        s3.intern();			// Generate "11" in the string constant pool: JDK6: create a new object "11", which will have a new address. JDK7: at this time, "11" is not created in the constant pool, but points to (records) the address of new "11" before the heap space
        String s4 = "11";		// Address of s4 variable record: the address of "11" generated in the constant pool during the execution of the previous line of code
        System.out.println(s3 == s4);
    }
}

JDK 6: false + false

JDK 7/8: false + true

JDK 8

public class StringIntern2 {
    public static void main(String[] args) {
        ///3. The third case
        String s3 = new String("1") + new String("1");
        // "11" does not exist in the string constant pool after executing the previous line of code
        String s4 = "11";	// Generate object "11" in string constant pool
        String s5 = s3.intern();	//The object "11" has been generated in the constant pool. This step can only make s5 hold the address in the constant pool
        System.out.println(s3 == s4);	//false
        System.out.println(s5 == s4);	//true
    }
}

Summarize the use of String intern():

  • In JDK 1.6, try to put this string object into the string pool.
    • If there is in the string pool, it will not be put in. Returns the address of the object in the existing string pool.
    • If there is no in the string pool, a copy of this object will be copied and put into the string pool, and the address of the object in the string pool will be returned.
  • From JDK 1.7, try to put this string object into the string pool.
    • If there is in the string pool, it will not be put in. Returns the address of the object in the existing string pool.
    • If there is no in the string pool, a copy of the reference address of the object will be copied into the string pool, and the reference address in the string pool will be returned.

1.5.3 intern(): space efficiency test

/**
 * Using intern to test efficiency: spatial angle
 */
public class StringInternTest {
    static final int MAX_COUNT = 1000 * 1000;
    static final String[] arr = new String[MAX_COUNT];

    public static void main(String[] args) {
        Integer[] data = new Integer[]{1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

        long start = System.currentTimeMillis();
        for (int i = 0; i < MAX_COUNT; i++) {
           /* Test focus: whether to call the intern() method */
           //  arr[i] = new String(String.valueOf(data[i % data.length]));
           //  arr[i] = new String(String.valueOf(data[i % data.length])).intern();
        }
        long end = System.currentTimeMillis();

        System.out.println("Time spent:" + (end - start));

        try {
            Thread.sleep(1000000);
        } catch (InterruptedException e) {
            e.printStackTrace();
        }

        System.gc();
    }
}

Unused intern():

Use intern():

For a large number of strings in the program, especially when there are many duplicate strings, using intern() can save memory space.

Large website platforms need to store a large number of strings in memory (social networking sites). This is because if the string calls the intern() method, the memory size will be significantly reduced.

1.6 garbage collection of stringtable

/**
 * String Garbage collection:
 * -Xms15m -Xmx15m -XX:+PrintStringTableStatistics -XX:+PrintGCDetails
 *
 * Parameter - XX:+PrintStringTableStatistics: print string constant pool statistics
 */
public class StringGCTest {
    public static void main(String[] args) {
        // Increase the parameters in turn, 0 - > 100 - > 100000, and observe whether GC occurs
        for (int i = 0; i < 100; i++) {
            String.valueOf(i).intern();
        }
    }
}

0 (cancel for loop):

100:

100000:

Occurrence of GC:

Parameter less than 100000:

1.7 String de duplication in G1

Official website:

http://openjdk.java.net/jeps/192

De duplication: for char array

String str1 = new String("hello");
String str2 = new String("hello");
  • Background: tests on many Java applications (large and small) have yielded the following results:

    • String objects account for 25% of the heap survival data set
    • There are 13.5% duplicate String objects in the heap survival data set
    • The average length of String objects is 45
  • The bottleneck of many large-scale Java applications lies in memory. Tests show that in these types of applications, almost 25% of the data sets surviving in the Java heap are String objects. Furthermore, almost half of the String objects are repeated, that is:

    • string1.equals(string2) = true

    • The existence of duplicate String objects on the heap must be a waste of memory. This project will implement automatic and continuous de duplication of duplicate String objects in G1 garbage collector, so as to avoid wasting memory.

  • realization:

    • When the garbage collector works, it accesses the objects that live on the heap. For each accessed object, it will check whether it is a candidate String object to be de duplicated.
    • If so, insert a reference of this object into the queue and wait for subsequent processing. A de duplication thread runs in the background to process the queue. Processing an element of a queue means deleting the element from the queue and then trying to recreate the String object it references.
    • Use a hashtable to record all non repeating char arrays used by String objects. When you go as like as two peas, check hashtable to see if there is a char array on the heap.
    • If it exists, the String object will be adjusted to refer to that array, release the reference to the original array, and finally be recycled by the garbage collector.
    • If the search fails, the char array will be inserted into hashtable so that the array can be shared later.
  • Command line options

    • Usestringduplication (bool): enable String de duplication. It is not enabled by default and needs to be enabled manually.
    • Printstring duplication Statistics (bool): print detailed de duplication statistics
    • String duplication age threshold (uintx): String objects that reach this age are considered candidates for de duplication

For each accessed object, it will check whether it is a candidate String object to be de duplicated.

  • If so, insert a reference of this object into the queue and wait for subsequent processing. A de duplication thread runs in the background to process the queue. Processing an element of a queue means deleting the element from the queue and then trying to recreate the String object it references.

  • Use a hashtable to record all non repeating char arrays used by String objects. When you go as like as two peas, check hashtable to see if there is a char array on the heap.

  • If it exists, the String object will be adjusted to refer to that array, release the reference to the original array, and finally be recycled by the garbage collector.

  • If the search fails, the char array will be inserted into hashtable so that the array can be shared later.

  • Command line options

    • Usestringduplication (bool): enable String de duplication. It is not enabled by default and needs to be enabled manually.
    • Printstring duplication Statistics (bool): print detailed de duplication statistics
    • String duplication age threshold (uintx): String objects that reach this age are considered candidates for de duplication

Keywords: Java jvm Back-end string

Added by prometheos on Mon, 31 Jan 2022 00:12:14 +0200