String constant pool (StringTable) summary

catalogue

1, Basic properties of String

2, Immutability of String

3, Memory allocation for String

4, Basic operation of String

5, Use of intern()

6, Common interview questions

1, Basic properties of String

  • String: string, represented by a pair of "".
    • String s1 = "mogublog" ; / / definition method of literal quantity
    • String s2 = new String("abc");
  • String is declared as final and cannot be inherited;
  • String implements the Serializable interface: it means that the string supports serialization. The Comparable interface is implemented: it means that string can compare sizes;
  • String defines final char[] value in jdk8 and before to store string data. When JDK9, it is changed to byte [];

Why did JDK9 change the structure?

The current implementation of the String class stores characters in a char array, using two bytes (16 bits) per character. Data collected from many different applications show that strings are the main part of heap use, and most String objects contain only Latin characters, which only need one byte of storage space, so half of the space in the internal char array of these String objects will not be used.

We suggest changing the internal representation class of the String from utf - 16 character array to [byte array + an encoding flag field]. The new String class will store characters encoded as iso-8859-1 / Latin-1 (one byte per character) or utf-16 (two bytes per character) according to the content of the String. The encoding flag will indicate which encoding is used.

Conclusion: String is no longer stored in char [] but in byte [] with code mark, which saves some space.

// before
private final char value[];
// after
private final byte[] value

At the same time, String based data structures, such as StringBuffer and StringBuilder, have also been modified.

2, Immutability of String

String: represents an immutable character sequence. Abbreviation: non variability.

When the String is re assigned, the assigned memory area needs to be rewritten, and the original value cannot be used for assignment. When connecting an existing String, you also need to assign a value to the memory area again. The original value cannot be used for assignment. When calling the replace() method of String to modify the specified character or String, you also need to reassign the memory area assignment, and the original value cannot be used for assignment. A String is assigned a literal value (different from new), and the String value is declared in the String constant pool.

The following code demonstrates the immutability of String:

public class StringTest01 {
    private static void test1() {
        // The literal is defined in such a way that "abc" is stored in the string constant pool
        String s1 = "abc";
        String s2 = "abc";
        System.out.println(s1 == s2);  //true
        s1 = "hello";
        System.out.println(s1 == s2);  //false
        System.out.println(s1);        //hello
        System.out.println(s2);        //abc
        System.out.println("----------------");
    }

    private static void test2() {
        String s1 = "abc";
        String s2 = "abc";
        // As long as you make changes, you will recreate an object, which is immutability
        s2 += "def";
        System.out.println(s1);        //abc
        System.out.println(s2);        //abcdef
        System.out.println("----------------");
    }

    private static void test3() {
        String s1 = "abc";
        String s2 = s1.replace('a', 'm');
        System.out.println(s1);       //abc
        System.out.println(s2);       //mbc
    }

    public static void main(String[] args) {
        test1();
        test2();
        test3();
    }

}

Operation results:

true
false
hello
abc
----------------
abc
abcdef
----------------
abc
mbc

Through the above example, we should understand the immutability of String. Next, we will strengthen our understanding of String immutability through a common interview question.

public class StringTest02 {
    String str = new String("good");
    char[] ch = {'t', 'e', 's', 't'};

    public void change(String str, char ch[]) {
        str = "test ok";
        ch[0] = 'b';
    }

    public static void main(String[] args) {
        StringTest02 stringTest02 = new StringTest02();
        stringTest02.change(stringTest02.str, stringTest02.ch);
        System.out.println(stringTest02.str); //good
        System.out.println(stringTest02.ch);  //best
    }
}

Operation results:

good
best

We should note that the string constant pool will not store strings with the same content.

The string Pool of string is a fixed size Hashtable. The default size length is 1009. If too many strings are put into the string Pool, the Hash conflict will be serious, resulting in a long linked list. The direct impact of a long linked list is when calling string When intern, the performance will be greatly reduced.

We can use - XX:StringTablesize to set the length of stringTable.

In jdk6, the stringTable is fixed, which is 1009 in length. Therefore, if there are too many strings in the constant pool, the efficiency will decrease quickly.

In jdk7, the default value of stringtable length is 60013. In JDK8, the minimum value that stringtable can set is 1009.

3, Memory allocation for String

There are eight basic data types and a special type string in the Java language. These types provide a constant pool concept in order to make them run faster and save memory.

The constant pool is similar to a cache provided at the Java system level. The constant pools of the eight basic data types are system coordinated, and the constant pools of string type are special. It is mainly used in two ways:

(1) String objects declared directly in double quotation marks will be stored directly in the constant pool.

  • For example: String info= "abcd";

(2) . if it is not a String object declared in double quotation marks, you can use the intern() method provided by String.

  • Java 6 and before, the string constant pool is stored in the permanent generation;
  • Java 7 adjusts the position of the string constant pool to the Java Heap, and all strings are saved in the Heap, just like other ordinary objects. This allows you to adjust the Heap size when tuning applications.
  • Java 8 meta space, string constants are still in heap space;

Why is StringTable adjusted from permanent generation to heap?

In JDK 7, the interned string is no longer allocated in the permanent generation of the Java heap, but in the main part of the Java heap (called the younger generation and the older generation), together with other objects created by the application. This change will result in more data residing in the main Java heap and less data residing in the permanent build, so you may need to resize the heap. As a result of this change, most applications will only see relatively small differences in heap usage, but this difference will occur in larger applications that load many classes or use strings heavily. There are two main reasons:

  • The space of the permanent generation is relatively small by default, and it is difficult to use parameters to dynamically set its size;
  • The garbage collection frequency of the permanent generation is low, resulting in a large number of unused string constants that cannot be recycled;

4, Basic operation of String

The Java language specification requires exactly the same String literal, which should contain the same Unicode character sequence (constant containing the same code point sequence), and must point to the same String class instance.

(1) String splicing

  • The splicing results of constants and constants are in the constant pool. The principle is compile time optimization;
  • Variables with the same content will not exist in the constant pool;
  • As long as one of them is a variable, the result is in the heap. The principle of variable splicing is StringBuilder;
  • If the result of splicing calls the intern() method, the string object not yet in the constant pool is actively put into the pool and the object address is returned;

Let's take an example of string splicing:

public class StringTest03 {
    private static void test1() {
        String s1 = "a" + "b" + "c";  // Get constant pool of abc
        String s2 = "abc"; // abc is stored in the constant pool and returns the address of the constant pool directly
        /**
         * Finally, java is compiled into Class, and then execute class
         */
        System.out.println(s1 == s2);       // true, because it is stored in the string constant pool
        System.out.println(s1.equals(s2));  // true
    }

    private static void test2() {
        String s1 = "javaEE";
        String s2 = "hadoop";
        String s3 = "javaEEhadoop";
        String s4 = "javaEE" + "hadoop";
        String s5 = s1 + "hadoop";
        String s6 = "javaEE" + s2;
        String s7 = s1 + s2;

        System.out.println(s3 == s4); // true
        System.out.println(s3 == s5); // false
        System.out.println(s3 == s6); // false
        System.out.println(s3 == s7); // false
        System.out.println(s5 == s6); // false
        System.out.println(s5 == s7); // false
        System.out.println(s6 == s7); // false

        String s8 = s6.intern();
        System.out.println(s3 == s8); // true because the Java eehadoop spliced by s6 is already in the constant pool, intern will directly return the address of Java eehadoop to s8
    }

    public static void main(String[] args) {
        test1();
        test2();
    }
}

From the above results, we can know:

If there are variables before and after the splicing symbol, it is equivalent to new String() in the heap space. The specific content is the splicing result; When calling the intern method, it will judge whether there is a Java eehadoop value in the string constant pool. If so, it will return the value in the constant pool. Otherwise, it will be created in the constant pool.

(2) I. underlying principle

The underlying layer of splicing actually uses StringBuilder:

Let's take a look at the bytecode file compiled by the above program:

The execution details of s1 + s2 in the above procedure are as follows:

  • StringBuilder s = new StringBuilder();
  • s.append(s1);
  • s.append(s2);
  • s.toString(); - > similar to new String("ab");

After JDK5, StringBuilder is used, and before JDK5, StringBuffer is used.

String

StringBuffer

StringBuilder

The value of String is immutable, which leads to the generation of new String objects every time you operate on String, which is not only inefficient, but also wastes a lot of priority memory space

StringBuffer is a variable class and thread safe string operation class. Any operation on the string it points to will not produce a new object. Each StringBuffer object has a certain buffer capacity. When the string size does not exceed the capacity, no new capacity will be allocated. When the string size exceeds the capacity, the capacity will be automatically increased

Variable classes, faster

Immutable

variable

variable

Thread safety

Thread unsafe

Multithreaded operation string

Single threaded operation string

Note that if we use variables on the left and right sides, we need new StringBuilder to splice them. However, if we use final decoration, we get them from the constant pool. Therefore, if the left and right sides of the splice symbol are string constants or constant references, compiler optimization is still used. In other words, variables modified by final will become constants, and classes and methods will not be inherited.

  • When final can be used in development, it is recommended to use it.
public class StringTest04 {
    public static void test4() {
        final String s1 = "a";
        final String s2 = "b";
        String s3 = "ab";
        String s4 = s1 + s2;
        System.out.println(s3 == s4);  //true
    }

    public static void main(String[] args) {
        test4();
    }
}

Operation results:

true

Why true? Let's look at its bytecode:

You can see that if it is a reference to a string constant, it has been spliced at the time of compilation. It is still optimized by the compiler, so it returns true.

(3) , splicing and append performance comparison

public class StringTest05 {

    public static void method1(int highLevel) {
        long startTime = System.currentTimeMillis();
        String src = "";
        for (int i = 0; i < highLevel; i++) {
            src += "a"; // Each loop creates a StringBuilder object
        }
        System.out.println("Splicing operation takes time:" + (System.currentTimeMillis() - startTime));
    }

    public static void method2(int highLevel) {
        long startTime = System.currentTimeMillis();
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < highLevel; i++) {
            sb.append("a");
        }
        System.out.println("append Splicing operation takes time:" + (System.currentTimeMillis() - startTime));
    }

    public static void main(String[] args) {
        method1(100000);
        System.out.println("============");
        method2(100000);
    }
}

Operation results:

Splicing operation time: 7656
============
append Splicing operation time: 8

It can be seen that the time consumed by method 1 is 7656ms, and the time consumed by method 2 is 8ms.

Conclusion:

  • The efficiency of adding strings through StringBuilder's append() method is much higher than that of String splicing method.

Benefits:

  • StringBuilder's append method creates only one StringBuilder object from beginning to end;
  • For String splicing, you also need to create many StringBuilder objects and String objects created when calling toString;
  • Because more StringBuilder and String objects are created in memory, the memory is too large. If GC is performed, it will take more time;

Room for improvement:

  • We use the null parameter constructor of StringBuilder. The default string capacity is 16, and then copy the original string to the new string. We can also initialize a larger length by default to reduce the number of capacity expansion;
  • Therefore, in the actual development, we can determine that the string to be added before and after is not higher than a certain limit value, so it is recommended to use the constructor to create a threshold length;

5, Use of intern()

intern() is a native method that calls the methods of the underlying C.

When calling the intern method, if the constant pool already contains a string equal to the string object determined by the equals(object) method, the string in the pool is returned. Otherwise, the string object is added to the pool and a reference to the string object is returned.

If it is not a string object declared in double quotation marks, you can use the intern method provided by string: the intern method will query whether the current string exists from the string constant pool. If it does not exist, it will put the current string into the constant pool. For example:

String myInfo = new string("abc").intern();

That is, if you call string. On any string For the intern method, the class instance to which the returned result points must be exactly the same as the string instance directly in the form of a constant. Therefore, the value of the following expression must be true:

("a"+"b"+"c").intern()== "abc" =======> true

Generally speaking, interconnected string is to ensure that there is only one copy of the string in memory, which can save memory space and speed up the execution of string operation tasks. Note that this value will be stored in the string inter pool.

Conclusion: when a large number of existing strings are used in the program, especially when there are many repeated strings, the use of intern() method can save memory space.

6, Common interview questions

(1) , new String("ab") will create several objects

public class StringTest06 {
    public static void main(String[] args) {
        String str = new String("ab");
    }
}

Let's look at the bytecode of the above program:

0 new #2 < Java / Lang / String > / / the first object: the new keyword is created in the heap space;
 3 dup
 4 ldc #3 < AB > / / the second object: the object in the string constant pool;
 6 invokespecial #4 <java/lang/String.<init>>      
 9 astore_1
10 return

You can see that two objects are created according to the bytecode.

  • An object is: the new keyword is created in the heap space;
  • Another object: an object in the string constant pool;

(2) . new String("a") + new String("b") will create several objects

public class StringTest07 {
    public static void main(String[] args) {
        String str = new String("a") + new String("b");
    }
}

Let's look at the bytecode of the above program:

0 new #2 < Java / Lang / StringBuilder > / / first object: new StringBuilder()
 3 dup
 4 invokespecial #3 <java/lang/StringBuilder.<init>>
 7 new #4 < Java / Lang / String > / / second object: new String("a")
10 dup
11 ldc #5 < a > / / third object: "a" of constant pool
13 invokespecial #6 <java/lang/String.<init>>
16 invokevirtual #7 <java/lang/StringBuilder.append>
19 new #4 < Java / Lang / String > / / the fourth object: new String("b")
22 dup
23 ldc #8 < b > / / fifth object: "B" of constant pool
25 invokespecial #6 <java/lang/String.<init>>
28 invokevirtual #7 <java/lang/StringBuilder.append>
31 invokevirtual #9 <java/lang/StringBuilder.toString>
34 astore_1
35 return

According to the bytecode, a total of 5 objects are created:

  • Object 1: new StringBuilder()
  • Object 2: new String("a")
  • Object 3: constant pool "a"
  • Object 4: new String("b")
  • Object 5: constant pool 'b'

But to be more precise, it should be six objects because StringBuilder Inside the tostring() method, an object is also created through new String().

//java.lang.StringBuilder#toString
public String toString() {
    return new String(value, 0, count);
}

A total of 6 objects were created:

  • Object 1: new StringBuilder()
  • Object 2: new String("a")
  • a 3: object of constant pool
  • Object 4: new String("b")
  • b 5: object of constant pool
  • Object 6: toString will create a new String("ab")
    • Calling the toString method will not generate the "ab" constant in the constant pool.

(3) I. expansion of interview questions

Let's take a look at the following program. The running results of the following program are a little different in different JDK versions.

First, let's look at JDK6:

String s = new String("1");  // Already in the constant pool
s.intern(); // Put the object into the constant pool. But calling this method doesn't make much difference because 1 already exists
String s2 = "1";
System.out.println(s == s2); //jdk6: false

String s3 = new String("1") + new String("1");
s3.intern();
String s4 = "11";
System.out.println(s3 == s4); //jdk6: false

Explanation:

a. The address of s points to the address of the new String("1") object in the heap space, while the address of s2 points to the address of "1" in the constant pool. Obviously, the two addresses are different, so s == s2 returns false.

b. The address of S3 refers to the address of the heap space new String("11") object. After [new String("1") + new String("1")] is executed, the string constant pool will not have "11" string. Then call s3.. intern(); At this time, the "11" constant will be saved in the constant pool. The address pointed to by s4 is the address of "11" put in the constant pool by the previous intern method. Obviously, the address of S3 is different from that of s4, so s3 == s4 returns false.

Then let's look at the running results in JDK7/8:

String s = new String("1");  // Already in the constant pool
s.intern(); // Put the object into the constant pool. But calling this method doesn't make much difference because 1 already exists
String s2 = "1";
System.out.println(s == s2); //jdk6: false     jdk7/8: false

String s3 = new String("1") + new String("1");
//In jdk6, the intern method creates a new "11" constant with a new address;
//In jdk7/8, the intern method will first check whether the "11" string has been created in the heap space, which is just created in the previous line of code
//"11" string, so the address of the object in the constant pool actually records the address of "11" created in the heap space.
s3.intern();  
String s4 = "11";
System.out.println(s3 == s4); //jdk6: false     jdk7/8: true

Explanation:

a. As jdk6, the address of s points to the address of the new String("1") object in the heap space, while the address of s2 points to the address of "1" in the constant pool. Obviously, the two addresses are different, so s == s2 returns false.

b. The address of S3 refers to the address of the heap space new String("11") object. After [new String("1") + new String("1")] is executed, the string constant pool will not have "11" string. Then call s3.. intern(); Note that after jdk7/8, the string constant pool is placed in the heap space. The intern method will first check whether the "11" string has been created in the heap space, and the "11" character string is created just in the previous line of code [new String("1") + new String("1")]. Therefore, the address of the object in the constant pool actually records the address of the "11" created in the heap space, The address pointed to by s4 is the address of "11" in the constant pool. Obviously, the address of S3 is the same as that of s4, so s3 == s4 returns true.

A simple diagram is drawn to help understand:

After understanding the above example, let's take a look at the following example:

public class StringTest09 {
    public static void main(String[] args) {
        String s3 = new String("1") + new String("1");
        String s4 = "11";  // String generated in constant pool
        s3.intern();  // Then s3 will look for it from the constant pool and do nothing if it is found
        System.out.println(s3 == s4);  //false
    }
}

Explanation:

a,String s3 = new String("1") + new String("1"); After execution, the "11" string constant will not exist in the constant pool, and then execute String s4 = "11". At this time, the "11" string will be generated in the constant pool, and then execute S3 Intern(), and S3 will find it from the constant pool. If it is found, it will do nothing. That is, S3 executes the address of the new object in the heap space, and s4 points to the address of the constant pool "11". The two addresses are different, so s3 == s4 returns false.

To make a brief summary: the use of String intern():

JDK1. In 6, try to put this string object into the constant pool:

  • If there is in the constant pool, it will not be put in. Returns the address of the object in the existing constant pool;
  • If not, it will copy this object, put it into the constant pool, and return the object address in the constant pool;

JDK1. Starting from 7, try to put this string object into the constant pool:

  • If there is in the constant pool, it will not be put in. Returns the address of the object in the existing constant pool;
  • If not, a copy of the reference address of the object will be copied into the constant pool, and the reference address in the constant pool will be returned;

Added by billcoker on Fri, 14 Jan 2022 23:41:34 +0200