Eh, Java is so particular about splitting a string

When it comes to Java splitting strings, I guess nine times out of ten you will make a cruel remark, "what's the difficulty? Just go to the split() method of String class!" If you really think so, you should pay attention. Things are far from so simple.

Come on, move a small bench and sit down.

If there is such a string of characters as "rain today, snow today", it needs to be split according to the Chinese comma "," which means that the first string of characters is "rain today" in front of the comma, The second string of characters is "snow today" after the comma (this is not nonsense). In addition, before splitting, check to see if the string of characters contains a comma, otherwise an exception should be thrown.

public class Test {
    public static void main(String[] args) {
        String cmower = "It rained and snowed today";

        if (cmower.contains(",")) {

            String [] parts = cmower.split(",");

            System.out.println("Part I:" + parts[0] +" Part II:" + parts[1]);

        } else {

            throw new IllegalArgumentException("The current string does not contain a comma");

        }

    }

}

This code looks rigorous, doesn't it? The results of the program output fully meet the expectations:

Part 1: it rained today. Part 2: it snowed today

This is based on the fact that the string is determined, and the most important thing is that the delimiter is determined. Otherwise, trouble will come.

There are about 12 kinds of English special symbols. If you directly replace the separator (Chinese comma) in the above code with these special symbols, the following errors will occur when this program runs.

Backslash \ (ArrayIndexOutOfBoundsException)

Caret ^ (ditto)

Dollar sign $(ditto)

Funny (ibid.)

Vertical line | (normal, no error)

question mark? (PatternSyntaxException)

Asterisk * (ibid.)

Plus sign + (ibid.)

Left or right parenthesis () (ibid.)

Left or right square brackets [] (ibid.)

Left brace or right brace {} (ditto)

When you see this, a little partner may say, "it's not a sharp point". No, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no, no.

What should I do when I encounter special symbols? Regular expressions.

Regular expression is a special set of text composed of letters and symbols. It can be used to find sentences that meet the format you want from the text.

Then a friend may say, "I can't remember so many regular expressions!" Don't worry, I've worked out a plan for you.

The following link is an online document for learning regular expressions on GitHub, which is very detailed. When you encounter regular expressions, take out this manual and you're done. It doesn't matter if you can't remember so many regular expressions. Learn and use them flexibly.

https://github.com/cdoco/learn-regax-zh

In addition to this document, there is also one:

https://github.com/cdoco/common-regax

The author has collected some regular expressions often used in project development, which can be used directly. Wonderful.

After solving the heart disease, let's use English comma "." To replace the separator:

String cmower = "It's raining today.it's snowing today";

if (cmower.contains(".")) {

    String [] parts = cmower.split("\\.");

    System.out.println("Part I:" + parts[0] +" Part II:" + parts[1]);

}

When using the split() method, you need to use the regular expression \ \ To replace the special character English comma "." Yes. Why use two backslashes? Because it is a special character, it needs to be escaped first.

You can also use the character class [] to contain the English comma ".", It is also a regular expression that matches any character contained in parentheses.

cmower.split("[.]");

In addition, you can use the quote() method of the Pattern class to wrap the English comma ".", This method returns a string wrapped with \ Q\E.

 

At this point, string An example of using the split () method is as follows:

String [] parts = cmower.split(Pattern.quote("."));

When entering string. Through debugging mode If you use the source code of the split () method, you will find the following details:

return Pattern.compile(regex).split(this, limit);

The split() method of String class calls the split() method of Pattern class. This means that we have a new choice to split the String, and we can not use the split() method of the String class.

public class TestPatternSplit {

    /**
     * Use precompiling function to improve efficiency
     */
    private static Pattern twopart = Pattern.compile("\\.");

    public static void main(String[] args) {

        String [] parts = twopart.split("It's raining today.it's snowing today");

        System.out.println("Part I:" + parts[0] +" Part II:" + parts[1]);

    }

}

In addition, you can also use Pattern and Matcher class to split strings. The advantage of this is that you can impose some strict restrictions on the strings to be split. Take a look at an example code:

public class TestPatternMatch {
    /**
     * Use precompiling function to improve efficiency
     */

    private static Pattern twopart = Pattern.compile("(.+)\\.(.+)");

 
    public static void main(String[] args) {

        checkString("It's raining today.it's snowing today");

        checkString("It's raining today.");

        checkString(".it's snowing today");

    }

    private static void checkString(String str) {

        Matcher m = twopart.matcher(str);

        if (m.matches()) {

            System.out.println("Part I:" + m.group(1) + " Part II:" + m.group(2));

        } else {

            System.out.println("Mismatch");

        }

    }

}

In this case, the regular expression is (. +) \ \ (. +), which means that the string can be divided into a character group according to English commas. This is the role of English parentheses () (see the regular expression manual I provided earlier).

Because the Pattern is determined, you can put the Pattern expression outside the main() method to improve the efficiency of the program through the precompiling function of static.

Let's take a look at the output of the program:

Part one: it's raining today  Part two: it snowed today

Mismatch

Mismatch

However, using Matcher to match some simple strings is relatively heavy. Using split() of String class is still the first choice, because this method also has some other awesome functions.

For example, if you want to wrap the separator in the first part of the split string, you can do this:

String cmower = "It rained and snowed today";

if (cmower.contains(",")) {

    String [] parts = cmower.split("(?<=,)");

    System.out.println("Part I:" + parts[0] +" Part II:" + parts[1]);

}

The results of the program output are as follows:

Part 1: it rained today part 2: it snowed today

You can see that the separator "," is wrapped in the first part. If you want to wrap in the second part, you can do this:

String [] parts = cmower.split("(?=,)");

 

Warm reminder: if you are unfamiliar with assertion patterns, you can check the regular expression manual I provided earlier.

In addition, if the string contains multiple separators and we only need two, we can also do this:

String cmower = "It rained today, it snowed today, and it was sunny today";

if (cmower.contains(",")) {

    String [] parts = cmower.split(",", 2);

    System.out.println("Part I:" + parts[0] +" Part II:" + parts[1]);

}

The split() method can pass two parameters. The first is the separator and the second is the number of split strings. If you view the source code of this method, you can see the following:

 

Part 1: it rained today part 2: it snowed today and it was sunny today

Well, my dear readers, that's all the content of this article. Do you suddenly feel that splitting a string is really exquisite?

  https://shimo.im/docs/QR8qXxKHQPcHGCTw/ Never create a collection like this! It's very easy to leak memory

Keywords: Java Programming network Network Protocol p2p

Added by Goofan on Thu, 30 Dec 2021 05:15:04 +0200