Go1.18 new feature: deprecate strings Title Method, change a new pit!

Hello, I'm fried fish.

I've been watching go1 18. When releasing notes, I found that the Title Method of strings and bytes standard library was Deprecated. Why?

Today's article will be read by fried fish and everyone.

introduce

Here, take the strings standard library as an example, strings The title method maps Unicode letters at the beginning of all words to their Unicode Title case.

Examples are as follows:

import (
    "fmt"
    "strings"
)

func main() {
    fmt.Println(strings.Title("her royal highness"))
    fmt.Println(strings.Title("eddy cjy"))
    fmt.Println(strings.Title("хлеб"))
}

Output result:

Her Royal Highness
Eddy Cjy
Хлеб

These words are converted to their uppercase.

problem

It seems that everything is beautiful, but in fact, he has two obvious defects at this stage.

namely:

  • Unicode punctuation cannot be handled correctly.
  • The capitalization rules of specific human languages are not considered.

Next, let's talk about it in detail.

Unicode punctuation

The first question is as follows:

import (
    "fmt"
    "strings"
)

func main() {
    a := strings.Title("go.go\u2024go")
    b := "Go.Go\u2024Go"
    if a != b {
        fmt.Printf("%s != %s\n", a, b)
    }
}

Output result:

Go.Go․go != Go.Go․Go

The result of variable a conversion is "Go.Go ․ go", but according to the actual demand, it should be "Go.Go ․ go".

Language specific rules

The code of the second question is as follows:

func main() {
    fmt.Println(strings.Title("ijsland"))
}

Output result:

Ijsland

In Dutch words, "IJsland" should be capitalized as "IJsland", but the result is converted to "IJsland".

Solution

This problem was found in 2013, which comes from< strings: Title function incorrectly handles word breaks >, it was identified as an unplanned problem by Rob Pike, the father of Go language.

As shown below:

Due to the Treaty of Go1 compatibility guarantee, it is "impossible" to repair. Once repaired, it will affect the output result of the function, which is a destructive change.

However, other methods can also be adopted, that is, the "abandonment" mentioned in this article. The following identification:

// Title returns a copy of the string s with all Unicode letters that begin words
// mapped to their Unicode title case.
//
// BUG(rsc): The rule Title uses for word boundaries does not handle Unicode punctuation properly.
//
// Deprecated: Use golang.org/x/text/cases instead.
func Title(s string) string {

Identify "Deprecated" on the function:

The corresponding Go document will be folded and clearly displayed for abandonment. It is recommended to directly use golang Org / X / text / cases library.

The new x/text/cases case is as follows:

import (
    "fmt"

    "golang.org/x/text/cases"
    "golang.org/x/text/language"
)

func main() {
    src := []string{
        "hello world!",
        "i with dot",
        "'n ijsberg",
        "here comes O'Brian",
    }
    for _, c := range []cases.Caser{
        cases.Lower(language.Und),
        cases.Upper(language.Turkish),
        cases.Title(language.Dutch),
        cases.Title(language.Und, cases.NoLower),
    } {
        fmt.Println()
        for _, s := range src {
            fmt.Println(c.String(s))
        }
    }
}

Output result:

hello world!
i with dot
'n ijsberg
here comes o'brian

HELLO WORLD!
İ WİTH DOT
'N İJSBERG
HERE COMES O'BRİAN

Hello World!
I With Dot
'n IJsberg
Here Comes O'brian

Hello World!
I With Dot
'N Ijsberg
Here Comes O'Brian

Output the conversion of multiple languages, and our core focus is on cases Code related to lower (language. Und), which will call:

  • cases.Title(<language>).Bytes(<bytes>)
  • cases.Title(<language>).String(<string>)

Specify the processing language in programming to solve the demands of symbols in different human languages, different languages and capitalized words, so as to avoid one size fits all.

summary

There is only one small problem in the extension of the function. In essence, there are cognitive limitations in design.

In addition, strings Title and bytes In practice, the title function is often misunderstood as the method of converting the initial capital, which is contrary to the design meaning.

Although in the end, such misunderstanding has brought better results than defects, there are still great problems for some special scenarios and language support.

It's a blessing in disguise.

If you have any questions, you are welcome to feedback and exchange in the comment area. The best relationship is mutual achievement. Your praise is Fried fish The biggest driving force of creation, thank you for your support.

The article is continuously updated. You can search [fried fish in your brain] on wechat. This article GitHub github.com/eddycjy/blog It has been included. You can watch it when learning Go language Go learning maps and routes , welcome to Star reminder.

Keywords: PHP Python Java Go Back-end

Added by Warmach on Wed, 16 Feb 2022 06:42:58 +0200