[R language string processing] stringr package

Main content of stringr package: insert code piece here

1. String splitting
2. STR replace all
3. String extraction tool
4. String interceptor - str_sub
The four most common methods in string processing are "disassemble, replace, extract and fetch". The stringr package is highly recommended. I think it is much easier to use than the grep, regexp, strsplit, sub and other functions of R.

Sharp tool 1: disassemble: str_split

str_split(string, pattern, n = Inf, simplify = FALSE)
string: Specifies the string vector to process
pattern: Separators, which can be complex regular expressions
n: Specifies the number of copies to be cut. By default, all strings that meet the conditions will be split.
simplify: Whether to return string matrix, which is returned in the form of list by default
> str_split(c('lsxxx2011@163.com','0511-87208801'), '[@-]')
[1] "lsxxx2011" "163.com"  

[1] "0511"     "87208801"
#For example, there is a column of mailbox field in the data table. How to split the address and domain name into two new columns?
email <- c('lsxxx2011@163.com','1029776077@qq.com','qazwsx@gmail.com','abc123edc@126.com')
# Combining the sapply function to get the contents before and after the @ separator
add <- sapply(str_split(email,'@'),'[',1)
doman <- sapply(str_split(email,'@'),'[',2)
df <- data.frame(email, add, doman)
> df
              email        add     doman
1 lsxxx2011@163.com  lsxxx2011   163.com
2 1029776077@qq.com 1029776077    qq.com
3  qazwsx@gmail.com     qazwsx gmail.com
4 abc123edc@126.com  abc123edc   126.com

Sharp tool 2: replace: STR replace and STR replace all

str_replace(string, pattern, replacement)
str_replace_all(string, pattern, replacement)
string: String vector
pattern: The substring to be replaced can be a complex regular expression
replacement: String to replace

The difference between the two functions is that the former function only replaces the substring that meets the condition for the first time, and the latter function can replace all the substrings that meet the condition.

#Convert data containing a thousandth or percentile character to numeric data
commadata <- c('123,456','780,123,433','45,234')
percentdata <- c('23.4%','34.56','44.12%')
commadatanew <- as.numeric(str_replace_all(commadata, ',', ''))
percentdatanew <- as.numeric(str_replace_all(percentdata, '%', ''))/100

Sharp tool 3: extract: STR ﹣ extract and str ﹣ extract ﹣ all and str ﹣ match ﹣ all

str_extract(string, pattern)
str_extract_all(string, pattern, simplify = FALSE)
string: String vector
pattern: Regular expressions are often used to extract substrings that meet the conditions.
simplify: Whether to return string matrix, which is returned in the form of list by default
//The difference between the two functions is that the previous function only extracts the substrings that meet the conditions for the first time, and the latter function can extract all the substrings that meet the conditions. When the previous function does not match the extracted result, theNA,The latter function returns when it does not match the extracted result character(0). 

str_match(string, pattern)
str_match_all(string, pattern)
//The meaning of the function parameter is the same as that of STR < extract.

# Extract the date and flow values in the string
s <- c('date:2017-04-14,pv:223453','date:2017-04-15,pv:228115','date:2017-04-16,pv:201233','date:2017-04-17,pv:324123')

date <- str_extract_all(s, '[0-9]{4}-[0-9]{2}-[0-9]{2}')
pv <- str_extract_all(s, 'pv:([0-9]*)')
#The pv in the result still contains'pv:'String, let's use another extraction function str_match_all. 
pv <- str_match_all(s, 'pv:([0-9]*)')
pv <- sapply(pv,'[',2)

Sharp tool 4: Take: str_sub

str_sub(string, start = 1L, end = -1L)
string: string vector
 start: Specifies the starting position to get the substring
 End: specifies where to get the end of the substring
 Note: if start or end is a negative integer, query forward from the last character of the string

Case study
 #Get the last 4 digits of mobile number (negative integer parameter)
s <- c('13611235678','13912343344','17888886666')
(tail4 <- str_sub(s, -4))

Keywords: Javascript Mobile

Added by ud2008 on Sat, 26 Oct 2019 18:05:13 +0300