How to trim the space before and after?

I had some trouble with leading and trailing whitespace in data. Frame. For example, I want to view specific row s in data.frame according to specific conditions:

> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] 

[1] codeHelper     country        dummyLI    dummyLMI       dummyUMI       
[6] dummyHInonOECD dummyHIOECD    dummyOECD      
<0 rows> (or 0-length row.names)

I want to know why I didn't get the expected output, because the data.frame obviously exists in Austria. After looking at my code history and trying to figure out what went wrong, I tried:

> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)]
   codeHelper  country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD
18        AUT Austria        0        0        0              0           1
   dummyOECD
18         1

The changes I made in the order only added a blank after Austria.

Obviously there will be more annoying problems. For example, when I want to merge two frameworks based on the country column. One data.frame uses "Austria" and the other uses "Austria". Invalid match.

  1. Is there a good way to "show" the blank space on the screen so that I can realize the problem?
  2. Can I remove leading and trailing spaces from R?

So far, I've written a simple Perl script that removes spaces, but if I can do it in R in some way, that's great.

#1 building

To manipulate spaces, use str_trim() in the stringr package. The package's manual, dated February 15, 2013, is located in CRAN. This function can also handle string vectors.

install.packages("stringr", dependencies=TRUE)
require(stringr)
example(str_trim)
d4$clean2<-str_trim(d4$V2)

(Credit: R. Cotton)

#2 building

A simple function to remove leading and trailing spaces:

trim <- function( x ) {
  gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
}

Usage:

> text = "   foo bar  baz 3 "
> trim(text)
[1] "foo bar  baz 3"

#3 building

The best way to do this might be to handle trailing spaces when reading a data file. If you use read.csv or read.table, you can set the parameter strip.white=TRUE.

If you want to clean up strings later, you can use one of the following features:

# returns string w/o leading whitespace
trim.leading <- function (x)  sub("^\\s+", "", x)

# returns string w/o trailing whitespace
trim.trailing <- function (x) sub("\\s+$", "", x)

# returns string w/o leading or trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)

To use the following function myDummy$country on myDummy$country:

 myDummy$country <- trim(myDummy$country)

To display blank, you can use:

 paste(myDummy$country)

It displays a string in quotation marks ("), making spaces easier to find.

#4 building

Use grep or grepl to find observations with spaces, and sub to get rid of them.

names<-c("Ganga Din\t","Shyam Lal","Bulbul ")
grep("[[:space:]]+$",names)
[1] 1 3
grepl("[[:space:]]+$",names)
[1]  TRUE FALSE  TRUE
sub("[[:space:]]+$","",names)
[1] "Ganga Din" "Shyam Lal" "Bulbul"  

#5 building

ad1) to view the spaces, you can call print.data.frame directly with the modified parameters:

print(head(iris), quote=TRUE)
#   Sepal.Length Sepal.Width Petal.Length Petal.Width  Species
# 1        "5.1"       "3.5"        "1.4"       "0.2" "setosa"
# 2        "4.9"       "3.0"        "1.4"       "0.2" "setosa"
# 3        "4.7"       "3.2"        "1.3"       "0.2" "setosa"
# 4        "4.6"       "3.1"        "1.5"       "0.2" "setosa"
# 5        "5.0"       "3.6"        "1.4"       "0.2" "setosa"
# 6        "5.4"       "3.9"        "1.7"       "0.4" "setosa"

See also? print.data.frame.

Added by PrivatePile on Mon, 02 Mar 2020 07:31:37 +0200