I had some trouble with leading and trailing whitespace in data. Frame. For example, I want to view specific row s in data.frame according to specific conditions:
> myDummy[myDummy$country == c("Austria"),c(1,2,3:7,19)] [1] codeHelper country dummyLI dummyLMI dummyUMI [6] dummyHInonOECD dummyHIOECD dummyOECD <0 rows> (or 0-length row.names)
I want to know why I didn't get the expected output, because the data.frame obviously exists in Austria. After looking at my code history and trying to figure out what went wrong, I tried:
> myDummy[myDummy$country == c("Austria "),c(1,2,3:7,19)] codeHelper country dummyLI dummyLMI dummyUMI dummyHInonOECD dummyHIOECD 18 AUT Austria 0 0 0 0 1 dummyOECD 18 1
The changes I made in the order only added a blank after Austria.
Obviously there will be more annoying problems. For example, when I want to merge two frameworks based on the country column. One data.frame uses "Austria" and the other uses "Austria". Invalid match.
- Is there a good way to "show" the blank space on the screen so that I can realize the problem?
- Can I remove leading and trailing spaces from R?
So far, I've written a simple Perl script that removes spaces, but if I can do it in R in some way, that's great.
#1 building
To manipulate spaces, use str_trim() in the stringr package. The package's manual, dated February 15, 2013, is located in CRAN. This function can also handle string vectors.
install.packages("stringr", dependencies=TRUE) require(stringr) example(str_trim) d4$clean2<-str_trim(d4$V2)
(Credit: R. Cotton)
#2 building
A simple function to remove leading and trailing spaces:
trim <- function( x ) { gsub("(^[[:space:]]+|[[:space:]]+$)", "", x) }
Usage:
> text = " foo bar baz 3 " > trim(text) [1] "foo bar baz 3"
#3 building
The best way to do this might be to handle trailing spaces when reading a data file. If you use read.csv or read.table, you can set the parameter strip.white=TRUE.
If you want to clean up strings later, you can use one of the following features:
# returns string w/o leading whitespace trim.leading <- function (x) sub("^\\s+", "", x) # returns string w/o trailing whitespace trim.trailing <- function (x) sub("\\s+$", "", x) # returns string w/o leading or trailing whitespace trim <- function (x) gsub("^\\s+|\\s+$", "", x)
To use the following function myDummy$country on myDummy$country:
myDummy$country <- trim(myDummy$country)
To display blank, you can use:
paste(myDummy$country)
It displays a string in quotation marks ("), making spaces easier to find.
#4 building
Use grep or grepl to find observations with spaces, and sub to get rid of them.
names<-c("Ganga Din\t","Shyam Lal","Bulbul ") grep("[[:space:]]+$",names) [1] 1 3 grepl("[[:space:]]+$",names) [1] TRUE FALSE TRUE sub("[[:space:]]+$","",names) [1] "Ganga Din" "Shyam Lal" "Bulbul"
#5 building
ad1) to view the spaces, you can call print.data.frame directly with the modified parameters:
print(head(iris), quote=TRUE) # Sepal.Length Sepal.Width Petal.Length Petal.Width Species # 1 "5.1" "3.5" "1.4" "0.2" "setosa" # 2 "4.9" "3.0" "1.4" "0.2" "setosa" # 3 "4.7" "3.2" "1.3" "0.2" "setosa" # 4 "4.6" "3.1" "1.5" "0.2" "setosa" # 5 "5.0" "3.6" "1.4" "0.2" "setosa" # 6 "5.4" "3.9" "1.7" "0.4" "setosa"
See also? print.data.frame.