Split string
In Python, strings are represented as str objects, which are immutable: this means that objects represented in memory cannot be changed directly. These two facts can help you learn (and then remember) how to use split().
Have you guessed how these two characteristics of strings relate to the splitting function in Python? If you guess this split() is an instance method, because the string is a special type, then you are right! In other languages, such as Perl, the original string is used as a stand - alone string The input of the split() function, not the method called on the string itself.
Note: the method that calls the string method
A string method like this split() is mainly shown here as an instance method called on a string. They can also be called static methods, but this is not ideal because it is more "verbose". For completeness, here is an example:
# Avoid this: str.split('a,b,c', ',')
This is cumbersome and clumsy when you compare it with your preferred usage:
# Do this instead: 'a,b,c'.split(',')
For more information about instances, classes, and static methods in Python, see our in-depth tutorial.
How about string invariance? This should remind you that string methods are not in place, but they return a new object in memory.
Note: local operation
In place operations are operations that directly change the objects that call them. A common example is used on a list Append () method: when you call a list, change the list directly by adding input to the same list append().append()
No parameter splitting
Before going deeper, let's look at a simple example:
>>> 'this is my string'.split() ['this', 'is', 'my', 'string']
This is actually A special case of the split() call, which I chose for its simplicity. No delimiter specified split() counts any spaces as separators.
Another feature of bare calls Split () is that it automatically removes leading and trailing spaces, as well as consecutive spaces. Compare split() calls the following string without a delimiter parameter and calls with '' as a delimiter parameter:
>>> s = ' this is my string ' >>> s.split() ['this', 'is', 'my', 'string'] >>> s.split(' ') ['', 'this', '', '', 'is', '', 'my', 'string', '']
The first thing to note is that this shows the invariance of strings in Python: subsequent calls Split () handles the original string instead of the first call split().
The second and main thing you should see is, bare Split () call extracts the words in the sentence and discards any spaces.
Specify separator
. split(''), on the other hand, is more literal. When there is a leading or trailing delimiter, you get an empty string that you can see in the first and last elements of the result list.
If there are multiple consecutive separators (for example, between "this" and "is" and between "is" and "my"), the first will be used as the separator, and the subsequent separator will enter your result list as an empty string.
Note: the delimiter in the call split()
Although the above example uses a single space character as the delimiter input split(), but the character type or string length used as a delimiter is not limited. The only requirement is that your delimiter is a string. You can use from "..." Anything to even "separator".
Use Maxsplit to restrict splitting
. split() has another optional parameter called maxplit By default split() will do all possible splits when called. Maxplit, however, when you assign a value to, only a given number of splits are made. Using our previous example string, we can see maxplit:
>>> s = "this is my string" >>> s.split(maxsplit=1) ['this', 'is my string']
As shown above, if maxplit is set to 1, the first blank area will be used as a separator and the rest will be ignored. Let's do some exercises to test everything we have learned so far.
Exercise: "try it yourself: maxsplit" show hide what happens when you give a negative number as the maxplit parameter?
Solution: try it yourself: maxsplit is displayed or hidden split() will split your string on all available separators, which is also the default behavior when maxplit is not set.
Exercise: partial understanding check display and hide
You recently received a badly formatted comma separated value (CSV) file. Your job is to extract each row into a list, and each element of the list represents the column of the file. What makes it malformed? The address field contains multiple commas, but needs to be represented as a single element in the list!
Suppose your file has been loaded into memory as the following multiline string:
Name,Phone,Address Mike Smith,15554218841,123 Nice St, Roy, NM, USA Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA Guido van Rossum,315558730,Science Park 123, 1098 XG Amsterdam, NL
Your output should be a list:
[ ['Mike Smith', '15554218841', '123 Nice St, Roy, NM, USA'], ['Anita Hernandez', '15557789941', '425 Sunny St, New York, NY, USA'], ['Guido van Rossum', '315558730', 'Science Park 123, 1098 XG Amsterdam, NL'] ]
Each internal list represents the CSV rows we are interested in, while the external list saves them together.
Solution: partial understanding check is displayed or hidden
This is my solution. There are several ways to attack it. It is important that you use split() takes all its optional parameters and gets the expected output:
input_string = """Name,Phone,Address Mike Smith,15554218841,123 Nice St, Roy, NM, USA Anita Hernandez,15557789941,425 Sunny St, New York, NY, USA Guido van Rossum,315558730,Science Park 123, 1098 XG Amsterdam, NL""" def string_split_ex(unsplit): results = [] # Bonus points for using splitlines() here instead, # which will be more readable for line in unsplit.split('\n')[1:]: results.append(line.split(',', maxsplit=2)) return results print(string_split_ex(input_string))
We Split () called here twice. The first use may look scary, but don't worry! We'll work through it step by step, and you'll be satisfied with these expressions. Let's take another look at the first one split() call: unsplit split('\n')[1:].
The first element is unsplit, which simply points to the variable of the input string. Then we have ours split() Tel: split('\n'). Here, we are splitting a special character called a newline character.
What does it do \ n? As its name implies, it tells anyone reading a string that each character after it should appear on the next line. In a multi line string like ours, input is at the end of each line_ String has a hidden \ n.
The last part may be new: [1:] The statement so far gives us a new list in memory, [1:] looks like a list index symbol, which is -- a little! This extended index symbol gives us a list slice. In this case, we take element 1 at index and all subsequent elements, and discard element 0 at index.
In summary, we iterate through a list of strings, where each element represents each line in the multi line input string except the first line.
In each string, we split() calls using again as the split character, but this time we only use the first two commas of maxplit to split, and the address remains unchanged. We then append the result of the call to the properly named results array and return it to the caller.
Connection and connection string
Another basic string operation is the opposite of splitting strings: string concatenation. If you haven't seen the word, don't worry. It's just a strange way to say "bonded together".
Connect with + operator
There are several ways to do this, depending on what you want to achieve. The simplest and most common method is to add multiple strings using the plus sign (+). Just put a + between any number of strings you want to connect together:
>>> 'a' + 'b' + 'c' 'abc'
To be consistent with the math topic, you can also multiply the string and repeat it:
>>> 'do' * 2 'dodo'
Remember, strings are immutable! If you concatenate or repeat a string stored in a variable, you must assign the new string to another variable to preserve it.
>>> orig_string = 'Hello' >>> orig_string + ', world' 'Hello, world' >>> orig_string 'Hello' >>> full_sentence = orig_string + ', world' >>> full_sentence 'Hello, world'
If we don't have an immutable string, full_sentence will output 'Hello, world, world'
Another note is that Python does not perform implicit string conversion. If you try to concatenate a string with a non string type, python will throw a TypeError:
>>> 'Hello' + 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: must be str, not int
This is because you can only connect strings with other strings, which may be a new behavior for you if you come from a language that attempts implicit type conversion, such as JavaScript.
From list to string in Python join()
There is another more powerful way to concatenate strings. You can use the join() method to convert a list from Python to a string.
A common use case here is when you have an iteratable object consisting of strings (such as a list) and you want to combine these strings into a string. Like split(),. Join () is a string instance method. If all your strings were in an iteratable object, which one would you call join()?
This is a bit of a thorny problem. Remember, when you use split(), which you will call on the string or character you want to split. The opposite operation is join(), so you can call it on a string or character that you want to use to join iteratable strings together:
>>> strings = ['do', 're', 'mi'] >>> ','.join(strings) 'do,re,mi'
Here, we connect each element of the list with a comma (,) and call join() instead of the strings list.
Exercise: improving readability by adding show hide
How to make the output text more readable?
Solution: improve readability by adding display and hide
One thing you can do is add spacing:
>>> strings = ['do', 're', 'mi'] >>> ', '.join(strings) 'do, re, mi
By adding a space to our connection string, we greatly improve the readability of the output. You should always keep this in mind when adding strings to improve readability.
. join() is smart because it inserts your "joiner" between the iteratable strings you want to add, rather than just adding your joiner at the end of each iteratable string. This means that if you pass iteration 1 of size, you will not see your participants:
>>> 'b'.join(['a']) 'a'
Exercise: partial understanding check display and hide
Using our web crawler tutorial, you have built a great weather crawler. However, it loads string information in lists, each containing a unique line of information to write out to the CSV file:
[ ['Boston', 'MA', '76F', '65% Precip', '0.15 in'], ['San Francisco', 'CA', '62F', '20% Precip', '0.00 in'], ['Washington', 'DC', '82F', '80% Precip', '0.19 in'], ['Miami', 'FL', '79F', '50% Precip', '0.70 in'] ]
Your output should be a single string as follows:
""" Boston,MA,76F,65% Precip,0.15in San Francisco,CA,62F,20% Precip,0.00 in Washington,DC,82F,80% Precip,0.19 in Miami,FL,79F,50% Precip,0.70 in """
Solution: partial understanding check is displayed or hidden
For this solution, I used list derivation, a powerful feature of python that allows you to build lists quickly. If you want to learn more about them, check out this wonderful article that covers all the available derivations in Python.
Here is my solution, starting with a list and ending with a single string:
input_list = [ ['Boston', 'MA', '76F', '65% Precip', '0.15 in'], ['San Francisco', 'CA', '62F', '20% Precip', '0.00 in'], ['Washington', 'DC', '82F', '80% Precip', '0.19 in'], ['Miami', 'FL', '79F', '50% Precip', '0.70 in'] ] # We start with joining each inner list into a single string joined = [','.join(row) for row in input_list] # Now we transform the list of strings into a single string output = '\n'.join(joined) print(output)
Here we are join() is not used once, but twice. First, we use it in list derivation, which combines all strings in each internal list into one string. Next, we concatenate each string with \ nthe newline character we saw earlier. Finally, we simply print the result so that we can verify that it meets our expectations.
Tie it all together
Although this concludes the overview of the most basic string operations (split, join, and join) in Python, there are still a number of string methods that make it easier for you to manipulate strings.
Once you have mastered these basic string operations, you may want to know more.