Sed Intermediate Tutorial: Manipulating Text Stream in Linux Environment

Provide: ZStack Cloud Computing

Series of Courses

This tutorial is Guide to Sed Use The second of the two articles.

Content introduction

Sed Stream Editor is a powerful editing tool that can perform many operations with very little input. In the last tutorial, we discussed Basic knowledge of sed text editor.

This article will continue to introduce other more advanced topics.

Provide multiple edit sequences

Sometimes, we may need to send multiple commands to sed at the same time. We can achieve this effect in a variety of ways.

If the file you need to edit does not yet exist, here we recreate it to complete the next operation:

cd
cp /usr/share/common-licenses/BSD .
cp /usr/share/common-licenses/GPL-3 .
echo "this is the song that never ends
yes, it goes on and on, my friend
some people started singing it
not knowing what it was
and they'll continue singing it forever
just because..." > annoying.txt

Because sed operates through standard input and output, we need to integrate different calls through a pipeline (remember to comment out &, because this means using a fully matched pattern):

sed 's/and/\&/' annoying.txt | sed 's/people/horses/'

this is the song that never ends
yes, it goes on & on, my friend
some horses started singing it
not knowing what it was
& they'll continue singing it forever
just because...

The effect is OK, but it contains some unnecessary calls, takes up more space and can't play sed's built-in function.

We can use the - e option to merge multiple commands before each command:

sed -e 's/and/\&/' -e 's/people/horses/' annoying.txt

Another method of command integration is to use semicolon characters to separate different commands in the same way as the previous one.

sed 's/and/\&/;s/people/horses/' annoying.txt

Note that when using - e to build commands, we need to use quotation marks to group different commands. However, when quotation marks are used, all commands can be included in a single quotation group.

Although both approaches are feasible, sometimes we still need to use the pipeline pattern mentioned earlier to achieve certain functions.

For example, the "=" operator. This operator inserts a numeric line between the existing lines, and the output is as follows:

sed '=' annoying.txt

1
this is the song that never ends
2
yes, it goes on and on, my friend
3
some people started singing it
4
not knowing what it was
5
and they'll continue singing it forever
6
just because...

But you will find that we can't change the numbering format by modifying the content of the text.

For demonstration purposes, we use the "G" command here, which by default inserts a blank line between each line.

sed 'G' annoying.txt

this is the song that never ends

yes, it goes on and on, my friend

some people started singing it

not knowing what it was

and they'll continue singing it forever

just because...

Combining these two commands, it is reasonable to add a blank line between the content line and the numbered line, but this is not the case in practice:

sed '=;G' annoying.txt

1
this is the song that never ends

2
yes, it goes on and on, my friend

3
some people started singing it

4
not knowing what it was

. . .
. . .

This is because the "=" operator directly modifies the output stream, meaning that we cannot edit the results directly.

We can use two sed calls to solve this problem, that is, to modify the first sed as a simple text stream for the second command to operate on:

sed '=' annoying.txt | sed 'G'

1

this is the song that never ends

2

yes, it goes on and on, my friend

3

some people started singing it
. . .
. . .

The result is in line with expectations. Note that some common operations need to be done accordingly, especially when multiple commands are combined, the results tend to be inconsistent with expectations.

Advanced Addressing

One of the greatest advantages of the Sed addressable command is that regular expressions can be used as selection criteria. This means that we do not have to be limited to the various known behavioral values mentioned earlier:

sed '1,3s/.*/Hello/' annoying.txt

Hello
Hello
Hello
not knowing what it was
and they'll continue singing it forever
just because...

Instead, we can use regular expressions to match rows that contain specific patterns. We can add matching patterns within two slashes (/) in the command:

sed '/singing/s/it/& loudly/' annoying.txt

this is the song that never ends
yes, it goes on and on, my friend
some people started singing it loudly
not knowing what it was
and they'll continue singing it loudly forever
just because...

In this example, we put "loudly" after the first "it" of each line containing a "singing" string. Note that the second and fourth lines do not conform to this pattern, so the content remains unchanged.

We can also further enhance the complexity of addressing expressions to execute commands flexibly.

The example itself is not complicated, but we can use regular expressions to generate addresses for other commands. In the following command, it matches any blank line (the beginning of the line is tightly connected to the end), and then deletes the blank line:

sed '/^$/d' GPL-3

                GNU GENERAL PUBLIC LICENSE
                   Version 3, 29 June 2007
 Copyright (C) 2007 Free Software Foundation, Inc. 

 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.
                        Preamble
  The GNU General Public License is a free, copyleft license for
. . .
. . .

Note that regular expressions can be used at any end of an arbitrary range.

For example, we can delete rows between rows containing "START" and rows containing "END":

sed '/^START$/,/^END$/d' inputfile

It should be emphasized here that this operation will delete all the contents between the first "START" and the first "END", and then execute again after encountering the next "START" tag.

If we need to invert the address (to operate on any line that does not match the pattern), we can add an exclamation mark after the pattern.

For example, we can delete all non-empty lines (useless, just as an example):

sed '/^$/!d' GPL-3

Use "Hold Buffer"

Keeping cache, hold b uffer, can effectively improve sed's multi-line editing ability. Keeping cache is a temporary storage area that can be modified by specific commands.

With this extra buffer, we can store other rows in some rows while operating, and then use these buffers when necessary.

The following are the commands and functions in maintaining the cache:

  • h: Copy the current schema buffer (that is, the row we are currently matching and processing) into the hold buffer (which overrides the original contents of the buffer).
  • H: Attach the existing schema buffer to the end of the current hold buffer, separated by a new line character (\ n).
  • g: Copy the existing retention cache into the current schema cache. This overrides the original schema cache.
  • G: Attach the current holding mode to the end of the current mode cache, separated by a new line character (\n).
  • x: Exchange current mode and keep cache.

Keep the contents of the cache untouched until they are transferred to the schema cache.

Let's look at a more complex example.

The following example shows how to merge adjacent rows (sed provides built-in commands to perform such tasks). Where the N command merges the next line into the current line:

sed -n '1~2h;2~2{H;g;s/\n/ /;p}' annoying.txt

this is the song that never ends yes, it goes on and on, my friend
some people started singing it not knowing what it was
and they'll continue singing it forever just because...

Let's look at the specific content of the command.

The first is the "-n" option, which is used to disable automatic output. In this way, sed will only output what we want it to output.

The first part of the script is "1-2h". The address is specified at the beginning, which means that the sequence operation is performed on the first line, and then every other line (that is, odd rows). The "h" section requires the command to copy the matching rows into the hold cache.

The second half of the command is more complex. It also begins with an address assignment, this time referring to a numbered line (unlike the first command).

The rest of the command is placed in parentheses. This means that the rest of the command will inherit the address previously specified. If there are no parentheses, only the "H" command inherits the address, while the rest is executed on each line.

The "H" command copies a new line character, then caches for the current mode, and finally keeps the cache for the current mode.

This hold mode (an odd line, followed by a new line character, and finally a numbered line) is then copied back to the schema cache (overwriting the previous schema cache) via the "g" command.

Next, the new line character is replaced with a space, and the line is output with the "p" command.

If the "N" command is used, the content of the command will be greatly shortened and the result will remain unchanged as described above:

sed -n 'N;s/\n/ /p' annoying.txt    

this is the song that never ends yes, it goes on and on, my friend
some people started singing it not knowing what it was
and they'll continue singing it forever just because...

Using Sed scripts

Before using more complex commands, we can aggregate commands in a text editor. In this way, we can perform a large number of command operations for a single target.

For example, if you want to write plain text information, but you need to organize it into a standardized format before using the text, you can use sed script to simplify the whole process.

Instead of entering commands one by one, we can incorporate them into a set of scripts and submit them to sed as parameters. A set of SED scripts is equivalent to a list of original sed commands (usually enclosed in single quotes).

For example:

s/this/that/g
s/snow/rain/g
1,5s/pinecone/apricot/g

Then we can call the file using the following syntax:

sed -f sedScriptName fileToEdit

In this way, we can edit the file and create any text format we want.

summary

In this tutorial, we discuss sed in more depth.

Sed's command is difficult to understand when it comes to contact, and it usually requires constant experimentation to master its usage. Therefore, please practice more to consolidate your experience.

However, I believe you have already felt the powerful power of sed from the examples. I hope you can further develop your imagination wisely and use this excellent tool to better accomplish your daily tasks.

The source of this article is from DigitalOcean Community . English text: Intermediate Sed: Manipulating Streams of Text in a Linux Environment By Justin Ellingwood

Translation: diradw

Keywords: REST Linux

Added by neel_basu on Mon, 15 Jul 2019 03:53:14 +0300