Unix Geekery for Line Edits

The other day I found myself with a chat transcript about email security and I wanted to put it online in my other blog. Unfortunately, in order to do it properly I needed to add <p> to the beginning of each line and </p> to the end.

As I started manually adding the proper HTML paragraph markup to the lines I realized that what I was doing was pretty stupid. There are a number of ways to automate these simple changes to a file. Usually this can be done via a simple find-and-replace function of good text editors. In this case however, I didn’t want to replace anything, and while I could run a find-and-replace on line-endings (aka newlines) and then replace them with a line-ending followed by <p>, I didn’t know how to match line-endings in the program I was using (which was notepad2).

Then I realized that I already had access to the perfect utility to perform such a function, and one that could find anything I wanted in a file, including line-endings: sed. Sed is the Unix Stream Editor, and I had been meaning to learn how to use it for some time. Finally, I had the perfect excuse to read a sed tutorial! Combined with some input and output redirection, I executed the following command in my shell:

cat chat.txt | sed -e 's/^/<p>/' -e 's/$/<\/p>/' > chat-sedified.txt

Here’s what happened:

  1. First, the chat.txt file was printed on STDOUT, but instead of just displaying that output I piped it into sed with the | (the vertical bar, or, incidentally, pipe symbol). In other words, the output of cat chat.txt was turned into the input for sed.

    Now that sed was given input, I told it to run two expressions (using the -e flag or option) one after the other.

    1. The first expression runs a search-and-replace (the s in the expression), looking for the beginning of the line (the ^ or caret character) and inserting an HTML paragraph mark (<p>, obviously).

    2. The second expression does the same thing, only for the end of the line instead of the beginning. ($ matches the end of the line.) There is only one complication: sed’s search-and-replace uses the forward slash (/) as the delimiter of the input fields. In other words, the usage of sed’s search-and-replace is thus: s/searchForThisText/replaceWithThisText/.

      In order to replace (or insert) text that includes a literal forward slash, it needs to be escaped by immediately preceeding it with a backslash. Thus the somewhat-more-cryptic <\/p> closing paragraph tag in the command.

  2. Finally, the result (output) of sed’s changes were redirected to the not-yet-existant ied.txt file using the greater-than sign, >.

Sure, it took me about 20 minutes to read the tutorial and figure out the proper command, but then it would have taken at least 10 minutes to make the changes manually. Of course, the next time I would need to make similar changes, it would take me another ten minutes, and the same is true for all the other future times I would need to make such changes, assuming the file to be changed is the same length or shorter.

And, of course, I can do much more than simply wrap lines of a file in arbitray text with sed. It’s just as powerful, if not more so, than any kind of search-and-replace command in a text editor. And, naturally, it can do much more than just search for text and replace it. But I’ve rambled on enough by now and I’m getting hungry for dinner.