CIT 052 Index > sed Exercise

sed Exercise

Do the exercise on pages 155-156 of your book. Put numbers 1-9 into a file named lastname_firstname_sed.sh. The file for number 10 should be placed into a file named lastname_firstname_edit.sed. Here is the datebook file for this exercise.

Note: Don't use sed’s a command to append the asterisks at the end of the line in step 6. The a command adds new lines. You want the asterisks to be at the end of the line, not on a new line.

Also, do the following task, and put it in a file named lastname_firstname_docbook.sed

The input file will be a file written in Docbook format, which is a format designed for writing books and documentation. It looks a little bit like HTML, the language that is used for Web pages.

Docbook files consist of text with markup, or “tags” that tell how the text is to be displayed. An opening tag is a command enclosed in angle brackets < and >. For example, <para> means “start a paragraph”. The corresponding closing tag that means “end a paragraph” is written </para> (note the slash after the less than sign).

Here is the document that you will be modifying; just copy and paste it into a text file on your system.


<article>
<title>About the Web</title>

<para>
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.
</para>

<para>Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.</para>

<para>
Web pages are written in a markup language called HTML. Here is what it
looks like. The &lt; and &gt; mark off elements.
</para>

<listing>
&lt;body&gt;
&lt;div id="top-navig"&gt;
&lt;a id="top"&gt;&lt;/a&gt;
&lt;a href="index.html"&gt;CIT 040 Index&lt;/a&gt;
&amp;gt;
Assignment 1
&lt;/div&gt;

&lt;h1&gt;Assignment 1&lt;/h1&gt;
&lt;p&gt;This exercise shows you how to use the two computer environments that
you will use in this class. You will:&lt;/p&gt;
&lt;ol class="upper-roman"&gt;
&lt;li&gt;Set up your directories on Windows. This is
where you will write your HTML documents.&lt;/li&gt;
&lt;/ol&gt;
</listing>

<para>It looks difficult, but it is possible to learn HTML in a few
weeks. <emphasis>You, too can create web pages for viewing by
friends and family!</emphasis>
Note that, in our listing, we had to encode &gt; as &amp;gt.
</para>
</article>

Write a sed file that does the following. It should work on any Docbook file, not just this one. That means you can’t count on a particular tag always being on a particular line.

  1. Lines with <article> and </article> should be deleted.
  2. Replace <title> with Title:, and replace </title> with nothing.
  3. Replace all <para> and </para> tags with the null string. If the resulting line is empty, delete the line. (You may need to use curly braces to make this happen.)
  4. Replace all <emphasis> and </emphasis> tags with asterisks. Thus:
    This is a <emphasis>great</emphasis> bargain.
    will become
    This is a *great* bargain.
  5. Replace the word web with Web everywhere.
  6. Replace lines starting with <listing> by ---begin listing
  7. Replace lines starting with </listing> by ---end listing
  8. Between the <listing> and </listing>, do these things (you must use curly braces to do this!): Note: you must do these operations in the order shown above; otherwise, you will get the wrong results!

Note: The & character is a special metacharacter when used in the “replacement” portion of a substitution. For example, if you want to replace the word “and” with “&”, you would do this:

s/and/\&/

Your resulting file will look like this.

Important: This is a sed script. That means you will not send me a series of sed -e ... commands. This is covered in the tutorial, but it is important enough that I will duplicate the information here. The following is an example of what I mean by a sed script.

First, create a file named test.txt with this text in it:

This line is good.
This line is bad.
Numerals one and two
Words one and two
The pessimist looks for the best.
This is another bad line.
This is another good line.
The pessimist sees a half-empty glass.
Numerals one and two again

So, for example, if I wanted you to write a sed script that deletes any line with the word "bad" on it, and also substitutes the word "pessimist" with "optimist" on lines 5 through 7, you would make a file named, say, example.sed with these contents:

/bad/d
5,7s/pessimist/optimist/

This file example.sed that you just created is a sed script. Now do this from the command line, and you will see that the script does both operations on the test.txt file.

sed -f example.sed test.txt

Here is a hint for the script. Put the following into a file named example2.sed. This sed script will change the words “one” and “two” to the digits 1 and 2 on any line that also contains the word “Numerals”.

/Numerals/{
   s/one/1/
   s/two/2/
}

Now run this from the command line:

sed -f example2.sed test.txt

When You Finish

When you finish, put all three files into a single .zip file and mail it to the instructor. Please make sure that you put CIT052 in the subject line of your email.