CIT 052 Index > sed Exercise

sed Exercise

Do the exercise on pages 155-156 of your book. Put numbers 1-9 into a file named lastname_firstname_sed.sh. The file for number 10 should be placed into a file named lastname_firstname_edit.sed. Here is the datebook file for this exercise.

Note: Don't use sed’s a command to append the asterisks at the end of the line in step 6. The a command adds new lines. You want the asterisks to be at the end of the line, not on a new line.

Also, do the following task, and put it in a file named lastname_firstname_docbook.sed

The input file will be a file written in Docbook format, which is a format designed for writing books and documentation. It looks a little bit like HTML, the language that is used for Web pages.

Docbook files consist of text with markup, or “tags” that tell how the text is to be displayed. An opening tag is a command enclosed in angle brackets < and >. For example, <para> means “start a paragraph”. The corresponding closing tag that means “end a paragraph” is written </para> (note the slash after the less than sign).

Here is the document that you will be modifying; just copy and paste it into a text file on your system.


<article>
<title>About the Web</title>

<para>
This is an article about the World Wide Web.
The World Wide Web is a collection of documents that are linked to
one another. The Web is <emphasis>not</emphasis> the same as the
Internet. The Internet is a world-wide network of networks, and it
does far more than simply serve up Web pages.
</para>

<para>Tim Berners-Lee, the inventor of the World Wide Web, put special
emphasis on the portability of web pages. Rather than create a
proprietary format, he made Web pages dependent only upon plain ASCII
text.</para>

<para>
Web pages are written in a markup language called HTML. Here is what it
looks like:
</para>

<listing>
&lt;body&gt;
&lt;div id="top-navig"&gt;
&lt;a id="top"&gt;&lt;/a&gt;
&lt;a href="index.html"&gt;CIT 040 Index&lt;/a&gt;
&amp;gt;
Assignment 1
&lt;/div&gt;

&lt;h1&gt;Assignment 1&lt;/h1&gt;
&lt;p&gt;This exercise shows you how to use the two computer environments that
you will use in this class. You will:&lt;/p&gt;
&lt;ol class="upper-roman"&gt;
&lt;li&gt;Set up your directories on Windows. This is
where you will write your HTML documents.&lt;/li&gt;
&lt;/ol&gt;
</listing>

<para>It looks difficult, but it is possible to learn HTML in a few
weeks. <emphasis>You, too can create web pages for viewing by
friends and family!</emphasis>
Note that, in our listing, we had to encode < as &lt.
</para>
</article>

Write a sed file that does the following. It should work on any Docbook file, not just this one. That means you can’t count on a particular tag always being on a particular line.

  1. Lines with <article> and </article> should be deleted.
  2. Replace <title> with Title:, and replace </title> with nothing.
  3. Replace all <para> and </para> tags with the null string. If the resulting line is empty, delete the line. (You may need to use curly braces to make this happen.)
  4. Replace all <emphasis> and </emphasis> tags with asterisks. Thus:
    This is a <emphasis>great</emphasis> bargain.
    will become
    This is a *great* bargain.
  5. Replace the word web with Web everywhere.
  6. Replace lines starting with <listing> by ---begin listing
  7. Replace lines starting with </listing> by ---end listing
  8. Between the <listing> and </listing>, do these things (you must use curly braces to do this!): Note: you must do these operations in the order shown above; otherwise, you will get the wrong results!

Note: The & character is a special metacharacter when used in the “replacement” portion of a substitution. For example, if you want to replace the word “and” with “&”, you would do this:

s/and/\&/

Your resulting file will look like this.

When You Finish

When you finish, put all three files into a single .zip file and mail it to the instructor. Please make sure that you put CIT052 in the subject line of your email.