Using sed to transform XML to be displayed in HTML

Pasting an XML-snippet into HTML can be full of surprises, the rendered result may not what you had in mind. Most browsers will try to interpret the snippet and you end up with something completely different. 😉
If the XML to be used is still well-formed, XSLT is one option to transform the offending characters into entities, which in turn render the way they should in any web-browser. But, if we are talking about a couple of lines of XML, things are different. Since the XML isn’t well-formed any more, XSLT is not an option.
A simple sed one-liner could look like this:

sed 's/\\</\\&lt;/g' file1.xml | sed 's/\\>/\\&gt;/g' > file2.txt

Let’s have a detailed look:

  1. sed is called and
  2. and told to perform a substitution s of all occurences of < to &lt; globally g in file1.xml.
  3. The output goes via | to the second call of sed which
  4. perfoms another substitution similar to the one in 2. This time for all occurences of >.
  5. The resulting text is written to file2.txt.

Now you’re able to cut and paste snippets like this

<figure>
  <title>Tags Window</title>
  <screenshot>
<mediaobject>
  <imageobject>
    <imagedata fileref="pictures/ch03/ch03-02tagswindow1.png"
      format="PNG" />
  </imageobject>
</mediaobject>
  </screenshot>
</figure>
<para>The menu of the <quote>Tags</quote> window
allows us to create a new tag.</para>

into the HTML-editor of your choice.

Technorati Tags: , ,