top page

Notes on the XML Article Sample

This downloadable package consists of three basic components: (1) an article-in-progress of mine (to be published this winter by the Toyo Gakuen University Bulletin) in XML format, written according to the guidelines for XML encoding set forth by the Text Encoding Initiative (TEI); (2) A subdirectory containing the TEI DTD files; (3) A subdirectory containing the related XSL stylesheets that allow for the generation of new HTML files, based on the original XML file.

Why write in XML?

As a greater volume of serious research and teaching materials begin to be published on the web — whether directly or as an afterthought — the question of how to make these materials available in the most systematic manner, so that they can readily be converted to other formats and utilized in other ways, becomes increasingly important.

Of course, all recently-built word processors provide the readily available function of saving as HTML, and the HTML editors contained in such Web browsers as Mozilla are gradually reaching the level where their functionality approaches that of a word processor. So if one's purpose in HTML publishing is to simply convert an old article or two into HTML format, or just make a simple page for a small web site, the word-processor "save-as-HTML" solution can often be good enough. But if one wants to produce web materials on a steady basis, and wants these materials to remain available for ready updating, or conversion into other formats1 in a way where one has full control over the formatting, one will quickly find that that simple HTML publication brings problems. Also, when it comes to the encoding of the information contained in the work, HTML alone cannot help, since the only thing it can encode is appearance; it can't encode meaning. XML, when used in conjunction with XSL style sheets, provides a way to skillfully and powerfully merge content-based markup with the style of one's choice. And since all of the XML code is written in human-readable format, it is something that has an unusual degree of accessibility for authors of articles, books, and so forth who possess no special training in computer programming.

XML, used with XSL(T) stylesheets, provides the basic underlying technology to do fully-empowered web publication. But XML only provides a set of general rules for the creation of markup tags. To publish literary manuscripts, research articles and the like on the web in a systematic manner, some sort of standard set of guidelines needs to be in place, and it is toward the establishment of these sorts of guidelines that the Text Encoding Initiative (TEI) has been working. Their aim is to provide a standard for the smooth interchange of digitized scholarly research information, and thus to overcome the impediments and information loss incurred by proprietary software systems.

There is a growing number of scholars who are using TEI's application of XML to digitize previously written works,2 and to write new works from the beginning, which allows full freedom in terms of our choice of future publication format—especially if there is any chance that a version of the text will eventually make it onto the web. And for those who manage full-fledged web-publication projects, this form of publication has no equal in terms of broad functionality and ease of use.3

Since there is ample material available both on the web and in book format explaining the structure and function of XML and XSL, we do not want to waste time in attempting to duplicate that here. Also, it has been my experience that most of the real learning of an application such as XML comes with being involved with the material in a "hands-on" manner. This is what we have attempted to offer here. If you have just a few basic tools on your computer, along with a good book or two, you can begin to write your own works in XML almost immediately. You will just need a few things to get started.

Preparation

You will need:

It is not necessary just to get started, but if you want to seriously experiment with these materials, you will probably want to get a good XML/XSL(T) book or two: For example, The XML Bible by Elliotte Rusty Harold and Beginning XSLT by Jeni Tennison.

Download and Usage

Now, getting down to business:

  1. Download this ZIP file and unzip it into a directory. It should create a subdirectory named xml-tei-tut.
  2. Open the file ogahae-tgu2003.xml with your XML editor of choice.
  3. Find the "XSL Transform" function of your XML program, and try generating an HTML file. Depending on the function of your XML editor and how you set it up, it may output the HTML code in another window for you to save with a new filename. Or it might fire up your system's web browser automatically to display it for you. If you are using oXygen, you will get a pop-up dialog asking you to input the filename and path of your XSLT file, which you can find listed near the top of the document.
  4. Try doing some input/editing, and see how XML makes you adhere to its structural rules. Then try outputting as HTML again. If you have damaged any of the tag structure, you will not get output, but instead an error report.
  5. The type of output done thus far is for web publication. Let's say you want to submit the article instead for a paper journal (as I will eventually be doing for with this piece). Go to the top of the document and find the link to the XSLT style sheet, currently set as "xsl/art-trans-web.xslt", and change this to "xsl/art-trans-paper.xslt". Now try your XSLT transformation again and observe the differences. If you want to make your own changes in format, you can go to the /xsl/ subdirectory, open up the files ending with *.xslt, and experiment. If you mess something up, just unpack the ZIP file again and start over.
  6. Note that the endnotes and table of contents are automatically numbered, placed, and linked. This is done using standard XSL functions, and not Java programming. For example, any note that you bracket with <note place="end">A note.</note> will automatically be numbered and moved to the end upon XSL transformation.
  7. Questions? If they have to do with TEI, please do take a look around the TEI site first. There is an awful lot of explanatory material there. If you can't find what you need, write to me.

Acknowledgments

The most critical aspects of the XSLT formatting, including the integration of the style sheets and the handling of footnotes and table of contents were either directly written by, or derived from examples provided by Michael Beddow. Other sophisticated functions were applied through examples received from Jeni Tennison and Wendell Piez. However, since none of these people were directly involved in the production of the final style sheets, they should not be considered responsible for any poorly-conceived code to be seen therein.

Charles Muller


Notes

1. Such as: other arrangements of the HTML, conversion to E-Book; or conversion into PDF, MS-Word, assimilation into databases, or incorporation into larger projects.[back]

2. Every article and translation that I have done in the past four years has been written this way. One might, for example, download the XML files available for some of my online translations or digitized versions of previous print publications, place these in the same folder, set the links the DTD and XSL files, and validate/output those files in the same way.[back]

3. There are other formats, such as TEX, that are good for web publication, but TEX does not encode semantic information, and is not as readily learnable by non-programmers as XML.[back]

4. Actually, most of the work on this project was done with Emacs, but I don't recommend it for beginners.[back]