top page

TEI-XML-XSLT Article-Bibliography Sample Package


Table of Contents

1. Why write in XML? (a.k.a. “Born Digital”)
2. Preparation
3. This package consists of:
4. Download and Usage
5. Acknowledgments

1. Why write in XML? (a.k.a. “Born Digital”)

[Note: This is much-belated update (2013.10.7) of the resources that I originally posted in 2003, written for TEI-P4. That original page is available here.]

As a greater volume of serious research and teaching materials are published on the web—whether directly or as an afterthought—the question becomes increasingly important of how to make these materials available in the most systematic manner, so that they can readily be converted to other formats and utilized in other ways.

Of course, all recently-built word processors provide the readily available function of saving as HTML. So if one's purpose in web publishing is to simply convert an old document or two into HTML format, or just make a simple page for a small web site, the word-processor “save-as-HTML” solution can often be good enough. But if one wants to produce systematically-formatted web materials on a steady basis, and wants these materials to remain available for ready updating, or conversion into other formats 1 one will quickly find that that simple HTML publication is inadequate. This is because the only thing HTML can encode is appearance; it doesn't encode content. XML, when used in conjunction with XSLT style sheets, provides a way to skillfully and powerfully merge content-based markup with the style of one's choice. And since XML code is written in human-readable format, it is something that has an unusual degree of accessibility for authors of articles, books, and so forth who possess no special training in computer programming.

XML, used with XSLT stylesheets, provides the basic underlying technology to do fully-empowered web publication. But XML only provides a set of general rules for the creation of markup tags. To publish literary manuscripts, research articles and the like on the web in a systematic manner, it is helpful to have some sort of standardized approach in the academic realm. It is toward the establishment of these sorts of guidelines that the Text Encoding Initiative (TEI) has been working. Their aim is to provide a standard for the interchange of digitized scholarly research information, which has the effect, among other things, of overcoming the impediments and information loss incurred by proprietary software systems.

There is a growing number of scholars who are using TEI's application of XML to digitize previously written works, and to write new works from the beginning, 2 a practice that allows full freedom in terms of choice of future publication format—especially if there is any chance that a version of the text will eventually make it onto the web. And for those who manage full-fledged web-publication projects, this form of publication has no equal in terms of broad functionality and ease of use. 3

There is ample introductory material available both on the web and in book format explaining the structure and function of XML and XSLT, so I need not attempt to duplicate that here. 4 It has been my experience that most of the real learning of an application such as XML comes with being involved with the material in a “hands-on” manner. This is what I have attempted to offer here. If you have just a few basic tools on your computer, along with a good book, or user-friendly web site, you can begin to write and publish your own works in XML almost immediately.


2. Preparation

You will need:


3. This package consists of:

  1. A recent article of mine written in TEI-XML (P5). The bibliography for this article is composed using the TEI <biblStruct> framework.
  2. A set of XSLT files, including transformations for global styles, bibliographic styles, endnote-creation styles, and hyperlinking styles. These are subdivided into two general kinds of output: (1) web publication as HTML and (2) output in MS-Word (but there are some files that are shared between these two systems).
  3. Two MS-Word VBA macros for making final conversions for generating TOC and footnotes or endnotes in Word.

Caveats: I am not a professional-level XSLT programmer, nor am I an “expert” in the usage of TEI. Therefore these sheets do not output high-level types of formatting. I would not even begin to imagine that the way I have written my XML and XSLT is the best way. On the other hand, since my programming skills are relatively low, the beginner can be assured that there is nothing here that is terribly arcane. Thus, hopefully, newcomers can look at the way I have done things as a kind of beginner's exercise. After a brief amount of time of working with these materials, the newcomer can, and should, improve upon and correct this code.


4. Download and Usage

Please follow the below steps:

  1. Download this ZIP file and unzip it into a directory. It should create a subdirectory named xml-xslt-teip5.
  2. Open the file 2013-06-10-views-and-faith.xml with your XML editor.
  3. To output the file as an HTML page, use the presently-designated XSLT processing instruction of “xml-stylesheet type="text/xsl" href="xslt/art-trans-web.xsl”. If you want to generate an HTML file to be used for ultimate publication in MS-Word, change the filename to “art-trans-paper.xsl”.
  4. Click on the “Apply Transformation Scenario” button (the red right-arrow in the middle of the above menu), and answer “Yes” to the apply-tranformation-scenario query. On most systems, when the file is generated, your system browser will be started up containing the output file.
  5. Note that the endnotes and table of contents are automatically numbered, placed, and linked. For example, any note that you bracket with <note place="end">A note.</note> will automatically be numbered and moved to the end of the document. Try doing some input/editing of the sample file, and see how XML makes you adhere to its structural rules. Then try outputting as HTML again. If you have damaged any of the tag structure, you will not get output, but instead an error report.
  6. The type of output done thus far is for web publication. But let's say you want to submit the article instead for a paper journal. Go to the top of the document and find the link to the XSLT style sheet, currently set as “xsl/art-trans-web.xslt,” and change this to “xsl/art-trans-paper.xslt.” You will now generate an HTML file that needs to be processed one more time in MS-Word. In order to do this, you need to runopen up the file and run one of two VBA macros in Word. To have access to these, please copy the Word template file html-xml-indexing.dotm into either your Word templates folder, or the Startup folder (in the former case, you will need to manually register it each time; in the latter case it will always be automatically loaded). You can run the macro xmlPublishasWordEndnotes or xmlPublishasWordFootnotes. You can do this by directly doing Macros --> Run, but when you load the Template, there should also be a new dropdown menu under the Add-ins tab that lists this macro.
  7. If you are working with a full-length book monograph rather than a short article, please change the <extent> value in the header from analytic to monogr.
  8. If you find any blatant errors based on missing dependencies that break the XSLT or the VBA, please let me know. If you have questions about the TEI, please do take a look around the TEI site first. There is a vast amount of explanatory material there. If you can't find what you need, write to me.

5. Acknowledgments

The most critical programmatic aspects of the XSLT formatting, including the integration of the style sheets and the handling of footnotes and table of contents were either directly written by, or derived from examples provided by Michael Beddow. Other sophisticated functions were applied through examples received from Jeni Tennison and Wendell Piez. Much of the development of the <biblStruct> area has occurred with the collaboration of the students in my XML courses at the University of Tokyo. However, since none of these people were directly involved in the production of the final style sheets, they should not be considered responsible for any poorly-conceived code to be seen therein.

Charles Muller [acmuller@l.u-tokyo.ac.jp]


Notes

1. Such as: other arrangements of the HTML, conversion to E-Book, or conversion into PDF, MS-Word, assimilation into databases, or incorporation into larger projects.

2. I am a rather extreme example, as virtually every article, book manuscript, and digitization project I have undertaken since 1998 has been done using TEI-XML. My online dictionaries are also developed in a form of XML that is strongly influenced by the TEI Dictionary Module.

3. There are other formats, such as TEX, that are good for web publication, but TEX does not encode content information, and is not as readily learnable by non-programmers as XML.

4. For learning basic XSLT, I recommend the books by Jenny Tennison. For advanced XSLT programming, see the works by Michael Kay.

5. For my own XML work, I still often do a lot with the free text editor Emacs, which comes automatically set up in NXML mode. This, I plug into a TEI schema. I use this mainly this is simply because this is how I started out, and so I am used to it. It is light, fast, and one can take advantage of recording keyboard macros. The problem is that it uses ideosyncratic keystrokes and menus that make it extremely difficult for beginners, and veteran Emacs users are often not that friendly toward newbies.