TEI-XML-XSLT Article-Bibliography Sample Package
Table of Contents
1. | Why write in XML? (a.k.a. “Born Digital”) |
2. | Preparation |
3. | This package consists of: |
4. | Download and Usage |
5. | Acknowledgments |
1. Why write in XML? (a.k.a. “Born Digital”)
[Note: This is much-belated update (2013.10.7) of the resources that I originally posted in 2003, written for TEI-P4. That original page is available here.]
As a greater volume of serious research and teaching materials are published on the web—whether directly or as an afterthought—the question becomes increasingly important of how to make these materials available in the most systematic manner, so that they can readily be converted to other formats and utilized in other ways.
Of course, all recently-built word processors provide the readily available function of saving as HTML. So if one's purpose in web publishing is to simply convert an old document or two into HTML format, or just make a simple page for a small web site, the word-processor “save-as-HTML” solution can often be good enough. But if one wants to produce systematically-formatted web materials on a steady basis, and wants these materials to remain available for ready updating, or conversion into other formats 1 one will quickly find that that simple HTML publication is inadequate. This is because the only thing HTML can encode is appearance; it doesn't encode content. XML, when used in conjunction with XSLT style sheets, provides a way to skillfully and powerfully merge content-based markup with the style of one's choice. And since XML code is written in human-readable format, it is something that has an unusual degree of accessibility for authors of articles, books, and so forth who possess no special training in computer programming.
XML, used with XSLT stylesheets, provides the basic underlying technology to do fully-empowered web publication. But XML only provides a set of general rules for the creation of markup tags. To publish literary manuscripts, research articles and the like on the web in a systematic manner, it is helpful to have some sort of standardized approach in the academic realm. It is toward the establishment of these sorts of guidelines that the Text Encoding Initiative (TEI) has been working. Their aim is to provide a standard for the interchange of digitized scholarly research information, which has the effect, among other things, of overcoming the impediments and information loss incurred by proprietary software systems.
There is a growing number of scholars who are using TEI's application of XML to digitize previously written works, and to write new works from the beginning, 2 a practice that allows full freedom in terms of choice of future publication format—especially if there is any chance that a version of the text will eventually make it onto the web. And for those who manage full-fledged web-publication projects, this form of publication has no equal in terms of broad functionality and ease of use. 3
There is ample introductory material available both on the web and in book format explaining the structure and function of XML and XSLT, so I need not attempt to duplicate that here. 4 It has been my experience that most of the real learning of an application such as XML comes with being involved with the material in a “hands-on” manner. This is what I have attempted to offer here. If you have just a few basic tools on your computer, along with a good book, or user-friendly web site, you can begin to write and publish your own works in XML almost immediately.
You will need:
Caveats: I am not a professional-level XSLT programmer, nor am I an “expert” in the usage of TEI. Therefore these sheets do not output high-level types of formatting. I would not even begin to imagine that the way I have written my XML and XSLT is the best way. On the other hand, since my programming skills are relatively low, the beginner can be assured that there is nothing here that is terribly arcane. Thus, hopefully, newcomers can look at the way I have done things as a kind of beginner's exercise. After a brief amount of time of working with these materials, the newcomer can, and should, improve upon and correct this code.
Please follow the below steps:
The most critical programmatic aspects of the XSLT formatting, including the integration of the style sheets and the handling of footnotes and table of contents were either directly written by, or derived from examples provided by Michael Beddow. Other sophisticated functions were applied through examples received from Jeni Tennison and Wendell Piez. Much of the development of the <biblStruct> area has occurred with the collaboration of the students in my XML courses at the University of Tokyo. However, since none of these people were directly involved in the production of the final style sheets, they should not be considered responsible for any poorly-conceived code to be seen therein.
Charles Muller [acmuller@l.u-tokyo.ac.jp]
Notes
1. Such as: other arrangements of the HTML, conversion to E-Book, or conversion into PDF, MS-Word, assimilation into databases, or incorporation into larger projects.
2. I am a rather extreme example, as virtually every article, book manuscript, and digitization project I have undertaken since 1998 has been done using TEI-XML. My online dictionaries are also developed in a form of XML that is strongly influenced by the TEI Dictionary Module.
3. There are other formats, such as TEX, that are good for web publication, but TEX does not encode content information, and is not as readily learnable by non-programmers as XML.
4. For learning basic XSLT, I recommend the books by Jenny Tennison. For advanced XSLT programming, see the works by Michael Kay.
5. For my own XML work, I still often do a lot with the free text editor Emacs, which comes automatically set up in NXML mode. This, I plug into a TEI schema. I use this mainly this is simply because this is how I started out, and so I am used to it. It is light, fast, and one can take advantage of recording keyboard macros. The problem is that it uses ideosyncratic keystrokes and menus that make it extremely difficult for beginners, and veteran Emacs users are often not that friendly toward newbies.