Digitization and the Revolution in the Media of Buddhist and Asian Studies: Where We Have Come, and Where We Are Going

 

 

Charles Muller

Toyo Gakuen University

9/19/1999

Revised: 10/6/1999


Table of Contents


Organizing the Information
E-Journals
Databases
Birth of the EBTI
Examples of Application
Not Only Texts
Tasks at Hand
The Future of Academic Publishing: Some Predictions
Libraries
Acknowledgments


 

 

           In view of the continual media hype addressed to the advent of the "digital age," it might already seem superfluous to point out that the research methods for scholars in the humanities and social sciences are being radically transformed by the ongoing shift in major segments of the scholarly medium from paper to digital. For those with a greater awareness of the possibilities of the new medium however, the view is that we have barely begun to scratch the surface in terms of the overall consequences of this revolution. The main reason for the lack of general awareness of the magnitude of the changes is that the popular media rarely reports with any depth on the ramifications for computer-based research outside the IT industry, or outside the fields of natural sciences. It is also not yet customary for major academic journals to give extensive attention to matters of research technique as concerns the digital domain, as such journals tend to focus on actual research in their particular field, rather than to questions of the application of new technologies.

            It has already been the case since the late '80s (at least in North America and Europe), that the vast majority of scholars have come to use computers for the composition of their articles. During the same period, most academics have become accustomed to using e-mail, through which they do the bulk of their professional communication and collaboration. It was not until the full advent of HTML browsing on the World Wide Web however (starting in about 1994), that the ease of data exchange became enhanced to the extent that useful humanities-related data could be obtained from sites/persons with whom one had no previous contact, and that such data could be implemented on one's local system by persons with relatively little computing expertise. In the past few years, this situation has developed rapidly, as access to the World Wide Web is becomingly easier and faster, and is beginning to offer access to large amounts of high-quality research materials useful to scholars of the humanities, particularly in the areas of Asian Studies and Buddhist Studies. The Web has become, in a short time, a vitally important medium for the exchange of scholarly information.

  

Organizing the Information

There are, of course, numerous new problems that have arisen, and that will continue to arise, as the proliferation of information has begun to occur suddenly without any pre-established means for its organization and evaluation. In the world of print media, norms have been long established in the scholarly community for organized, value-appraised dissemination of knowledge through academic book and journal publication enterprises. With the development of the WWW, however, it has become possible for anyone with the ability to create an HTML document and server space to publish his or her materials. This material, in turn, is turned up indiscriminately by web search engines that have no real evaluative capacity. This sudden flood of materials with little or no relevance, or of poor quality, has frustrated many of those who attempt to use the Internet to seek information.

Some attempts have been made to deal with the proliferation of information, one example being the World Wide Web Virtual Library (WWWVL), wherein attempts are made to gather information on useful sites from various categories. But the present system is quite full of flaws: Many sites are run by persons with no expertise in the areas they are supposed to cover; the links in the library contain no notation, and therefore we must inevitably check for ourselves to see if a particular site is worthwhile. Also like any other page of links, they often end up being poorly maintained, and thus the usefulness of link libraries quickly disappears.

When we try to navigate through a link system such as the WWWVL, or other related resources lists to find information that is relevant to our interests, we are still going to find a vast range in type and quality of materials. Some pages with grandiose titles are going to be nothing more than simple announcements of events. Many others are announcements of data that is promised to come but very well may not. It is all-too-common to see an individual or group start a web page with grand intentions, but then fail to follow through. In this case the page languishes on the Internet to be continually re-found by web crawlers.

The wide range in quality of resources, and the ephemeral nature of the web links is therefore a serious problem, which will no doubt continue to obstruct its viability for some time. This problem should eventually see some rectification in the development of the usage of XML/TEI (explained below), which will begin to provide a way to classify material so that it may be located by more sophisticated search engines.

 

E-Journals

E-journals, despite their short history, have already become predominant in the fields of medicine and the hard sciences. University libraries have found them in many ways much easier to deal with than print journals, especially in view of the expensive price usually required for institutional subscriptions, as well as the handling and maintenance involved. Stanford University was one of the first to establish a comprehensive digitization and hyperlinking program for science journals, wherein all references to other related articles are automatically hyperlinked. This kind of rapid access has become invaluable to researchers in such fast-changing fields as medicine.

The Stanford project, however, wherein print journals are converted to digital format and enhanced with hyperlinks, only touches on one part of the E-journal domain. Another important dimension of this new phenomenon is the case where a journal for a certain area of scholarly interest has not yet been in existence (due to the variety of difficulties in starting up a new journal for a narrow field of interest), and a group of scholars get together and decide to put together a journal that is exclusively web-based in nature. Here, the area of the humanities has special relevance, because it is often in this realm where the establishment of a new paper journal is cost-prohibitive and a major gamble--the landscape is littered with startup journals that were simply not able to garner the various bases of support necessary to survive.

A model case of an eminently successful E-journal is that of the Journal of Buddhist Ethics (JBE) started by Charles Prebish and Damien Keown in 1994. Interested for some time in publishing a journal in their area of specialty, but perceiving the difficulties in setting up something for a sub-field as specific as Buddhist Ethics, Drs. Keown and Prebish investigated the possibilities of publishing online, without a print version. Five years later, their team is the host of a widely read and highly regarded scholarly publication, refereed by an editorial staff that boasts some of the major names in the field. As reported in Religious Studies News[1] Dr. Prebish and the editors of a half-dozen other journals in the field of religious studies have recently established a consortium called the Association of Peer-Reviewed Electronic Journals in Religion, aimed at the development of standards for scholarly E-journals, to ensure their parity in standing with print journals. One of the basic parameters recommended by the group is the registration of an ISSN number, along with a clearly defined referee system, the purpose being the establishment of stringent publication requirements equivalent to those of print journals. It is clear that with the establishment of these sorts of journals, a greater number of good-quality articles will enter into the realm of scholarly access.

 

Databases

Another major use of the Internet by humanities scholars is as a means for the acquisition of online digitized textual collections and information databases--material that has heretofore been exclusively contained in books. When the Internet first came into wide use, the amount of available useful humanities research data was negligible. But as time passes, we are witnessing the rapid development of online editions of high quality, valuable--and even rare editions, of data. This new availability is in some ways even more welcome to scholars who are working at small, isolated schools, but is a significant boon to any scholar who knows how to take good advantage of these resources.

The mere attainment of access to these materials, however, is only a small dimension of the actual change being wrought. While digital availability makes materials more readily accessible, it is in what happens after the materials arrive to one's own desktop that the dramatic differences begin to show themselves. Not only does the digitization and the networking of resources allow scholars to do things much faster--it allows them to do things that were heretofore impossible. There are of course many ways that research access has been enhanced by digitization, but one of the most obvious is that of research involved with large textual corpora, such as the various Asian literary canons, one of the prime examples of which is the Buddhist canon.

 

The Digital Buddhist Canons

Among the canonical bodies of the world religions, the Buddhist canon is distinguished by its vast size, the result of a combination of various distinctive attributes of the Buddhist tradition, including its rich philosophical character, its long history, its passage through a variety of cultures (there are Tibetan, Pali, Sanskrit, East Asian [CJKV] and various South Asian vernacular versions and translations of this canon), and its theoretical basis as an "open canon." When doing research with the Buddhist canon, Buddhologists are, like their counterparts in other textual religious traditions, expected to give precise scriptural citations by volume, page, column and line. But the vast size of the Buddhist canon in itself makes for a quite different situation than that of say, the Bible--a single volume which many scholars can memorize passage, line and verse. Conversely, it is also due to the relatively limited size of the combined Old and New Testaments, that biblical scholarship was among the first to be able to utilize computer technology to create electronic research tools. Many of the prerequisites for this kind of development were already fulfilled: the availability of complete, well-established English translations; an easily handleable document size--which therefore required a relatively short amount of time and energy for input. Also, the original source languages (Greek, Hebrew) were written in phonetic scripts of finite size, for which software applications had become available soon after English word-processing became popular.

Buddhist canonical collections however, especially those of the Chinese-Japanese-Korean-Vietnamese (CJKV) Buddhist canon, still do not fulfill any of these basic requirements (except perhaps for the fact that modern memory capacities can allow a PC to handle the largest of the single texts). Only a very small percentage of the CJKV canonical collections have been translated into English. Their size is immense--consisting of hundreds of volumes, most of which are several hundred pages in length.[2] The range of characters contained in the CJKV canonical collections is also extensive, more than 30,000--an amount far beyond what is contained in East Asian local character sets for computers.[3] The extreme magnitude of the Buddhist canon has made philological research a daunting task for scholars, as the mere work of searching for citations can consume the largest portion of scholar's energy in the completion of a research project. These citations sometimes cannot be found at all, especially when the citation varies to some degree from that contained in the actual source.

It is precisely these characteristics that make the Buddhist canon the perfect object for digitization, as it is exactly in these kinds of mechanical tasks that a computer can so drastically outdistance the human senses. It is for the same reason that beginning in the early part of this decade, a number of Buddhist clerical and academic centers began to make efforts at the input of the Buddhist canon. In the case of Pali, and other South Asian phonetic scripts (which have a limited number of characters), although the task was intimidating, the problems involved in input have mainly been those of simple volume--and therefore, labor. Once the input was done, the technical sophistication with which the material ended up being presented was also a major consideration, but nonetheless a mainly a matter of applying already existent technology.

The Chinese canon, on the other hand, presented technical difficulties of large proportions in just the matter of input, due to the large number of characters that needed to be represented outside the local character sets. This difficulty was further compounded by the fact that the encoding systems in use in the various local systems of the East Asian countries were all incompatible with each other. I.e., even though the Chinese, Taiwanese, Koreans, Japanese and Vietnamese all shared in using the Chinese (Han) characters, any single Chinese character contained in their respective software applications was encoded differently in each of the countries. This problem is only recently being resolved (to a limited extent) through the implementation of the Unicode character set, in which the characters contained in most of the world's scripts are assigned a single code number, which is now being shared in the commercial software around the world.

But since the present Unicode set contains only approximately 21,000 code spaces for Han characters, the missing character problem is still almost as much a thorn in the side of input project managers as it was ten years ago, when the earliest attempts at input were being made.[4] Among the earliest pioneers in CJK textual input, who attempted to devise the earliest solutions were Urs App and Christian Wittern of the International Research Institute for Zen Studies (IRIZ) at Hanazono University, relying on the principles of Standard Generalized Markup Language (SGML) to deal with the problem by placing reference entities in the text in the place of missing characters, which referred browser systems to a small graphic image of the character. Being done in a systematic manner, this allowed for a standardized piece of information at the point in the text where the character should be, and allowed for a display of the character in a browser environment.[5] It was nonetheless not a perfect solution, since standard word-processor type searches would not be able to find the indicated character.

Birth of the EBTI

During the early-to-mid nineties, a large number of CJK input projects began to spring up: at major academic centers such as Academia Sinica; UC Berkeley; the above-mentioned IRIZ project at Hanazono; the Pali Canon project at Mahidol University; at Buddhist monastic centers such as Haein-sa, along with dozens of small individually-run projects. The Internet still being relatively new, it was quite easy for these projects to begin to develop with little or no knowledge of each other, and so the possibility for duplicated work became a real problem. The concern was not only with the double work of physically inputting the same text. It was also a tremendous waste of energy if one group had come up with a technical solution for handling some aspect of the task, and another group doing similar work was unaware of this solution. Fortunately, the need for opening up some kind of forum for communication between these various groups was noted at a fairly early stage by Lewis Lancaster of UC Berkeley, who, with the solution of these sorts of concerns in mind, convened a small informal meeting at Berkeley in 1993, in which the Electronic Buddhist Text Initiative (EBTI) was founded.[6]

The initial purpose of the EBTI was to foster communication between in-progress textual input projects, but it gradually grew in scope to include a wide variety of computer-related Buddhist scholarly endeavors, including the digitization of all sorts of Buddhist cultural information, including art, music, and various social scientific resources. It also began to focus on the various forms of presentation, which included database-search technique, multi-media technology and so forth. Branching out into all kinds of research tools, including lexicons and various uses of the World Wide Web, the EBTI also became an advisory body for new start-up projects, whose leaders could learn much from those who had already made significant headway.

The three bywords of the EBTI have been "cooperation," "collaboration" and "standardization." Since the realm of digitized data is not limited by the normal constraints of the paper book, it is possible to develop resources of unlimited magnitude at a fraction of the prior costs. These kinds of very large projects are almost always better accomplished in a situation where the workload is shared between individuals, groups, or organizations. A prominent example of this is the ongoing task of digitizing the East Asian Buddhist canon. At present, there are a few major projects are dealing with this work: The Tripitaka Koreana group and Dongguk University group in Korea, the Chinese Buddhist Electronic Text Association (CBETA) in Taiwan, the IRIZ project and SAT Electronic Taisho group in Japan. Since all these groups send delegates to the EBTI meetings, they are able to share both data and implementation techniques. This sharing is going to greatly accelerate the speed of the preparation of a high-quality digital East Asian Buddhist canon.

In order for data to be shared between various input projects, and then be later transmitted effectively to scholars and students in every corner of the globe, standardization is also of critical importance. The data-makers are going to need to present their materials in a way that the average scholar with only a minimum of technical expertise will be able to use it. This means that those who develop digital resources need to pay attention to matter of storage and delivery in a non-proprietary format. Problems occur when a well-funded group with a sophisticated technical staff creates its own system (either purposely or unconsciously) which is not accessible to standard software, and thus the data will not be readily sharable. Such projects will not fare out well in the long run in terms of worldwide sharing. Therefore, a continual emphasis in the EBTI has been in learning how to take advantage of open data handling systems, such as SGML, its subset XML, and the Text Encoding Initiative (TEI), which seeks to apply open text markup standards to the needs of humanities scholarship.

The existence of the Internet in itself brings the possibility of collaboration into a totally new dimension, as those who are doing any sort of data input or analysis can readily share their information by placing on their own web site, or by directly submitting to the coordinator of a group project. Thus, the kind of collaboration that would have in the past required the physical gathering of a group by means of a large monetary grant can be coordinated to a large extent by Internet communication and data transfer. My own East Asian lexicographical project is a good case in point, as entries to the online Dictionary of East Asian Buddhist Terms (DEABT) and Dictionary of East Asian Literary Terms (DEALT) are being contributed by e-mail from scholars around the world. The same sort of method is also being used by a number of canonical text input projects and image databases. This kind of collaboration is not easily done if the transfer format is not something in some kind of shareable common denominator such as XML/TEI (more on these below).

The EBTI has served an important role as an organization of coordination and direction for digital research materials in Buddhist Studies, but also in the larger fields of Asian Studies, Religious Studies, and all other sorts of culturally oriented disciplines. With similar aims and intentions, an organization with a bit of a broader scope, called Scholars Engaged in Electronic Resources (SEER), has also come into existence. Spreading beyond the confines of Buddhism (but nonetheless, largely aimed at scholars in Asian Studies), SEER seeks to provide a forum for communication, collaboration, standardization and guidance similar in nature to EBTI.

We now have a state of affairs where a significant amount of data and tools are already available to the scholar of humanities who knows how to take advantage of them. Admittedly, at the present moment, the scholar who would make good usage of these resources probably needs computer skills that are a bit better than average, but a need for "techie"-level skills will not exist forever.

 

Examples of Application

I will start close to home, by giving some examples of personal experience with digital research tools. At the time that I first began to put together my dissertation proposal (around 1990), I was intending to do an annotated translation of a Korean Buddhist text titled the System of the Two Hindrances (Ijangui), a detailed analysis by the Korean master Wŏnhyo (617-686) of the two obstructions to enlightenment (aavara.na) taught in the Yogaacaara school, through an analysis of the Chinese versions of approximately fifty different Mahayana works, among these being the massive Yogaacaarabhuumi-`saastra and Avatamsaka-suutra. My advisor quickly warded me off this plan, explaining that it was simply not a dissertation-size project, as it might well take me ten years just to locate all of the citations--if I were indeed able to locate them at all! I heeded his advice and chose a more suitable project, but nonetheless maintained my strong interest in this text.

Now, however, several years later, I have returned to the System of the Two Hindrances, as a contributor to a large Wonhyo translation project.[7] After preparing a digital version of my text, and having obtained a number of works-in-progress of digitized versions of most the Chinese canon, and equipped with a shareware applet called Search and Replace,[8] I was able to locate 95% of the citations within a few weeks. Refined searching and various other strategies have allowed me to find the rest, and in the event of not-found citations, I can declare with reasonable certainty that they are not contained with the editions of the canon presently in our possession. This means that instead of consuming the largest part of my energies in painstaking detective work through the thick volumes of the Taisho, I can now fully invest my energies into matters of interpretation--which I find to be far more stimulating. This will decrease the overall time of the completion of the project by several years.

The study and translation of such a text carried out on my PC rather than with a paper-print version also offers a wide variety of other advantages. Since I also have contained in my computer my own extensive dictionaries of Buddhist literary Chinese and non-Buddhist literary Chinese, along with an array of commercial software dictionaries and a number of other digitized indexes, I am able to look up logographs and compound words with the click of a mouse, and am able to instantly either find a new meaning, or simply refresh my memory regarding the sense of a term that I already have a pretty good idea about. As any translator knows, it is so often the words that one thinks one knows, that can cause the most embarrassing mistakes in the final product. Since, on the computer screen, connected to digital dictionaries, however, lookup is instantaneous, there is no reason not to recheck. When a term is not contained in any of my on-line dictionaries, I can consult a digitized comprehensive index of Buddhist lexicons[9] that tells me the exact page number of any of the dictionaries in which the term is located. There is no need to search through several dictionaries--just grab the one(s) indicated and open to the right page, and that's it. In the case where a term is not contained in a dictionary, or where a dictionary definition does not seem satisfactory, I can use a multi-file search applet to search through the entire Buddhist canon and provide examples of all the contexts in which it is used. These are all techniques that were heretofore impossible.

Having textual corpora in digitized format further allows a wide range of computer-assisted research techniques to be applied, far beyond the mere task of lookup and translation. For example, we can do textual comparisons that reveal hard facts about the relationship of origin of texts that would be impossible with the human senses. Basically, the work of philology--always considered to be a somewhat mechanical form of scholarship, can be largely done by the machines that best handle it.

 

Not Only Texts

            There is a wide range of other applications being made of computerized resources in Asian Studies. One excellent example of a whole new genre in the development and sharing of data is that of the International Dunhuang Project  (IDP), administered by Susan Whitfield of the British Library. The IDP has been digitizing (as images) the Dunhuang manuscripts for several years now, compiling them into an extensive image database. Thus, in this single project, not only are the texts identified and categorized, but they are also preserved for posterity, giving the images a wide range of interpreted value: as historical artifacts, historical records, and textual resources. One does not need to go the trouble of trying to borrow a rare and expensive volume from the library, because the IDP has made this database openly accessible via a web site, through which scholars cannot only retrieve, but can also input information regarding individual objects. Similar work with images is being done by John and Susan Huntington at the Huntington Archives at Ohio State University, where thousands of images of art and architecture are being made available for everyone's perusal.

            The requisite level of technical skill for taking end-usage of these resources is still a bit higher than average, but this demand of technical expertise will steadily fall, as computer-based research methodologies become assimilated into the field, and software becomes more sophisticated. Nonetheless, for the creators of these digital materials, there are, and will continue to be, pressing, and sometimes conflicting demands made in terms of the need for near-professional level computing skills on the part of people who were trained to do drastically different kinds of work. Most of the present creators of the digital materials are humanities scholars who lack significant basic training in computing skills, who, in the development of their materials, are required to solve not only basic computing problems, but are more often than not forced to solve technical problems that have not yet been dealt with even by IT professionals. Therefore there is, for these scholars, a large drain of personal energy in trying to solve problems at the "bleeding edge" of technology, while at the same time trying to carry out the traditional scholarship, which is their basic motivating interest.

            This is a matter that was the focus of an article by Jamie Hubbard a few years back,[10] in which he pointed out the degree of additional requirements of technical skill that the new age of digitization is bringing to scholars--at least those who feel compelled to keep abreast with the development of digital tools that have application in their own research. These scholars have already been expected to be skilled with textual analysis, foreign languages, hermeneutics, history, philosophy and so forth, but were not expected to have a grasp of database usage, or to be able to program in Perl, Visual Basic, Java or XML. There are many who now feel compelled to know something about these new "languages" that were in the past strictly the domain of engineers and programmers. Even if one is not engaged in the creation of digital resources, to merely take advantage of available materials, one needs to at least know how to locate data on the web, download it to one's local system, and unpack a ZIP or Lzh file.

            I think that Dr. Hubbard's observations are far-sighted, especially his overall assessment of the magnitude of the impact of sociological changes being wrought at these times. If we look to the more distant future however, I believe that the present demand for technical skills will eventually subside in a relative sense, both for the average end-user, and for creators of the digital materials. This is because the difficulties presently being encountered by "scholars engaged in electronic resources" have much to do with the newness, coupled with the vast scope, of the transformation of the medium.

The PC is only about fifteen years old, and by the time this article reaches print, the HTML-based Web will only be about five years old. Yet the paradigm shift is radical, vast and multidimensional. One comparison often made is that of the prior major media revolution that occurred with the invention of the printing press. Yet while that shift certainly had huge implications for the reproduction, dissemination and preservation of textual, it can't come anywhere close in scope to the multidimensional shift we are presently witnessing. The digitization of our data allows for instantaneous transmission, location and presentation of not only print media, but also digitized and integrated images and sounds--all at the reach of one's fingertips. With the scope of the change being as vast and radical as it is, things have not yet begun to sort themselves out, and the necessary adjustments in the function of the society itself have not yet come about. Eventually, society, and along with it, academics, will adjust, and the usage of computers will be an almost unnoticed fact of life. This means that people who want to use the technology will not have to be technicians.

            Our generation, which sits on the cusp of this transition, can experience all the novelty and excitement of the changes. For the following generation, the situation will no doubt be quite different. First, two decades from now, all of the basic input work of digitization of human history and culture will be complete. Of course, computer aided methods for its presentation and analysis will continually grow in sophistication, in ways that we can't possibly imagine. But this kind of work will be able to be increasingly handled by pre-existent technology, developed by specialized engineers and programmers. For humanities scholars, there will be a wide range of specific software applications available that will allow them to acquire and analyze their data. Also, the next generation of scholars will have already been raised within the new medium, and will therefore naturally possess the basic computer skills through the course of standard education and cultural conditioning.

 

Tasks at Hand

            But that time is still a generation away. And during this generation, there is going to be a lot of work to be done, a lot of bleeding at the edge, and a lot of changes wrought on the world of scholarship and the related domains of academic publication and library science. One of the biggest problems that we will be encountering is that of markup, a concept still mostly unknown to most humanities scholars, but which will become of vital importance, especially during the period of transition of data sources from print to digital. The concept of markup can best be explained through a couple of simple examples.

            One of the greatest advantages of having textual materials in digital format is the ability to do various kinds of searches that allow the production of data analyses. That means that if I want to engage in some kind of research project concerning the relationship of the Buddhist sangha with the royal families in Tang and Sung China, I can use a search applet to find all occurrences of the term wang (king) ‰¤ in the Chinese Buddhist corpus. However, using a blunt search engine in this way, we will be quickly reminded of the fact that wang is also one of the most common surnames in Chinese history, and therefore our search will turn up large amounts of cases of the term that are inapplicable. But if the textual corpus in question had gone through a process of analysis and markup, then all cases where wang means "king" would be meta-tagged, with something like <ruler>‰¤</ruler>, and all cases where wang is a surname would be tagged as <surname>‰¤</surname>, and the markup-aware search engines would only report the instances of the term that are applicable to our purposes. Of course, we experience a similar problem all the time when we do searches on the Internet. Without markup it is almost impossible for search engines to distinguish between the turkey that is eaten in the United States at Thanksgiving, the one that is a country, and whatever other kinds of turkeys there may be.

            The problem with doing markup however, is that it is not merely a manner of hiring a bunch of undergraduate students to go through text and mark its content. While relatively untrained persons will probably be able to distinguish the varieties of turkeys, most of the more important markup needs to be done by well-trained specialists. Once a person understands the principles of markup of humanities materials, it will of course be advantageous for her/him to mark up his or her own article if there is any chance that the piece will be eventually published electronically--if one wants one's article to turn up appropriately in Web searches. In the future, it will no doubt be the case most of the time articles will be available in digital format, even if they were initially composed for a print edition.

For large pre-existent textual corpora, it will be vitally important for specialists to go through these works and do markup editing. The problem with this notion is that, at present, and certainly for the near future, no academic institution is going to give academic credit for this kind of work. Can you imagine someone applying for tenure at your institution, offering as his or her major work the "Full Text Markup of the Analects?" Yet this work clearly needs to be done as soon as possible. For the meantime, of course, we may still take a middle path by doing simple structural and content markup that does not require great expertise.[11]

            Another characteristic of the task of markup is that it is potentially an infinite process, as scholars from different disciplines are going to be interested in different aspects of an object of knowledge. Nonetheless, it will be extremely useful if they marked up an agreed-upon centrally located version of that object, so that knowledge and insights regarding it could be continually enhanced. This leads us back once again to the next major dimension of the ramification of the change in the media--the bywords of EBTI and SEER: cooperation, collaboration and standardization.

            Up to the present, the model of academic accreditation in the humanities fields has been quite different from that in the natural sciences and medicine. In the latter fields, it has long been the case that a large majority of the projects, reports and articles have been collaborative in nature. This cannot but the case, as sophisticated scientific research usually has to be carried out as a team effort. Therefore academic institutions must expect, and respect, co-authorship as a viable form of scholarly accreditation in these disciplines. In the humanities fields however, the basic model for credit has long been the single authorship system, especially at the level of the article/essay.

            The predominance of the digital medium however, and especially with the World Wide Web, will bring some pressure to bear upon this model. One case in point is that of the above-mentioned editing, markup and other preparation of online textual corpora. The fixture of any data source at a specified location with access allowed by predetermined "editors"[12] will allow contribution from a broad range of scholars around the world who have expertise on a topic. Managers of canonical collections will be able to allow an expert on a particular text to do its editing and markup, and indicate the exact nature and extent of the contribution in the document's header.[13] Another example is that of my own dictionary project, where entries are contributed by scholars from around the world over the Internet by e-mail. In this case, the initials of the contributor are recorded in a designated field of the entry, as are the initials of subsequent editors of the entry. In short, collaboration on large projects will on one hand become much easier, and on the other hand, become a basic necessity, if we are going to fully take advantage of the new medium.

            In order to regulate such projects, to have them carried out in a way that their data communicates with each other, standardization is also going to be of great importance. If one compiler of historical materials uses one proprietary system, and other compiler uses something incompatible, the very power of the computer medium is immediately undermined. This, of course, is a major reason for the establishment of the above-mentioned digital consortia, which are playing a major role in advisement of standards that are clearly defined as well as open. The most basic standard for many forms of data will most likely be the new internet markup language of XML, in combination with its special applications as outlined by the TEI. XML will provide an infinitely more flexible system for transferring and displaying information on the WWWeb than the present relatively fixed HTML format. As XML-encoded texts proliferate, Internet search engines will finally have the means to distinguish between various kinds of turkeys and various kinds of wangs.

TEI will take XML encoding principles into detailed application in various formats and genres of humanities texts, art, and other media. This means that it should not be the case that each person who does markup work on data creates her or his own new set of tags--as the basic categories of tags are already set up by TEI. New tags, when necessary, can be devised based upon the same principles and therefore be easily understood by newcomers to a certain field or genre.

Beyond the matter of credit for collaboration, the humanities disciplines will be challenged by wide range of new transformations, yet unthought-of by many. One will be the challenges of multimedia research and presentation to the traditional notion of the book-style dissertation. As young students come up through the ranks possessing strong basic skills in multimedia development, will it not make sense for a dissertation to be presented in the format of a multimedia CD-ROM? Or to publish on the Web, rather than having it stored as microfilm on a library shelf? There will no doubt initially be much resistance to these kinds of changes, but such resistance will inevitably be swept away by time.[14] There is also a strong precedent in place in present humanities scholarship for the encouragement of such multimedia work in the form of projects such as the Electronic Cultural Atlas Initiative--another of Lewis Lancaster's brainchildren. Unlike the EBTI and SEER, the ECAI is not merely a loose consortium seeking to create communication between disparate projects. It is itself a large, organized project aimed at creating multilayered (in time and space), multimedia, and digital maps of the culture of various regions around the globe. The project is (and clearly has to be) fully collaborative in nature. Each of the teams from various regions (for instance, China, the Caucasus, North America, Middle East, and so on) may have scholars from any of the sub-disciplines of the humanities and social sciences, as well as engineers and information technology specialists of various sorts. The size and scope of this project is expanding rapidly, and its eventual consummation is bound to bring about a wide range of paradigm shifts, especially concerning the matter of what is considered to be creditable scholarly work.

 

The Future of Academic Publishing: Some Predictions

Academic publication, and the various forms of accreditation that are currently attached to it will undergo dramatic shifts--even simply at the level of pure text--not to mention in terms of multimedia. Here I will confine myself to some predictions regarding three general types of texts used in Buddhist/Asian/Humanities studies: the reference work, the full-length book, and the journal/article.

(1) Reference Works - The fact has not yet dawned on many scholars and publishers, but the large reference work is already well on its way to rapid extinction. The giant lexicons, dictionaries, encyclopedias, indexes, and concordances that presently fill our shelves are already the proverbial dinosaur. Relatively small units of compartmentalized data are most fitting for computerized search and retrieval, and are not bothersome to read on a computer screen--where the textual scholars of the future will be doing almost all of their work. Using digital reference works, researchers will be able to search for words and phrases instantaneously. Once they find the object of their search, its entry will be filled with hyperlinks to related information, allowing them to course around in a sea of information at a speed that simply defies comparison to the printed work. This, however, is still only a part of the distinction. To see further, we need to grasp the difference between two important concepts: the static document and the dynamic document.

The static document is typified by the printed page in a book. Once it is completed, it is done forever, no matter how incomplete or imperfect it may be. Since the static document has been the model for textuality since the invention of writing, it is still very hard for many people to conceive of documents in any other way. But the birth of the word-processor, much aided by the WWWeb, has brought with it a vastly different model--that of the dynamic document. The dynamic document is constantly in a state of growth and development--with its own life, the way any static-intended document is while in the period of its composition, prior to its publication. The Internet is naturally filled with dynamic documents, but at this time, only a very small percentage of them contain useful, locatable information. Once, however, you have something like an on-line, sharable database (whether it be a lexicon, image database, encyclopedia, or some unthought-of combination of various types of documents--such as are being developed in ECAI) that is not aimed for eventual petrification as a static document, you have a radically new way of storing, sharing, and developing scientific information. As is the case with the DEABT/DEALT, there is not a single entry that is ever "finished"--all units of information are capable of being revised at any time, and there is hardly a time when I look at a prior entry and don't do something to improve it. This continued revision is now being dramatically enhanced by the input I am receiving from scholars around the world.

The DEABT and DEALT, then, are compilations operating in a different dimension from their static ancestors, as they are open-ended in time and space, unlimited in authorship. By comparison, a large reference compilation project, sponsored by major company, based on a fixed budget and a fixed publication date, can only have a the static model as its eventual goal--even if that static document is in digital format. No matter how large the reference work is made, it will be limited in the number and size of entries, and decisions will have to be made about what to leave out. Therefore, even if it is digitized and released for sale as a CD-ROM, it still cannot compete with an open, dynamic work, which is endlessly incorporating new information, endlessly undergoing revision, and continually hyperlinking to other related resources. The dynamic document-as-reference work is borderless, and thus infinite in scope.

It is instructive to note that the publishers of Encyclopedia Britannica have already realized the inevitability of the predominance of electronic publication, and have all but given up with print versions of their famous reference work, now dealing almost exclusively in CD-ROMs. This type of implementation certainly takes good advantage of the technology, but it is a middle stage with a yet uncertain destination, as the attitude toward the material is still basically that of the static document. It cannot be corrected and edited by specialists in various fields, and it cannot be developed in concert with related projects around the globe. Nor can it be effectively interlinked with them. And no matter what, once it's done, it's done.

I venture to say, therefore, that scholarly reference works undertaken after the full transition to the new medium will necessarily be dynamic, open, collaborative projects. They will be primarily Internet based (although they can be distributed as CD, or even printed as a book, if necessary). As Internet documents, they will be able to take full advantage of being networked with other related resources, resulting in a vast, interlinked knowledge base, with each part of the base being a single point--not unlike the image of Indra's Net. The large, desktop reference book is history, and digital works that are based on the static model will not be able to compete, no matter how large the funding, as their contents will be eventually assimilated into the dynamic Network.

(2) The Book - At the other end of the spectrum will be the traditional, full-length book. Whether or not the book will be finally supplanted by an electronic equivalent remains difficult to see at the present moment, but there are clearly aspects of the book that will tend to ensure its survival for a much greater time than the reference work. In the case of a full-length book, be it a popular novel or scholarly monograph, the benefits of digitization are much less prominent. Yes, one may still search for a word or phrase more quickly in a digital version, but the main purpose of a book is not for data searching, but for long-term, and often relaxed, reading. We like to carry books with us on the train, plane, leave them in the bathroom, and browse them from our shelves. Although the "electronic book" is already making a bit of headway, it will remain a fairly expensive piece of electronic equipment. It can't be dropped or get wet. It will need a power source, and will have a range of complicated settings, and so forth. The continued success of daily newspapers may also be taken as an indicator of the perdurance of certain forms of reading habit.

Nonetheless, since virtually all books will be originally developed in software format, software versions of books will be available. We might imagine, therefore, that publishers may be motivated to include a digital version of the book, or at least some of the book's essential data, in an attached CD-ROM (as is already often done with computer-related books), so that its contents may also be searched and utilized digitally. Libraries may also find it easier and cheaper to acquire and store digital versions of books, rather than their paper equivalents. Thus, as will be the case in other areas, simple economics will no doubt be a major influence in determining the trend.

(3) Articles and Journals Occupying the area of a fascinating and somewhat unclear middle ground is the scholarly journal and its articles. Whatever the final extent of the changes, there is no doubt this area will be radically transformed in the coming years, as articles are also well suited in many ways for Internet publication. First, they share to great extent with reference works in being relatively small, compartmentalized units of data. Their elements can be easily hyperlinked to other articles (once these are also on the Web). The speed of getting articles prepared for Internet publication is far faster, and less expensive, than it is for print journals. Besides being expensive, print journals are also often not that easy to get hold of, compared to Web-based articles. While there is no doubt that there are many people (such as myself) who would prefer to have a hard copy of the journal in hand for relaxed perusal, the final outcome may well be out of our hands, as the ultimate terms of determination are going to be primarily economic. The simple fact of shrinking library budgets coupled with the rapid profusion of E-journals, makes it hard to imagine the long-term survival of the paper journal--at least in its present predominating form. Whether or not journals discontinue their print versions, they are going to have to put a digital version on the web. Scholars of the future will do a large portion of their research by means of incredibly fast, efficient, and accurate web searches, and journals that are not digitized and on the network simply will become invisible.

The transformation to the web-based journal is going to bring with it some very radical changes to the humanities disciplines, the likes of which have never been seen before. This is because peer-reviewed journal publication has for so long been one of the most important standards for the awarding of academic accreditation and promotion. With the advent of the latest generation of software, the publication of one's article on the Internet has become the easy option of anyone who can use a word processor and FTP program. This means that scholars (and non-scholars) will be able to place any of their articles on their own home page at will.

Due to this fact, even if the E-journal peer review system can be well established, the E-journals will have a far different type of environment in which to compete as compared with the situation experienced by their print predecessors. During the age of the print journal, the journal provided two vital services: (1) the bringing of the work into public availability and (2) academic accreditation. If a strong peer-review guild can be developed, academic journals will still be able to provide the latter, but they will have little or nothing special to offer regarding the former, as the Internet provides an easy, fast, way to publish one's works, without having to jump through any bureaucratic or academic hoops. Furthermore, digital data, and the articles that contain this data, will not be found by looking at the tables of contents of E-journals, but by Web search engines.

Another dimension to the matter is the fact that the peer-review system itself has always been plagued by the tendency to turn into a buddy system, or a system that at least prioritizes the works of well-known scholars, as compared with unknown scholars. So if a competent scholar writes an article that he or she is quite satisfied with, and is not especially concerned for one reason or another about getting the piece peer-reviewed, that scholar may just as well place the article on her own web site, or some other well-trafficked web site. Therefore, the E-journals are going to need to do something special to continue to draw good material.

The other important aspect of journal/article publication that will be challenged will be that of the static document model. Presently, online academic journals, despite their dynamic potentials, are mere replications of the static/print model on the digital/dynamic matrix. That means that the publishers are essentially still perceiving publication through the paradigm of print media, and just placing the equivalent of hard copy on a web site, with some hyperlinks added. As in the case of reference works, there is no reason to why articles cannot be published as dynamic documents, and be updated, re-edited, and corrected as necessary. No doubt in many situations the static document will be satisfactory for its author. But what happens if it is not? In the case of a dynamic document article, the author would be better off having it in a place where s/he may edit freely.

The traditional paper journal will eventually disappear, or take a decidedly secondary role. But this will happen much more slowly, and perhaps not quite so completely as is the case with reference works. E-journals will come to the fore, but in a variety of types and taking advantage of different types of technology. There will be much trial and tribulation in the reworking of the processes of academic accreditation, and a whole new type of dynamic of canonicity is likely to develop on its own.

 

Aside from these three main kinds of scholarly media, we will also have greater access to unpublished theses and dissertations. Since these are now all composed on computers anyway, and the present generation of word-processing software offers automatic conversion to HTML format, it will require little effort or skill to simply place one's unpublished work (whatever it may be) on a web site or CD-ROM. Once again, there will also be a strong need for new types of scholarly accreditation bodies for digital materials. This has already begun to occur in the case of the peer-reviewed E-journals, but the model will need to be extended to other forms of publication as well. We will need some means of sifting out the works of acceptable quality from the rubbish--which brings us back around to the topic of finding, acquiring and cataloging.

 

Libraries

There is probably no single field in academia that will be transformed as much as library science. Since the traditional book and journal are not likely to disappear at any time in the foreseeable future, librarians (at least some of them) will need to retain the full gamut of present methodologies for dealing with paper texts (along with whatever new digital means appear to support the traditional model using computerized data systems). In addition to this, however, they will be obliged to attempt to provide their campuses access to a wide range of digital information, which they will have to categorize and present in new ways. During this generation of transition, there is little doubt that some libraries (and some librarians) are going to adapt better than others.[15] The field of library science does perhaps have some advantage over other humanities disciplines, in that the models of collaboration, cooperation and standardization have traditionally been a central component of the field, and it is therefore possible that pre-established principles can be more readily reworked to meet the demands of the digital age. Librarians will also have no recourse but to work closely with scholars who are engaged with digital materials, and lines of distinction and responsibility will often become blurred.

We can at this point only begin to fathom the extent to which this dramatic shift in the media of communication is going to change humanities scholarship--but we will without doubt experience a level of impact greater than that felt by society at large. What is certain, is that whether we like it or not, it will come.

 

 

Acknowledgments

Significant direct input on the composition of this article came from Christian Wittern, Charles Prebish, Matthew Ciolek, C.C. Hsieh, and Jamie Hubbard. Indirect, but valuable input was also received from Lew Lancaster, John Lehman, Howie Lan, Tetsuya Katsumura, Susan Whitfield, Maureen Donovan, Jost Gippert, Kosei Ishii, Shigeki Moro, John Huntington, Lou Burnard, and Iain Sinclair.




[1] "E-Journals Make Their Move", RSN, Nov. 1998, p. 32-33

[2] Digital text versions of the Taisho portion of the Chinese canon (i.e., excluding the Zokuzokyo, Han'guk pulgyo chonso and other canonical anthologies) usually occupy approximately 200 MB of disk space.

[3] The Big5 character set used in Taiwan contains about 13,000 characters; up until recently the JIS set used in Japan contained about 6300 characters, while the KSC set used in Korea contained less than 6000 characters. The Han character portion of the present Unicode set contains almost 21,000 characters.

[4] The computing group that has been digitizing the Chinese literary corpus at Academia Sinica (headed by C. C. Hsieh), has already identified some 140,000 separate characters, or "glyphs."

[5] This system has since been borrowed by a wide number of CJK textual input projects, with some variations.

[6] Since that time the EBTI has been meeting on a regular basis, including meetings in 1994 at Haein-sa (Korea), in 1996 at Fo Kuang Shan (Taiwan), in 1997 at Otani University (Japan) and 1999 at Academia Sinica (Taiwan). The next meeting is scheduled for the year 2000 at UC Berkeley. Each of these meetings has shown growth in numbers and sophistication of projects, as well as degree of sophistication, cooperation and standardization.

[7] Dongguk University and SUNY Stony Brook, with the collaboration of about twenty scholars in the field of East Asian Buddhism, began this project in 1997, and hope to start producing the first volumes within the next couple of years.

[8] Search and Replace is a very inexpensive piece of shareware that one may use on any Windows system to search any language. The multiple-file search applet contained in the recent release of the text editor Textpad (www.textpad.com) is probably four or five times as fast as Search and Replace.

[9] Compiled by Urs App and Christian Wittern (available on the ZenBase CD), to which I have also been adding.

[10] "Upping the Ante" Journal of the International Association of Buddhist Studies (JIABS)  18.2, Winter '95 pp. 309-322.

[11] Those who are interested in understanding the basic principles of humanities markup, may browse through the online materials supplied by the Text Encoding Initiative. See URL below.

[12] For an example of such a project already underway, in which contributors enter information into a central database, see the Ricci 21st Century Roundtable and International Dunhuang Project. URLs below.

[13] The Textual Encoding Initiative offers clearly defined header structures for this purpose.

[14] Christian Wittern adds here: "While in the 19th century it was unthinkable to publish a dissertation in a language other than Latin, the situation has been completely reversed – although still possible in some cases, nobody even thinks of writing a study of Confucius in Latin."

[15] An early example of library science paying direct attention to the rapid developments in digital textuality can be seen in the creation of AsianDoc, a combination mail list and web site forum, administered by Maureen Donovan of the Ohio State University library.