Module 0: Introduction to Text Encoding and the TEI

1. Introduction

Computers can only process texts whose characters are represented by a system that relates to the binary system computers can interpret. This is called character encoding. One such character encoding scheme based on the English alphabet is ASCII (American Standard Code for Information Interchange). Character encoding facilitates the storage of text in computers and the transmission of text through telecommunication networks. Character encoding, however, does not say anything about the semantics, interpretation or structure of a text. Such information on a text is called meta-information. If we want to add any meta-information to a text so that it can be processed by computers, we need to encode or markup texts. We can do this by inserting natural language expressions (or codes representing them) in the text with the same character encoding the text is using, but separated from the text by specific markers. One such an expression, we call a tag. All of the tags used to encode a text together constitute a markup language. The application of a markup language to a text, we call text encoding.

The Text Encoding Initiative (TEI) is a standard for the representation of textual material in digital form through the means of text encoding. This standard is the collaborative product of a community of scholars, chiefly from the humanities, social sciences, and linguistics, who are organised in the TEI Consortium (TEI-C, https://tei-c.org). The TEI Consortium is a non-profit membership organisation which governs a wide variety of activities such as the development, publication, and maintenance of the text encoding standard documented in the TEI Guidelines, the discussion and development of the standard on the TEI mailing list (TEI-L) and in Special Interest Groups (SIG), the gathering of the TEI community on yearly members meetings, and the promotion of the standard in publications, on workshops, training courses, colloquia, and conferences. These activities are generally open to non-members as well.

By TEI Guidelines one may refer both to the markup language and tag set proposed by the TEI Consortium and to its documentation online or in print. Informally TEI Guidelines is often abbreviated to TEI. In this these tutorials TEI Guidelines is used as the general term for the encoding standard. The TEI Guidelines are widely used by libraries, museums, publishers, and individual scholars to present texts for online research, teaching, and preservation. Since the TEI is expressed in terms of the eXtensible Markup Language (XML) and since it provides procedures and mechanisms to adapt to one’s own project needs, the TEI Guidelines define an open standard that is generally applicable to any text and purpose.

This introductory tutorial first introduces the concepts of text encoding and markup languages in the humanities and then introduces the TEI encoding principles. Next, this tutorial provides a brief historical survey of the TEI Guidelines and ends with a presentation of the Consortium’s organisation.

Bibliography

  • Barnard, David T., Cheryl A. Fraser, and George M. Logan. 1988. “Generalized Markup for Literary Texts.” Literary and Linguistic Computing 3 (1): 26–31. 10.1093/llc/3.1.26.
  • Barnard, David T., Ron Hayter, Maria Karababa, George M. Logan, and John McFadden 1988. “SGML-Based Markup for Literary Texts: Two Problems and Some Solutions.” Computers and the Humanities 22 (4): 265–276.
  • Berkowitz, Luci, Karl A. Squitier, and William H. A. Johnson. 1986. Thesaurus Linguae Graecae, Canon of Greek Authors and Works. New York/Oxford: Oxford University Press.
  • Bray, Tim, Jean Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation 10-February-1998. https://www.w3.org/TR/1998/REC-xml-19980210 (accessed September 2008).
  • Burnard, Lou 1988. “Report of Workshop on Text Encoding Guidelines.” Literary and Linguistic Computing 3 (2): 131–133. 10.1093/llc/3.2.131.
  • Burnard, Lou, and C. M. Sperberg-McQueen. 2006. “TEI Lite: Encoding for Interchange: an introduction to the TEI Revised for TEI P5 release.” February 2006 https://tei-c.org/release/doc/tei-p5-exemplars/html/tei_lite.doc.html.
  • DeRose, Steven J. 1999. “XML and the TEI.” Computers and the Humanities 33 (1–2): 11–30.
  • Goldfarb, Charles F. 1990. The SGML Handbook. Oxford: Clarendon Press.
  • Hockey, Susan 1980. Oxford Concordance Program Users’ Manual. Oxford: Oxford University Computing Service.
  • Ide, Nancy M., and C. M. Sperberg-McQueen. 1988. “Development of a Standard for Encoding Literary and Linguistic Materials.” In Cologne Computer Conference 1988. Uses of the Computer in the Humanities and Social Sciences. Volume of Abstracts. Cologne, Germany, Sept 7–10 1988, p. E.6-3-4.
  • ———. 1995. “The TEI: History, Goals, and Future.” Computers and the Humanities 29 (1): 5–15.
  • Kay, Martin 1967. “Standards for Encoding Data in a Natural Language.” Computers and the Humanities, 1 (5): 170–177.
  • Lancashire, Ian, John Bradley, Willard McCarty, Michael Stairs, and Terence Russon Woolridge. 1996 Using TACT with Electronic Texts. New York: Modern Language Association of America.
  • Russel, D. B. 1967. COCOA: A Word Count and Concordance Generator for Atlas. Chilton: Atlas Computer Laboratory.
  • Sperberg-McQueen, C. M. 1991. “Text in the Electronic Age: Textual Study and Text Encoding with examples from Medieval Texts.” Literary and Linguistic Computing 6 (1): 34–46. 10.1093/llc/6.1.34.
  • Sperberg-McQueen, C. M., and Lou Burnard (eds.). 1990. TEI P1: Guidelines for the Encoding and Interchange of Machine Readable Texts. Chicago/Oxford: ACH-ALLC-ACL Text Encoding Initiative. https://tei-c.org/Vault/Vault-GL.html (accessed October 2008).
  • ———. 1993. TEI P2 Guidelines for the Encoding and Interchange of Machine Readable Texts Draft P2 (published serially 1992–1993); Draft Version 2 of April 1993: 19 chapters. https://tei-c.org/Vault/Vault-GL.html (accessed October 2008).
  • ———. 1994. Guidelines for Electronic Text Encoding and Interchange. TEI P3. Oxford, Providence, Charlottesville, Bergen: Text Encoding Initiative.
  • ———. 1999. Guidelines for Electronic Text Encoding and Interchange. TEI P3. Revised reprint. Oxford, Providence, Charlottesville, Bergen: Text Encoding Initiative.
  • ———. 2002. TEI P4: Guidelines for Electronic Text Encoding and Interchange. XML-compatible edition. XML conversion by Syd Bauman, Lou Burnard, Steven DeRose, and Sebastian Rahtz. Oxford, Providence, Charlottesville, Bergen: Text Encoding Initiative Consortium. https://tei-c.org/Vault/P4/doc/html/ (accessed October 2008).
  • TEI Consortium. 2007. TEI P5: Guidelines for Electronic Text Encoding and Interchange. Oxford, Providence, Charlottesville, Nancy: TEI Consortium. https://tei-c.org/Vault/P5/1.0.0/doc/tei-p5-doc/en/html/ (accessed October 2008).