Module 1: Common Structure, Elements, and Attributes
2. General TEI Document Structure #
The TEI makes use of XML as its governing metalanguage. This means that all TEI metadata are expressed as XML elements and thus comply with the World Wide Web Consortium XML Recommendation. Information (plain text) is contained in XML elements, delimited by start tags (e.g., <TEI>) and end tags (e.g., </TEI>). Additional information to these XML elements can be given in attributes, consisting of a name (e.g., xml:id) and a value (e.g., "text1"). XML comments are delimited by start markers (<!--) and end markers (-->).
Note
In these TEI by Example tutorials, names of TEI components are formatted in a specific way:- Element names are printed in monospace between pointy brackets, e.g., <TEI>
- Attribute are displayed in monospace, and prefixed with the “at” sign, e.g., @n
- Class, datatype, and macro names are displayed in monospace, e.g., att.global
A full TEI document consists of one single <TEI> element, which consists of two major components:
- <teiHeader>: an element containing all the metadata describing the document.
- <text>: an element containg the actual document
This common structure is mandatory for all “standard” TEI documents.
This is an example of a TEI XML text, containing both information and meta-information. This example, as any TEI text, is recognizable as a TEI text by the outermost <TEI> element, which is declared in the dedicated TEI namespace (http://www.tei-c.org/ns/1.0). Before proceeding, let’s first have a look at the namespace declaration. In the previous example, the TEI namespace is declared as the “default” namespace, i.e., without any prefix. It could have been expressed equally as follows:
Here, the namespace declaration xmlns:tei="http://www.tei-c.org/ns/1.0" on the <TEI> element binds the TEI namespace URI (http://www.tei-c.org/ns/1.0) to the namespace prefix tei. All descendant elements using that prefix before the actual element name belong to this namespace (e.g., <tei:teiHeader>). Yet, the <text> element contains its own namespace declaration: xmlns="http://www.tei-c.org/ns/1.0, only this time it is binding it to an empty namespace prefix. All descendant elements without a namespace prefix (in the “default” namespace), will belong to this namespace. Since both namespace declarations in the previous example are referencing the same namespace URI, the previous example is equivalent to the first.
Because the TEI namespace is vital to any TEI element, the examples in these TEI by Example tutorials will explicitly render their top-level element(s) with a “default” (i.e., without namespace prefix) namespace declaration for the TEI namespace URI. In order not to hamper legibility, no namespace prefix will be used, and the namespace declaration won’t be repeated on any descendant elements.
2.1. TEI Header #
The TEI header (<teiHeader>) is mandatory and contains descriptive meta-information about the document. The <teiHeader> minimally contains a description of the electronic file inside a (<fileDesc>). The latter element consists of three mandatory components:
- the title statement (<titleStmt>), providing information about the title (<title>), author (<author>), and others responsible for the electronic text
- the publication statement (<publicationStmt>), providing publication details about the electronic text in a structured way or as prose inside a paragraph (<p>)
- a description of the source (<sourceDesc>), documenting bibliographic details about the electronic text’s material source (if any) in a structured way or in a prose paragraph (<p>)
2.2. Text #
2.2.1. Body #
The actual text (<text>) contains a single text of any kind. This commonly contains the actual text and other encodings. A text <text> minimally contains a text body (<body>). The body contains lower-level text structures like paragraphs (<p>), or different structures for text genres other than prose: lines (<l>) for poetry, speeches (<sp>) for drama.
2.2.2. Front #
Next to the <body>, a text can optionally contain front matter which may be encoded with <front>. Clear examples are title pages, headers, prefaces, or dedications. Prologues in drama or forewords and introductions in prose may also be considered prefatory material. May, because the encoder may choose simply not to encode the front matter of a text as such. With exception of the title page, for which the TEI defines specific elements, front matter should be encoded using the same elements as the rest of a text. This means that there are no specific elements to encode prefaces, dedications, abstracts, frontispieces etc. Instead, either numbered or un-numbered divisions <div> with an attribute @type are used to distinguish between the different components of a <front> section. The following suggested values for the @type attribute may be used for this purpose:
- "preface": a foreword or preface addressed to the reader
- "ack": a formal declaration of acknowledgement by the author
- "dedication": a formal offering or dedication of a text by the author
- "abstract": a summary of the content of a text as continuous prose
- "contents": a table of contents. A <list> element should be used to mark its structure
- "frontispiece": a pictorial frontispiece, possibly including some text
2.2.3. Back #
All back matter to a text may be grouped within <back>. As is the case with <front>, either numbered or un-numbered divisions <div> with a @type attribute are used to distinguish the different components. The following attribute values may be supplied for the @type in order to distinguish various kinds of division characteristic of back matter:
- "appendix": an appended self-contained section of a work, often providing additional information or text
- "glossary": contains a list <list>of terms and their explanations
- "notes": a section in which textual or other kinds of notes are gathered together
- "bibliogr": contains a list of bibliographical citations <listBibl>
- "index": any form of index to the work
- "colophon": a statement appearing at the end of a book describing the conditions of its physical production
2.2.4. Full Example <text> #
2.2.5. Unitary or Composite Texts #
Apart from simple texts, TEI provides means to encode composite texts, either by grouping structurally related texts in a <group> element inside <text>, or treating them as a corpus of diverse texts, using <teiCorpus> as the outermost element.
2.2.6. Summary #
The following example shows the empty framework of a basic TEI document structure:
The following example fills this empty framework with the text of the examples: