Module 2: The TEI Header

4. The Header of a Complex Text

Before we end, let’s go back to where we left you: the library, in front of the library catalogue or computer screen. Prepared for the possibility that this copy of the book may be in loan, you find another reference to The Wild Ass’s Skin in the record of La Comédie Humaine (look for “505 8 0 |gPhilosophic and analytic studies: v. 41. The|tmagic skin”; The Magic Skin is an alternative title for the English translation):

Figure 4. A library catalogue record for a complex work.
Now that’s a record! If you thought the truckload of possibilities for the description of electronic texts in the TEI header set your head spinning already, imagine what an electronic edition of La Comédie Humaine might look like! Code 300 tells us that it has no less than 53 volumes, with different titles per volume.

One way of encoding this majestic work as a whole would be to treat La Comédie Humaine as a kind of “supertext” containing all different works. This can be done in TEI by treating the whole as a <teiCorpus>, containing each separate work in its own <TEI> text. As each of these <TEI> texts needs its own TEI header, you can imagine the amount of meta-information, much of which will have to be repeated. This can be avoided by placing the common meta-information in the <teiHeader> element of the <teiCorpus> element, while retaining all work-specific meta-information in the TEI header section of the respective <TEI> text. This mechanism allows you to be maximally expressive in the description of all texts in a TEI corpus, and maximally efficient in the reduction of common information in the individual TEI headers.

The following example gives an impression of what a TEI header for an electronic edition of La Comédie Humaine might look like:

<teiCorpus xmlns="">
<!-- general TEI header information or the entire corpus -->
<title>La Comédie Humaine</title>
<author xml:id="HdB">Honoré de Balzac</author>
<editor role="editor" xml:id="T​BEcrew">The T​BE crew</editor>
<name xml:id="RvdB">Ron Van den Branden</name>
<sponsor>Association for Literary and Linguistic Computing (A​LLC)</sponsor>
<sponsor>Centre for Data, Culture and Society, University of Edinburgh, UK</sponsor>
<sponsor>Centre for Computing in the Humanities (C​CH) - King​'s College London</sponsor>
<sponsor>University College London (U​CL)</sponsor>
<addrLine>Centre for Scholarly Editing and Document Studies (C​TB)</addrLine>
<addrLine>Royal Academy of Dutch Language and Literature</addrLine>
<addrLine>Koningstraat 18</addrLine>
<addrLine>9000 Gent</addrLine>
<principal xml:id="EV">Edward Vanhoutte</principal>
<principal xml:id="MT">Melissa Terras</principal>
<extent>0​.5 Gb</extent>
<publisher>Centre for Scholarly Editing and Document Studies (C​TB)</publisher>
<distributor>Centre for Computing in the Humanities (C​CH) - King​'s College London</distributor>
<name type="institution">Centre for Scholarly Editing and Document Studies (C​TB)</name>
<name type="institution">Royal Academy of Dutch Language and Literature</name>
<street>Koningstraat 18</street>
<name type="city">Gent</name>
<name type="country">Belgium</name>
<idno type="I​SBN">0​-00​-000000​-9</idno>
<availability status="free">
<licence>Published under a
<ref target="http​://creativecommons​.org​/licenses​/by​-sa​/3.0/">Creative Commons Attribution Share​Alike 3​.0 License</ref>
<date when="2110​-01​-01">1 January 2110</date>
<title>The T​BE collection: sample texts encoded with T​EI.</title>
<name>Edward Vanhoutte</name>
<idno type="I​SSN">0000​-0001</idno>
<note>O​CR scanning done at K​ANTL, Ghent.</note>
<title>La Comédie Humaine</title>
<author>Honoré de Balzac</author>
<date from="1895" to="1900">1895​-1900</date>
<extent>53 v.</extent>
<correction method="markup">
<p>Apparent errors have been corrected using the
elements, wrapped in a
<normalization method="markup" source="http​://www​.oed​.com/">
<p>Spelling has been modernised using the
elements, wrapped in a
<quotation marks="all">
<p>Diplomatic transcription, all original quotation marks have been retained and normalised to double quotation marks.</p>
<hyphenation eol="none">
<p>End​-of​-line hyphenation has been removed. All other hyphenation has been retained.</p>
<p>Thematic analysis added, studying the main motifs.</p>
<p>Names and dates are marked.</p>
<p>Text encoded for
<soCalled>The T​BE collection: sample texts encoded with T​EI</soCalled>
, aiming at providing a collection of prime exemplar T​EI encoded materials.</p>
<p>The paragraphs in the text are numbered with the
attribute. Each number consists of four digits; numbering is consecutive throughout the book. For example:
numbers the 203th paragraph throughout the book.</p>
<p>Each chapter is identified with a formal identification code inside the
attribute. Chapters are numbered using arabic numerals. The codes are composed by concatenating the identification codes for all ancestor text divisions down to the chapter level, with the dot as separation marker. For example:
identifies the third chapter of the second book of the first volume.</p>
<taxonomy xml:id="D​DC">
<title>Dewey Decimal Classification</title>
<edition>Abridged Edition 14</edition>
<ptr target="http​://www​.oclc​.org​/dewey​/versions​/abridgededition14​/default​.htm"/>
<taxonomy xml:id="lcsh">
<title>Library of Congress Subject Headings</title>
<metSym value="-">long syllable</metSym>
<metSym value="u">short syllable</metSym>
<metSym value="|">foot boundary</metSym>
<metSym value="/">line boundary</metSym>
<variantEncoding method="parallel​-segmentation" location="internal"/>
<language ident="en" usage="98">English</language>
<language ident="fr" usage="1">French</language>
<language ident="ar" usage="1">Arabic</language>
<change when="2109​-12​-20" who="#RvdB">final proofing</change>
<change when="2009​-03​-08" who="#MT">addition of thematic analysis</change>
<change when="2009​-03​-08" who="#RvdB">addition of explanatory notes</change>
<change when="2008​-10​-25" who="#Rvdb">spell check</change>
<change when="2008​-08​-25" who="#Rvdb">addition of phrase level markup</change>
<change when="2008​-08​-20" who="#Rvdb">file creation</change>
<!-- ... -->
<!-- specific TEI header information for the distinct TEI documents -->
<title>The Wild Ass​’s Skin: an electronic edition</title>
<editor role="translator" xml:id="EM">Ellen Marriage</editor>
<editor role="preface" xml:id="GS">George Saintsbury</editor>
<edition n="2​.0">
<title>Version 2​.0, enriched with thematic annotations.</title>
<date when="2010">2010</date>
<name>Melissa Terras</name>
<resp>Added thematic annotations.</resp>
<extent>572 Kb</extent>
<idno type="I​SBN">0​-00​-000000​-0</idno>
<date when="2010​-01​-01">1 January 2010</date>
<title>The Wild Ass​’s Skin</title>
<author>Honoré de Balzac</author>
<date when="1906">1906</date>
<extent>288 p.</extent>
. Translated by
<editor role="translator">Ellen Marriage</editor>
; preface by
<editor role="preface">George Saintsbury</editor>
<creation>Original written in
<date when="1831">1831</date>
<name type="city">Paris</name>
<language ident="en" usage="98">English</language>
<language ident="fr" usage="1">French</language>
<language ident="ar" usage="1">Arabic</language>
<keywords scheme="#lcsh">
<item>Fiction and juvenile belles lettres</item>
<item>Literature​--Translations into English</item>
<classCode scheme="#D​DC">843​.7</classCode>
<catRef target="#B​CS.phil" scheme="#B​CS"/>
<handNote xml:id="JH" scribe="James​Harding" script="hand" medium="ink​.blue">handwriting in blue ink by James Harding, previous owner of the book</handNote>
<handNote xml:id="ar" script="arabic">Arabic script</handNote>
<change when="2009​-03​-08" who="#MT">addition of thematic analysis</change>
<change when="2009​-03​-08" who="#RvdB">addition of explanatory notes</change>
<change when="2008​-10​-25" who="#RvdB">spell check</change>
<change when="2008​-08​-25" who="#RvdB">addition of phrase level markup</change>
<change when="2008​-08​-20" who="#RvdB">file creation</change>
<!-- ... -->
<!-- ... -->
Example 39. Encoding common metadata of a composite work in the <teiHeader> of a <teiCorpus> element.


The TEI Guidelines provide refined ways of associating contextual information with specific (parts of) texts. See section 15.3 Associating Contextual Information with a Text for more information.


A complex text encoded as a <teiCorpus> should have a <teiHeader> in its own right. This TEI header on the corpus level can contain the general descriptive information about all corpus texts embedded as <TEI> documents. Each corpus text then should have its own <teiHeader>, describing only those aspects that are specific to that text.