TEI by Example Module 2: The TEI Header Ron Van den Branden Edward Vanhoutte Melissa Terras Association for Literary and Linguistic Computing (ALLC) Centre for Data, Culture and Society, University of Edinburgh, UK Centre for Digital Humanities (CDH), University College London, UK Centre for Computing in the Humanities (CCH), King’s College London, UK Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium
ctb@kantl.be
Edward Vanhoutte Melissa Terras
Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Gent
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium

Licensed under a Creative Commons Attribution ShareAlike 3.0 License

9 July 2010
TEI by Example. Edward Vanhoutte editor Ron Van den Branden editor Melissa Terras editor

Digitally born

TEI by Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). Besides a general introduction to text encoding, step-by-step tutorial modules provide example-based introductions to eight different aspects of electronic text markup for the humanities. Each tutorial module is accompanied with a dedicated examples section, illustrating actual TEI encoding practice with real-life examples. The theory of the tutorial modules can be tested in interactive tests and exercises.

en-GB technical revision corrected significant typo (biblStruct for biblFull), removed ref around gi release corrected typos + examples creation
Module 2: The TEI Header
Desiderius Erasmus: Colloquia familiaria

This example features the TEI header for the transcription of Colloquia familiaria, a series of colloquia written by Desiderius Erasmus. They are encoded and made available by the Stoa Consortium, University of Kentucky.

This is an excellent example of a TEI header. The file description provides the minimal information sections about the title and responsibilities of the electronic text, its publication, and its source. Editorial principles are documented in encodingDesc, which also has a statement about sampling decisions in samplingDesc (see section 2.3.2 The Sampling Declaration of the TEI Guidelines). It also contains a formal declaration of a reference system, for which it makes use of refState elements (see section 2.3.5.3 Milestone Method of the TEI Guidelines). Two classification systems are declared in classDecl: Library of Congress Subject Headings and Library of Congress Classification. The next header section, profileDesc, contains the actual classification of the text according to both systems, in textClass. This is a nice illustration of two classification strategies: using natural language keywords (keywords) or abstract classification codes (classCode). Also, the languages of the text are formally declared in langUsage. Finally, a complete revision history is available in revisionDesc.

Colloquia familiaria Desiderius Erasmus Encoded by Jennifer K. Nelson Gretche Ermer A. Ross Scaife Cultural Heritage Language Technologies NSF-EU Stoa Consortium Lexington, KY
Department of Modern and Classical Languages, Literatures and Cultures 1055 Patterson Office Tower University of Kentucky Lexington, KY 40506-0027 url:http://www.stoa.org
2002-09-05
Erasmus, Desiderius, d. 1536 Desiderii Erasmi Roterodami colloquia familiaria ad optimarum editionum fidem diligenter emendata, cum succincta difficiliorum explanatione Ed. stereotypa Lipsiae sumptibus Ottonis Holtze 1867-1872 771 p. (2 vols) ; 15 cm.

Editorial notes in the Holtze edition have not been reproduced

Original punctuation conventions in the Holtze edition have been retained as much as possible

Hyphenated words that appear at the end of the line in the Holtze edition have been reformed

Italics are recorded without interpretation

Library of Congress Subject Headings Library of Congress Classification
Latin Ancient Greek Dialogues, Latin (Medieval and modern) Folly -- Early works to 1800 PA8501 [ed] added text of "De utilitate colloquiorum ad lectorem" [ed] expanded TEI header to include more information, e.g. LC subject headings and LC classification [markup]: began tei-compliant markup
Adapted from a TEI P4 XML encoding of the header for Desiderius Eramus’ Colloquia Familiaria (Erasmus, 1872).
Thomas Wentworth Higginson: Letter of 7 November 1885

This example shows the TEI header of the digital edition of a letter of 7 November 1885 by the American minister and writer Thomas Wentworth Higginson, encoded and made available by the Lincoln Electronic Text Center of the University of Nebraska.

This TEI header provides detailed documentation about the electronic text in fileDesc. The title statement not only identifies the people responsible for transcription and markup, but also for the technical processing of the letters by means of stylesheets. The extent section needs to be completed still; of course, this can only be done after completion of the encoding. Notice the detailed statement of availability in availability. The source text in which this letter has been published is described using the biblFull element; notice how its sections reflect the actual file description in the TEI header of the electronic text (apart from the sourceDesc section). The notesStmt seems to be used to record some loose annotations about the source text.

The encoding description section only contains a description of the editorial practice in editorialDecl. This is done in a prose paragraph. The header is concluded by a minimal revision description, recording only one change.

Correspondence of Thomas Wentworth Higginson, 1865—1910 Thomas Wentworth Higginson Transcribed by Melissa Sinner Encoded by Margaret Mering Laura Weakly Stylesheet created by Brian L. Pytlik Zillig Commentary on the Letters by Linda Ray Pratt *** kb University of Nebraska—Lincoln Electronic Text Center LC1885k07

This work is the property of the University of Nebraska—Lincoln. It may be copied freely by individuals for personal use, research, and teaching (including distribution to classes) as long as this statement of availability is included in the text. It may be linked to freely in Internet editions of all kinds, including for—profit works.

Publishers, libraries, and other information providers interested in providing this text in a commercial or non—profit product or from an information server must contact the University of Nebraska—Lincoln for licensing and cost information.

Scholars interested in changing or adding to these texts by, for example, creating a new edition of the text (electronically or in print) with substantive editorial changes, may do so with the permission of the University of Nebraska—Lincoln. This is the case whether the new publication will be made available at a cost or free of charge.

2001
Carlton and Territa Lowenberg Collection, Archives and Special Collections, University of Nebraska—Lincoln Libraries. Lowenberg, Carlton; Lowenberg, Territa A., 1825—1991. Archives and Special Collections, Electronic Text Center University of Nebraska—Lincoln Libraries 2001
Electronic Text Center 319 Love Library P.O. Box 884100 University of Nebraska—Lincoln Lincoln, NE 68588—4100
Nov. 7, 1885 No other plays by Miss Parker have come to light.

Line breaks, paragraph breaks, and indentations have been preserved within the transcription. The layout of the page has been preserved whenever possible. Abbreviations and spellings have been maintained within the transcriptions, and the full word and corrected or modern spellings have been provided. Images of the original letters have been provided in order to show the original page layout and other markings that are not the author's. Such markings include letter head, postcards, postal stamps, etc. and writing by other people. Words or phrases that are deemed indecipherable have been noted as "unclear." To provide further information as to the context of a particular letter, notations have been provided.

Add and revise header info, change lbs and divs
Adapted from a TEI P4 XML encoding of a letter by Thomas Higginson (1885). TEI XML source file is not publicly available.
Christopher Marlowe: The Tragedie of Doctor Faustus (B text)

This example contains the TEI header of the digital edition of Christopher Marlowe’s The Tragedie of Doctor Faustus (B text), encoded and made available by the Perseus Digital Library.

This TEI header provides decent descriptions of the publication details of the electronic text (publicationStmt), and the languages occurring in the text (langUsage). A reference system is declared in the encodingDesc section of the header, using refState elements (see section 2.3.5.3 Milestone Method of the TEI Guidelines).

The revision description is interesting both in a positive and a negative way. It clearly contains a detailed list of the changes. The list seems to be generated by an automated versioning system, which allows one to keep complete track of a file’s historical states, and document changes with log messages. Integrating automated revision control in the revisionDesc section of the TEI header is an interesting idea, as it combines processability and expressiveness. However, on the encoding level, this integration could be improved. In this case, a single change element is (ab)used to record the complete revision history. If the output of the automated version control system would be formatted to distinct change elements per revision (either directly, or via a post-processing step), this would make the information much more compliant with the semantics of the TEI header.

One essential point of critique concerns the lacking description of the source document in sourceDesc. In this case, the title and author of the source work (that can be recollected from the information in the titleStmt subsection) still provide cues to its origin, but this could be much harder for less known texts. It is reasonable to suppose that the source texts of the files in the Perseus Digital Library are documented externally, but then the TEI header sections of these files should at least contain a pointer to these resources.

The Tragedie of Doctor Faustus (B text) Christopher Marlowe Hilary Binda Perseus Project, Tufts University Gregory Crane Prepared under the supervision of Lisa Cerrato William Merrill Elli Mylonas David Smith Tufts University Trustees of Tufts University Medford, MA Perseus Project

This text may be freely distributed, subject to the following restrictions: You credit Perseus, as follows, whenever you use the document: Text provided by Perseus Digital Library, with funding from Tufts University. Original version available for viewing and download at http://www.perseus.tufts.edu/hopper/. You leave this availability statement intact. You use it for non-commercial purposes only. You offer Perseus any modifications you make.

English Latin Greek Italian $Log: marl.faustb.xml,v $ Revision 1.2 2004/04/22 14:24:57 cwulfman *** empty log message *** Revision 1.1 2004/04/22 13:55:24 cwulfman Making xml files the canonical ones. Revision 1.11 2003/07/01 22:14:53 yorkc Updated texts to TEI P4 and Perseus P4 extensions; minor cleanup (esp. character encodings and typos.) Revision 1.10 2000/04/27 23:22:22 dasmith Hopperized TEI header. Fixed typos. Revision 1.9 1999/09/01 17:15:34 dasmith Fixed preamble and added encodingDesc. Revision 1.8 1997/09/11 16:00:37 textgod Updated for nsgmls. Revision 1.7 1997/07/02 21:57:11 textgod Added CASTLIST HEAD Revision 1.6 1997/06/30 21:57:09 textgod Added group for the prologue. Revision 1.5 1997/06/25 14:51:39 textgod Fixed broken NAME tag. Revision 1.4 1997/06/25 14:40:00 textgod Added log messages to file.
Adapted from a TEI P4 XML encoding of Christopher Marlowe’s play The Tragedie of Doctor Faustus (Marlowe 1616). TEI XML source available from .
William Shakespeare: Sonnet 17

The following example illustrates the TEI header for a sonnet by William Shakespeare, containing a detailed metrical analysis of the poem. Both the electronic text and its source are bibliographically described in the fileDesc section. The text encoding process is described in encodingDecl, providing details about the encoding project (projectDesc), the editorial policy (editorialDecl), and the system used to analyse the metre of the poem (metDecl). Notice how the editorialDecl subsection had to be repeated, as it both documents features that can be encoded in a TEI category (segmentation and interpretation), and features for which no such TEI labels are available. (p). The standard TEI scheme does not allow both systems (formal and informal) to be mixed, hence the repetition of the encodingDesc section. The same goes for the metDecl sections: as both a formal (metSym) and informal (p) description is provided for the metrical system, repeating the metDecl element was the easiest solution. Of course, this could have been addressed as well by adapting the TEI schema.

A Selection of Sonnets: electronic edition encoded in XML with a TEI DTD Wlliam Shakespeare Transribed and encoded by Mubina Islam 64 KB University College London London The Complete Works of William Shakespeare William Shakespeare Peter Alexander Collins London 1978 0-00-435634-9

A total of ten sonnets collected and encoded according to the metrical interpretation of the verse by Mubina Islam, August 2004. This document was created as part of a Master's dissertation on the markup of poetic metre, for the course MA Electronic Communication and Publishing at UCL.

Each sonnet has been divided into the stanzaic line groupings.

Line groups have been further divided to mark individual lines of verse.

Segmentation tags have been used to represent the division of each line into metrical feet.

The metrical interpretation of the text, defined with the segmentation of the text into units of feet, was added by hand by the encoder. This has not been checked and may be subject to alternative readings.

All punctuation marks, excluding dashes or hyphenation, have been encoded as entities.

Caesuras and line enjambement have been recorded in this document as accurately as possible by the encoder.

+- -+ ++ -- -+- --+ metrical promimence metrical non-prominence foot boundary metrical line boundary

Metrically prominent syllables are marked '+' and other syllables '-'. Foot divisions are marked by a vertical bar, and line divisions with a solidus.

This notation may be applied to any metrical unit, of any size (including, for example, individual feet as well as groups of lines).

The 'real' attribute has been used to indicate possible variations in the iambic base metre. Where this attribute is not included, it is assumed each foot inherits the iambic metre defined for the overall division of text.

The 'met' attribute has been used in feet which have a missing or additional syllable rather than the two syllables expected, although the line may still confirm to the metre of the poem.

Adapted from a TEI P4 XML encoding by Mubina Islam (Islam 2004) of William Shakespeare’s poem Sonnet 17 (Shakespeare 1978). TEI source not publicly available.
Walt Whitman: After the Argument

This example contains the TEI header of the digital edition of a manuscript draft of After the Argument, a poem by Walt Whitman. It was encoded and made available by the Walt Whitman Archive.

This TEI header contains a detailed description of the electronic text in fileDesc. Apart from the required subsections, the edition of the electronic text is identified briefly in editionStmt. The notesStmt element contains a general remark about the dating of the manuscript.

Besides the file description, the header contains a detailed account of the file’s history in revisionDesc.

Functioning as the header of a manuscript transcription, however, one would have expected at least an encodingDesc, documenting how the electronic version relates to the source text. When this text is seen in isolation, this header falls short in explaining the editorial choices (that are referred to, however, in the revisionDesc). Of course, this text probably features in the wider context of the Walt Whitman Archive, where uniform encoding practices were used for all texts. Still, without repeating boilerplate information in each text of the archive, it would have made sense to provide an editorialDecl section with at least pointers to the external documentation of these practices available at and . Furthermore, as the transcription is fairly detailed in the recording of editorial phenomena (additions, deletions, substitutions), identification of the different document hands in profileDesc could have made sense.

(Of course, these are only minor remarks, relative to the quality of the surrounding documentation of the archive in which this text is embedded. Yet, even if such external documentation exists, it makes sense to provide pointers in the document.)

After the Argument a machine readable transcription Walt Whitman Ken Price Ed Folsom Transcription and encoding Nicole Gray Andrew Jewell Kenneth M. Price Brett Barney Zach Bajaber Nick Krauter Melissa Sinner Justin St. Clair The Institute for Advanced Technology in the Humanities University of Iowa University of Nebraska-Lincoln The National Endowment for the Humanities The United States Department of Education 2002 The Walt Whitman Archive loc.00001
The Institute for Advanced Technology in the Humanities Alderman Library University of Virginia P.O. Box 400115 Charlottesville, VA 22904-4115 whitman@jefferson.village.virginia.edu

The text of the original item is in the public domain. The text encoding and editorial notes were created and/or prepared by the Walt Whitman Archive and are licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0). Any reuse of the material should credit the Walt Whitman Archive.

"After the Argument" was published first in Lippincott's Magazine, March, 1891. This manuscript was likely written in 1890 or early 1891, shortly before the poem's publication. Walt Whitman After the Argument 1890 or 1891 The Charles E. Feinberg Collection of the Papers of Walt Whitman, 1839–1919, Library of Congress, Washington, D.C. Transcribed from The Walt Whitman Archive I: Whitman Manuscripts at the Library of Congress, ed. Joel Myerson (New York: Garland, 1993), 1:121; Major American Authors on CD-Rom: Walt Whitman (Westport, CT: Primary Source Media, 1997). The transcription was then checked against digital images of the original.
corrected converted, added schematron declaration Added third digit to leaf numbers Addition of Date and Work Markup Addition of Date and Work Markup Updated closer/signature Conversion to camel-case Blessed Updated to current practice Checked by editor Revised Encoded Transcribed
Encoding of a manuscript of Walt Whitman’s poem After the Argument (1890). TEI XML source available from .
Oscar Wilde: The Importance of Being Earnest

This example contains the TEI header for an electronic edition of Oscar Wilde’s The Importance of Being Earnest, encoded and made available by Corpus of Electronic Texts (CELT), a project of University College, Cork.

This is an excellent TEI header example, featuring quality descriptions of the electronic text (fileDesc), its relation to the source text (encodingDesc), the context in which it came about (profileDesc), and a revision history (revisionDesc).

An outstanding feature of this example is the level of detail for the bibliographic description of the source text, in sourceDesc. It contais a complete bibliography, in three sections: select editions, select bibliography, and the edition used in the digital edition. The former two categories consist of bibliographic lists, with a listBibl element grouping the separate bibl elements. The actual edition used for the electronic text is described in detail with a biblStruct element.

The Importance of Being Earnest A trivial comedy for serious people An electronic edition Oscar Wilde Electronic edition compiled by Margaret Lantry University College, Cork Second draft. Proof corrections by Margaret Lantry 23410 CELT: Corpus of Electronic Texts: a project of University College, Cork
College Road, Cork, Ireland—http://www.ucc.ie/celt
1997 2008 CELT online at University College, Cork, Ireland. E850003-002

Available with prior consent of the CELT programme for purposes of academic research and teaching only.

There is not as yet an authoritative edition of Wilde's works. Select editions The writings of Oscar Wilde (London; New York: A. R. Keller & Co. 1907) 15 vols. Robert Ross (ed), The First Collected Edition of the Works of Oscar Wilde (London: Methuen & Co. 1908). 15 vols. Reprinted Dawsons: Pall Mall 1969. Complete works of Oscar Wilde (Glasgow: HarperCollins, 1994). Select bibliography 'Notes for a bibliography of Oscar Wilde', Books and book-plates (A quarterly for collectors) 5, no. 3 (April 1905), 170-183. Karl E. Beckson, The Oscar Wilde encyclopedia (New York: AMS Press 1998). AMS Studies in the nineteenth century 18. Richard Ellmann (ed), The Artist as Critic: Critical Writings of Oscar Wilde (Chicago 1982). Richard Ellmann; John Espey, Oscar Wilde: two approaches: papers read at a Clark Library seminar, April 17, 1976 (Los Angeles: William Andrews Clark Memorial Library, University of California 1977). Richard Ellmann, Oscar Wilde at Oxford: a lecture delivered at the Library of Congress on March 1, 1983 (Washington, DC: Library of Congress 1984). Richard Ellmann, Oscar Wilde: a biography (London: Hamilton 1987). Juliet Gardiner, Oscar Wilde: a life in letters, writings and wit (Dublin: Gill & Macmillan 1995). Frank Harris, Oscar Wilde, including My memories of Oscar Wilde, by George Bernard Shaw and an introductory note by Lyle Blair (London: Robinson, 1992). Rupert Hart-Davis (ed), Selected letters of Oscar Wilde (Oxford: Oxford University Press 1979). Rupert Hart-Davis (ed), More letters of Oscar Wilde (London: Murray 1985). Vyvyan Beresford Holland, Oscar Wilde: a pictorial biography (London: Thames & Hudson 1960). H. Montgomery Hyde, Oscar Wilde: a biography (London: Methuen 1977). Andrew McDonnell, Oscar Wilde at Oxford: an annotated catalogue of Wilde manuscripts and related items at the Bodleian Library, Oxford, including many hitherto unpublished letters, photographs and illustrations (A. McDonnell 1996). Limited edition of 170 copies. Stuart Mason, Bibliography of Oscar Wilde (London: E. G. Richards 1907). Also pubd. New York 1908, London 1914 in 2 vols. Repr. of 1914 edition: New York: Haskell House 1972. E. H. Mikhail, Oscar Wilde: an annotated bibliography of criticism (London: Macmillan 1978). Also pubd. Totowa NJ: Rowman & Littlefield 1978. Thomas A. Mikolyzk, Oscar Wilde: an annotated bibliography (Westport CT: Greenwood Press 1993). Bibliographies and indexes in world literature, 38. Norman Page, An Oscar Wilde chronology (London: Macmillan 1991). Hesketh Pearson, A Life of Oscar Wilde (London 1946). Richard Pine, The thief of reason: Oscar Wilde and modern Ireland (Dublin: Gill & Macmillan 1996). Horst Schroeder, Additions and corrections to Richard Ellmann's Oscar Wilde (Braunschweig: H. Schroeder 1989) The edition used in the digital edition Oscar Wilde The Importance of Being Earnest Plays, Prose Writings and Poems London Everyman 1930 450-509

CELT: Corpus of Electronic Texts

All the editorial text with the corrections of the editor has been retained.

Text has been checked, proof-read and parsed using NSGMLS.

The electronic text represents the edited text.

Direct speech is marked q.

The editorial practice of the hard-copy editor has been retained.

div0=the whole text.

Names of persons (given names), and places are not tagged. Terms for cultural and social roles are not tagged.

The n attribute of each text in this corpus carries a unique identifying number for the whole text.

The title of the text is held as the first head element within each text.

div0 is reserved for the text (whether in one volume or many).

By Oscar Wilde (1854-1900). 1895 The text is in English. One word occurring twice in Anglo-French. Keywords added; file validated; new wordcount made. Minor changes made to header. Normalised language codes and edited langUsage for XML conversion Converted to XML Text parsed using NSGMLS. Proof corrections entered and mark-up corrected; text spell-checked. Text proofed. Header created; structural mark-up inserted. Text captured by scanning.
Adapted from a TEI P3 SGML encoding of Oscar Wilde’s play The Importance of Being Earnest, in the anthology Plays, Prose Writings and Poems by Oscar Wilde (Wilde 1930). TEI SGML source available from .
Erasmus, Desiderius. 1867-1872. Desiderii Erasmi Roterodami colloquia familiaria. Lipsiae: sumptibus Ottonis Holtze. Encoded and made available by the Stoa Consortium, University of Kentucky at . Higginson, Thomas Wentworth. 1885. Letter of November 7, 1885. Encoded and made available by the Lincoln Electronic Text Center of the University of Nebraska at . Islam, Mubina. 2004. A Selection of Sonnets: electronic edition encoded in XML with a TEI DTD. Unpublished Master’s Dissertation, London: University College London. Marlowe, Christopher. 1616. The Tragedie of Doctor Faustus. Encoded and made available by the Perseus Digital Library. Available online at . Shakespeare, William. 1978. The Complete Works of William Shakespeare. Edited by Alexander, Peter. London: Collins. Whitman, Walt. 1890. After the Argument. Manuscript encoded and made available by the Walt Whitman Archive at . Wilde, Oscar. 1930. The Importance of Being Earnest. In: Plays, Prose Writings and Poems. London: Everyman. Encoded and made available by CELT: Corpus of Electronic Texts: a project of University College, Cork. Available online at .