TEI by Example. Module 2: The TEI Header Edward Vanhoutte Ron Van den Branden Edward Vanhoutte Ron Van den Branden Melissa Terras Association for Literary and Linguistic Computing (ALLC) Centre for Digital Humanities (CDH), University College London, UK Centre for Computing in the Humanities (CCH), King's College London, UK Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium
ctb@kantl.be
Edward Vanhoutte Melissa Terras Ron Van den Branden
Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Gent
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium

Licensed under a Creative Commons Attribution ShareAlike 3.0 License

9 July 2010
TEI By Example. Edward Vanhoutte editor Ron Van den Branden editor Melissa Terras editor

Digitally born

TEI By Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). Besides a general introduction to text encoding, step-by-step tutorial modules provide example-based introductions to eight different aspects of electronic text markup for the humanities. Each tutorial module is accompanied with a dedicated examples section, illustrating actual TEI encoding practise with real-life examples. The theory of the tutorial modules can be tested in interactive tests and exercises.

en-GB corrected significant typo (biblStruct for biblFull), removed ref around gi release corrected typos + examples creation
Examples for Module 2: The TEI Header
Desiderius Erasmus: Colloquia familiaria

This example features the TEI header for the transcription of Colloquia familiaria, a series of colloquia written by Desiderius Erasmus. They are encoded and made available by the Stoa Consortium, University of Kentucky.

This is an excellent example of a TEI header. The file description provides the minimal information sections about the title and responsibilities of the electronic text, its publication, and its source. Editorial principles are documented in encodingDesc, which also has a statement about sampling decisions in samplingDesc (see the TEI Guidelines section 2.3.2 The Sampling Declaration). It also contains a formal declaration of a reference system, for which it makes use of refState/ elements (see the TEI Guidelines, section 2.3.5.3 Milestone Method). Two classification systems are declared in classDecl: Library of Congress Subject Headings and Library of Congress Classification. The next header section, profileDesc, contains the actual classification of the text according to both systems, in textClass. This is a nice illustration of two classification strategies: using natural language keywords (keywords) or abstract classification codes (classCode). Also, the languages of the text are formally declared in langUsage. Finally, a complete revision history is available in revisionDesc.

Colloquia familiaria Desiderius Erasmus Encoded by Jennifer K. Nelson Gretche Ermer A. Ross Scaife Cultural Heritage Language Technologies NSF-EU Lexington, KY Stoa Consortium
Department of Modern and Classical Languages, Literatures and Cultures 1055 Patterson Office Tower University of Kentucky Lexington, KY 40506-0027 url:http://www.stoa.org
2002-09-05
Erasmus, Desiderius, d. 1536 Desiderii Erasmi Roterodami colloquia familiaria ad optimarum editionum fidem diligenter emendata, cum succincta difficiliorum explanatione Ed. stereotypa Lipsiae sumptibus Ottonis Holtze 1867-1872 771 p. (2 vols) ; 15 cm.

Editorial notes in the Holtze edition have not been reproduced

Original punctuation conventions in the Holtze edition have been retained as much as possible

Hyphenated words that appear at the end of the line in the Holtze edition have been reformed

Italics are recorded without interpretation

Library of Congress Subject Headings Library of Congress Classification
Latin Ancient Greek Dialogues, Latin (Medieval and modern) Folly -- Early works to 1800 PA8501 [ed] added text of "De utilitate colloquiorum ad lectorem" [ed] expanded TEI header to include more information, e.g. LC subject headings and LC classification [markup]: began tei-compliant markup
Based on a TEI P4 XML encoding of Erasmus, Desiderius (1867-1872). Desiderii Erasmi Roterodami colloquia familiaria. Lipsiae: sumptibus Ottonis Holtze. Encoded and made available by the Stoa Consortium, University of Kentucky at http://www.stoa.org/hopper/text.jsp?doc=Stoa:text:2003.02.0006
Thomas Wentworth Higginson: Letter of 7 November 1885

This example shows the TEI header of the digital edition of a letter of 7 November 1885 by the American minister and writer Thomas Wentworth Higginson, encoded and made available by the Lincoln Electronic Text Center of the University of Nebraska.

This TEI header provides detailed documentation about the electronic text in fileDesc. The title statement not only identifies the people responsible for transcription and markup, but also for the technical processing of the letters by means of stylesheets. The extent section needs to be completed still; of course, this can only be done after completion of the encoding. Note the detailed statement of availability in availability. The source text in which this letter has been published is described using the biblFull element; note how its sections reflect the actual file description in the TEI header of the electronic text (apart from the sourceDesc section). The notesStmt seems to be used to record some loose annotations about the source text.

The encoding description section only contains a description of the editorial practice in editorialDecl. This is done in a prose paragraph. The header is concluded by a minimal revision description, recording only one change.

Correspondence of Thomas Wentworth Higginson, 1865—1910 Thomas Wentworth Higginson Transcribed by Melissa Sinner Encoded by Margaret Mering Laura Weakly Stylesheet created by Brian L. Pytlik Zillig Commentary on the Letters by Linda Ray Pratt *** kb LC1885k07 University of Nebraska—Lincoln Electronic Text Center

This work is the property of the University of Nebraska—Lincoln. It may be copied freely by individuals for personal use, research, and teaching (including distribution to classes) as long as this statement of availability is included in the text. It may be linked to freely in Internet editions of all kinds, including for—profit works.

Publishers, libraries, and other information providers interested in providing this text in a commercial or non—profit product or from an information server must contact the University of Nebraska—Lincoln for licensing and cost information.

Scholars interested in changing or adding to these texts by, for example, creating a new edition of the text (electronically or in print) with substantive editorial changes, may do so with the permission of the University of Nebraska—Lincoln. This is the case whether the new publication will be made available at a cost or free of charge.

2001
Carlton and Territa Lowenberg Collection, Archives and Special Collections, University of Nebraska—Lincoln Libraries. Lowenberg, Carlton; Lowenberg, Territa A., 1825—1991. Archives and Special Collections, Electronic Text Center University of Nebraska—Lincoln Libraries 2001
Electronic Text Center 319 Love Library P.O. Box 884100 University of Nebraska—Lincoln Lincoln, NE 68588—4100
Nov. 7, 1885 No other plays by Miss Parker have come to light.

Line breaks, paragraph breaks, and indentations have been preserved within the transcription. The layout of the page has been preserved whenever possible. Abbreviations and spellings have been maintained within the transcriptions, and the full word and corrected or modern spellings have been provided. Images of the original letters have been provided in order to show the original page layout and other markings that are not the author's. Such markings include letter head, postcards, postal stamps, etc. and writing by other people. Words or phrases that are deemed indecipherable have been noted as "unclear." To provide further information as to the context of a particular letter, notations have been provided.

Add and revise header info, change lbs and divs
Based on a TEI P4 XML encoding of a letter by Thomas Wentworth Higginson (1885). Encoded and made available by the Lincoln Electronic Text Center of the University of Nebraska at http://higginson.unl.edu/letters/LC1885k07.html
Christopher Marlowe: The Tragedie of Doctor Faustus (B text)

This example contains the TEI header of the digital edition of Christopher Marlowe's The Tragedie of Doctor Faustus (B text), encoded and made available by the Perseus Digital Library.

This TEI header provides decent descriptions of the publication details of the electronic text (publicationStmt), and the languages occurring in the text (langUsage). A reference system is declared in the encodingDesc section of the header, using refState/ elements (see the TEI Guidelines, section 2.3.5.3 Milestone Method).

The revision description is interesting both in a positive and a negative way. It clearly contains a detailed list of the changes. The list seems to be generated by an automated versioning system, which allows one to keep complete track of a file's historical states, and document changes with log messages. Integrating automated revision control in the revisionDesc section of the TEI header is an interesting idea, as it combines processability and expressiveness. However, on the encoding level, this integration could be improved. In this case, a single change element is (ab)used to record the complete revision history. If the output of the automated version control system would be formatted to distinct change elements per revision (either directly, or via a post-processing step), this would make the information much more compliant with the semantics of the TEI header.

One essential point of critique concerns the lacking description of the source document in sourceDesc. In this case, the title and author of the source work (that can be recollected from the information in the titleStmt subsection) still provide cues to its origin, but this could be much harder for less known texts. It is reasonable to suppose that the source texts of the files in the Perseus Digital Library are documented externally, but then the TEI header sections of these files should at least contain a pointer to these resources.

The Tragedie of Doctor Faustus (B text) Christopher Marlowe Hilary Binda Perseus Project, Tufts University Gregory Crane Prepared under the supervision of Lisa Cerrato William Merrill Elli Mylonas David Smith Tufts University Trustees of Tufts University Medford, MA Perseus Project

This text may be freely distributed, subject to the following restrictions: You credit Perseus, as follows, whenever you use the document: Text provided by Perseus Digital Library, with funding from Tufts University. Original version available for viewing and download at http://www.perseus.tufts.edu/hopper/. You leave this availability statement intact. You use it for non-commercial purposes only. You offer Perseus any modifications you make.

English Latin Greek Italian $Log: marl.faustb.xml,v $ Revision 1.2 2004/04/22 14:24:57 cwulfman *** empty log message *** Revision 1.1 2004/04/22 13:55:24 cwulfman Making xml files the canonical ones. Revision 1.11 2003/07/01 22:14:53 yorkc Updated texts to TEI P4 and Perseus P4 extensions; minor cleanup (esp. character encodings and typos.) Revision 1.10 2000/04/27 23:22:22 dasmith Hopperized TEI header. Fixed typos. Revision 1.9 1999/09/01 17:15:34 dasmith Fixed preamble and added encodingDesc. Revision 1.8 1997/09/11 16:00:37 textgod Updated for nsgmls. Revision 1.7 1997/07/02 21:57:11 textgod Added CASTLIST HEAD Revision 1.6 1997/06/30 21:57:09 textgod Added group for the prologue. Revision 1.5 1997/06/25 14:51:39 textgod Fixed broken NAME tag. Revision 1.4 1997/06/25 14:40:00 textgod Added log messages to file.
Based on a TEI P4 XML encoding of Marlowe, Christopher (1616). The Tragedie of Doctor Faustus . Encoded and made available by the Perseus Digital Library. Available online at http://www.perseus.tufts.edu/hopper/text?doc=Perseus:text:1999.03.0011
William Shakespeare: Sonnet 17

The following example illustrates the TEI header for a sonnet by William Shakespeare, containing a detailed metrical analysis of the poem. Both the electronic text and its source are bibliographically described in the fileDesc section. The text encoding process is described in encodingDecl, providing details about the encoding project (projectDesc), the editorial policy (editorialDecl), and the system used to analyse the metre of the poem (metDecl). Note how the editorialDecl subsection had to be repeated, as it both documents features that can be encoded in a TEI category (segmentation and interpretation), and features for which no such TEI labels are available. (p). The standard TEI scheme does not allow both systems (formal and informal) to be mixed, hence the repetition of the encodingDesc section. The same goes for the metDecl sections: as both a formal (metSym) and informal (p) description is provided for the metrical system, repeating the metDecl element was the easiest solution. Of course, this could have been addressed as well by adapting the TEI schema.

A Selection of Sonnets: electronic edition encoded in XML with a TEI DTD Wlliam Shakespeare Transribed and encoded by Mubina Islam 64 KB University College London London The Complete Works of William Shakespeare William Shakespeare Peter Alexander Collins London 1978 0-00-435634-9

A total of ten sonnets collected and encoded according to the metrical interpretation of the verse by Mubina Islam, August 2004. This document was created as part of a Master's dissertation on the markup of poetic metre, for the course MA Electronic Communication and Publishing at UCL.

Each sonnet has been divided into the stanzaic line groupings.

Line groups have been further divided to mark individual lines of verse.

Segmentation tags have been used to represent the division of each line into metrical feet.

The metrical interpretation of the text, defined with the segmentation of the text into units of feet, was added by hand by the encoder. This has not been checked and may be subject to alternative readings.

All punctuation marks, excluding dashes or hyphenation, have been encoded as entities.

Caesuras and line enjambement have been recorded in this document as accurately as possible by the encoder.

+- -+ ++ -- -+- --+ metrical promimence metrical non-prominence foot boundary metrical line boundary

Metrically prominent syllables are marked '+' and other syllables '-'. Foot divisions are marked by a vertical bar, and line divisions with a solidus.

This notation may be applied to any metrical unit, of any size (including, for example, individual feet as well as groups of lines).

The 'real' attribute has been used to indicate possible variations in the iambic base metre. Where this attribute is not included, it is assumed each foot inherits the iambic metre defined for the overall division of text.

The 'met' attribute has been used in feet which have a missing or additional syllable rather than the two syllables expected, although the line may still confirm to the metre of the poem.

Based on a TEI P4 XML encoding of Islam, Mubina (2004). A Selection of Sonnets: electronic edition encoded in XML with a TEI DTD. Unpublished Master's Dissertation, London: University College London (based on Alexander, Peter (1978) The Complete Works of William Shakespeare. London: Collins.).
Walt Whitman: After the Argument

This example contains the TEI header of the digital edition of a manuscript draft of After the Argument, a poem by Walt Whitman. It was encoded and made available by the Walt Whitman Archive.

This TEI header contains a detailed description of the electronic text in fileDesc. Apart from the required subsections, the edition of the electronic text is identified briefly in editionStmt. The notesStmt element contains a general remark about the dating of the manuscript.

Besides the file description, the header contains a detailed account of the file's history in revisionDesc.

Functioning as the header of a manuscript transcription, however, one would have expected at least an encodingDesc, documenting how the electronic version relates to the source text. When this text is seen in isolation, this header falls short in explaining the editorial choices (that are referred to, however, in the revisionDesc). Of course, this text probably features in the wider context of the Walt Whitman Archive, where uniform encoding practices were used for all texts. Still, without repeating boilerplate information in each text of the archive, it would have made sense to provide an editorialDecl section with at least pointers to the external documentation of these practices available at http://www.whitmanarchive.org/about/editorial.html and http://www.whitmanarchive.org/mediawiki/index.php/Whitman_Encoding_Guidelines. Furthermore, as the transcription is fairly detailed in the recording of editorial phenomena (additions, deletions, substitutions), identification of the different document hands in profileDesc could have made sense.

(Of course, these are only minor remarks, relative to the quality of the surrounding documentation of the archive in which this text is embedded. Yet, even if such external documentation exists, it makes sense to provide pointers in the document.)

After the Argument a machine readable transcription Walt Whitman Ken Price Ed Folsom Transcription and encoding the Walt Whitman Archive staff Andrew Jewell Brett Barney Zach Bajaber Melissa Sinner The Institute for Advanced Technology in the Humanities University of Iowa University of Nebraska-Lincoln The National Endowment for the Humanities The United States Department of Education 2002 loc.00001 The Walt Whitman Archive
The Institute for Advanced Technology in the Humanities Alderman Library University of Virginia P.O. Box 400115 Charlottesville, VA 22904-4115 whitman@jefferson.village.virginia.edu

Copyright © 2001 by Ed Folsom and Kenneth M. Price, all rights reserved. Items in the Archive may be shared in accordance with the Fair Use provisions of U.S. copyright law. Redistribution or republication on other terms, in any medium, requires express written consent from the editors and advance notification of the publisher, The Institute for Advanced Technology in the Humanities. Permission to reproduce the graphic images in this archive has been granted by the owners of the originals for this publication only.

This manuscript was likely written in 1890 or early 1891, shortly before the poem's publication. Walt Whitman After the Argument 1890 or 1891 The Charles E. Feinberg Collection of the Papers of Walt Whitman, Library of Congress, Washington, DC Transcribed from Joel Myerson, ed. The Walt Whitman Archive I: Whitman Manuscripts at the Library of Congress, New York: Garland, 1993, Part I: 121; Major American Authors on Cd-Rom: Walt Whitman, Westport, CT: Primary Source Media, 1997; our own digital image of original manuscript.
Addition of Date and Work Markup Updated closer/signature Conversion to camel-case Blessed Updated to current practice Checked by editor Revised Encoded Transcribed
Based on a TEI P4 XML encoding of Whitman, Walt, After the Argument , a manuscript encoded and made available by the Walt Whitman Archive at http://www.whitmanarchive.org/manuscripts/transcriptions/loc.00001.html.
Oscar Wilde: The Importance of Being Earnest

This example contains the TEI header for an electronic edition of Oscar Wilde's The Importance of Being Earnest, encoded and made available by Corpus of Electronic Texts (CELT), a project of University College, Cork.

This is an excellent TEI header example, featuring quality descriptions of the electronic text (fileDesc), its relation to the source text (encodingDesc), the context in which it came about (profileDesc), and a revision history (revisionDesc).

An outstanding feature of this example is the level of detail for the bibliographic description of the source text, in sourceDesc. It contais a complete bibliography, in three sections: select editions, select bibliography, and the edition used in the digital edition. The former two categories consist of bibliographic lists, with a listBibl element grouping the separate bibl elements. The actual edition used for the electronic text is described in detail with a biblStruct element.

The Importance of Being Earnest A trivial comedy for serious people An electronic edition Oscar Wilde compiled by Margaret Lantry University College, Cork First draft, revised and corrected. Proof corrections by Margaret Lantry 19 648 CELT: Corpus of Electronic Texts: a project of University College, Cork
College Road, Cork, Ireland.
1997 CELT online at University College, Cork, Ireland. E850003.002

Available with prior consent of the CELT programme for purposes of academic research and teaching only.

There is not as yet an authoritative edition of Wilde's works. Select editions. The writings of Oscar Wilde (London; New York: A. R. Keller & Co. 1907) 15 vols. Robert Ross (ed), The First Collected Edition of the Works of Oscar Wilde (London: Methuen & Co. 1908). 15 vols. Reprinted Dawsons: Pall Mall 1969. Complete works of Oscar Wilde (Glasgow: HarperCollins, 1994). Select bibliography. 'Notes for a bibliography of Oscar Wilde', Books and book-plates (A quarterly for collectors) 5, no. 3 (April 1905), 170-183. Karl E. Beckson, The Oscar Wilde encyclopedia (New York: AMS Press 1998). AMS Studies in the nineteenth century 18. Richard Ellmann (ed), The Artist as Critic: Critical Writings of Oscar Wilde (Chicago 1982). Richard Ellmann; John Espey, Oscar Wilde: two approaches: papers read at a Clark Library seminar, April 17, 1976 (Los Angeles: William Andrews Clark Memorial Library, University of California 1977). Richard Ellmann, Oscar Wilde at Oxford: a lecture delivered at the Library of Congress on March 1, 1983 (Washington, DC: Library of Congress 1984). Richard Ellmann, Oscar Wilde: a biography (London: Hamilton 1987). Juliet Gardiner, Oscar Wilde: a life in letters, writings and wit (Dublin: Gill & Macmillan 1995). Frank Harris, Oscar Wilde, including My memories of Oscar Wilde, by George Bernard Shaw and an introductory note by Lyle Blair (London: Robinson, 1992). Rupert Hart-Davis (ed), Selected letters of Oscar Wilde (Oxford: Oxford University Press 1979). Rupert Hart-Davis (ed), More letters of Oscar Wilde (London: Murray 1985). Vyvyan Beresford Holland, Oscar Wilde: a pictorial biography (London: Thames & Hudson 1960). H. Montgomery Hyde, Oscar Wilde: a biography (London: Methuen 1977). Andrew McDonnell, Oscar Wilde at Oxford: an annotated catalogue of Wilde manuscripts and related items at the Bodleian Library, Oxford, including many hitherto unpublished letters, photographs and illustrations (A. McDonnell 1996). Limited edition of 170 copies. Stuart Mason, Bibliography of Oscar Wilde (London: E. G. Richards 1907). Also pubd. New York 1908, London 1914 in 2 vols. Repr. of 1914 edition: New York: Haskell House 1972. E. H. Mikhail, Oscar Wilde: an annotated bibliography of criticism (London: Macmillan 1978). Also pubd. Totowa NJ: Rowman & Littlefield 1978. Thomas A. Mikolyzk, Oscar Wilde: an annotated bibliography (Westport CT: Greenwood Press 1993). Bibliographies and indexes in world literature, 38. Norman Page, An Oscar Wilde chronology (London: Macmillan 1991). Hesketh Pearson, A Life of Oscar Wilde (London 1946). Richard Pine, The thief of reason: Oscar Wilde and modern Ireland (Dublin: Gill & Macmillan 1996). Horst Schroeder, Additions and corrections to Richard Ellmann's Oscar Wilde (Braunschweig: H. Schroeder 1989) The edition used in the digital edition. Oscar Wilde The Importance of Being Earnest Plays, Prose Writings and Poems London Everyman 1930 450-509

CELT: Corpus of Electronic Texts

All the editorial text with the corrections of the editor has been retained.

Text has been checked, proof-read and parsed using NSGMLS.

The electronic text represents the edited text. Compound words have not been hyphenated after CELT practice.

Direct speech is marked q.

The editorial practice of the hard-copy editor has been retained.

div0=the whole text.

Names of persons (given names), and places are not tagged. Terms for cultural and social roles are not tagged.

The n attribute of each text in this corpus carries a unique identifying number for the whole text.

The title of the text is held as the first head element within each text.

div0 is reserved for the text (whether in one volume or many).

By Oscar Wilde (1854-1900). 1895 Whole text in English. One word occurring twice in Anglo-French. Text parsed using NSGMLS. Proof corrections entered and mark-up corrected; text spell-checked. Text proofed. Header created; structural mark-up inserted. Text captured by scanning.
Based on a TEI P3 SGML encoding of Wilde, Oscar, The Importance of Being Earnest . Encoded and made available by CELT: Corpus of Electronic Texts: a project of University College, Cork. Available online at ftp://ftp.ucc.ie/pub/celt/texts/E850003.002.sgml.