TEI by Example Module 6: Primary Sources Ron Van den Branden Edward Vanhoutte Melissa Terras Association for Literary and Linguistic Computing (ALLC) Centre for Data, Culture and Society, University of Edinburgh, UK Centre for Digital Humanities (CDH), University College London, UK Centre for Computing in the Humanities (CCH), King’s College London, UK Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium
ctb@kantl.be
Edward Vanhoutte Melissa Terras
Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Centre for Scholarly Editing and Document Studies (CTB) , Royal Academy of Dutch Language and Literature, Belgium Gent
Centre for Scholarly Editing and Document Studies (CTB) Royal Academy of Dutch Language and Literature Koningstraat 18 9000 Gent Belgium

Licensed under a Creative Commons Attribution ShareAlike 3.0 License

9 July 2010
TEI by Example. Edward Vanhoutte editor Ron Van den Branden editor Melissa Terras editor

Digitally born

TEI by Example offers a series of freely available online tutorials walking individuals through the different stages in marking up a document in TEI (Text Encoding Initiative). Besides a general introduction to text encoding, step-by-step tutorial modules provide example-based introductions to eight different aspects of electronic text markup for the humanities. Each tutorial module is accompanied with a dedicated examples section, illustrating actual TEI encoding practise with real-life examples. The theory of the tutorial modules can be tested in interactive tests and exercises.

en-GB integrated examples in a single file
Module 6: Primary Sources
Jeremy Bentham: manuscript JB/116/010/001

This manuscript page was written by the philosopher and jurist Jeremy Bentham (1748–1832).

The manuscript page JB/116/010/001

Since this is a prose text, the basic structural units are encoded as paragraphs (p), with line breaks encoded as lb where they occur. Marginal notes are encoded with the note element; the note occurring on the sixth line in this example contains a simple deletion (the final word disorder), which is marked with the del element. This manuscript contains many deletions and additions. Some are simple, such as the addition of the word still in the phrase the same barbarity is still shown on line 6. This is indicated in the encoding by wrapping the added content in a add element. More often, deletions and additions occur in combination, in which case the transcriber tries to reflect their order in the nesting of del and add elements. For example: forth, turned adrift and thought no more of out of This fragment is marked as an addition. Yet, Bentham had emended this addition by adding the phrase turned adrift (as a second-level addition). Later, he canceled this addition by deleting it again: that’s why this phrase is encoded inside add, with a nesting del, indicating that this added text had been deleted entirely. Further, at the end of this fragment, another addition is indicated with an add element. Again, this entire second-level addition had been deleted. Yet, since the encoder could not decipher the deleted text anymore, this is indicated by the empty gap element, which signals that text was present on the manuscript, but left out from the transcription. In order to record why the transcriber had decided to omit this text, a reason attribute could have been provided on gap (with a value such as illegible).

Where text could still be transcribed, but the encoder is not certain of the reading, this reading is recorded in an unclear element, as is the case with the word that, occurring in a deletion on the last but one line.

Finally, when the encoder spotted obvious mistakes, these have been identified with the sic element, as is the case with the word compleat. The encoder could equally have provided a correction by wrapping both the incorrect form (sic) and correction (corr) inside a choice element: compleat complete

home system neglect is as impossible, as attention is in the other foreign one.

Among savages, when to a certain degree a man is sick in body, he is cast out of society, and then forth, turned adrift and thought no more of out of no more thought of. sight or from thenceforth out of mind. Among nations In a nation, civilized in other respects, the same barbarity is still shown to this at least equally unable class of patients, in whose case the disorder patient, the seat of disorder has is in the mind. Not indeed to every order division in this last class of patients. For upon patients labouring under insanity, known or called and characterized by that name, no man has yet thought proposed prescribed a voyage to New South Wales. The inefficacy of such a prescription however could not be more compleat, in the case of that class description of patients, than in has hitherto been, and from the nature of the case ever must be, in the instance of that class the other description to which it continues to be applied.

Encoding of manuscript JB/116/010/001 by Jeremy Bentham (1802). TEI XML source available from .
Walt Whitman: After the Argument

This manuscript, featuring an early version of the poem After the Argument, was likely written in 1890 or early 1891, shortly before the poem’s publication.

A facsimile of the manuscript for After the Argument.

This example clearly illustrates how the TEI transcr module can be applied to verse texts as well. The entire poem is encoded inside lg type="poem", containing a heading (head) and two verse lines (l), in which physical line breaks have been maked with lb elements. As will be clear from the facsimile, this short manuscript features some complex editorial traces. Sequential deletions (del) and additions (add) are grouped into substitutions (subst). Moreover, inside the substitutions, the exact order of the editing interventions is specified by means of a sequence number in a seq attribute, making explicit that the deletions occurred before the additions.The seq attribute is a more advanced concept documented in chapter 11 Representation of Primary Sources of the TEI Guidelines. Notice how this explicit sequence number is not strictly needed here, as deletions logically precede additions, and only one deletion is involved.

This example illustrates nicely how additions and deletions can nest. In both cases in the example, an addition contains further deletions. The rend attribute is used on del and add in order to encode the way in which deletions (overstrike or overwrite) and additions (insertion, overwrite, unmarked) have been realised on the manuscript. An additional place attribute on add indicates if the additions are located supralinear, over existing text, or inline.

Page breaks are indicated with pb elements, indicating the type of the manuscript page (recto or verso) with the type attribute. The facs attribute points to a digital facsimile of the page indicated with the pb element.

After an the unsolv'd argument The Coming in, a A group of little children, and their ways and chatter, flow in, upon me Like welcome rippling water o'er my heated nerves and flesh. Walt Whitman Encoding of manuscript loc.00001 of the Walt Whitman Archive (Whitman 1890). TEI XML source available from .
Bentham, Jeremy. 1802. Manuscript JB/116/010/001. Manuscript encoded and made available by the Transcribe Bentham project at . Whitman, Walt 1890. After the Argument. Manuscript encoded and made available by the Walt Whitman Archive at .