Module 6: Primary sources

4. Editorial interventions

4.1. Unclear, supplied, omitted text

Depending on the quality of the source material or the handwriting, transcription of primary source texts may be more or less straightforward. As any further interpretation of an electronic transcription depends on this first interpretative act, it may be desirable for an encoder to indicate places of uncertainty, either for further inspection or to take intellectual responsibility. Text for which the reading is uncertain can be encoded in an <unclear> element. The reason for the unclear reading can be stated in a @reason attribute. If it has been caused by an identifiable human, the @hand attribute can provide a link to the definition of a hand in the TEI header of the transcription. If the legibility is affected by damage, the cause of the damage can be described in the @agent attribute. For example, as our previous transcription of the word dioxide as diacxside was quite uncertain, this could be indicated with the <unclear> element:
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title>There and Back Again: digital edition</title>
<author xml:id="HannaRenton">Hanna Renton</author>
<editor xml:id="TBE">The TBE crew</editor>
<!--...-->
</titleStmt>
<!--...-->
</fileDesc>
<!--...-->
</teiHeader>
<text>
<body>
<!--...-->
<p>
<!--...-->
di<unclear reason="hardly legible substitution" resp="#TBE" hand="#HR">
<subst>
<del rend="overwritten">ox</del>
<add>acx</add>
</subst>
</unclear>side
<!--...-->
</p>
<!--...-->
</body>
</text>
</TEI>
Similarly, if we decided that the damaged dateline at the start of the document could still be deciphered, the uncertain status of this part of the text could be indicated with an <unclear> element:
<dateline>
<date when="2008-08-26"><damage agent="stapling" hand="#teacher" unit="chars" quantity="3">
<unclear>26/</unclear>
</damage>8/08</date>
</dateline>
...or without the <damage> element, if this is deemed less important to the transcription:
<dateline>
<date when="2008-08-26"><unclear reason="damage" agent="stapling" hand="#teacher" unit="chars" quantity="3">26/</unclear>8/08</date>
</dateline>
Note, however, that the use of the <unclear> element implies that the text it encloses must still be present in the document source, and still be legible to some degree. If the encoder considers text too unclear to be transcribed in any way, he or she may opt to omit this part of the text, and indicate this editorial intervention with a <gap> element. This is typically used as an element, only to indicate the omission, possibly with characterisations of the reason (@reason), the hand responsible for this reason, if any (@hand), or the damage causing this decision, if any (@agent). As with the <damage> element, the extent of the omission can be specified implicitly with the @extent attribute, or more explicitly by combining the @unit and @quantity attributes.
For example, on page 3 of our sample text, the phrase Yess!! The door!!! is followed by some words that can't all be deciphered confidently, as they appear to have been erased by the author. When transcribing this passage, we could opt to encode an informed guess and mark it with the <unclear> element, while leaving out the truly illegible words. This omission can be marked with <gap>:
Yess!! The door!!! <unclear reason="erasure" hand="#HR" resp="#TBE"><gap hand="#HR" unit="cm" quantity="2,5"/> I got out.</unclear>
Similarly, the damage in the dateline could be deemed too destructive for a confident reading of the day, which may motivate the encoder to leave it out. In this case, too, a <gap> element can be used, either within or without a surrounding <damage> element:
<dateline>
<date when="2008-08-26"><damage agent="stapling" hand="#teacher" unit="chars" quantity="3">
<gap unit="chars" quantity="3"/>
</damage>8/08</date>
</dateline>
In contrast, the editor may wish to make a stronger intervention, by supplying text that is lacking from or illegible in the document source. This can be done by wrapping the added text in a <supplied> element. In a @reason attribute, the reason for this editorial addition can be given. For the dateline example, if the text is considered illegible, but the encoder feels able to reconstruct the date, this can result in following encoding:
<dateline>
<date when="2008-08-26"><damage agent="stapling" hand="#teacher" unit="chars" quantity="3">
<supplied resp="#TBE">26/</supplied>
</damage>8/08</date>
</dateline>
This allows us to encode other lacking text as well: at the end of page 2, a couple of final words on some lines are incomplete due to xeroxing. These can be reconstructed fairly straightforwardly for the transcription. However, these reconstructions are best signalled with the <supplied> element:
<p>
<!--...-->
Nothing was happeni<supplied reason="cutoff while xeroxing" resp="#TBE">ng.</supplied> Then, <quote>Hello, dearie</quote>, an old woman answere<supplied reason="cutoff while xeroxing" resp="#TBE">d.</supplied> <quote>Who is it?</quote> I hung up. Uh, oh. How do I get back then?</p>

Note:

Note, the crucial difference between the encoding of text added or deleted by the author or editor of the source document on the one hand, and by the encoder of the electronic transcription on the other hand. Additions or deletions present in the source may only be encoded respectively as <add> and <del>, while text that has been added or deleted by editorial emendation must be encoded as <supplied> or <gap>, respectively.

Summary

When text in the source document is still partly legible, but needs interpretation in order to be transcribed, this uncertainty can be expressed by enclosing the text in an <unclear> element. If text is deemed totally illegible, it can be omitted from the transcription, but signalled with a <gap> element (without any content). Both elements can indicate the reason for the editorial intervention (@reason), the human responsible for this reason (@hand), or the nature of the damage (@agent). An editor wishing to supply text in the electronic transcription for illegible or lacking text in the source text, can encode this supplied text with the <supplied> element. In a @reason attribute, the reason for this intervention can be stated.

4.2. Corrections

If we look back at the comparison between the facsimiles and the initial transcription, we notice that a lot of words have been silently corrected by the transcriber. Although some errors had been corrected by the teacher (who can be considered an editor or corrector of the source document), many have slipped through. Depending on the aim of the transcription, such apparent errors may be transcribed unmediatedly, corrected silently, marked explicitly, or corrected explicitly. All of these practises are perfectly legitimate as long as they are applied consistently and motivated in the <editorialDecl> element of the electronic document's header. An encoder adhering to a more explicit practise would like to at least signal apparent errors, editorial corrections, or both. The TEI provides specific elements for this purpose: <sic>, for indicating apparent errors, and <corr>, for indicating editorial corrections. For example, the sentence Now I know what all the black air is:: all the polution. on page 2 could be transcribed as follows:
Now I know what all the black air is: all the <sic>polution</sic>.
...if the encoder would be interested in transcribing the source text as accurately as possible, or:
Now I know what all the black air is: all the <corr>pollution</corr>.
...if the content matters most (perhaps to ease searching operations in a digital edition of the text). However, both 'views' on the text can be combined in a <choice> element. This enables an encoder to express alternative encodings of the same text. Both views could thus be combined as:
Now I know what all the black air is: all the <choice>
<sic>polution</sic>
<corr>pollution</corr>
</choice>.
The <sic> element may contain all elements that are necessary to represent the original source text, like deletions, damage, and so on. The diacxside fragment on page two can thus be corrected as follows:
<choice>
<sic>di<subst resp="#TBE">
<del hand="#HR" rend="overwritten">ox</del>
<add hand="#HR">acx</add>
</subst>side</sic>
<corr>dioxide</corr>
</choice>

Note:

Again, a crucial distinction must be pointed out between encoding of corrections present in the source document, and editorial corrections in the electronic transcription. The latter must always be encoded using <sic> and <corr>, possibly wrapped in a <choice> element. Corrections present in the source document must be encoded using combinations of <del> and <add>, possibly grouped in a <subst> element, and preferably specified with attributes identifying the responsible document hand (@hand), and the editor responsible for this identification (@resp).

Summary

Apparent errors in the source text may be indicated explicitly in a <sic> element, or corrected with a <corr> element. Both the original and the correction can be included in the transcription, if they are wrapped in a <choice> element.