Module 6: Primary Sources
4. Editorial Interventions #
4.1. Unclear, Supplied, Omitted Text #
Depending on the quality of the source material or the handwriting, transcription of primary source texts may be more or less straightforward. As any further interpretation of an electronic transcription depends on this first interpretative act, it may be desirable for an encoder to indicate places of uncertainty, either for further inspection or to take intellectual responsibility. Text for which the reading is uncertain can be encoded in an <unclear> element. The reason for the unclear reading can be stated in a @reason attribute, which takes either a single keyword, or a white space separated list of keywords. If the legibility is affected by damage, the cause of the damage can be described in the @agent attribute. For example, as our previous transcription of the word “dioxide” as “diacxside” was quite uncertain, this could be indicated with the <unclear> element:
Similarly, if we decided that the damaged dateline at the start of the document could still be deciphered, the uncertain status of this part of the text could be indicated with an <unclear> element:
...or without the <damage> element, if this is deemed less important to the transcription:
Notice, however, that the use of the <unclear> element implies that the text it encloses must still be present in the document source, and still be legible to some degree. If the encoder considers text too unclear to be transcribed in any way, he or she may opt to omit this part of the text, and indicate this editorial intervention with a <gap> element. This is an empty element, whose sole purpose is to indicate the omission, possibly with characterisation of the reason (@reason), or the cause of the damage causing this omission, if any (@agent). As with the <damage> element, the extent of the omission can be specified implicitly with the @extent attribute, or more explicitly by combining the @unit and @quantity attributes.
For example, on page 3 of our sample text, the phrase “Yess!! The door!!!” is followed by some words that can’t all be deciphered confidently, as they appear to have been erased by the author. When transcribing this passage, we could opt to encode an informed guess and mark it with the <unclear> element, while leaving out the truly illegible words. This omission can be marked with <gap>:
Similarly, the damage in the dateline could be deemed too destructive for a confident reading of the day, which may motivate the encoder to leave it out. In this case, too, a <gap> element can be used, either within or without a surrounding <damage> element:
In contrast, the editor may wish to make a stronger intervention, by supplying text that is lacking from or illegible in the document source. This can be done by wrapping the added text in a <supplied> element. In a @reason attribute, the reason for this editorial addition can be given. For the dateline example, if the text is considered illegible, but the encoder feels able to reconstruct the date, this can result in following encoding:
This allows us to encode other lacking text as well: at the end of page 2, a couple of final words on some lines are incomplete due to xeroxing. These can be reconstructed fairly straightforwardly for the transcription. However, these reconstructions are best signalled with the <supplied> element:
Note
Notice, the crucial difference between the encoding of text added or deleted by the author or editor of the source document on the one hand, and by the encoder of the electronic transcription on the other hand. Additions or deletions present in the source may only be encoded respectively as <add> and <del>, while text that has been added or deleted by editorial emendation must be encoded as <supplied> or <gap>, respectively.Summary
When text in the source document is still partly legible, but needs interpretation in order to be transcribed, this uncertainty can be expressed by enclosing the text in an <unclear> element. If text is deemed totally illegible, it can be omitted from the transcription, but signalled with a <gap> element (without any content). Both elements can indicate the reason for the editorial intervention (@reason), and the nature of the damage (@agent). An editor wishing to supply text in the electronic transcription for illegible or lacking text in the source text, can encode this supplied text with the <supplied> element. In a @reason attribute, the reason for this intervention can be stated.4.2. Corrections #
If we look back at the comparison between the facsimiles (see figure 1) and the initial transcription (see example 1), we notice that a lot of words have been silently corrected by the transcriber. Although some errors had been corrected by the teacher (who can be considered an editor or corrector of the source document), many have slipped through. Depending on the aim of the transcription, such apparent errors may be transcribed unmediatedly, corrected silently, marked explicitly, or corrected explicitly. All of these practices are perfectly legitimate as long as they are applied consistently and motivated in the <editorialDecl> element of the electronic document’s header. An encoder adhering to a more explicit practice would like to at least signal apparent errors, editorial corrections, or both. The TEI provides specific elements for this purpose: <sic>, for indicating apparent errors, and <corr>, for indicating editorial corrections. For example, the sentence “Now I know what all the black air is: all the polution.” on page 2 could be transcribed as follows:
...if the encoder would be interested in transcribing the source text as accurately as possible, or:
...if the content matters most (perhaps to ease searching operations in a digital edition of the text). However, both “views” on the text can be combined in a <choice> element.
Note
Again, a crucial distinction must be pointed out between encoding of corrections present in the source document, and editorial corrections in the electronic transcription. The latter must always be encoded using <sic> and <corr>, possibly wrapped in a <choice> element. Corrections present in the source document must be encoded using combinations of <del> and <add>, possibly grouped in a <subst> element, and preferably specified with attributes identifying the responsible document hand (@hand), and the editor responsible for this identification (@resp).This enables an encoder to express alternative encodings of the same text. Both views could thus be combined as:
The <sic> element may contain all elements that are necessary to represent the original source text, like deletions, damage, and so on. The “diacxside” fragment on page two can thus be corrected as follows: