Module 7: Critical Editing

6. Caveats

Until the release of version 2.9.1 of the TEI Guidelines in 2015, <lem> and <rdg> could only contain phrase-level elements. For a very long time, this had caused problems for variants that involve larger structural units. Yet, since version 2.9.1, <lem> and <rdg> can contain chunk-level elements such as <div>, <p>, <ab>, <lg>, and <l>. This addition has greatly increased the use of <lem> and <rdg> for encoding real-life textual variation.

One tough problem remains, however, when textual variation occurs on a structural level. For example, if you look closely at the facsimiles of the TEI Guidelines above (see figures 1, 2, 3, and 4), you’ll notice that there is a paragraph shift at the sentence starting with “Historically, the word markup has been used ”:

  • in the p2 and p3 versions, this sentence starts the third paragraph
  • in the p4 and p5 versions, this sentence is part of the second paragraph

This poses a harder encoding problem, as it involves markup itself (i.e., the end and start tag of the third paragraph are the subject of variation). As XML requires proper nesting of elements, this is a problem in any XML representation of this kind of structural variation. Again, two strategies could be followed (none of which is ideal, however):

  • Encode structural variants as variant structures. However, this may obscure their alignment.
  • Encode structural variants using milestone elements instead of full-blown XML structures. However, depending on your view on the texts, this could be considered a less orthodox approach, as it implies some notion of a base text that determines the encoding of the others.

The first option would compare the individual transcriptions of these text witnesses, some of which spread more or less the same textual contents over 3 paragraphs, while others use only 2 paragraphs. In a parallel segmented apparatus, this might look as follows:

<app xmlns="http://www.tei-c.org/ns/1.0">
<rdg wit="#p2​_p #p3​_p">
<p>S​GML is an international standard for the description of marked​-up electronic text. More exactly, S​GML is a
<app>
<rdg wit="#p2​_p">metalanguage</rdg>
<rdg wit="#p3​_p">
<hi>metalanguage</hi>
</rdg>
</app>
, that is, a means of formally describing a language, in this case, a
<app>
<rdg wit="#p2​_p">markup language</rdg>
<rdg wit="#p3​_p">
<hi>markup language</hi>
</rdg>
</app>
. Before going any further we should define these terms.</p>
</rdg>
<rdg wit="#p4 #p5"/>
</app>
<p xmlns="http://www.tei-c.org/ns/1.0">
<app>
<rdg wit="#p4">X​ML is an extensible markup language used for the description of marked​-up electronic text. More exactly, X​ML is a
<hi>metalanguage</hi>
, that is, a means of formally describing a language, in this case, a
<hi>markup language</hi>
.</rdg>
<rdg wit="#p5">Strictly speaking, X​ML is a metalanguage, that is, a language used to describe other languages, in this case, markup languages.</rdg>
<rdg wit="#p2​_p #p3​_p"/>
</app>
Historically, the word
<!-- ... -->
</p>
Example 29. Encoding structural variants as variant structures.

This approach treats the shifting paragraph as a variant in its own right, that is present in some witnesses (p2 and p3), while absent in the others (p4 and p5). The second apparatus entry then omits the text of p2 and p3, while including the (corresponding) text of p4 and p5. However, as this example illustrates, the alignment of the corresponding text fragments between both groups of witnesses (those starting a new paragraph and those that don’t) is lost: there is no way of telling how the phrases “SGML is an international standard … . More exactly, SGML …” (in p2 and p3) and “XML is an extensible markup language … . More exactly, XML …” correspond. This kind of encoding could be less problematic when generating an electronic critical edition (in which case the more complicated apparatus encoding could be generated by an automatic collation routine). When creating a digital edition, the construction of such a more complex apparatus entry could be less desirable.

The other solution would be to encode the paragraph break in the p2 and p3 versions using an empty “milestone” marker: an empty element that indicates some kind of structural boundary in the text where it occurs, as in this parallel segmented example:

<p xmlns="http://www.tei-c.org/ns/1.0">
<app>
<rdg wit="#p2 #p3">S​GML is an international standard for the description of marked​-up electronic text. More exactly</rdg>
<rdg wit="#p4">X​ML is an extensible markup language used for the description of marked​-up electronic text. More exactly</rdg>
<rdg wit="#p5">Strictly speaking</rdg>
</app>
,
<app>
<rdg wit="#p2 #p3">S​GML</rdg>
<rdg wit="#p4 #p5">X​ML</rdg>
</app>
is a
<app>
<rdg wit="#p2 #p5">metalanguage</rdg>
<rdg wit="#p3 #p4">
<hi>metalanguage</hi>
</rdg>
</app>
, that is, a
<app>
<rdg wit="#p2 #p3 #p4">means of formally describing a language</rdg>
<rdg wit="#p5">language used to describe other languages</rdg>
</app>
, in this case,
<app>
<rdg wit="#p2">a markup language</rdg>
<rdg wit="#p3 #p4">a
<hi>markup language</hi>
</rdg>
<rdg wit="#p5">markup languages</rdg>
</app>
.
<app>
<rdg wit="#p2 #p3">Before going any further we should define these terms.
<milestone unit="p"/>
</rdg>
<rdg wit="#p4 #p5"/>
</app>
Historically, the word
<!-- ... -->
</p>
Example 30. Encoding structural variation with “milestone” markers.

Since the milestone paragraph boundary marker (<milestone unit="p"/>) removes the intrusive XML boundaries, this allows us to compare the text between all versions. However, this implies that the encoding of the third paragraph in the p2 and p3 versions is suppressed, in contrast to the other paragraphs in these text versions. This could be less a problem when creating an electronic critical edition, rather than when generating one. In the latter case, the milestone encoding would reflect a dependency on a base text (that does not have the paragraph break). Moreover, it presupposes some kind of structural alignment prior to the encoding of the individual texts.

Summary

Problems can arise when the variation involves text structures as well, giving rise to problems of overlapping XML structures. This can be avoided by either ignoring the possible alignment of such structures in the apparatus, or paraphrasing some structural boundaries with empty milestone elements.

Bibliography

  • Vanhoutte, Edward, and Ron Van den Branden. 2009. “Describing, Transcribing, Encoding, and Editing Modern Correspondence Material: a Textbase Approach.” Literary and Linguistic Computing 24 (1): 77–98. 10.1093/llc/fqn035.