Module 7: Critical Editing

6. Caveats

While the <app> element provides a powerful and efficient means for representing textual variation, some caveats must be pointed out. First off, the content model for the <lem> and <rdg> elements is limited to phrase-level elements. This may pose a problem for variants that involve larger structural units, like the addition or deletion of a paragraph. For example, if we consider the first block of text in the #p4 version:
 As originally published in previous editions of the Guidelines, this
 chapter provided a gentle introduction to 'just enough' SGML for anyone to
 understand how the TEI used that standard. Since then, the Gentle Guide
 seems to have taken on a life of its own independent of the Guidelines,
 having been widely distributed (and flatteringly imitated) on the web. In
 revising it for the present draft, the editors have therefore felt free to
 reduce considerably its discussion of SGML-specific matters, in favour of a
 simple presentation of how the TEI uses XML. 
All other versions start with the phrase The encoding scheme defined by these Guidelines, which is the start of the second paragraph in version #p4 as well. Therefore, this block of text can be considered an addition compared to the earlier versions, while it was deleted in version #p5. Intuitively, one might wish to encode this as follows:
<app>
<rdg wit="#p4"> <p>As originally published in previous editions of the Guidelines, this chapter provided a gentle introduction to 'just enough' SGML for anyone to understand how the TEI used that standard. Since then, the Gentle Guide seems to have taken on a life of its own independent of the Guidelines, having been widely distributed (and flatteringly imitated) on the web. In revising it for the present draft, the editors have therefore felt free to reduce considerably its discussion of SGML-specific matters, in favour of a simple presentation of how the TEI uses XML.</p> </rdg>
<rdg wit="#p2 #p3 #p5"/>
</app>
However, <p> being a member of a 'chunk-level' model class of elements, it is not allowed as contents of <rdg>. There are two ways of solving such problems:
  • changing the encoding: if the content allows it, you can look for alternative ways to encode the contents (without resorting to tag abuse, however!)
  • changing the TEI scheme: by redefining the content model of <rdg>, you can make sure that the encoding validates in a TEI-conformant way. For details on how to customise a TEI scheme, see TBE module 8. Customising TEI, ODD, Roma.
In this example, the contents of the variant text block permits an interpretation as a note that could be characterised as a disclaimer. In TEI, the <note> element is a member of the global model class, and thus may occur inside <rdg>. A valid alternative to the previous encoding could be:
<app>
<rdg wit="#p4">
<note type="disclaimer">As originally published in previous editions of the Guidelines, this chapter provided a gentle introduction to 'just enough' SGML for anyone to understand how the TEI used that standard. Since then, the Gentle Guide seems to have taken on a life of its own independent of the Guidelines, having been widely distributed (and flatteringly imitated) on the web. In revising it for the present draft, the editors have therefore felt free to reduce considerably its discussion of SGML-specific matters, in favour of a simple presentation of how the TEI uses XML.</note>
</rdg>
<rdg wit="#p2 #p3 #p5"/>
</app>
A harder problem, however, occurs when the variation occurs on a structural level. For example, if you look closely at the facsimiles of the TEI Guidelines above, you'll notice that there is a paragraph shift at the sentence starting with Historically, the word markup has been used :
  • in the #p2 and #p3 versions, this sentence starts the third paragraph
  • in the #p4 and #p5 versions, this sentence is part of the second paragraph
This poses a harder problem to the representation, as it involves markup itself (i.e. the end and start tag of the third paragraph are the subject of variation). As XML requires proper nesting, this is a problem in any XML representation of this kind of structural variation. Again, two strategies could be followed (none of which is ideal, however):
  • encode structural variants as variant structures. However, this may obscure their alignment.
  • encode structural variants using milestone elements instead of full-blown XML structures. However, depending on your view on the texts, this could be considered less orthodox from an encoding point of view, as it implies some notion of a base text that determines the encoding of the others.
The first option would compare the individual transcriptions of these text witnesses, some of which spread more or less the same textual contents over 3 paragraphs, while others use only 2 paragraphs. In a parallel segmented apparatus, this might look as follows:
<app>
<rdg wit="#p2_p #p3_p">
<p>SGML is an international standard for the description of marked-up electronic text. More exactly, SGML is a <app>
<rdg wit="#p2_p">metalanguage</rdg>
<rdg wit="#p3_p">
<hi>metalanguage</hi>
</rdg>
</app>, that is, a means of formally describing a language, in this case, a <app>
<rdg wit="#p2_p">markup language</rdg>
<rdg wit="#p3_p">
<hi>markup language</hi>
</rdg>
</app>. Before going any further we should define these terms.</p>
</rdg>
<rdg wit="#p4 #p5"/>
</app>
<p><app>
<rdg wit="#p4">XML is an extensible markup language used for the description of marked-up electronic text. More exactly, XML is a <hi>metalanguage</hi> , that is, a means of formally describing a language, in this case, a <hi>markup language</hi> . </rdg>
<rdg wit="#p5">Strictly speaking, XML is a metalanguage, that is, a language used to describe other languages, in this case, markup languages. </rdg>
<rdg wit="#p2_p #p3_p"/>
</app>Historically, the word
<!-- ... -->
</p>

Note:

Note, how the treatment of the <p> element in the first apparatus entry would require a modification of the TEI scheme.
This approach treats the shifting paragraph as a variant in its own right, that is present in some witnesses (#p2 and #p3), while absent in the others (#p4 and #p5). The second apparatus entry then omits the text of #p2 and #p3, while including the (corresponding) text of #p4 and #p5. However, as this example illustrates, the alignment of the corresponding text fragments between both groups of witnesses (those starting a new paragraph and those that don't) is lost: there is no way of telling how the phrases SGML is an international standard [...] .More exactly, SGML [...] (in #p2 and #p3) and XML is an extensible markup language [...]. More exactly, XML [...] correspond. This kind of encoding could be less problematic when generating an electronic critical edition (in which case the more complicated apparatus encoding could be generated by an automatic collation routine). When creating an electronic edition, the construction of such a more complex apparatus entry could be less desirable.
The other solution would be to encode the paragraph break in the #p2 and #p3 versions using an empty milestone marker: an empty element that indicates some kind of structural boundary in the text where it occurs, as in this parallel segmented example:
<p><app>
<rdg wit="#p2 #p3">SGML is an international standard for the description of marked-up electronic text. More exactly</rdg>
<rdg wit="#p4">XML is an extensible markup language used for the description of marked-up electronic text. More exactly</rdg>
<rdg wit="#p5">Strictly speaking</rdg>
</app>, <app>
<rdg wit="#p2 #p3">SGML</rdg>
<rdg wit="#p4 #p5">XML</rdg>
</app> is a <app>
<rdg wit="#p2 #p5">metalanguage</rdg>
<rdg wit="#p3 #p4">
<hi>metalanguage</hi>
</rdg>
</app>, that is, a <app>
<rdg wit="#p2 #p3 #p4">means of formally describing a language</rdg>
<rdg wit="#p5">language used to describe other languages</rdg>
</app>, in this case, <app>
<rdg wit="#p2">a markup language</rdg>
<rdg wit="#p3 #p4">a <hi>markup language</hi></rdg>
<rdg wit="#p5">markup languages</rdg>
</app>. <app>
<rdg wit="#p2 #p3">Before going any further we should define these terms. <milestone type="p"/></rdg>
<rdg wit="#p4 #p5"/>
</app>Historically, the word
<!-- ... -->
</p>
Since the milestone paragraph boundary marker (<milestone type="p"/>) removes the intrusive XML boundaries, this allows us to compare the text between all versions. However, this implies that the encoding of the third paragraph in the #p2 and #p3 versions is suppressed, in contrast to the other paragraphs in these text versions. This could be less a problem when creating an electronic critical edition, rather than generating one. In the latter case, the milestone encoding would reflect a dependency on a base text (that does not have the paragraph break). Moreover, it presupposes some kind of structural alignment prior to the encoding of the individual texts.

Summary

Problems may arise when textual variation occurs on above-paragraph level, as the <rdg> element may only contain phrase-level elements. Such problems may be overcome by trying to look for an alternative (phrase-level) encoding of such text structures, or by modifying the TEI scheme so that the content model of <rdg> is widened. Other problems can arise when the variation involves text structures as well, giving rise to problems of overlapping XML structures. This can be avoided by either ignoring the possible alignment of such structures in the apparatus, or paraphrasing some structural boundaries with empty milestone elements.