Module 7: Critical Editing

2. Textual Variation

Similar to all other TEI modules, the elements and attributes defined in the TEI textcrit module can be used for the encoding of existing source materials (be they in print or digital form), or the encoding of electronic documents from scratch. However, the use of this module in the context of electronic critical editing, adds another perspective to this traditional authorial/editorial angle (see Vanhoutte and Van den Branden 2009). Electronic or digital critical editions can be created from scratch either by encoding different primary source materials straight as a critical edition, or by generating the edition from previously encoded electronic transcriptions of those materials as independent texts in their own right. Therefore, the tags defined in the TEI textcrit module can be used to:

digitise
an existing print edition
create
a digital edition, e.g., by recording some or all of the known variations among different witnesses to the text in a critical apparatus of variants
generate
a digital edition from encoded transcriptions of the documentary source material

In the examples in this TBE module, “critical editing with TEI” will be understood as the act of encoding material sources in a TEI representation that allows for the creation or generation of a digital edition in some form (using any output format in the digital medium, e.g., HTML pages, PDF, flash movies,...), rather than digitising an existing critical edition. In this sense, the authorial/editorial angle of this TBE module differs from that of the other modules (focusing on the digitisation of a material source text in a certain genre). However, the strategies discussed in this tutorial for representing textual variation can equally be applied to the digitisation of existing critical editions. Where there are differences, these will be pointed out explicitly.

For example, consider following texts:

Figure 1. A page of version P2 of the TEI Guidelines.
Figure 1. A page of version P2 of the TEI Guidelines.
Figure 2. A page of version P3 of the TEI Guidelines.
Figure 2. A page of version P3 of the TEI Guidelines.
Figure 3. A page of version P4 of the TEI Guidelines.
Figure 3. A page of version P4 of the TEI Guidelines.
Figure 4. A page of version P5 of the TEI Guidelines.
Figure 4. A page of version P5 of the TEI Guidelines.

Some of these images may look more or less familiar to you: they are facsimiles from the first page of chapter 2 of the printed TEI Guidelines throughout their different incarnations, from version P2 (1992) to the latest version, P5 (2009). As you can imagine, the technological evolutions of these 17 years have prompted considerable changes to this chapter that introduces the technological background of text encoding with TEI, ranging from rephrasing, addition or deletion of notes, changes in italicisation, restructuring of paragraphs, etc. One way of approaching this textual variation could consist of encoding these text versions as physically distinct TEI documents, in which corresponding text structures could be aligned by a common identification mechanism. For example, the first couple of paragraphs in these 4 text witnesses could be encoded in different TEI documents as follows:

P2 P3
<pb xmlns="http://www.tei-c.org/ns/1.0" n="2"/>
<head xmlns="http://www.tei-c.org/ns/1.0">Chapter 2
<lb/>
A G​ENTLE I​NTRODUCTION TO S​GML</head>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p1" corresp="P3​.xml​#p1 P4​.xml​#p1 P5​.xml​#p1">The encoding scheme defined by these Guidelines is formulated as an application of a system known as the Standard Generalized Markup Language (S​GML).
<note place="foot" xml:id="n1" corresp="P3​.xml​#n1 P4​.xml​#n2">
<bibl>
<editor>International Organization for Standardization</editor>
,
<title>I​SO 8879: Information processing​--Text and office systems​--Standard Generalized Mark​-up Language (S​GML)</title>
, ([
<pubPlace>Geneva</pubPlace>
]:
<publisher>I​SO</publisher>
,
<date>1986</date>
).</bibl>
Although widely said to be short for the surnames of its progenitors, the official expansion of this abbreviation is "Standard Generalized Markup Language."</note>
S​GML is an international standard for the definition of device​-independent, system​-independent methods of representing texts in electronic form. This chapter presents a brief tutorial guide to its main features, for those readers who have not encountered it before. For a more technical account of T​EI practice in using the S​GML standard, see chapter 30, "T​EI Conformance," [in separate fascicle]; for a more technical description of the subset of S​GML used by the T​EI encoding scheme, see chapter 39, "Formal Grammar for the T​EI-Interchange​-Format Subset of S​GML," [in separate fascicle].</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2a" corresp="P3​.xml​#p2a P4​.xml​#p2 P5​.xml​#p2">S​GML is an international standard for the description of marked​-up electronic text. More exactly, S​GML is a metalanguage, that is, a means of formally describing a language, in this case, a markup language. Before going any further we should define these terms.</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2b" corresp="P3​.xml​#p2b P4​.xml​#p2 P5​.xml​#p2">Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font and so forth. As the formatting and printing of texts was automated, the term was extend​-ed to cover all sorts of special markup codes inserted into electronic texts to govern formatting, printing, or other processing.</p>
Example 1. An encoded page of version P2 of the TEI Guidelines.
<pb xmlns="http://www.tei-c.org/ns/1.0" n="13"/>
<head xmlns="http://www.tei-c.org/ns/1.0">Chapter 2
<lb/>
A Gentle Introduction to S​GML</head>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p1" corresp="P2​.xml​#p1 P4​.xml​#p1 P5​.xml​#p1">The encoding scheme defined by these Guidelines is formulated as an application of a system known as the Standard Generalized Markup Language (S​GML).
<note place="foot" xml:id="n1" corresp="P2​.xml​#n1 P4​.xml​#n2">
<bibl>
<editor>International Organization for Standardization</editor>
,
<title>I​SO 8879: Information processing - Text and office systems - Standard Generalized Markup Language (S​GML)</title>
, ([
<pubPlace>Geneva</pubPlace>
]:
<publisher>I​SO</publisher>
,
<date>1986</date>
)</bibl>
</note>
S​GML is an international standard for the definition of device​-independent, system​-independent methods of representing texts in electronic form. This chapter presents a brief tutorial guide to its main features, for those readers who have not encountered it before. For a more technical account of T​EI practice in using the S​GML standard, see chapter 28, "Conformance," on page 727. For a more technical description of the subset of S​GML used by the T​EI encoding scheme, see chapter 39, "Formal Grammar for the T​EI-Interchange​-Format Subset of S​GML," on page 1247.</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2a" corresp="P2​.xml​#p2a P4​.xml​#p2 P5​.xml​#p2">S​GML is an international standard for the description of marked​-up electronic text. More exactly, S​GML is a
<hi>metalanguage</hi>
, that is, a means of formally describing a language, in this case, a
<hi>markup language</hi>
. Before going any further we should define these terms.</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2b" corresp="P2​.xml​#p2b P4​.xml​#p2 P5​.xml​#p2">Historically, the word
<hi>markup</hi>
has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font and so forth. As the formatting and printing of texts was automated, the term was extended to cover all sorts of special
<hi>markup codes</hi>
inserted into electronic texts to govern formatting, printing, or other processing.</p>
Example 2. An encoded page of version P3 of the TEI Guidelines.
P4 P5
<pb xmlns="http://www.tei-c.org/ns/1.0" n="13"/>
<head xmlns="http://www.tei-c.org/ns/1.0">2 A Gentle Introduction to X​ML</head>
<note xmlns="http://www.tei-c.org/ns/1.0" type="disclaimer" xml:id="n1">As originally published in previous editions of the Guidelines, this chapter provided a gentle introduction to 'just enough' S​GML for anyone to understand how the T​EI used that standard. Since then, the Gentle Guide seems to have taken on a life of its own independent of the Guidelines, having been widely distributed (and flatteringly imitated) on the web. In revising it for the present draft, the editors have therefore felt free to reduce considerably its discussion of S​GML-specific matters, in favour of a simple presentation of how the T​EI uses X​ML.</note>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p1" corresp="P2​.xml​#p1 P3​.xml​#p1 P5​.xml​#p1">The encoding scheme defined by these Guidelines may be formulated either as an application of the I​SO Standard Generalized Markup Language (S​GML)
<note place="foot" corresp="P2​.xml​#n1 P3​.xml​#n1">
<bibl>
<editor>International Organization for Standardization</editor>
,
<title>I​SO 8879: Information processing - Text and office systems - Standard Generalized Markup Language (S​GML)</title>
, ([
<pubPlace>Geneva</pubPlace>
]:
<publisher>I​SO</publisher>
,
<date>1986</date>
)</bibl>
</note>
or of the more recently developed W3C Extensible Markup Language (X​ML)
<note place="foot" xml:id="n3">
<bibl>
<editor>World Wide Web Consortium</editor>
:
<title>Extensible Markup Language (X​ML) 1​.0</title>
, available from
<ref target="http​://www​.w3​.org​/TR/REC-xml">http​://www​.w3​.org​/TR/REC-xml</ref>
</bibl>
</note>
. Both S​GML and X​ML are widely​-used for the definition of device​-independent, system​-independent methods of storing and processing texts in electronic form; X​ML being in fact a simplification or derivation of S​GML. In the present chapter we introduce informally the basic concepts underlying such markup languages and attempt to explain to the reader encountering them for the first time how they are actually used in the T​EI scheme. Except where the two are explicitly distinguished, references to X​ML in what follows may be understood to apply equally well to the T​EI usage of S​GML. For a more technical account of T​EI practice see chapter 28
<hi>Conformance</hi>
; for a more technical description of the subset of S​GML used by the T​EI encoding scheme, see chapter 39
<hi>Formal Grammar for the T​EI-Interchange​-Format Subset of S​GML</hi>
.</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2" corresp="P2​.xml​#p2a P2​.xml​#p2b P3​.xml​#p2a P3​.xml​#p2b P5​.xml​#p2">X​ML is an extensible markup language used for the description of marked​-up electronic text. More exactly, X​ML is a
<hi>metalanguage</hi>
, that is, a means of formally describing a language, in this case, a
<hi>markup language</hi>
. Historically, the word
<hi>markup</hi>
has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font and so forth. As the formatting and printing of texts was automated, the term was extended to cover all sorts of special codes inserted into electronic texts to govern formatting, printing, or other processing.</p>
Example 3. An encoded page of version P4 of the TEI Guidelines.
<pb xmlns="http://www.tei-c.org/ns/1.0" n="xxxi"/>
<head xmlns="http://www.tei-c.org/ns/1.0">v
<lb/>
A Gentle Introduction to X​ML</head>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p1" corresp="P2​.xml​#p1 P3​.xml​#p1 P5​.xml​#p1">The encoding scheme defined by these Guidelines is formulated as an application of the Extensible Markup Language (X​ML) (Bray et al. (eds.) (2006)). X​ML is widely used for the definition of device​-independent, system​-independent methods of storing and processing texts in electronic form. It is now also the interchange and communication format used by many applications on the World Wide Web. In the present chapter we informally introduce some of its basic concepts and attempt to explain to the reader encountering them for the first time how and why they are used in the T​EI scheme. More detailed technical accounts of T​EI practice in this respect are provided in chapters
<hi>23. Using the T​EI</hi>
,
<hi>1. The T​EI Infrastructure</hi>
, and
<hi>22. Documentation Elements</hi>
of these Guidelines.</p>
<p xmlns="http://www.tei-c.org/ns/1.0" xml:id="p2" corresp="P2​.xml​#p2a P2​.xml​#p2b P3​.xml​#p2a P3​.xml​#p2b P4​.xml​#p2">Strictly speaking, X​ML is a metalanguage, that is, a language used to describe other languages, in this case, markup languages. Historically, the word markup has been used to describe annotation or other marks within a text intended to instruct a compositor or typist how a particular passage should be printed or laid out. Examples include wavy underlining to indicate boldface, special symbols for passages to be omitted or printed in a particular font, and so forth. As the formatting and printing of texts was automated, the term was extended to cover all sorts of special codes inserted into electronic texts to govern formatting, printing, or other processing.</p>
Example 4. An encoded page of version P5 of the TEI Guidelines.

This would allow for maximal representation of the distinct material sources, and leave the identification of the actual variation either to further processing or human inspection. A variant of this approach could integrate the transcriptions of the text in all material witnesses in a single TEI document, and make use of appropriate linking attributes to point out the alignment between the different text structures. In their naivety, such systems are both redundant and crude. While providing all text of all text witnesses, and aligning the corresponding text structures, they provide little insight in the places where the different witnesses actually differ.

In order to encode the actual textual variation between the different text versions in a meaningful way, the TEI Guidelines provide a specialised module of elements and attributes that allow you to encode textual variation at word level. This TBE tutorial will first discuss how to describe the different text witnesses represented in the critical edition; next deal with the encoding of textual variants between these witnesses in isolation; then treat different ways of integrating such records of variation within the encoding of the critical edition; and finally point out potential problems and pitfalls when creating a critical edition with TEI.

Bibliography

  • Vanhoutte, Edward, and Ron Van den Branden. 2009. “Describing, Transcribing, Encoding, and Editing Modern Correspondence Material: a Textbase Approach.” Literary and Linguistic Computing 24 (1): 77–98. 10.1093/llc/fqn035.