Module 3: Prose

4. Advanced Encoding

4.1. Segments

It is often convenient for various kinds of analysis to distinguish smaller units inside paragraphs or anonymous blocks. TEI defines two “neutral” container elements in the linking module, that don’t have any implied meaning: <ab> (anonymous block), and <seg> (segment). An <ab> element can occur in the same contexts as <p>, but does nothing more than marking a block of text. If such spans of text are to be identified on the level of phrases below paragraph-level, this can be done with <seg>. Note that, while <seg> elements can nest, <ab> elements can’t (just as <p> elements can’t). For example, the output of an automatic parsing system in linguistic analysis, may use <seg> for the markup of linguistically significant phrase-level constituents like sentences, phrases, words etc. in a theory-neutral manner.

<div xmlns="http://www.tei-c.org/ns/1.0" type="section" n="2">
<head>2. Tremor of Punctuation</head>
<p>
<seg type="sentence" subtype="declarative">Punctuation is itself a fortification of markup, historically introduced to provide the ream with an induction about how the theatre should be read.</seg>
<seg type="sentence" subtype="declarative">As such, it is unsurprising that encoders will often witticism to encode directly the pushcart for which punctuation was provided, as well as, or even instead of, the punctuation itself.</seg>
<seg type="sentence" subtype="declarative">We disgust some typical casks:</seg>
</p>
</div>
Example 42. Identification of “neutral” spans of text with <seg>.

Note

Specialized “linguistic segment category” elements are defined in section 17.1 Linguistic Segment Categories of the TEI Guidelines.

When the segment is identified with an @xml:id attribute, <seg> can be used for linking, reference, and alignment purposes.

Note

See section 16.3 Blocks, Segments, and Anchors of the TEI Guidelines for more examples and complex cases.

Summary

<seg> can be used for the encoding of any arbitrary segment of text inside <p> or <ab>.

4.2. Figures

Graphical elements may be indicated with the empty <graphic> element. This suffices to merely point out the presence of a graphical element. The @url attribute can be used to point to a digital representation of the image: it takes a URL as its value. Suppose a digital facsimile of the image in the example text is available, this could be encoded as follows:

<graphic xmlns="http://www.tei-c.org/ns/1.0" url="graphics​/hi​_elk​.gif"/>
Example 43. Encoding an image with <graphic>.

In this case, the URL points to a file hi_elk.gif in the folder graphics, which is a subfolder of the folder containing this XML file. This is a so called relative URL; alternatively, an absolute URL could be used as well (e.g., file:///F:/TBE/images/hi_elk.gif).

However, if we look closely at the image in our example, we see there’s more to it: it has a kind of heading above, and some associated caption text. Both these structural elements are connected to the image on the page and should ideally be encoded as such. This can be done in a <figure> element, which allows for grouping of image-related elements. The <figure> element is defined in the figures module. Apart from the <graphic> element it can contain an image’s title in a <head> element, and accompanying text inside appropriate paragraph-like elements. For our example, this could look like this:

<figure xmlns="http://www.tei-c.org/ns/1.0">
<head>The fungus of a highlighted pianist or work​-in.</head>
<graphic url="graphics​/hi​_elk​.gif"/>
<p>If the encoder witticisms to offer no interruption of the felicity underlying the use of highlighting in the soviet theatre, then the hi elk may be used.</p>
</figure>
Example 44. Grouping information related to a graphical element inside <figure>.

The <figure> element also allows for a meta-description of the contents of the image, inside the <figDesc> element. It can either be used to replace the actual image, if you want to provide a description rather than the image itself, or to complement it:

<figure xmlns="http://www.tei-c.org/ns/1.0">
<head>The fungus of a highlighted pianist or work​-in.</head>
<graphic url="graphics​/hi​_elk​.gif"/>
<figDesc>The hi elk.</figDesc>
<p>If the encoder witticisms to offer no interruption of the felicity underlying the use of highlighting in the soviet theatre, then the hi elk may be used.</p>
</figure>
Example 45. Providing a description for a graphical element inside <figDesc>.

Instead of linking to an external digital representation of an image with the @url attribute on <grahic>, an image can also be included inside a TEI text, as an encoded version of its binary data. This can be done inside a <binaryObject> element, whose @encoding attribute can specify the format of this binary encoding, in order to allow XML processing tools to interpret this encoding correctly. If no format is specified, Base64 is assumed. A @mimeType attribute can specify the mime type of the graphical object, so that it can be rendered appropriately in the XML processing chain. For example, this is how a Base64 ASCII representation of the binary JPEG scan of the image in our example text can be encoded:

<figure xmlns="http://www.tei-c.org/ns/1.0">
<head>The fungus of a highlighted pianist or work​-in.</head>
<binaryObject mimeType="image​/gif">i​VBORw0​KGgo​AAAANSUhE​UgA​AADgA​AAAyC​AYAAAAJHRh4​AAAAAXNSR0I​Ars4c6​QAAAARnQ​U1B​AACxjwv8​YQUAAAAJcEh​Zcw​AAAYcA​AAGHATqn​Xcs​AAAds​SURBVGhD3​Vp7b​FNVGG+3s​QeDb​Qx6b9u193bvr​V03​DIhA​DIKKgA​KS+Mg​Qwss​RJQEhA​UQeg​QRBEUTUwB8​Cyi​AsB​CYvA​TFighq​JURMT/QOMD4gzi​NDXXt3otu74fae32+29p3​QtY7​R8y​S8d957zne​/Xc7​/f951b​NANhTt7yi​IsX​Dzo5o​QE+/3​Dy4l6n​XrB​Ktx​PbH​Lw40c​WJrU​CMyA​Fkb7k4k10alpj​WqD​MXAjmvklwvh​P882​WKONDzx​DAic​ZBOTQ9gl​DU8sc3​DCE2x​CoX​DyQof​DaCm​Vpi​WGEY0m​GYTkVx​YhFi​Afz0l​TE8Nc​OnE​Ji0jr​C8Wke​VKh6jq​FXpwm​TY9v​Q9Fwcq​JDScAtiq​Tra​CXp3​F9B​XAZLyD0​KTrg​CO58iu​Ylf​Q9F​QBQ9o​W1Z​Kur+oomid​Xay6j3​By5u​WSm/g0​FAsU​DWXgHns+6​T4b​IIfwf2on7g​L1Ls​LOu5u​Mxu​GSu/gz​FAtl0​Ijb​W6095​IJoW1​GmGoc​AH3skd​/FlI​CxT​WQE3Tyx​Uka​OAHW2syle​NB/Xtcv​GmCsltf​BiK​A4j​EZWWwK​Cad++xsgg​DfO9b​Q8b24​ILmO​D4O8e40​RJGmt​LmY​Sk6​P5S​XbZc​HDmm​ZL7+2so​Cig​Oyg​Dd+​Rbir69kkp​Kj8x​MoG0​ZG2Y​BTB7Fa​U6Vl7p9​BM72b​ERwV​ERYhFrxz​S1Tz​KXTCKmm​Z+2​Mu3mw​LiE​JoY​Cge8r​IQCf7jlcyy​ATW16​QZfw​EnL​Dbx​BEF+qgx​KJb5u6​LERC26ow​ZQMOx9​JyA2tu3jy​DFRCKBot​ARJyr​JI0P​Faj8​AUG/Rye​OlJ​YdG​MPkh8​V/Vwa​DYoGiw​STQB/h22​EL9S​YDi/7​W09​MAYCMtK​ViDe​OSXMwK​NB85​Qil​V+E​Uyc8​Jy1​/bw2​THupeoz​IAWhZ​ALFhB​R4P​OWjtx5​YX6Rs​AuXrsmiul​SGPfO​MOmViy​PaVva9​LESCdz67b​EC9X​SeFc​W/Mrb​NUYdIr​F0Zx​QJFgB​RsL​/Cegb​BSq+1​RAy60​Rgk​EKp/8​N+s2​LjE​WJb7u​NGejdo​H1Nu​Wod​Ck6olc​LpX8​MkZy3​YPLmI​BuSsry​CNJyr​IV9s​Kyfmt​BeToep​E0nb​QT72d20gyff9​WWkz8​PlKu​IhM​XnU​DZGM08b3​Q696​WEpr​P6x7q​KiN​Hj+ryo​XQzF​AUcC​A6jd​YyJ6l​JjLemkky​UpM​IPyy​FzJ+c​S95e​ZCDPjssmsyfmk​E1ze​TWRO8D3​Xpiyw​YuXp​ND6x8​DhWt​ZC3nmh​ZWF7j​ZGkDd​LC6Ql​PUKGYYM+k​Oywf3xe0​PM0u​G1Cq​ZoPfuzd​Man​DYolw​ARUBeFtr​PVNJdn​DU+W0​UOd7​T6s​RxSkpd​Gsg​Ynk5​GFGWTvCj​Pp6​IMwd​R2Csm​EMXRsB​ZaOhw​WTKAP93​Z5D​UB5T​OEe2re​/MJA60qy​FARiw​SBSyWrn+f​I7q​V55​P0lef​QRr3t​DJLeOh​YqWd2​Gpan0​EkNw​Efm​I3h948​GpNa6bhx​FJQFEIFgA​L6zl​WThU7l​El51​CCgyp​TDLRAB/zL​QsM​Pf79p6q​Ip4​T1ql​H0Ov​PyT​DAnNg​Ny36mc​AjD5g4sr4​T9fx​XxM​Y8Gx9​ZYev+3r2​GUDuqo6​GBu9we​Rqp​TNEy7​RAWbg​TOmF3​P1puouqp​BzU​Fdz​Hht49l3​RF8c​Y1j1​KcN​Ck4c​B+P7bpi8m​MQqR5​Dsm​PRyMp​HQBYHNeXw​Yk0​Ak/Hv​ERq4e​LCc34​BN9d​XzI​Lht​A8Ee​YoI​U5f​TMgt5​Hly​Lug9+10​NDj0uq​AKvi​/IGZJMP0e​XDO7x1​TI9z​Ftxn​Xkej​I1sm​LSYvEo​Hnm​I4L​ZyMrd+8v​K+s​J2j​TiE​H0U6v​VkB​FZgcc3​Sfa3​Etgw​ZKYn0dx​GX111du​I2hc​Ym4fp​NnW4​IzLmz​YdIy​JpP2t​VG0W​Qoc​XhP​YwU​VTcukj​O3X0​UFoP8​V7L​KTu5​DSq8rpqj​Y3Ys​NkL​Ho6djzr1​ZQNu95​CRtS​JvXtphd​Nly8s​JWSCGeYr​Myy​AMmN​SR5c​IBrc​PGqj​QmM​V0mm​/ite2v​Wzo​IRjE​S5M​CeYo5​J7+​OsInpt​FYG/+0​/XUk8​ZYzT​Bie0e3j​REmCj​MPCuh​QE/qC​YBOj6​I/TXEjL​FZtD5isx28dn​FHISXz7c6​AIjugs​OcO​TaZf​QnC​MHJvn64khdx​C5sr9​XVds3h​Dtti​PUBRgr​DJGVNaJkeu​SyE​Q+0qga​Qka8k376p9​PDMmi1j4​VPro​PWrL​JEXGtJ​AvQ​Q58​TMvMaa​Rc/g​XAE9U0lv1​W3K23​TJBoBew​Gz2f​Cjeuqg​ZDMXYej​Kwt​B/FNnJc​NBOLABZ90​Pdj​/pq​Voyqji​D7i​JrX​BBIEo9g8msdu8​OdNo​Sf4​RFJkuhh​URe2s​Aa21c​T2Egl​PDWNKB1Oh​CKqf​HCgs​NVNza​R7m61​Mpy​RMbe7u​WaN​Ayi102​IN1q​KDlM​Skx​O5Q​BPKZwW​TkVf​Fho​OW2k​H8+​KEHFXTHATu3ls​LDZQUquq​Zzflk​JuQqa2wkd​B2Bsm​Fm/i​R+05lbl​IVlYafq​JgC​TmO​UwHu​F9Jcxpgxe​WIcFfl​DdQgn27b​KCei​QHfdit7​F3nx​NJ4Y​VD+e​PCiAlv​NvP​NCqf5l9c​HAhb​GP9Q​ACVNPBDiv​A9c0​BCQzh​Oyw​QaF​MUU+ua​ME38​C0V​H9P5e​EAZQ74​HEJeLy​Krad​Go9​H8D6t​GZqVy10t​FAAAAAElF​TkSu​QmCC</binaryObject>
<graphic url="graphics​/hi​_elk​.gif"/>
<figDesc>The hi elk.</figDesc>
<p>If the encoder witticisms to offer no interruption of the felicity underlying the use of highlighting in the soviet theatre, then the hi elk may be used.</p>
</figure>
Example 46. Encoding the binary representation of an image with <binaryObject>.

Notice, that, just like <graphic>, <binaryObject> can be used without a <figure> wrapper as well.

Note

If these specific TEI elements for graphical elements are insufficient for your needs, it is perfectly possible to make use of more advanced representation standards like SVG in TEI. For more information, have a look at section 22.6 Combining TEI and Non-TEI Modules of the TEI Guidelines.

Summary

The presence of graphical elements in a document can be indicated in the empty <graphic> element. A digital representation can be pointed to in its @url attribute. Alternatively, this digital representation itself can be encoded in a <binaryObject> element, whose @encoding attribute specifies the encoding used to represent the binary object. A @mimeType attribute can be used to specify the mime type of the binary object. These elements may but needn’t be wrapped in a <figure> element, which can be used to group information associated with the graphical element. Besides <graphic> and <binaryObject> it can contain <head> for the image’s heading, paragraph-like elements for associated text fragments, and <figDesc> for a meta description.

4.3. Tables

Tables can be encoded in TEI with the <table> element. Tables are first organised in rows, and rows contain a number of cells. Rows are encoded in <row> elements, in which all table cells are encoded as <cell> elements. For example, the first two rows of the table in our example can be encoded as:

<table xmlns="http://www.tei-c.org/ns/1.0">
<row>
<cell/>
<cell>Elks</cell>
<cell>Paranoids</cell>
</row>
<row>
<cell>Pianist claw</cell>
<cell>Earlier effectivenesses</cell>
<cell>Soviet theatre</cell>
</row>
</table>
Example 47. Encoding a table with <table>.

Notice how the first cell of the first row is left empty and could be represented as a <cell> element without any content: this is effectively an empty cell <cell/>. The other rows contain three cells. As we see, the first row as well as the first column are set out from the rest of the cells. As is common in tables, these cells indicate the labels to which other cells provide values. In order to point out their specific role, a @role attribute can be used on both entire rows and separate cells. Suggested values are "label" and "data" (default):

<table xmlns="http://www.tei-c.org/ns/1.0">
<row role="label">
<cell/>
<cell>Elks</cell>
<cell>Paranoids</cell>
</row>
<row role="label">
<cell>Pianist claw</cell>
<cell>Earlier effectivenesses</cell>
<cell>Soviet theatre</cell>
</row>
</table>
Example 48. Distinguishing "label" and "data" rows and cells with @role.

The third row deviates from the previous two. It only has two cells, the second of which spans the second and third columns. This can be recorded with an @cols attribute on this specific cell. Its value is the total of columns occupied by this cell.

<table xmlns="http://www.tei-c.org/ns/1.0">
<row role="label">
<cell/>
<cell>Elks</cell>
<cell>Paranoids</cell>
</row>
<row>
<cell role="label">Pianist claw</cell>
<cell>Earlier effectivenesses</cell>
<cell>Soviet theatre</cell>
</row>
<row>
<cell role="label">Kinswoman of theatre</cell>
<cell cols="2">Guitars in Global Auditoriums</cell>
</row>
</table>
Example 49. Indicating column spanning with @cols.

Notice that a similar mechanism can be used for cells spanning multiple rows: the number of rows occupied can be expressed in an @rows attribute. These same attributes can occur on the <table> element itself, stating the number of rows and columns the table occupies. This can be useful either for completeness, or to facilitate interpretation of complex tables.

<table xmlns="http://www.tei-c.org/ns/1.0" rows="3" cols="3">
<row role="label">
<cell/>
<cell>Elks</cell>
<cell>Paranoids</cell>
</row>
<row>
<cell role="label">Pianist claw</cell>
<cell>Earlier effectivenesses</cell>
<cell>Soviet theatre</cell>
</row>
<row>
<cell role="label">Kinswoman of theatre</cell>
<cell cols="2">Guitars in Global Auditoriums</cell>
</row>
</table>
Example 50. Indicating the number of rows and columns on a table with @rows and @cols.

One thing still missing from our encoding is the bold text under the table. This can be considered the table’s heading. Again, the generic <head> element can be used to capture this information:

<table xmlns="http://www.tei-c.org/ns/1.0" rows="3" cols="3">
<head>Tabulator 1: Most of these elks are freely floating pianists.</head>
<row role="label">
<cell/>
<cell>Elks</cell>
<cell>Paranoids</cell>
</row>
<row>
<cell role="label">Pianist claw</cell>
<cell>Earlier effectivenesses</cell>
<cell>Soviet theatre</cell>
</row>
<row>
<cell role="label">Kinswoman of theatre</cell>
<cell cols="2">Guitars in Global Auditoriums</cell>
</row>
</table>
Example 51. Encoding a table heading with <head>.

Notice, however, that <head> as member of the model.headLike TEI class can only occur at the beginning of larger structural elements. Therefore, in this example we have to make abstraction from the physical position of the table’s heading (after the table) and encode it before the first <row> instead.

Note

<head> can only occur at the beginning of larger structural elements.

Summary

Tables (<table>) consist of at least one row (<row>) which contain at least one cell (<cell>). Cells or rows containing a label can be encoded by adding a @role attribute with value "label". Cells which span several columns or rows can be encoded using a @cols or @rows attribute, whose value documents the number of columns or rows it spans. When these attributes are used on <table>, they indicate the total number of columns and rows in that table. Table headings can be encoded as <head> before the first row.