Representing Middle English Manuscripts on the Web with UTF-8

Frank da Cruz, The Kermit Project, Columbia University, August 2002
St. Erkenwald Manuscript, lines 257-264:

ye bisshop baythes hȳ ȝet wt bale at his hert   257
yag̅h̅ mē menskid hī so how hit myȝt worthe   258
yt his clothes wer so clene in cloutes me thynkes   259
hom burde haue rotid & bene rent ī ratt long sythen   260
yi body may be enbawmyd hit bashis me noght   261
yt hit thar ryne ne route ne no ronke wormes   262
bot yi colour ne yi clothe I know ī no wise   263
how hit myȝt lye by mōnes lor & last so longe   264

This is a Middle English alliterative poem written about 1390 by an unknown author; manuscript copy dated 1477, British Library MS Harley 2250. From J.A. Burrow and Thorlac Turville-Petre, A Book of Middle English, Blackwell Publishers, Oxford (1992).

The passage above is encoded in UTF-8 with minimal HTML markup. The manuscript includes liberal use of overlining, mostly to denote vowels followed by "m" or "n"; for example "mē" means "men". The overline is represented here by U+0304 Combining Macron, since HTML does not have a font style element for overlining as it does for underlining (<u>..</u>). The intention of the line over "gh" in line 258 is unclear, but in this case we code it with U+0305 Combining Overline (rather than Combining Macron) after "g" and after "h", because adjacent macrons do not necessarily join together. However, we don't use Combining Overline over single letters because it's too wide. If your browser does not handle the Latin letter + Combining Macron (or Overline) combination, the overline appears right of the letter with a dotted circle underneath or, if the character is not even in your browser's font, as an "unknown character" symbol. (See Notes below about future developments.)

Underlining (accomplished here by markup) is used by the copyist to identify material that is questionable and/or glossed in the margins. Also note the crossed-out letter "u" of "route" in line 262 ("<strike>u</strike>"), indicating a correction by the copyist.

The letter "ȝ" (yogh) represents "y" at the beginning of a word or between vowels ("ȝet", yet; "yȝe", eye; "faȝerest", fairest), sometimes "w" between vowels ("oȝen", own; "ȝoȝelinge", yowling), "gh" (German ich Laut) at the end of a word or before another consonant ("roȝ", rough; "myȝt", might), and in Old English "g" ("wiȝa", man; "fuȝel", bird).