UTF-8 Sampler

UTF-8 is an ASCII-preserving encoding method for Unicode (ISO 10646), the Universal Character Set (UCS). The UCS encodes most of the world's writing systems in a single character set, allowing you to mix languages and scripts within a document without needing any tricks for switching character sets. This web page is encoded directly in UTF-8.

Kermit 95 can display UTF-8 plain text in Windows NT, XP, or 2000 when using a monospace Unicode font like Lucida Console or Courier New. The forthcoming GUI version of Kermit 95 will be able to display it too, even in Windows 95, 98, and ME. C-Kermit 7.0 and later can handle it too, if you have a Unicode display. As many languages as are representable in your font can be seen on the screen at the same time.

This, however, is a Web page. Some Web browsers can handle UTF-8, some can't. And those that can might not have a sufficiently populated font to work with. CLICK HERE for a survey of Unicode fonts for Windows.

First, the Euro symbol: €.

From the Anglo-Saxon Rune Poem (Rune version):

ᚠᛇᚻ᛫ᛒᛦᚦ᛫ᚠᚱᚩᚠᚢᚱ᛫ᚠᛁᚱᚪ᛫ᚷᛖᚻᚹᛦᛚᚳᚢᛗ
ᛋᚳᛖᚪᛚ᛫ᚦᛖᚪᚻ᛫ᛗᚪᚾᚾᚪ᛫ᚷᛖᚻᚹᛦᛚᚳ᛫ᛗᛁᚳᛚᚢᚾ᛫ᚻᛦᛏ᛫ᛞᚫᛚᚪᚾ
ᚷᛁᚠ᛫ᚻᛖ᛫ᚹᛁᛚᛖ᛫ᚠᚩᚱ᛫ᛞᚱᛁᚻᛏᚾᛖ᛫ᛞᚩᛗᛖᛋ᛫ᚻᛚᛇᛏᚪᚾ᛬

From Laȝamon's Brut (The Chronicles of England, Middle English, West Midlands):

An preost wes on leoden, Laȝamon was ihoten
He wes Leovenaðes sone -- liðe him be Drihten.
He wonede at Ernleȝe at æðelen are chirechen,
Uppen Sevarne staþe, sel þar him þuhte,
Onfest Radestone, þer he bock radde.

From the Tagelied of Wolfram von Eschenbach (Middle High German):

Sîne klâwen durh die wolken sint geslagen,
er stîget ûf mit grôzer kraft,
ich sih in grâwen tägelîch als er wil tagen,
den tac, der im geselleschaft
erwenden wil, dem werden man,
den ich mit sorgen în verliez.
ich bringe in hinnen, ob ich kan.
sîn vil manegiu tugent michz leisten hiez.

Some lines of Odysseus Elytis (Greek):

Τη γλώσσα μου έδωσαν ελληνική
το σπίτι φτωχικό στις αμμουδιές του Ομήρου.
Μονάχη έγνοια η γλώσσα μου στις αμμουδιές του Ομήρου.

από το Άξιον Εστί
του Οδυσσέα Ελύτη

The first stanza of Pushkin's Bronze Horseman (Russian):

На берегу пустынных волн
Стоял он, дум великих полн,
И вдаль глядел. Пред ним широко
Река неслася; бедный чёлн
По ней стремился одиноко.
По мшистым, топким берегам
Чернели избы здесь и там,
Приют убогого чухонца;
И лес, неведомый лучам
В тумане спрятанного солнца,
Кругом шумел.

Šota Rustaveli's Veṗxis Ṭq̇aosani, ̣︡Th, The Knight in the Tiger's Skin (Georgian):

ვეპხის ტყაოსანი შოთა რუსთაველი

ღმერთსი შემვედრე, ნუთუ კვლა დამხსნას სოფლისა შრომასა, ცეცხლს, წყალსა და მიწასა, ჰაერთა თანა მრომასა; მომცნეს ფრთენი და აღვფრინდე, მივჰხვდე მას ჩემსა ნდომასა, დღისით და ღამით ვჰხედვიდე მზისა ელვათა კრთომაასა.

And from the sublime to the ridiculous, here is a certain phrase in an assortment of languages (1):

  1. Sanskrit: काचं शक्नोम्यत्तुम् । नोपिहनिस्त माम् ।
  2. Sanskrit (standard transcription): kācaṃ śaknomyattum; nopahinasti mām.
  3. Classical Greek: ὕαλον ϕαγεῖν δύναμαι· τοῦτο οὔ με βλάπτει.
  4. Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
  5. Etruscan: (NEEDED)
  6. Latin: Vitrum edere possum; mihi non nocet.
  7. Esperanto: Mi povas manĝi vitron, ĝi ne damaĝas min.
  8. French: Je peux manger du verre, cela ne me fait pas mal.
  9. Provençal / Occitan: Pòdi manjar de veire, me nafrariá pas.
  10. Québécois: J'peux manger d'la vitre, ça m'fa pas mal.
  11. Walloon: Dji pou magnî do vêre, çoula m' freut nén må.
  12. Champenois: (NEEDED)
  13. Lorrain: (NEEDED)
  14. Picard: (NEEDED)
  15. Corsican: (NEEDED)
  16. Basque: Kristala jan dezaket, ez dit minik ematen.
  17. Catalan: Puc menjar vidre que no em fa mal.
  18. Spanish: Puedo comer vidrio, no me hace daño.
  19. Aragones: Puedo minchar beire, no me'n fa mal .
  20. Galician: Eu podo xantar cristais e non cortarme.
  21. Portuguese: Posso comer vidro, não me faz mal.
  22. Brazilian Portuguese: Consigo comer vidro. Não me machuca.
  23. Cabo Verde Creole: M' podê cumê vidru, ca ta maguâ-m'.
  24. Papiamentu: (NEEDED)
  25. Italian: Posso mangiare il vetro e non mi fa male.
  26. Roman: Me posso magna' er vetro, e nun me fa male.
  27. Sicilian: Puotsu mangiari u vitru, nun mi fa mali.
  28. Milanese: Sôn bôn de magnà el véder, el me fa minga mal.
  29. Venetian: Mi posso magnare el vetro, no'l me fa mae.
  30. Rheto-Romance: (NEEDED)
  31. Romanian: Pot să mănânc sticlă și ea nu mă rănește.
  32. Pictish: (NEEDED)
  33. Breton: (NEEDED)
  34. Cornish: Mý a yl dybry gwéder hag éf ny wra ow ankenya.
  35. Welsh: Dw i'n gallu bwyta gwydr, dwy e ddim yn gwneud dolur i mi.
  36. Manx Gaelic: Foddym gee glonney agh cha jean eh gortaghey mee.
  37. Irish: Tá mé in ann gloine a ithe; Ní chuireann sé isteach ná amach orm.
  38. Scottish Gaelic: S urrainn dhomh gloinne ithe; cha ghoirtich i mi.
  39. Anglo-Saxon: Ic mæg glæs eotan ond hit hearmiað me ne.
  40. Middle English: Ich canne glas eten and hit hirtiþ me nouȝt.
  41. English: I can eat glass and it doesn't hurt me.
  42. Lalland Scots / Doric: Ah can eat gless, it disnae hurt us.
  43. Glaswegian: (NEEDED)
  44. Gullah: (NEEDED)
  45. Gothic: (NEEDED)
  46. Old Norse: (NEEDED)
  47. Norsk / Norwegian (Nynorsk): Eg kan eta glas utan å skada meg.
  48. Norsk / Norwegian (Bokmål): Jeg kan spise glass uten å skade meg.
  49. Føroyskt / Faroese: (NEEDED)
  50. Íslenska / Icelandic: Ég get etið gler án þess að meiða mig.
  51. Dansk / Danish: Jeg kan spise glas, det gør ikke ondt på mig.
  52. Soenderjysk: Æ ka æe glass uhen at det go mæ naue.
  53. Frysk / Frisian: Ik kin glês ite, it docht me net sear.
  54. Nederlands / Dutch: Ik kan glas eten. Het doet me geen pijn.
  55. Afrikaans: Ek kan glas eet, maar dit maak my nie seer nie.
  56. Lëtzebuergescht / Luxemburgish: Ech kan Glas iessen, daat deet mir nët wei.
  57. Deutsch / German: Ich kann Glas essen, ohne mir weh zu tun.
  58. Ruhrdeutsch: Ich kann Glas verkasematucken, ohne dattet mich wat jucken tut.
  59. Sächsisch / Saxon: 'sch kann Glos essn, ohne dass'sch mer wehtue.
  60. Pfälzisch: Isch konn Glass fresse ohne dasses mer ebbes ausmache dud.
  61. Schwäbisch / Swabian: I kå Glas frässa, ond des macht mr nix!
  62. Bayrisch / Bavarian: I koh Glos esa, und es duard ma ned wei.
  63. Allemannisch: I kaun Gloos essen, es tuat ma ned weh.
  64. Schwyzerdütsch: Ich chan Glaas ässe, das tuet mir nöd weeh.
  65. Svensk / Swedish: Jag kan äta glas, det skadar mig inte.
  66. Suomea / Finnish: Pystyn syömään lasia. Se ei koske yhtään.
  67. Hungarian: Meg tudom enni az üveget, nem lesz tőle bajom.
  68. Estonian: Ma vōin klaasi süüa, see ei tee mulle midagi.
  69. Latvian: Es varu ēst stiklu, tas man nekaitē.
  70. Lithuanian: Aš galiu valgyti stiklą ir jis manęs nežeidžia
  71. Croatian: Ja mogu jesti staklo i ne boli me.
  72. Czech: Mohu jíst sklo, neublíží mi.
  73. Slovak: Môžem jesť sklo. Nezraní ma.
  74. Polska / Polish: Mogę jeść szkło i mi nie szkodzi.
  75. Albanian: Unë mund të ha qelq dhe nuk më gjen gjë.
  76. Slovenian: Lahko jem steklo, ne da bi mi škodovalo.
  77. Serbian: Mogu jesti staklo a da mi ne škodi.
  78. Serbian: Могу јести стакло а да ми не шкоди.
  79. Macedonian: Можам да јадам стакло, а не ме штета.
  80. Russian: Я могу есть стекло, это мне не вредит.
  81. Ukrainian: Я можу їсти шкло, й воно мені не пошкодить.
  82. Bulgarian: Мога да ям стъкло и не ме боли.
  83. Georgian: მინას ვჭამ და არა მტკივა.
  84. Armenian: Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։
  85. Turkish: Cam yiyebilirim, bana zararı dokunmaz.
  86. Marathi: मी काच खाऊ शकतो, मला ते दुखत नाही.
  87. Hindi: मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती.
  88. Urdu(2): میں کانچ کھا سکتا ہوں اور مجھے تکلیف نہیں ہوتی ۔
  89. Pashto(2): زه شيشه خوړلې شم، هغه ما نه خوږوي
  90. Farsi / Persian: .من می توانم بدونِ احساس درد شيشه بخورم
  91. Arabic(2): أنا قادر على أكل الزجاج و هذا لا يؤلمني.
  92. Aramaic: (NEEDED)
  93. Hebrew(2): אני יכול לאכול זכוכית וזה לא מזיק לי.
  94. Yiddish(2): איך קען עסן גלאָז און עס טוט מיר נישט װײ.
  95. Ladino: (NEEDED)
  96. Gǝʼǝz: (NEEDED)
  97. Amharic: (NEEDED)
  98. Twi: Metumi awe tumpan, ɜnyɜ me hwee.
  99. Hausa (Latin): Inā iya taunar gilāshi kuma in gamā lāfiyā.
  100. Hausa (Ajami) (2): إِنا إِىَ تَونَر غِلَاشِ كُمَ إِن غَمَا لَافِىَا
  101. Yoruba(3): Mo lè je̩ dígí, kò ní pa mí lára.
  102. Malay: Saya boleh makan kaca dan ia tidak mencederakan saya.
  103. Tagalog: Kaya kong kumain nang bubog at hindi ako masaktan.
  104. Chamorro: Siña yo' chumocho krestat, ti ha na'lalamen yo'.
  105. Javanese: Aku isa mangan beling tanpa lara.
  106. Vietnamese (quốc ngữ): Tôi có thể ăn thủy tinh mà không hại gì.
  107. Vietnamese (nôm) (4): 些 ࣎ 世 咹 水 晶 ও 空 ࣎ 害 咦
  108. Chinese: 我能吞下玻璃而不伤身体。
  109. Japanese: 私はガラスを食べられます。それは私を傷つけません。
  110. Korean: 나는 유리를 먹을 수 있어요. 그래도 아프지 않아요
  111. Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ
  112. Navajo: Tsésǫʼ yishą́ągo bííníshghah dóó doo shił neezgai da.
  113. Cherokee (and other Native American languages): (NEEDED)
  114. Lojban: mi kakne le nu citka le blaci .iku'i le se go'i na xrani mi

For testing purposes, some of these are repeated in a monospace font . . .

  1. Euro Symbol: €.
  2. Greek: Μπορώ να φάω σπασμένα γυαλιά χωρίς να πάθω τίποτα.
  3. Icelandic: Èg get borðað gler, það meiðir mig ekki.
  4. Polish: Mogę jeść szkło, i mi nie szkodzi.
  5. Romanian: Pot să mănânc sticlă și ea nu mă rănește.
  6. Ukrainian: Я можу їсти шкло, й воно мені не пошкодить.
  7. Armenian: Կրնամ ապակի ուտել և ինծի անհանգիստ չըներ։
  8. Georgian: მინას ვჭამ და არა მტკივა.
  9. Hindi: मैं काँच खा सकता हूँ, मुझे उस से कोई पीडा नहीं होती.
  10. Hebrew(2): אני יכול לאכול זכוכית וזה לא מזיק לי.
  11. Yiddish(2): איך קען עסן גלאָז און עס טוט מיר נישט װײ.
  12. Arabic(2): أنا قادر على أكل الزجاج و هذا لا يؤلمني.
  13. Japanese: 私はガラスを食べられます。それは私を傷つけません。
  14. Thai: ฉันกินกระจกได้ แต่มันไม่ทำให้ฉันเจ็บ

Notes:

  1. The numbering of the samples is arbitrary, done only to keep track of how many there are, and can change any time a new entry is added. The arrangment is also arbitrary but with some attempt to group related examples together. Bug #1: the (WANTED) examples shouldn't count. Fix: Fill them in! Bug #2: All languages not listed are wanted, not just the ones that say (WANTED).
  2. Correct right-to-left display of these languages depends on the capabilities of your browser. The period should appear on the left. In the monospace Yiddish example, the Yiddish digraphs should occupy one character cell. Note: unlike the other RTL examples, the Farsi phrase was entered "backwards".
  3. The third word is Latin letter small 'j' followed by small 'e' with U+0329, Combining Vertical Line Below. This displays correctly only if your Unicode font includes the U+0329 glyph and your browser supports combining diacritical marks. Some of the Indic examples also include combining sequences.
  4. Includes Unicode 3.1 Plane 2 characters.

(Additions, corrections, completions, gratefully accepted.)

Credits:
The "I can eat glass" phrase and the initial collection of translations: Ethan Mollick. Transcription / conversion to UTF-8: Frank da Cruz. Albanian: Sindi Keesan. Afrikaans: Johan Fourie. Anglo Saxon: Frank da Cruz. Arabic: Najib Tounsi. Armenian: Vaçe Kundakçı. Bulgarian: Sindi Keesan, Guentcho Skordev. Cabo Verde Creole: Cláudio Alexandre Duarte. Chinese: Jack Soo. Cornish: Chris Stephens. Croatian: Marjan Baće. Czech: Stanislav Pecha. Dutch: Peter Gotink. Esperanto: Franko Luin. Farsi/Persian: Payam Elahi. Galician: Laura Probaos. Georgian: Giorgi Lebanidze. Greek: Ariel Glenn, Constantine Stathopoulos, Siva Nataraja. Hebrew: Jonathan Rosenne. Hausa: Malami Buba, Tom Gewecke. Hindi: Shirish Kalele. Hungarian: András Rácz. Icelandic: Andrés Magnússon. Italian: Thomas De Bellis. Japanese: Makoto Takahashi. Korean: Jungshik Shin. Lëtzebuergescht: Stefaan Eeckels. Lithuanian: Gediminas Grigas. Lojban: Edward Cherlin. Macedonian: Sindi Keesan. Malay: Zarina Mustapha. Manx: Éanna Ó Brádaigh. Marathi: Shirish Kalele. Middle English: Frank da Cruz. Milanese: Marco Cimarosti. Navajo: Tom Gewecke. Norwegian: Herman Ranes. Pashto: N.R. Liwal. Pfälzisch: Dr. Johannes Sander. Polish: Juliusz Chroboczek. Québécois: Laurent Detillieux. Roman: Pierpaolo Bernardi. Romanian: Juliusz Chroboczek, Ionel Mugurel. Ruhrdeutsch: Timwi. Sanskrit: Siva Nataraja. Sächsisch: André Müller. Schwäbisch: Otto Stolz. Scots: Jonathan Riddell. Serbian: Sindi Keesan, Ranko Narancic, Boris Daljevic, Szilvia Csorba. Slovak: G. Adam Stanislav. Slovenian: Albert Kolar. Tagalog: Jim Soliven. Thai: Alan Wood's wife. Turkish: Vaçe Kundakçı. Ukrainian: Michael Zajac. Urdu: Mustafa Ali. Vietnamese: Dixon Au, [James] Đỗ Bá Phước 杜 伯 福. Walloon: Pablo Saratxaga. Yiddish: Mark David.

Commentary:
Date: Wed, 27 Feb 2002 13:21:59 +0100
From: "Bruno DEDOMINICIS" <b.dedominicis@cite-sciences.fr>
Subject: Je peux manger du verre, cela ne me fait pas mal.

I just found out your website and it makes me feel like proposing an interpretation of the choice of this peculiar phrase.

Glass is transparent and can hurt as everyone knows. The relation between people and civilisations is sometimes effusional and more often rude. The concept of breaking frontiers through globalization, in a way, is also an attempt to deny any difference. Isn't "transparency" the flag of modernity? Nothing should be hidden any more, authority is obsolete, and the new powers are supposed to reign through loving and smiling and no more through coercion...

Eating glass without pain sounds like a very nice metaphor of this attempt. That is, frontiers should become glass transparent first, and be denied by incorporating them. On the reverse, it shows that through globalization, frontiers undergo a process of displacement, that is, when they are not any more speakable, they become repressed from the speech and are therefore incorporated and might become painful symptoms, as for example what happens when one tries to eat glass.

The frontiers that used to separate bodies one from another tend to divide bodies from within and make them suffer.... The chosen phrase then appears as a denial of the symptom that might result from the destitution of traditional frontiers.

Best,
Bruno De Dominicis, Paris, France

Other Unicode samplers:

[ Kermit 95 ] [ K95 Screen Shots ] [ C-Kermit ] [ Kermit Home ] [ Unicode Fonts ]


UTF-8 Sampler / The Kermit Project / Columbia University / kermit@columbia.edu / 7 May 2002