Fandom

Unicode discussion

Malayalam: ഔ-sign double encoding problem

189pages on
this wiki
Add New Page
Talk0 Share

Refer: Unicode Public Review Issue #93

Current Usage Edit

<U+0D4C> and <U+0D46, U+0D57> are canonically equivalant and represent the two part archaic sign as in സൌമ്യം.

<U+0D57> represent the one part modern sign as in സൗമ്യം

Issues Edit

  • ഔ-sign is spelled as a single entity in traditional or modern Malayalam. So it should have only one encoding in Unicode.
  • Interchanging one part or two part signs does not affect the meaning of any Malayalam word. This fact, proves that this is a formatting difference - not a spelling difference.
  • Two different encodings for ഔ-sign can cause user confusion. For example in domain names. As per current Unicode standard, the domain names www.സൌമ്യം.com and www.സൗമ്യം.com will be different. A user cannot convay the difference without adding extra information about the old and new orthographies.
  • Introducing difference in Modern and Traditional ഔ-sign spelling serves only to fragment the language more.

Arguments against Edit

Comparison to 'ou', 'o' pair in English Edit

Argument Edit

If at some point in time English words like “colour” start getting written as “color”, nobody would expect the encoded representation to remain the same. That is patently obvious. And likewise it is patently obvious that the change from “ൌ” to “ൗ” constitutes a change in spellings and that the two different spellings should have two different encoded representations.

Counter-Argument Edit

These two scenarios are comparable. Reason being, 'ou' and 'o' can produce meaning difference, at least in some words. Eg: pound, pond. However, "ൌ " and "ൗ " cannot produce difference in meaning for any malayalam word. That implies it is just an orthography style difference which should not be encoded differently.

English written with the Latin script is encoded linearly. Above argument would be correct, if only Indic scripts had been encoded linearly. But, they weren't, and we have scads of examples where Indic Unicode strings display quite differently depending on user font choices.

Possible Solution Edit

  • <U+0D4C> or <U+0D46, U+0D57>

Rendering engine should request modern one part au-sign. If it is not available with the font, archaic two-part form should be requested.

  • <ZWNJ, U+0D4C> or <ZWNJ, U+0D46, U+0D57>

Rendering engine should request archaic two part au-sign. If it is not available with the font, modern one-part form should be requested.

  • Standalone U+0D57 should not be used unless prefixed with U+0D46.

Related Issues Edit

Same methodology could be adopted to ഉ, ഊ, signs also.

ഉ-sign Edit

  • <CONS, U+0D41>

Rendering engine should request traditional conjoint form. If it is not available with the font, modern independant u-sign should be requested.

  • <CONS, ZWNJ, U+0D41>

Rendering engine should request modern independent ഉ-sign.

ഊ-sign Edit

  • <CONS, U+0D42>

Rendering engine should request traditional conjoint form. If it is not available with the font, modern independent uu-sign should be requested.

  • <CONS, ZWNJ, U+0D42>

Rendering engine should request modern independent ഊ-sign.

Ad blocker interference detected!


Wikia is a free-to-use site that makes money from advertising. We have a modified experience for viewers using ad blockers

Wikia is not accessible if you’ve made further modifications. Remove the custom ad blocker rule(s) and the page will load as expected.