The Unicode model of text makes a clear distinction between character and glyph, and in so doing, paradoxically, creates the impression that the ultimate representation for text is some form of abstraction from its visual presentation. However,the level of abstraction for different languages encoded “naturally” in Unicode is quite different.
We propose instead that text be encoded as sequences of context–tagged indices into arbitrary indexed structures, including not just character sets such as Unicode, but also dictionaries of words or compound words. Furthermore, these sequences need not necessarily contain elements from the same indexed structures.
Using our approach allows natural solutions for a wide range of problems, including the creation of documents that can be printed using several alternate spellings, the automatic generation of error messages with arguments, and the correct generation of nouns or adjectives with number, case or gender markers or of verb conjugations.