The genetics of XML - in order to understand, there are a number of different layers of recognition:
- That a text is written in a particular alphabet: that jaqsohy is written in the Greek alphabet, but that frukt is in the Cyrillic alphabet.
- Which particular language within a given alphabet: ‘frucht’ and ‘fruta’ are respectively German and Portuguese, but the word ‘fruit’ is in which language? English? French? And is frukt Russian or Ukrainian? This starts to become a little trickier.
- The meaning of the particular string of characters that make up a word. In the previous example, we could make a stab at meaning, and make the reasonable assumption that ‘frucht’, ‘fruta’, and ‘fruit’ all mean the same thing in each of the four languages (as does frukt once you recognize that it is pronounced ‘frucht’). On the other hand, what are ‘plod’, ‘toradh’, ‘hedelmä’ and ‘gyümölcs’? And in which languages?
- In the case of words with multiple meanings, the specific intended meaning can often be deduced from context. If the French word avocat appears in a recipe book it is most likely to have a different meaning than if the same word appears in a legal tome.
The context provides valuable insight into the probable meaning in a particular situation.