Markets in South Asia and the Near East

Meeting of the 3 Shir
Meeting of the three "shir": in Persian, "shir" can mean "milk", "tap" or "lion".

 

The German automobile industry moves east: Chinese, Korean and Indian markets offer companies a promising future. The Near East also offers opportunities for motor car manufacturers and suppliers (Iran, for instance). The numerous ventures which European companies have undertaken in that country and a study conducted by the Verband der Automobilindustrie (VDA) confirm this trend. (Source: Handelsblatt, 2 September 2004)

 

In contrast to European languages with their 30 to 40 characters which need few coding functions, the management and conversion of the character sets of most Asian languages are more complex. They need another bit-set and are called double-byte typefaces. In this context, it is important that the operating systems used for producing the technical documentation support the required coding for the respective languages.

Chinese versus Chinese

Mainland Chinese (People's Republic of China) is "simplified Chinese" with simplified abbreviations. This variant is the result of the writing reforms carried out by the Chinese government during the '50s. In contrast, Chinese in Taiwan and Hong Kong is known as "traditional Chinese" and has full characters. Both of these variants can be converted using an editing and terminology application into each of the different versions. The terminological modifications are necessary since many modern terms such as "pump" and "laser" exist in simplified Chinese, but in traditional Chinese most only have a corresponding pictural equivalent and need to be converted.

 

In Japanese, which has its origins in Chinese, many Chinese characters are still used after a number of language reforms. Korean, which belongs to the same language family, developed its own character set in the '70s. A letter system for Vietnamese, comprising Latin characters with many diacritical characters, was introduced round about the same time.

 

In Persian, words often have several meanings due to missing vowels in the written language. At first sight, this may not seem problematic. However, the word "shir" in Persian can mean "milk", "tap" or "lion", depending on the syntax, which shows how carefully localization must be carried out.

Publishing – the heart of the matter

Modern Chinese, Japanese and Korean are mostly written from left to right and from top to bottom. A number of literary texts are still written from top right to bottom left. In contrast, Arabic, Persian and Hebrew characters run from right to left, and thus the wrong way round according to the western mind; although on page, they still run from top to bottom. The entire document structure appears to be "back to front".

From our point of view, a document begins on the last printed page. Such documents must also be structured in "reverse" order. This means complete conversion of the layout for publishing.

Therefore, it is always necessary to check before the first conversion of a translation whether the character sets and their direction in the documentation can be reproduced, or whether it is advisable to create another environment, since most usual typographic composition programs are problematic. In addition, checks must be carried out to ascertain whether the existing workflow can be created in the CAT tool. For a full service provider such as euroscript, this standard workflow forms a natural part of production and work procedures.

 

Example of double-byte fonts

 

Did you know?

Double-byte typefaces: 256 characters which enable 8-bit coding per character are not enough to extend the character set of the Latin alphabet. Therefore, character sets were introduced which use more than one byte for coding each character. Character sets which use double bytes to represent their characters are referred to as a "double-byte character set" (DBCS).

Diacritical characters: a diacritical character is placed on, over or under a letter to indicate pronunciation or stress more clearly. The American Standard Code for Information Interchange (ASCII) does not contain any diacritical characters. Diacritical characters are almost completely contained in unicode.

Unicode: unicode attempts to summarize all known text characters in a character set, i.e. all the letters of the Latin, Greek, Cyrillic, Arabic, Hebrew and Thai alphabets, and the so-called CJK scripts, i.e. the different Chinese (traditional CN and simplified CN), Japanese (Katakana, Hiragana and Kanji) and Korean (hangul) scripts.

^ top