Welcome to ChaosZone!
Prasenjeet Dutta's Home Page.

About this Post
Archives

Localization in Vain

Localizing software for Indian languages is an interesting task, but has little practical value other than preserving the agendas of assorted state governments. Software developers (especially open source ones) looking to reach ‘untouched’ users would be better off spending volunteer time adding better internationalization support to applications, better fonts, IMEs and input technologies. (This is a bit of a long post, so hang on.)

It started with an interesting thread at india-gii on localizing Linux for India, especially the quality of translation. This screenshot is symptomatic — it relies far too much on English technical jargon and is therefore totally out of touch with the language actually spoken by those who most need localized software.

It’s easy to dismiss this problem as poor translation by volunteers (indeed, Microsoft’s translations have been consistently better), but there is a deeper point hiding here.

The point is an interesting, hard, problem — especially for India. India’s approach to education is peculiarly skewed: the majority of children, especially lower income groups, attend state-run schools that teach in the local language. A minority (usually better-off) go to private schools that teach English (a proliferation of private English schools has ensured that more children today go to private English schools compared to, say, a generation ago). After school, however, the tables are turned. The majority of state-run colleges and nearly all private colleges teach in English (and these are accessible to all based on merit+affirmative action); a minority choose to go to (usually state-run) local-language colleges, and almost always enter into government jobs or academia after graduating.

Based on this, there are five categories of potential Indian computer users, in order of size:

Category Description
1 Those that have tenth-grade or less read+understand skills in their
local language only.
2 Those who can read and understand English moderately, and use it mostly at work.
3 People who (actually or effectively) illiterate.
4 Those who read+understand English comfortably (usually have used it as a medium of instruction all their lives, or from college onwards).
5 Those who have college-level skills in their
local language only
(nearly all of this category can also read and understand English moderately
to well, but do not need to use it at work)

The current marketplace for software is overwhelmingly dominated by Categories 4 and 2, in that order. In sheer numbers, on the other hand, Categories 1 and 3 dominate — being (wild-ass guess) around 60% of India’s population. In order to grow, however, commercial software companies (and Open Source Advocates) are localizing their software to attract a wider base. The thinking goes like this:

  • Category 2 may buy a computer for the home if it was localized. This is IMHO misguided thinking. The Indian middle class (especially the Middle and Lower Income groups) from which Category 2 springs has time and again shown its preference for English (and upward mobility) rather than the native-language initiatives of their own state governments. Private schools discovered a long time ago that people do not wish to pay to receive a native-language education, so will the software industry, as long as most Indian home computers are bought with the kids in mind, or for self-improvement.
  • Category 5 will use computers at work if the UI is localized. This is a valid point, because Category 5 mostly is the state governments and bits and pieces of the central government. Localized UI may ensure state and central government orders — and this is a pretty big market.
  • Category 1 and/or Category 3 will be better able to use software if it is localized. Generally, neither of these audience buys or downloads software. (In fact, they have very little disposable income or purchasing power.) Rather, they use software that is chosen for them. Quite often, that software is custom-written to do a specialized task. It is increasingly becoming clear that you cannot make an average farmer sit down in front of a computer and expect him to use Evolution (or Outlook) comfortably, simply because the culture and worldview of the developer — vis-a-vis the user — is dramatically different. The successful apps for Category 1 and 3 will not be localized, they will be specialized and custom-developed for the world-view of its audience. Of course, ideally, eventually Categories 1 and 3 themselves would disappear.

So we see that for all the activity to localize software for India, the primary benefit will be gee-whiz value to government bureaucrats who were perfectly capable of using the un-localized versions. Meanwhile, the masses (Category 1 and 3 primarily, but others as well) see little research directed at their real needs, which are:

  • It is too hard to type in Indian languages! (and it’s an un-natural activity for Indian scripts, most of which use conjugates extensively). What we need is research on handwriting recognition, something China (for fairly obvious reasons) has been working vigorously on.
  • Deep internationalization support in applications. Categories 2, 4 and 5 are perfectly capable of reading English if that is the the UI language. However, they need to be able to work with their native languages in their applications. Most post-2000 apps from Microsoft (and most apps that target Windows 2000 and above) satisfy this criterion, but it is the open source movement’s Achilles’ Heel, largely due to lack of cohesion between OSS projects.
  • Good IMEs (for as many Indian languages as possible) that work well on US-English keyboards, the cheapest and most common variety found in India. Historically, Microsoft’s IMEs have been very good indeed, and OSS IMEs haven’t. However, standard OS-based IMEs for all major Indian scripts do not exist yet, even on Windows.
  • Good fonts and display logic. Most Indian scripts are too complex to be displayed reasonably at 8px, which is a the minimum-readable standard for Latin characters, so the display logic has to account for this. Hinting for Indian fonts also leaves much to be desired.
  • Specialized applications that are culturally sensitive to Category 1 and 3 users.

Microsoft has been one company which has traditionally had the deepest internationalization support, although their work on handwriting recognition for Indic scripts is unknown. Indeed, Microsoft’s marketing campaigns on Indian-Language computing has focused more on their deep internationalization and IMEs, rather than localized UIs (the UIs in the demos are in English).

Open Source solutions clearly have their work cut out for them in all these departments, although text display is likely to be less of a problem in the future with software like Pango being used by Gnome.

Some may argue that internationalization and localization are very orthogonal activities, involving different skillsets, and it is harmless if one continues while the other doesn’t. In fact, premature localization is harmful:

  • localization heightens user expectation about the software’s internationalization
  • poor localization (especially translation by amateurs) is a turn-off

If deep internationalization is achieved, then does localizing software for Indian audiences become easy? The answer is: only for some audiences (and these are the audiences who would probably have used un-localized software if they had to).
Translating the jargon of the tech industry (from files, folders, and directories to cookies (how do you translate ‘cookie’ without knowing why a cookie is called a cookie in the first place?), web-bugs and defragmentation) into terms that will make sense to Category 1 and 3 remains a daunting literary task, given how ill-equipped most Indian languages are to handle technical terms in common usage.

We run the risk today of turning software localized for India into superficial ‘Hinglish’-style translations at one end, or Academie Francaise-style ivory tower translations at the other. Companies with larger budgets, who can employ better writers/linguists, will see better results, yet the comfort zone  between Hinglish and the Ivory Tower is a thin one.

Comments Off

19 April 2003 11:53 pm

Comments are closed.

 

Copyright © 2001-2006, Prasenjeet Dutta. Terms of Use.

RSS Subscription Icon Subscribe

Powered by WordPress