I have had the opportunity to work for a number of different software companies each of which has localized their product into different languages. These opportunities have led to me experience a number of different processes around localization, and a great deal of pain surrounding the whole issue. Localization sucks. However, I don’t believe that it has to suck, and I think there are processes that if put in place can make this a relatively painless process.
I believe the biggest problem area that I have seen in localization is the interweaving of responsibilities. Let me describe the process from one company I have worked for. The developers create their resources in an english-only version, and the sources for these resources are checked into the source control. Then at some predetermined point, a developer packages up all the resource source files and sends them off to the localization team. They have some database which strings are loaded into, translated (often by an third party), and then new source files are generated for the various languages. The source files are then sent back to the developers who have to check them in and test them to make sure that they compile. Net result: days of turnaround time for even trivial localization changes, and hours of wasted time.
The root issue here is that you have developers who are involved in localization, which is not their primary responsibility, and localizers who are involved in development (by producing source files that have to be compiled). I think that we can resolve these issues and streamline the process by simply separating the functions more distinctly.
First, we need to have a method whereby localization can take place during the build process without any intervention. My first thoughts here are that we could create a program that is part of the build system that localizes an english source file into another language source file. The data that this program would consume is a translation database that would be checked into the source control system. The localization team is responsible simply for checking in a new copy of the database when they have updates. This system is a good improvement on the original, but it still leaves the problem that the files have to be compiled after the localization process changes them. One wrong string (with a misplaced quote character, etc.) and the build fails.
A better way to solve this problem is that the build system builds the initial binaries. Then the localization program is pointed at the binary file containing english resources. It edits this binary file and produces a copy of it with updated resources. In this way, we have isolated the localization process from the development process as much as possible.
The program that is responsible for performing the localization of binaries should also produce as part of its output a file that shows what resources are new or updated. This resources need further localization work. There needs now to be a tool that the localizers use to edit the localization database. It should be able to consume this build output product to import the new strings into their database. In this way the localizers never have to deal directly with the product. They only deal with the inputs to a tool that they own, and the outputs from a tool that they own.
Another side improvement to the general localization process is how to do a large degree of localization before any translation takes place. This process is what I have heard referred to as “pseudo-localization”. The localizing tool should be capable of producing a “pseudo-localized” binary, that does not have ANY real strings. Rather, it should take all existing string resources, generate enough gibberish to pad the value and make it long enough to simulate a translated string. (Usually strings grow by something like 30% or so when translated to certain languages.) The product can then be installed and tested with these pseudo-localized strings to find spacing issues, etc. long before any translation work is done.