Translate 13,000 Application Strings in One Shot

In a browser, Modern computer applications contain thousands of text strings used within the user interface to populate help text, buttons, dialogs, and other controls. Localization is used to swap these text strings out for those appropriate to other languages so the application can readily adapt to users around the globe. This post isn’t about how that’s done but about a method for addressing the single greatest weakness of localization–getting all those strings translated into every language supported by the application. Generally, they’re stored in properties files of one form or another and sent out to language experts (or volunteers) for translation.

This quickly becomes a logistical nightmare nightmare, and unless each translator is very adept, and every translations is made in the context of the application, results can be mixed. It’s a lot of work and can be a great expense that many smaller developers simply can’t afford, meaning that many applications simply don’t support as many languages as they might, or support them as well as they should.

I developed Story Design Studio, the “power tool” for using my Story Topography method for novel construction, from an open  source mind-mapping application originally developed by a team at Tufts University. It had already been adapted to support 15 languages aside from English, but my application added a lot of new strings, eliminated the need for many more, and changed the values of still others. Desiring to share my invention with the world but lacking the translation budget of a software company or university, I needed a tool to help me.

Fortunately, Google provided a solution in the form of its spreadsheet application and a translation cell formula that works pretty reliably for the short phrases and words needed here. From this, I developed a simple process that updates and helps maintain all the localization data in my application, and let me add support for five additional languages with almost no work at all.

Along the way, I discovered the failings of the manual methods employed at Tufts. Of 1,600 strings used by the application, 192 (12%) had been overlooked and either were never translated or were never coded to use translated versions at all. To fix that would have required 3,000 translations, plus about that many more to update the translations to fit my new application, plus nearly 8,000 more to add five additional languages. There is simply no way I could have performed or paid for translation of 13,000 strings in 20 languages, but with this method, it was easy.

The approach:

  • The key/value data to be translated is cleaned and loaded into a Google Sheets spreadsheet, which produces rough translations.
  • The translations are exported back to Excel, where a VBA program (Visual Basic For Applications) uses that to apply or append updates to properties files.

 

Getting ready:

  • This method makes bulk updates with no undo. You MUST backup your properties files before using this method. I’m not responsible for the consequences if you don’t.
  • In this example, I’ll be working with Java properties files, but the method can be easily adapted to any file-based localization system.
  • Different systems store localization data differently, but one way or another, it’s always in the form of “key/value pairs,” where the key is referenced in code to fetch the value appropriate to the language in use. Java properties files are plain text, where each key/value pair is on a line by itself in the form “theKey = The value that key corresponds to.”
  • Each language should have its own property file, and the name of that file should include the international language identifier (ex. es, ru, pt-BR etc.).
  • We’ll be assuming an application developed in English, with localization files for all other supported languages.
  • This method uses the Google Sheets GoogleTranslate function to look up translations. These are returned in unicode, but most development environments read ordinary (ASCII) text files and require that the unicode be encoded in “escaped unicode” format (ex. \u0020). The supplied VBA code takes care of this, but if you need to return the actual binary unicode values for some reason, you’ll need to modify the code to do so.

 

The Details:

  1. From the master (English in our case) properties file, extract as text those lines we want to translate, excluding comments and non-human readable properties like image file references, etc.
  2. In Excel or another spreadsheet program, create a Localization spreadsheet with two sheets (or tabs): “Master” and “Translations”.
  3. On the master tab, paste your cleaned up key/value list into the first column and create two columns to the right with formulas to split out the key and the value. In Excel:
  4. Open a browser and navigate to Google Sheets. Create a new Google sheet called “Localization Translator” (or whatever you’d like). Leaving the first two rows blank for adding a header, paste your key and value columns (copied from Excel) into the leftmost two columns as shown below:
  5. This of course is just the top left of the spreadsheet. It can go down for hundreds of rows if you like, and across for any number of language columns, and all will be handled together. Go down through the “values” column (column B) and adjust any values you think might not be handled correctly in translation. These might include ampersands (&) used for “and”, slashes (/) used for “or”, jargon, etc.
  6. Now, ignoring the third column for now (“encoded values”, column c above) create one column for each language you need to translate to. For each column:
    1. The name of the language goes on the top row. This is just for your reference.
    2. The international language specifier goes on the second row. This will be used to tell Google Translate what language to translate to, and later your computer which file to update.
    3. All rows contain a formula to translate the value in column B (actually C, but we’ll come back to that) into the language set for this column, and glue it back to the key from column A to make a new key/value pair:
      • =Concat(Concat($A3,” = “),GoogleTranslate($C3, “en”, D$2))
  7. Notice that the formula is set to allow the key and value to vary by row and the language to vary by column. When copied to all cells for all your strings and languages, and after giving Google time to update, you’ll now have a giant table of key/value pairs for each language, ready to be integrated into your program except…what’s the deal with column “C”?
  8. Column C:
    • You might have noticed that the cell formula above takes the “value” from column C, not column B (which we pasted in from Excel).
    • That’s because in Java, C#, and many other languages, strings can contain special codes that are substituted at run time that we need to account for. So for example, the string might be “Successfully updated %d out of %d rows.” and at run time, the two “%d” codes will be replaced by numbers. In a language that has this feature, it can help a lot with translation.
    • Automated translation tools like Google Translate are imperfect but work well for words and short phrases and can even generally figure out the correct word order. That’s important, because sentence construction varies by language, and it’s almost impossible to write code that can rearrange sentence order to get inserted numbers and names in the right places. GoogleTranslate can do that for us except for one big limitation: it doesn’t know those are codes and may interpret them in odd, language-dependent ways, moving them to the wrong spot or removing them entirely.
    • The GoogleTranslate function does not have any way of “escaping” or ignoring such codes, but there is a simple workaround. GoogleTranslate recognizes (and preserves untranslated) the acronyms for widely-known English language organizations like “NASA” or the “USDA”. Since these acronyms will never appear in our application GUI, and since we are doing this at compile time and will have an opportunity to catch and correct any problems before sending the results out to the world, we can use such acrynyms to encode anything we don’t want translated or moved out of context, then undo the encoding later (in our case, after downloading the translation to the VBA program that will update our application.
    • Column C in our Google sheet is where we convert our special codes to such acronyms so that GoogleTranslate will pass them on in the correct semantic positions without butchering them. The column C formula can be as complex as you need to support whatever codes you need to support, but I just handled the two most common (in Java), plus the Java linefeed code:
      • =SUBSTITUTE(SUBSTITUTE(SUBSTITUTE($B4,“%s”,“USDA”),“%d”,“NASA”),“\n”,“NAACP”)
  9. Once Google Sheets has translated all these modified value strings and built a table of translations, we need to get them into our app, but first, we need to get the calculated values (and not the formulas) into our Excel spreadsheet. To do this, we select the entire table range, from the top of column D (the heading for “bulgarian” in the example) all the way down to the bottom of the rightmost language column. Once selected, we press control-c. Now we switch back to Excel and the “Translations” tab we created above and paste in the values. This is just going to be input to a Visual Basic for Applications program, and you shouldn’t need to do anything else to it, but do make sure the language identifiers on row 2 match those used by your properties files, because the VBA program is going to construct the file names using them.
  10. If you don’t see the “Developer” tab at the top of Excel, go to “File” – “Options” – “Customize Ribbon” and check the “Developer” checkbox. This may change (because Microdaft can’t leave well enough alone) and you might need to save your spreadsheet as an .xlsm and/or take other actions to get the product to pipe down and do what you say.
  11. On the Developer tab, click “Macros”, then in the pop-up, create a new macro. Then paste the code below into the editor. This will create two new runable functions, LocalizationAppender and LocalizationUpdater. The difference between the two is in the name, the first one will only append missing key/value pairs to the end of properties files, while the updater will find keys wherever they are and replace their values, and also append missing pairs to the end of the file. You can use whichever is appropriate or modify the code to suit your needs. You might, for example, want to prompt on each change, or add code to present some sort of list for approval before applying changes.
  12. Modify the constants at the top of the listing to tell the code where to find your properties files and how to construct the file names using the international identifier from the top of each translation column.
  13. Run the code and check the results. I use an Eclipse plugin called “ResourceBundle Editor” in which I can open the master properties file and it will display all the available translations, in correctly rendered character sets.

That’s it. It looks like a lot written out this way, but it’s really extremely easy and is vastly faster and easier than making updates by hand. And even though GoogleTranslate is sure to botch a few words here and there, it generally does well with the sort of single words and short phrases used in application interfaces, and surely the automatic translations will be a much better starting point than a blank untranslated properties file. The best part is, once it works for one language, it’ll work just as well for many. You can translate 3,000 strings or 30,000 as easily as one.

Good luck and let me know if you find this handy.

 

 
[code lang="vb"]</pre>
Const pathToPropertiesFiles As String = "F:\Box.com\Software Development\Story Design Studio\SDS\src\main\resources\tufts\sds\"
Const propertyFilePathPattern As String = "VueResources_{0}.properties"
Const masterPropertyFileName As String = "VueResources.properties"

Sub LocalizationAppender()
' Reads Translations page and scans all matching language properties files for missing key value entrees which are then appended at the end of the file.

Dim wsSrc As Worksheet: Set wsSrc = Worksheets("Translations")
Dim LastRow As Long: LastRow = wsSrc.UsedRange.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
Dim LastCol As Long: LastCol = wsSrc.UsedRange.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
Dim i As Long, j As Long, key As String, kval As String
Dim eqIndex As Integer, valIndex As Integer

With CreateObject("scripting.filesystemobject") ' Used in scanning files to see if key already exists
For j = 1 To LastCol
fullpath = pathToPropertiesFiles & StringFormat(propertyFilePathPattern, wsSrc.Cells(2, j))
Open fullpath For Append As #1

For i = 3 To LastRow
eqIndex = InStr(wsSrc.Cells(i, j), "=")
key = Trim(Left(wsSrc.Cells(i, j), eqIndex - 1))
valIndex = eqIndex + 1
If Mid(wsSrc.Cells(i, j), valIndex, 1) = " " Then valIndex = valIndex + 1

If key = "menu.windows" Then
stopper = 0
End If

kval = Mid(wsSrc.Cells(i, j), eqIndex + 1)
If Len(kval) > 0 Then ' Ignore empty keys
If Not ContainsKey(.OpenTextFile(fullpath).ReadAll, key) Then
Print #1, key + " = " + escapeUnicodes(kval)
End If
End If
Next i
Close #1
Next j
End With

End Sub

Sub LocalizationUpdater()
' Reads Translations page and scans all matching language properties files for Existing key value entrees which are then replaced in place or appended

Dim wsSrc As Worksheet: Set wsSrc = Worksheets("Translations")
Dim LastRow As Long: LastRow = wsSrc.UsedRange.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
Dim LastCol As Long: LastCol = wsSrc.UsedRange.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
Dim i As Long, j As Long, key As String, kval As String
Dim eqIndex As Integer, valIndex As Integer
Dim lines() As String, scanBuffer As String

With CreateObject("scripting.filesystemobject") ' Used in scanning files for key presence.
For j = 1 To LastCol
fullpath = pathToPropertiesFiles & StringFormat(propertyFilePathPattern, wsSrc.Cells(2, j))
With .OpenTextFile(fullpath) ' Read existing properties into a buffer
lines = Split(.ReadAll, Chr(10))
.Close
End With
scanBuffer = .OpenTextFile(fullpath).ReadAll

Dim k As Long, lineCount As Long, escapedKey As String
lineCount = UBound(lines)

For i = 1226 To LastRow
eqIndex = InStr(wsSrc.Cells(i, j), "=")
key = Trim(Left(wsSrc.Cells(i, j), eqIndex - 1)) ' Get key, trimmed of any leading or trailing spaces
valIndex = eqIndex + 1 ' Set index to value to skip any space after the equals sign
If Mid(wsSrc.Cells(i, j), valIndex, 1) = " " Then valIndex = valIndex + 1
If key = "addLibrary.mycomputer.label" Then
stopper = 0
End If
kval = Mid(wsSrc.Cells(i, j), eqIndex + 1)
If Len(kval) > 0 Then ' Ignore empty keys
If Not ContainsKey(scanBuffer, key) Then ' properties file does NOT already contain key so append to buffer.
lineCount = lineCount + 1
ReDim Preserve lines(lineCount)
lines(lineCount) = key + " = " + escapeUnicodes(kval)
Else ' properties DOES already contain key, so find and update in buffer
For k = LBound(lines) To lineCount
If ContainsKey(lines(k), key) Then
escapedKey = escapeUnicodes(kval)
If InStr(lines(k), escapedKey) = 0 Then
lines(k) = key + " = " + escapedKey
End If
End If
Next
End If
End If
Next i ' Repeat for each row

Open fullpath For Output As #1 ' Write buffer out over properties file
For k = LBound(lines) To lineCount
Print #1, Replace(Replace(lines(k), Chr(10), ""), Chr(13), "")
Next
Close #1
Next j ' Repeat for each language column
End With

End Sub

Sub LocalizationCleaner()
' Scans master properties file and removes from all language properties files listed on Translations page any keys not on master.

Dim wsSrc As Worksheet: Set wsSrc = Worksheets("Translations")
Dim LastRow As Long: LastRow = wsSrc.UsedRange.Find("*", SearchOrder:=xlByRows, SearchDirection:=xlPrevious).Row
Dim LastCol As Long: LastCol = wsSrc.UsedRange.Find("*", SearchOrder:=xlByColumns, SearchDirection:=xlPrevious).Column
Dim i As Long, key As String
Dim eqIndex As Integer, valIndex As Integer
Dim lines() As String, masterBuffer As String

With CreateObject("scripting.filesystemobject") ' Used in scanning files for key presence.
masterBuffer = .OpenTextFile(pathToPropertiesFiles & masterPropertyFileName).ReadAll
Dim skip As Integer
skip = 0

For j = 1 To LastCol
fullpath = pathToPropertiesFiles & StringFormat(propertyFilePathPattern, wsSrc.Cells(2, j))
With .OpenTextFile(fullpath) ' Read existing properties into a buffer
lines = Split(.ReadAll, Chr(10))
.Close
End With

Dim k As Long, lineCount As Long, escapedKey As String
lineCount = UBound(lines)

For i = 1 To lineCount
eqIndex = InStr(lines(i), "=")
If eqIndex > 0 Then
key = Trim(Left(lines(i), eqIndex - 1)) ' Get key, trimmed of any leading or trailing spaces
If Not ContainsKey(masterBuffer, key) Then ' then remove this key from subordinate properties file.
lines(i) = "OMIT"
End If
End If
Next i ' Repeat for each row

Open fullpath For Output As #1 ' Write buffer out over properties file
For k = LBound(lines) To lineCount
If Not lines(k) = "OMIT" Then
Print #1, Replace(Replace(lines(k), Chr(10), ""), Chr(13), "")
Else
skip = skip + 1
End If
Next
Close #1
Next j ' Repeat for each language column
End With

End Sub

Function escapeUnicodes(strInput As String) As String
Dim Counter As Integer
strOutput = ""
' GoogleTranslate has no escape mechanism, so what should not be translated can be encoded at a US Government Agency
'Note: USDA=%s, NASA=%d, NAACP=\n

For Counter = 1 To Len(strInput)
strOutput = strOutput + CODEW(Mid(strInput, Counter, 1))
Next
escapeUnicodes = Replace(Replace(Replace(strOutput, "NAACP", "\n"), "NASA", " %d "), "USDA", " %s ")

End Function

Function CODEW(Character As String, Optional Unicode_value As Boolean = True, _
Optional Exact_functionality As Boolean = False) As Variant
Dim Characters As String
Dim i As Long

If Exact_functionality Then
CODEW = AscW(Character)
If Unicode_value Then CODEW = “U” & Hex(CODEW)
Exit Function
End If
Characters = " ?%1234567890-=+_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ&#*[]{}()"
For i = 128 To 159
Characters = Characters & Chr(i)
Next i

If InStr(1, Characters, Left$(Character, 1), vbBinaryCompare) Then
CODEW = Character
Else
CODEW = "\u" + Right("0000" + Hex(AscW(Character)), 4)
End If

End Function

Function StringFormat(ByVal mask As String, ParamArray tokens()) As String
' Gives VBA the useful ability to handle variable injection in strings, like a real programming language.

Dim i As Long
For i = LBound(tokens) To UBound(tokens)
mask = Replace(mask, "{" & i & "}", tokens(i))
Next
StringFormat = mask

End Function

Function ContainsKey(str As String, key As String) As Boolean
' Checks to see if string contains key followed by either equals sign or space
Dim firstTry As Long
firstTry = InStr(str, Chr(10) + key + " =")
If firstTry > 0 Then
ContainsKey = True
Else
ContainsKey = (InStr(str, Chr(10) + key + "=") > 0)
End If
End Function
<pre>[/code]