I’ve spent a lot of time wrestling with i18n and l10n in Java, for various projects. On the back of that, I’ve put together an awful lot of little code snippets to demonstrate and clarify various “interesting” things about Java’s handling of Locales.

This article is basically just to list some of the differences between the available Locales in Java 8, 9 and 10. I’ll update this with info for more recent Java versions once I’ve had a need to use them.

Most of the differences mentioned here are between Java 8 and Java 9 - that’s where the big effort occurred. Between Java 9 and Java 10, there was only a tiny change, mentioned at the end.

The code used as the source of this article is here but basically it’s:

get a unicode capable output stream
get a sorted list of all the available locales
get a sorted list of all the available countries
print all the locale.getDisplayName()
print all the country.getCountry() and country.getDisplayCountry()
print the count of locales and countries

The raw results are here:

I compiled that with OpenJDK’s Java 8 version, then ran it in OpenJDK’s JVMs for Java 8, 9 and 10. I took the outputs from each and diffed them to see what has changed between Java versions.

Note that these results are from the OpenJDK implementations of the JVM, running on a Debian PC. Different JVMs for different OSs on different hardware may have different locales information. Do your own tests if you need to know for sure!

Here are the main interesting bits I saw in the results:

1. My word, Java 9 added a lot of Locales!

  • Java 8 Locale count: 160
  • Java 9 Locale count: 736

That’s 4.6 times more languages Java 9 knows about than Java 8. They did a lot of work on i18n for Java 9 so this isn’t a massive surprise; but still, that’s a lot of languages

2. Many more language variants

Although a lot of that huge pile of new locales in Java 9 is new languages, a portion of the new stuff is actually new languages variants added to either support less widely spoken/written variants, or to be more precise about the language naming.

For example Chinese in Java 8 had 5 variants:

Language: Chinese
Language: Chinese (China)
Language: Chinese (Hong Kong)
Language: Chinese (Singapore)
Language: Chinese (Taiwan)

In Java 9 we see 14 variants:

Language: Chinese
Language: Chinese (China)
Language: Chinese (Simplified,China)
Language: Chinese (Hong Kong SAR China)
Language: Chinese (Simplified,Hong Kong SAR China)
Language: Chinese (Traditional,Hong Kong SAR China)
Language: Chinese (Simplified,Macau SAR China)
Language: Chinese (Traditional,Macau SAR China)
Language: Chinese (Singapore)
Language: Chinese (Simplified,Singapore)
Language: Chinese (Taiwan)
Language: Chinese (Traditional,Taiwan)
Language: Chinese (Simplified)
Language: Chinese (Traditional)

Similarly Portuguese, French and English gained large numbers of extra variants.

In particular English went from 12 variants to 106 variants - My word that’s a lot of different Englishes!

3. Every(?) language has a country variant

Another cause of the locale explosion in Java 9 is that every new language Locale added now also has a country specific locale. So adding a new language actually adds 2 new Locales. For example Basque was added as a new language in Java 9:

Language: Basque
Language: Basque (Spain)

3. Unicode in the English names

In Java 8 and earlier, the display names of languages used for a display locale of English didn’t include any “weird” characters. In Java 9 some of the newly added languages, especially for the country specific variants, include unicode characters (i.e. outside ascii range) in their display name, even with the display Locale set to English. For example:

Language: French (Côte d’Ivoire)
Language: Norwegian Bokmål
Language: Swedish (Åland Islands)

Yes, that French variant apostrophe is not a single quote character, it’s a true apostrophe; and those circly dotty things in the Scandinavian languages are different.

Bizarrely, in Java 8 some of the country Locales did use unicode chars in the English display name, but others didn’t. Then in Java 9, they appear to have fixed the odd cases where overly anglicised naming was used. For example:

Java 8 Country Code: RE, Country Name:Reunion
Java 9 Country Code: RE, Country Name:Réunion
Java 8 Country Code: CI, Country Name:Côte d'Ivoire
Java 9 Country Code: CI, Country Name:Côte d’Ivoire

Yes, in Java 8 that country apostrophe is a single quote, but the circumflex is indeed a circumflex, even when the name is displayed in English. But an acute on Reunion was apparently taking things too far. For Java 9 they did a lot of work to improve the sanity of this stuff.

4. Java 9 abbreviates some Country names

In Java 8 countries with “and” or “saint” in their names include those words in full, in Java 9 the abbreviations of “&” or “St.” are used. A good example of that is a country Locale which uses both:

Java 8 Country Code: KN, Country Name:Saint Kitts And Nevis
Java 9 Country Code: KN, Country Name:St. Kitts & Nevis

5. Java 9 lost a country!

Country Code: AN, Country Name:Netherlands Antilles

No such place in Java 9. Wikipedia has an article about what happened.

6. Java 9 - Minor name tidying

Java 9 also saw a lot of little changes to namings. I assume some of this was due to politics happening and some people changing decisions on how their country or language name should be displayed. But clearly a number of the changes were basically fixing typos. A few of the tweaks I’ve see that interested me are:

Java 8 Country Code: IM, Country Name:Isle Of Man
Java 9 Country Code: IM, Country Name:Isle of Man  >> lowercase 'o'
Java 8 Country Code: MM, Country Name:Myanmar
Java 9 Country Code: MM, Country Name:Myanmar (Burma) >> re-adding an old name
Java 8 Country Code: CD, Country Name:The Democratic Republic Of Congo
Java 8 Country Code: CG, Country Name:Congo
Java 9 Country Code: CD, Country Name:Congo - Kinshasa
Java 9 Country Code: CG, Country Name:Congo - Brazzaville >> naming capitals
Java 8 Country Code: VA, Country Name:Vatican
Java 9 Country Code: VA, Country Name:Vatican City >> oops!

7. Java 9 to Java 10 - ho hum

In contrast to the masses of work and changes done between Java 8 and Java 9, there were only 2 changes between Java 9 and Java 10 that my test scribble picked up.

These were the 2 new language variants added:

Language: Serbian (Kosovo)
Language: Chinese (Macau SAR China)

Taking the number of Locales in Java from 736 to 738. Yawn. (Apologies to those from Kosovo and Macau who cared about this)