blob: b20603db89d65b7ae7f73071ae5829f409608c31 [file] [view] [edit]
The Java `Locale` API is broken in a few ways that should be avoided, with some
examples of error prone issues below:
#### Constructors
The constructors don't validate the parameters at all, they just "trust" it
100%. This is also true for the static method `Locale.of`, introduced in JDK 19.
For example:
```java
Locale locale = new Locale("en_AU"); // or Locale.of("en_AU")
locale.toString(); // "en_au"
locale.getLanguage(); // "en_au"
locale.getCountry(); // ""
locale = new Locale("somethingBad#!34, long, and clearly not a locale ID");
// or Locale.of("somethingBad#!34, long, and clearly not a locale ID")
locale.toString(); // "somethingbad#!34, long, and clearly not a locale id"
locale.getLanguage(); // "somethingbad#!34, long, and clearly not a locale id"
locale.getCountry(); // ""
```
As you can see, the full string is interpreted as language, and the country is
empty.
For `new Locale("zh", "tw", "#Hant")` and `Locale.of("zh", "tw", "#Hant")` you
get:
```
toString() : zh_TW_#Hant
getLanguage() : zh
getCountry() : TW
getScript() :
getVariant() : #Hant
```
And for `Locale.forLanguageTag("zh-hant-tw")` you get a different result:
```
toString() : zh_TW_#Hant
getLanguage() : zh
getCountry() : TW
getScript() : Hant
getVariant() :
```
We can see that while the `toString()` value for both locales are equivalent,
the individual parts are different. More specifically, the first locale is
incorrect since `#Hant` is supposed to be the script for the locale rather than
the variant. There's no reliable way of getting a correct result through a
`Locale` constructor, so we should prefer using `Locale.forLanguageTag()` (and
the IETF BCP 47 format) for correctness.
**Note:** You might see a `.replace('_', '-')` appended to a suggested fix for
the error prone checker for this bug pattern. This is sanitization measure to
handle the fact that `Locale.forLanguageTag()` accepts the "minus form" of a tag
(`en-US`) but not the "underscore form" (`en_US`). It will silently default to
`Locale.ROOT` if the latter form is passed in.
**Note:** This error-prone rule cannot reliably fix constructors and static
method `Locale.of` with two or three parameters, because a proper fix requires
more context.
If the initial code started with a `String` that was split at `'_'` or `'-'`,
just to be used for locale, the right fix is to use `toLanguageTag()`.
```java
void someMethod(String localeId) {
String[] parts = localeId.split("_");
Locale locale = switch (parts.size) {
case 1 -> new Locale(part[0]), // or Locale.of
case 2 -> new Locale(part[0], part[1]), // or Locale.of
case 3 -> new Locale(part[0], part[1], part[2]), // or Locale.of
}
// use the locale
}
```
```java
void someMethod(String localeId) {
Locale locale = Locale.forLanguageTag(localeId.replace('_', '-'));
// use the locale
}
```
If the initial code started separate "pieces" (language, region, variant) the
right fix is to use a `Locale.Builder()`.
```java
void someMethod(@NotNull String langId, String regionId) {
Locale locale (regionId == null)
? new Locale(langId) // or Locale.of
: new Locale(langId, regionId); // or Locale.of
// use the locale
}
```
```java
void someMethod(@NotNull String langId, String regionId) {
Locale.Builder builder = new Locale.Builder();
builder.setLanguage(langId);
if (regionId == null) {
builder.setCountry(regionId);
}
Locale locale = builder.build();
// use the locale
}
```
#### `toString()`
This poses the inverse of the constructor problem.
```java
Locale myLocale = Locale.forLanguageTag("zh-hant-tw")
String myLocaleStr = myLocale.toString() // zh_TW_#Hant
Locale derivedLocale = ??? // Not clean way to get a correct locale from myLocaleStr
```
The `toString()` implementation for `Locale` isn't necessarily incorrect in
itself. It is intended to be *"concise but informative representation that is
easy for a person to read"* (see documentation at
[Object.toString()](https://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#toString\(\))).
So it is not intended to produce a value that can be turned back into a
`Locale`. It is not a serialization format. It often produces a value that
*looks* like a locale identifier, but it is not.