Locale

public final class Locale extends Object
implements Cloneable Serializable

A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. For example, displaying a number is a locale-sensitive operation— the number should be formatted according to the customs and conventions of the user's native country, region, or culture.

The Locale class implements identifiers interchangeable with BCP 47 (IETF BCP 47, "Tags for Identifying Languages"), with support for the LDML (UTS#35, "Unicode Locale Data Markup Language") BCP 47-compatible extensions for locale data exchange.

A Locale object logically consists of the fields described below.

language
ISO 639 alpha-2 or alpha-3 language code, or registered language subtags up to 8 alpha letters (for future enhancements). When a language has both an alpha-2 code and an alpha-3 code, the alpha-2 code must be used. You can find a full list of valid language codes in the IANA Language Subtag Registry (search for "Type: language"). The language field is case insensitive, but Locale always canonicalizes to lower case.

Well-formed language values have the form [a-zA-Z]{2,8}. Note that this is not the the full BCP47 language production, since it excludes extlang. They are not needed since modern three-letter language codes replace them.

Example: "en" (English), "ja" (Japanese), "kok" (Konkani)

script
ISO 15924 alpha-4 script code. You can find a full list of valid script codes in the IANA Language Subtag Registry (search for "Type: script"). The script field is case insensitive, but Locale always canonicalizes to title case (the first letter is upper case and the rest of the letters are lower case).

Well-formed script values have the form [a-zA-Z]{4}

Example: "Latn" (Latin), "Cyrl" (Cyrillic)

country (region)
ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code. You can find a full list of valid country and region codes in the IANA Language Subtag Registry (search for "Type: region"). The country (region) field is case insensitive, but Locale always canonicalizes to upper case.

Well-formed country/region values have the form [a-zA-Z]{2} | [0-9]{3}

Example: "US" (United States), "FR" (France), "029" (Caribbean)

variant
Any arbitrary value used to indicate a variation of a Locale. Where there are two or more variant values each indicating its own semantics, these values should be ordered by importance, with most important first, separated by underscore('_'). The variant field is case sensitive.

Note: IETF BCP 47 places syntactic restrictions on variant subtags. Also BCP 47 subtags are strictly used to indicate additional variations that define a language or its dialects that are not covered by any combinations of language, script and region subtags. You can find a full list of valid variant codes in the IANA Language Subtag Registry (search for "Type: variant").

However, the variant field in Locale has historically been used for any kind of variation, not just language variations. For example, some supported variants available in Java SE Runtime Environments indicate alternative cultural behaviors such as calendar type or number script. In BCP 47 this kind of information, which does not identify the language, is supported by extension subtags or private use subtags.


Well-formed variant values have the form SUBTAG (('_'|'-') SUBTAG)* where SUBTAG = [0-9][0-9a-zA-Z]{3} | [0-9a-zA-Z]{5,8}. (Note: BCP 47 only uses hyphen ('-') as a delimiter, this is more lenient).

Example: "polyton" (Polytonic Greek), "POSIX"

extensions
A map from single character keys to string values, indicating extensions apart from language identification. The extensions in Locale implement the semantics and syntax of BCP 47 extension subtags and private use subtags. The extensions are case insensitive, but Locale canonicalizes all extension keys and values to lower case. Note that extensions cannot have empty values.

Well-formed keys are single characters from the set [0-9a-zA-Z]. Well-formed values have the form SUBTAG ('-' SUBTAG)* where for the key 'x' SUBTAG = [0-9a-zA-Z]{1,8} and for other keys SUBTAG = [0-9a-zA-Z]{2,8} (that is, 'x' allows single-character subtags).

Example: key="u"/value="ca-japanese" (Japanese Calendar), key="x"/value="java-1-7"
Note: Although BCP 47 requires field values to be registered in the IANA Language Subtag Registry, the Locale class does not provide any validation features. The Builder only checks if an individual field satisfies the syntactic requirement (is well-formed), but does not validate the value itself. See Locale.Builder for details.

Unicode locale/language extension

UTS#35, "Unicode Locale Data Markup Language" defines optional attributes and keywords to override or refine the default behavior associated with a locale. A keyword is represented by a pair of key and type. For example, "nu-thai" indicates that Thai local digits (value:"thai") should be used for formatting numbers (key:"nu").

The keywords are mapped to a BCP 47 extension value using the extension key 'u' (UNICODE_LOCALE_EXTENSION). The above example, "nu-thai", becomes the extension "u-nu-thai".code

Thus, when a Locale object contains Unicode locale attributes and keywords, getExtension(UNICODE_LOCALE_EXTENSION) will return a String representing this information, for example, "nu-thai". The Locale class also provides getUnicodeLocaleAttributes(), getUnicodeLocaleKeys(), and getUnicodeLocaleType(String) which allow you to access Unicode locale attributes and key/type pairs directly. When represented as a string, the Unicode Locale Extension lists attributes alphabetically, followed by key/type sequences with keys listed alphabetically (the order of subtags comprising a key's type is fixed when the type is defined)

A well-formed locale key has the form [0-9a-zA-Z]{2}. A well-formed locale type has the form "" | [0-9a-zA-Z]{3,8} ('-' [0-9a-zA-Z]{3,8})* (it can be empty, or a series of subtags 3-8 alphanums in length). A well-formed locale attribute has the form [0-9a-zA-Z]{3,8} (it is a single subtag with the same form as a locale type subtag).

The Unicode locale extension specifies optional behavior in locale-sensitive services. Although the LDML specification defines various keys and values, actual locale-sensitive service implementations in a Java Runtime Environment might not support any particular Unicode locale attributes or key/type pairs.

Creating a Locale

There are several different ways to create a Locale object.

Builder

Using Locale.Builder you can construct a Locale object that conforms to BCP 47 syntax.

Constructors

The Locale class provides three constructors:

     Locale(String)
     Locale(String, String)
     Locale(String, String, String)
 
These constructors allow you to create a Locale object with language, country and variant, but you cannot specify script or extensions.
Factory Methods

The method forLanguageTag(String) creates a Locale object for a well-formed BCP 47 language tag.

Locale Constants

The Locale class provides a number of convenient constants that you can use to create Locale objects for commonly used locales. For example, the following creates a Locale object for the United States:

     Locale.US
 

Use of Locale

Once you've created a Locale you can query it for information about itself. Use getCountry to get the country (or region) code and getLanguage to get the language code. You can use getDisplayCountry to get the name of the country suitable for displaying to the user. Similarly, you can use getDisplayLanguage to get the name of the language suitable for displaying to the user. Interestingly, the getDisplayXXX methods are themselves locale-sensitive and have two versions: one that uses the default locale and one that uses the locale specified as an argument.

The Java Platform provides a number of classes that perform locale-sensitive operations. For example, the NumberFormat class formats numbers, currency, and percentages in a locale-sensitive manner. Classes such as NumberFormat have several convenience methods for creating a default object of that type. For example, the NumberFormat class provides these three convenience methods for creating a default NumberFormat object:

     NumberFormat.getInstance()
     NumberFormat.getCurrencyInstance()
     NumberFormat.getPercentInstance()
 
Each of these methods has two variants; one with an explicit locale and one without; the latter uses the default locale:
     NumberFormat.getInstance(myLocale)
     NumberFormat.getCurrencyInstance(myLocale)
     NumberFormat.getPercentInstance(myLocale)
 
A Locale is the mechanism for identifying the kind of object (NumberFormat) that you would like to get. The locale is just a mechanism for identifying objects, not a container for the objects themselves.

Compatibility

In order to maintain compatibility with existing usage, Locale's constructors retain their behavior prior to the Java Runtime Environment version 1.7. The same is largely true for the toString method. Thus Locale objects can continue to be used as they were. In particular, clients who parse the output of toString into language, country, and variant fields can continue to do so (although this is strongly discouraged), although the variant field will have additional information in it if script or extensions are present.

In addition, BCP 47 imposes syntax restrictions that are not imposed by Locale's constructors. This means that conversions between some Locales and BCP 47 language tags cannot be made without losing information. Thus toLanguageTag cannot represent the state of locales whose language, country, or variant do not conform to BCP 47.

Because of these issues, it is recommended that clients migrate away from constructing non-conforming locales and use the forLanguageTag and Locale.Builder APIs instead. Clients desiring a string representation of the complete locale can then always rely on toLanguageTag for this purpose.

Special cases

For compatibility reasons, two non-conforming locales are treated as special cases. These are ja_JP_JP and th_TH_TH. These are ill-formed in BCP 47 since the variants are too short. To ease migration to BCP 47, these are treated specially during construction. These two cases (and only these) cause a constructor to generate an extension, all other values behave exactly as they did prior to Java 7.

Java has used ja_JP_JP to represent Japanese as used in Japan together with the Japanese Imperial calendar. This is now representable using a Unicode locale extension, by specifying the Unicode locale key ca (for "calendar") and type japanese. When the Locale constructor is called with the arguments "ja", "JP", "JP", the extension "u-ca-japanese" is automatically added.

Java has used th_TH_TH to represent Thai as used in Thailand together with Thai digits. This is also now representable using a Unicode locale extension, by specifying the Unicode locale key nu (for "number") and value thai. When the Locale constructor is called with the arguments "th", "TH", "TH", the extension "u-nu-thai" is automatically added.

Serialization

During serialization, writeObject writes all fields to the output stream, including extensions.

During deserialization, readResolve adds extensions as described in Special Cases, only for the two cases th_TH_TH and ja_JP_JP.

Legacy language codes

Locale's constructor has always converted three language codes to their earlier, obsoleted forms: he maps to iw, yi maps to ji, and id maps to in. This continues to be the case, in order to not break backwards compatibility.

The APIs added in 1.7 map between the old and new language codes, maintaining the old codes internal to Locale (so that getLanguage and toString reflect the old code), but using the new codes in the BCP 47 language tag APIs (so that toLanguageTag reflects the new one). This preserves the equivalence between Locales no matter which code or API is used to construct them. Java's default resource bundle lookup mechanism also implements this mapping, so that resources can be named using either convention, see ResourceBundle.Control.

Three-letter language/country(region) codes

The Locale constructors have always specified that the language and the country param be two characters in length, although in practice they have accepted any length. The specification has now been relaxed to allow language codes of two to eight characters and country (region) codes of two to three characters, and in particular, three-letter language codes and three-digit region codes as specified in the IANA Language Subtag Registry. For compatibility, the implementation still does not impose a length constraint.

Locale data

Note that locale data comes solely from ICU. User-supplied locale service providers (using the java.text.spi or java.util.spi mechanisms) are not supported.

Here are the versions of ICU (and the corresponding CLDR and Unicode versions) used in various Android releases:

Android 1.5 (Cupcake)/Android 1.6 (Donut)/Android 2.0 (Eclair) ICU 3.8 CLDR 1.5 Unicode 5.0
Android 2.2 (Froyo) ICU 4.2 CLDR 1.7 Unicode 5.1
Android 2.3 (Gingerbread)/Android 3.0 (Honeycomb) ICU 4.4 CLDR 1.8 Unicode 5.2
Android 4.0 (Ice Cream Sandwich) ICU 4.6 CLDR 1.9 Unicode 6.0
Android 4.1 (Jelly Bean) ICU 4.8 CLDR 2.0 Unicode 6.0
Android 4.3 (Jelly Bean MR2) ICU 50 CLDR 22.1 Unicode 6.2
Android 4.4 (KitKat) ICU 51 CLDR 23 Unicode 6.2
Android 5.0 (Lollipop) ICU 53 CLDR 25 Unicode 6.3
Android 6.0 (Marshmallow) ICU 55.1 CLDR 27.0.1 Unicode 7.0

Be wary of the default locale

Note that there are many convenience methods that automatically use the default locale, but using them may lead to subtle bugs.

The default locale is appropriate for tasks that involve presenting data to the user. In this case, you want to use the user's date/time formats, number formats, rules for conversion to lowercase, and so on. In this case, it's safe to use the convenience methods.

The default locale is not appropriate for machine-readable output. The best choice there is usually Locale.US – this locale is guaranteed to be available on all devices, and the fact that it has no surprising special cases and is frequently used (especially for computer-computer communication) means that it tends to be the most efficient choice too.

A common mistake is to implicitly use the default locale when producing output meant to be machine-readable. This tends to work on the developer's test devices (especially because so many developers use en_US), but fails when run on a device whose user is in a more complex locale.

For example, if you're formatting integers some locales will use non-ASCII decimal digits. As another example, if you're formatting floating-point numbers some locales will use ',' as the decimal point and '.' for digit grouping. That's correct for human-readable output, but likely to cause problems if presented to another computer (parseDouble(String) can't parse such a number, for example). You should also be wary of the toLowerCase() and toUpperCase() overloads that don't take a Locale: in Turkey, for example, the characters 'i' and 'I' won't be converted to 'I' and 'i'. This is the correct behavior for Turkish text (such as user input), but inappropriate for, say, HTTP headers.

Nested Class Summary

class Locale.Builder Builder is used to build instances of Locale from values configured by the setters. 
enum Locale.Category Enum for locale categories. 

Constant Summary

char PRIVATE_USE_EXTENSION The key for the private use extension ('x').
char UNICODE_LOCALE_EXTENSION The key for Unicode locale extension ('u').

Field Summary

public static final Locale CANADA Useful constant for country.
public static final Locale CANADA_FRENCH Useful constant for country.
public static final Locale CHINA Useful constant for country.
public static final Locale CHINESE Useful constant for language.
public static final Locale ENGLISH Useful constant for language.
public static final Locale FRANCE Useful constant for country.
public static final Locale FRENCH Useful constant for language.
public static final Locale GERMAN Useful constant for language.
public static final Locale GERMANY Useful constant for country.
public static final Locale ITALIAN Useful constant for language.
public static final Locale ITALY Useful constant for country.
public static final Locale