Locale

public final class Locale extends Object
implements Cloneable Serializable

A Locale object represents a specific geographical, political, or cultural region. An operation that requires a Locale to perform its task is called locale-sensitive and uses the Locale to tailor information for the user. For example, displaying a number is a locale-sensitive operation— the number should be formatted according to the customs and conventions of the user's native country, region, or culture.

The Locale class implements identifiers interchangeable with BCP 47 (IETF BCP 47, "Tags for Identifying Languages"), with support for the LDML (UTS#35, "Unicode Locale Data Markup Language") BCP 47-compatible extensions for locale data exchange.

A Locale object logically consists of the fields described below.

language
ISO 639 alpha-2 or alpha-3 language code, or registered language subtags up to 8 alpha letters (for future enhancements). When a language has both an alpha-2 code and an alpha-3 code, the alpha-2 code must be used. You can find a full list of valid language codes in the IANA Language Subtag Registry (search for "Type: language"). The language field is case insensitive, but Locale always canonicalizes to lower case.

Well-formed language values have the form [a-zA-Z]{2,8}. Note that this is not the the full BCP47 language production, since it excludes extlang. They are not needed since modern three-letter language codes replace them.

Example: "en" (English), "ja" (Japanese), "kok" (Konkani)

script
ISO 15924 alpha-4 script code. You can find a full list of valid script codes in the IANA Language Subtag Registry (search for "Type: script"). The script field is case insensitive, but Locale always canonicalizes to title case (the first letter is upper case and the rest of the letters are lower case).

Well-formed script values have the form [a-zA-Z]{4}

Example: "Latn" (Latin), "Cyrl" (Cyrillic)

country (region)
ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code. You can find a full list of valid country and region codes in the IANA Language Subtag Registry (search for "Type: region"). The country (region) field is case insensitive, but Locale always canonicalizes to upper case.

Well-formed country/region values have the form [a-zA-Z]{2} | [0-9]{3}

Example: "US" (United States), "FR" (France), "029" (Caribbean)

variant
Any arbitrary value used to indicate a variation of a Locale. Where there are two or more variant values each indicating its own semantics, these values should be ordered by importance, with most important first, separated by underscore('_'). The variant field is case sensitive.

Note: IETF BCP 47 places syntactic restrictions on variant subtags. Also BCP 47 subtags are strictly used to indicate additional variations that define a language or its dialects that are not covered by any combinations of language, script and region subtags. You can find a full list of valid variant codes in the IANA Language Subtag Registry (search for "Type: variant").

However, the variant field in Locale has historically been used for any kind of variation, not just language variations. For example, some supported variants available in Java SE Runtime Environments indicate alternative cultural behaviors such as calendar type or number script. In BCP 47 this kind of information, which does not identify the language, is supported by extension subtags or private use subtags.


Well-formed variant values have the form SUBTAG (('_'|'-') SUBTAG)* where SUBTAG = [0-9][0-9a-zA-Z]{3} | [0-9a-zA-Z]{5,8}. (Note: BCP 47 only uses hyphen ('-') as a delimiter, this is more lenient).

Example: "polyton" (Polytonic Greek), "POSIX"

extensions
A map from single character keys to string values, indicating extensions apart from language identification. The extensions in Locale implement the semantics and syntax of BCP 47 extension subtags and private use subtags. The extensions are case insensitive, but Locale canonicalizes all extension keys and values to lower case. Note that extensions cannot have empty values.

Well-formed keys are single characters from the set [0-9a-zA-Z]. Well-formed values have the form SUBTAG ('-' SUBTAG)* where for the key 'x' SUBTAG = [0-9a-zA-Z]{1,8} and for other keys SUBTAG = [0-9a-zA-Z]{2,8} (that is, 'x' allows single-character subtags).

Example: key="u"/value="ca-japanese" (Japanese Calendar), key="x"/value="java-1-7"
Note: Although BCP 47 requires field values to be registered in the IANA Language Subtag Registry, the Locale class does not provide any validation features. The Builder only checks if an individual field satisfies the syntactic requirement (is well-formed), but does not validate the value itself. See Locale.Builder for details.

Unicode locale/language extension

UTS#35, "Unicode Locale Data Markup Language" defines optional attributes and keywords to override or refine the default behavior associated with a locale. A keyword is represented by a pair of key and type. For example, "nu-thai" indicates that Thai local digits (value:"thai") should be used for formatting numbers (key:"nu").

The keywords are mapped to a BCP 47 extension value using the extension key 'u' (UNICODE_LOCALE_EXTENSION). The above example, "nu-thai", becomes the extension "u-nu-thai".code

Thus, when a Locale object contains Unicode locale attributes and keywords, getExtension(UNICODE_LOCALE_EXTENSION) will return a String representing this information, for example, "nu-thai". The Locale class also provides getUnicodeLocaleAttributes(), getUnicodeLocaleKeys(), and getUnicodeLocaleType(String) which allow you to access Unicode locale attributes and key/type pairs directly. When represented as a string, the Unicode Locale Extension lists attributes alphabetically, followed by key/type sequences with k