Collator

public abstract class Collator extends Object
implements Comparator<Object> Cloneable
Known Direct Subclasses

Performs locale-sensitive string comparison.

Following the Unicode Consortium's specifications for the Unicode Collation Algorithm (UCA), there are 4 different levels of strength used in comparisons:

  • PRIMARY strength: Typically, this is used to denote differences between base characters (for example, "a" < "b"). It is the strongest difference. For example, dictionaries are divided into different sections by base character.
  • SECONDARY strength: Accents in the characters are considered secondary differences (for example, "as" < "às" < "at"). Other differences between letters can also be considered secondary differences, depending on the language. A secondary difference is ignored when there is a primary difference anywhere in the strings.
  • TERTIARY strength: Upper and lower case differences in characters are distinguished at tertiary strength (for example, "ao" < "Ao" < "aò"). In addition, a variant of a letter differs from the base form on the tertiary strength (such as "A" and "Ⓐ"). Another example is the difference between large and small Kana. A tertiary difference is ignored when there is a primary or secondary difference anywhere in the strings.
  • IDENTICAL strength: When all other strengths are equal, the IDENTICAL strength is used as a tiebreaker. The Unicode code point values of the NFD form of each string are compared, just in case there is no difference. For example, Hebrew cantellation marks are only distinguished at this strength. This strength should be used sparingly, as only code point value differences between two strings are an extremely rare occurrence. Using this strength substantially decreases the performance for both comparison and collation key generation APIs. This strength also increases the size of the collation key.

This Collator deals only with two decomposition modes, the canonical decomposition mode and one that does not use any decomposition. The compatibility decomposition mode java.text.Collator.FULL_DECOMPOSITION is not supported here. If the canonical decomposition mode is set, Collator handles un-normalized text properly, producing the same results as if the text were normalized in NFD. If canonical decomposition is turned off, it is the user's responsibility to ensure that all text is already in the appropriate form before performing a comparison or before getting a CollationKey.

Examples:

 // Get the Collator for US English and set its strength to PRIMARY
 Collator usCollator = Collator.getInstance(Locale.US);
 usCollator.setStrength(Collator.PRIMARY);
 if (usCollator.compare("abc", "ABC") == 0) {
     System.out.println("Strings are equivalent");
 }
 

The following example shows how to compare two strings using the collator for the default locale.

 // Compare two strings in the default locale
 Collator myCollator = Collator.getInstance();
 myCollator.setDecomposition(Collator.NO_DECOMPOSITION);
 if (myCollator.compare("ḁ̀", "ḁ̀") != 0) {
     System.out.println("ḁ̀ is not equal to ḁ̀ without decomposition");
     myCollator.setDecomposition(Collator.CANONICAL_DECOMPOSITION);
     if (myCollator.compare("ḁ̀", "ḁ̀") != 0) {
         System.out.println("Error: ḁ̀ should be equal to ḁ̀ with decomposition");
     } else {
         System.out.println("ḁ̀ is equal to ḁ̀ with decomposition");
     }
 } else {
     System.out.println("Error: ḁ̀ should be not equal to ḁ̀ without decomposition");
 }
 

See Also

Constant Summary

int CANONICAL_DECOMPOSITION Constant used to specify the decomposition rule.
int FULL_DECOMPOSITION Constant used to specify the decomposition rule.
int IDENTICAL Constant used to specify the collation strength.
int NO_DECOMPOSITION Constant used to specify the decomposition rule.
int PRIMARY Constant used to specify the collation strength.
int SECONDARY Constant used to specify the collation strength.
int TERTIARY Constant used to specify the collation strength.

Public Constructor Summary

Public Method Summary

Object
clone()
Creates and returns a copy of this Object.
int
compare(Object object1, Object object2)
Compares two objects to determine their relative order.
abstract int
compare(String string1, String string2)
Compares two strings to determine their relative order.
boolean
equals(String string1, String string2)
Compares two strings using the collation rules to determine if they are equal.
static Locale[]
getAvailableLocales()
Returns an array of locales for which custom Collator instances are available.
abstract CollationKey
getCollationKey(String string)
Returns a CollationKey for the specified string for this collator with the current decomposition rule and strength value.
abstract int
getDecomposition()
Returns the decomposition rule for this collator.
static Collator
getInstance()
Returns a Collator instance which is appropriate for the user's default Locale.
static Collator
getInstance(Locale locale)
Returns a Collator instance which is appropriate for locale.
abstract int
getStrength()
Returns the strength value for this collator.
abstract void
setDecomposition(int value)
Sets the decomposition rule for this collator.
abstract void
setStrength(int value)
Sets the strength value for this collator.

Inherited Method Summary