LanguageIdentifier

public interface LanguageIdentifier extends Closeable, LifecycleObserver, OptionalModuleApi

A LanguageIdentification client for identifying the language of a piece of text.

A LanguageIdentifier is created via LanguageIdentification.getClient(LanguageIdentificationOptions) or LanguageIdentification.getClient() if you wish to use the default options. For example, the code below creates a LanguageIdentifier with default options.

Example:

LanguageIdentifier languageIdentifier = LanguageIdentification.getClient();
 

This class can be used from any thread.

Constant Summary

float DEFAULT_IDENTIFY_LANGUAGE_CONFIDENCE_THRESHOLD The default confidence threshold for the identifyLanguage(String) call.
float DEFAULT_IDENTIFY_POSSIBLE_LANGUAGES_CONFIDENCE_THRESHOLD The default confidence threshold for the identifyPossibleLanguages(String) call.
String UNDETERMINED_LANGUAGE_TAG The BCP 47 language tag for "undetermined language"

Public Method Summary

abstract void
abstract Task<String>
identifyLanguage(String text)
Identifies the language in a supplied String and returns the most likely language.
abstract Task<List<IdentifiedLanguage>>
identifyPossibleLanguages(String text)
Identifies the language in a supplied String and returns a list of possible languages, cutting off any languages whose confidence score falls below the threshold which is set in LanguageIdentificationOptions.Builder.setConfidenceThreshold(float).

Inherited Method Summary

Constants

public static final float DEFAULT_IDENTIFY_LANGUAGE_CONFIDENCE_THRESHOLD

The default confidence threshold for the identifyLanguage(String) call.

Constant Value: 0.5

public static final float DEFAULT_IDENTIFY_POSSIBLE_LANGUAGES_CONFIDENCE_THRESHOLD

The default confidence threshold for the identifyPossibleLanguages(String) call.

Constant Value: 0.01

public static final String UNDETERMINED_LANGUAGE_TAG

The BCP 47 language tag for "undetermined language"

Constant Value: "und"

Public Methods

public abstract void close ()

public abstract Task<String> identifyLanguage (String text)

Identifies the language in a supplied String and returns the most likely language.

Parameters
text the text for which to identify the language. Inputs longer than 200 characters are truncated to 200 characters, as longer input does not improve the detection accuracy.
Returns

public abstract Task<List<IdentifiedLanguage>> identifyPossibleLanguages (String text)

Identifies the language in a supplied String and returns a list of possible languages, cutting off any languages whose confidence score falls below the threshold which is set in LanguageIdentificationOptions.Builder.setConfidenceThreshold(float).

Note that this API assumes the text is in a single language; the returned list contains all estimations for what that language could be, along with a confidence score for each possible language. The API does not detect multiple languages in a single text.

Parameters
text the text for which to identify the language. Inputs longer than 200 characters are truncated to 200 characters, as longer input does not improve the detection accuracy.
Returns