Text

public interface Text
Known Indirect Subclasses

Common interface for every entity across the hierarchy of recognized text. An entity may contain other smaller entities, or may be an atom.

Public Method Summary

abstract Rect
getBoundingBox()
Axis-aligned bounding box containing the text.
abstract List<? extends Text>
getComponents()
Smaller components that comprise this entity, if any.
abstract Point[]
getCornerPoints()
Four corner points in clockwise direction starting with top-left.
abstract String
getLanguage()
Prevailing language in the text, if any.
abstract String
getValue()
Retrieve the recognized text as a string.

Public Methods

public abstract Rect getBoundingBox ()

Axis-aligned bounding box containing the text. The bounding box may extend past the image boundary.

public abstract List<? extends Text> getComponents ()

Smaller components that comprise this entity, if any. If this entity is an atom, an empty list is returned. TextBlock is at the top of the Text hierarchy. TextBlock contains Line objects, which contains Elements. Elements are atoms. We may decide to add character-level objects in later versions.

For example, a client could draw bounding boxes for recognized text in different colors for paragraphs, lines, words, and alphabets by repeatedly traversing down the tree with this method.

public abstract Point[] getCornerPoints ()

Four corner points in clockwise direction starting with top-left. Due to the possible perspective distortions, this is not necessarily a rectangle. Parts of the region could be outside of the image.

public abstract String getLanguage ()

Prevailing language in the text, if any. The format is in BCP47 (e.g. "en" or "sr-Latn-BA") or "und" if the language could not be determined.

public abstract String getValue ()

Retrieve the recognized text as a string. Returned in reading order for the language. For Latin, this is top to bottom within a TextBlock, and left-to-right within Lines.