The Mobile Vision API is now a part of ML Kit. We strongly encourage you to try it out, as it comes with new capabilities like on-device image labeling! Also, note that we ultimately plan to wind down the Mobile Vision API, with all new on-device ML capabilities released via ML Kit. Feel free to reach out to Firebase support for help.

Text Recognition API Overview

Text recognition is the process of detecting text in images and video streams and recognizing the text contained therein. Once detected, the recognizer then determines the actual text in each block and segments it into lines and words. The Text API detects text in Latin based languages (French, German, English, etc.), in real-time, on device.

Try out the Text API codelab to learn how to integrate the Text API into your application.

Recognized Languages

The Text API can recognize text in any Latin based language. This includes, but is not limited to:

  • Catalan
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Hungarian
  • Italian
  • Latin
  • Norwegian
  • Polish
  • Portugese
  • Romanian
  • Spanish
  • Swedish
  • Tagalog
  • Turkish

Text Structure

The Text Recognizer segments text into blocks, lines, and words. Roughly speaking:

  • a Block is a contiguous set of text lines, such as a paragraph or column,

  • a Line is a contiguous set of words on the same vertical axis, and

  • a Word is a contiguous set of alphanumeric characters on the same vertical axis.

The image below highlights examples of each of these in descending order. The first highlighted block, in cyan, is a Block of text. The second set of highlighted blocks, in blue, are Lines of text. Finally, the third set of highlighted blocks, in dark blue, are Words.