The Mobile Vision API is now a part of ML Kit. We strongly encourage you to try it out, as it comes with new capabilities like on-device image labeling! Also, note that we ultimately plan to wind down the Mobile Vision API, with all new on-device ML capabilities released via ML Kit. Feel free to reach out to Firebase support for help.

Text recognition for iOS

The TextDetector API detects and recognizes printed text in images and video streams in real-time, on device. Internally, the API runs text detection first. Once any text is detected, the API then determines the actual text and segments it into lines and words.

Recognized Languages

The Text API can recognize text in any Latin based language. This includes, but is not limited to:

  • Catalan
  • Danish
  • Dutch
  • English
  • Finnish
  • French
  • German
  • Hungarian
  • Italian
  • Latin
  • Norwegian
  • Polish
  • Portugese
  • Romanian
  • Spanish
  • Swedish
  • Tagalog
  • Turkish

Detect Text Features in Photos

This tutorial will discuss:

  1. Creating a text detector.
  2. Detecting text in a static image.

Creating the text detector

  • Add a file named Podfile to your Xcode project folder, if you don't have one already.

  • Add pod 'GoogleMobileVision/TextDetector' to your Podfile.

  • Run the command pod update from Terminal in the Xcode project folder. This will download and add the TextDetector CocoaPod to your project.

Then, import the GoogleMobileVision framework to use the detector API.

@import GoogleMobileVision;

Typically, the text detector is created in the viewDidLoad method.

  self.textDetector = [GMVDetector detectorOfType:GMVDetectorTypeText options:nil];

Detecting and recognizing text

Use the text detector to find text blocks in an UIImage. Each text block is made up of text lines, and each line is made up of words (called "elements" in the API).

UIImage *image = [UIImage imageNamed:@"text.jpg"];
NSArray<GMVTextBlockFeature *> *features = [self.textDetector featuresInImage:self.image
                                                                      options:nil];

Getting text results

The Text API segments text into blocks, lines, and words. Roughly speaking:

  • a Block is a contiguous set of text lines, such as a paragraph or column
  • a Line is a contiguous set of words on the same vertical axis, and
  • a Word is a contiguous set of alphanumeric characters on the same vertical axis.

The detector returns a collection of blocks. To use it simply iterate over the collection. The bounding box, prevailing language, and the detected text value are available as properties on each text blocks, lines, and elements.

// Iterate over each text block.
for (GMVTextBlockFeature *textBlock in features) {
    NSLog(@"Text Block: %@", NSStringFromRect(textBlock.bounds));
    NSLog(@"lang: %@ value: %@", textBlock.language, textBlock.value);

    // For each text block, iterate over each line.
    for (GMVTextLineFeature *textLine in textBlock.lines) {
        NSLog(@"Text Line: %@", NSStringFromRect(textLine.bounds));
        NSLog(@"lang: %@ value: %@", textLine.language, textLine.value);

        // For each line, iterate over each word.
        for (GMVTextElementFeature *textElement in textLine.elements) {
            NSLog(@"Text Element: %@", NSStringFromRect(textElement.bounds));
            NSLog(@"value: %@", textElement.value);
        }
    }
}