Vision is a category of machine learning that deals with the analysis and interpretation of images and video streams. Cameras are increasingly used as input methods, allowing users to convey visually what's hard to describe in text. ML Kit's APIs allow you to offer core experiences like visual search or text extraction as a means to developing camera-first experiences.

Base APIs

Barcode scanning

Scan and process barcodes.

Face detection

Detect faces and facial landmarks.

Image labeling

Identify objects, locations, activities, animal species, products, and more.

Landmark detection

Identify popular landmarks in an image.

Object detection and tracking

Localize and track in real time the most prominent object in the live camera feed.

Text recognition

Recognize and extract text from images.


AutoML Vision Edge

Generate custom image classification models to use on device from your own library of images.