Vision is a category of machine learning that deals with the analysis and interpretation of images and video streams. Cameras are increasingly used as input methods, allowing users to convey visually what's hard to describe in text. ML Kit's APIs allow you to offer core experiences like visual search or text extraction as a means to developing camera-first experiences.

Base APIs

Scan and process barcodes.
Detect faces and facial landmarks.
Identify objects, locations, activities, animal species, products, and more.
Identify popular landmarks in an image.
Localize and track in real time the most prominent object in the live camera feed.
Recognize and extract text from images.


Generate custom image classification models to use on device from your own library of images.