Object detection and tracking

Page Summary

ML Kit's on-device API enables detection and tracking of objects within images or live camera feeds, working efficiently even on lower-end mobile devices.
It offers optional object classification using a built-in coarse classifier or your own custom TensorFlow Lite model for more specialized categorization.
The API can identify the most prominent object in an image and track it across frames, making it suitable for visual search applications.
Custom models can be integrated to classify objects into specific categories, enhancing the functionality for tailored use cases.
Input images are automatically preprocessed to fit model requirements, using bilinear scaling and stretching if necessary.

With ML Kit's on-device object detection and tracking API, you can detect and track objects in an image or live camera feed.

Optionally, you can classify detected objects, either by using the coarse classifier built into the API, or using your own custom image classification model. See Using a custom LiteRT model for more information.

Because object detection and tracking happens on the device, it works well as the frontend of the visual search pipeline. After you detect and filter objects, you can pass them to a cloud backend, such as Cloud Vision Product Search.

iOS Android

Key capabilities

Fast object detection and tracking Detect objects and get their locations in the image. Track objects across successive image frames.
Optimized on-device model The object detection and tracking model is optimized for mobile devices and intended for use in real-time applications, even on lower-end devices.
Prominent object detection Automatically determine the most prominent object in an image.
Coarse classification Classify objects into broad categories, which you can use to filter out objects you're not interested in. The following categories are supported: home goods, fashion goods, food, plants, and places.
Classification with a custom model Use your own custom image classification model to identify or filter specific object categories. Make your custom model perform better by leaving out background of the image.

Example results

Tracking the most prominent object across images

The following example shows the tracking data from three successive frames with the default coarse classifier provided by ML Kit.

Tracking ID	0
Bounds	(95, 45), (496, 45), (496, 240), (95, 240)
Category	PLACE
Classification confidence	0.9296875

Tracking ID	0
Bounds	(84, 46), (478, 46), (478, 247), (84, 247)
Category	PLACE
Classification confidence	0.8710938

Tracking ID	0
Bounds	(53, 45), (519, 45), (519, 240), (53, 240)
Category	PLACE
Classification confidence	0.8828125

Photo: Christian Ferrer [CC BY-SA 4.0]

Multiple objects in a static image

The following example shows the data for the four objects detected in the image with the default coarse classifier provided by ML Kit.

Shoes

Object 0
Bounds	(1, 97), (332, 97), (332, 332), (1, 332)
Category	FASHION_GOOD
Classification confidence	0.95703125
Object 1
Bounds	(186, 80), (337, 80), (337, 226), (186, 226)
Category	FASHION_GOOD
Classification confidence	0.84375
Object 2
Bounds	(296, 80), (472, 80), (472, 388), (296, 388)
Category	FASHION_GOOD
Classification confidence	0.94921875
Object 3
Bounds	(439, 83), (615, 83), (615, 306), (439, 306)
Category	FASHION_GOOD
Classification confidence	0.9375

Using a custom LiteRT model

The default coarse classifier is built for five categories, providing limited information about the detected objects. You might need a more specialized classifier model that covers a narrower domain of concepts in more detail; for example, a model to distinguish between species of flowers or types of food.

This API lets you tailor to a particular use case by supporting custom image classification models from a wide range of sources. Refer to Custom models with ML Kit to learn more. Custom models can be bundled with your app or dynamically downloaded from Cloud Storage.