Use ARCore as input for Machine Learning models

Page Summary

The ARCore camera feed can be used in a machine learning pipeline to create intelligent augmented reality experiences.
ARCore captures CPU and GPU image streams by default; the CPU image stream is suitable for machine learning due to its default VGA resolution and is used for feature recognition.
You can configure and retrieve the ARCore CPU image for use with machine learning models.
Machine learning libraries like ML Kit and Firebase Machine Learning can be used to process the CPU image and identify objects.
Detected objects can be displayed in the AR scene by using hit testing on the object's coordinates and creating an anchor.

You can use the camera feed that ARCore captures in a machine learning pipeline to create an intelligent augmented reality experience. The ARCore ML Kit sample demonstrates how to use ML Kit and the Google Cloud Vision API to identify real-world objects. The sample uses a machine learning model to classify objects in the camera's view and attaches a label to the object in the virtual scene.

The ARCore ML Kit sample is written in Kotlin. It is also available as the ml_kotlin sample app in the ARCore SDK GitHub repository.

Use ARCore's CPU image

ARCore captures at least two sets of image streams by default:

A CPU image stream used for feature recognition and image processing. By default, the CPU image has a resolution of VGA (640x480). ARCore can be configured to use an additional higher resolution image stream, if required.
A GPU texture stream, which contains a high-resolution texture, usually at a resolution of 1080p. This is typically used as a user-facing camera preview. This is stored in the OpenGL texture specified by Session.setCameraTextureName().
Any additional streams specified by SharedCamera.setAppSurfaces().

CPU image size considerations

No additional cost is incurred if the default VGA-sized CPU stream is used because ARCore uses this stream for world comprehension. Requesting a stream with a different resolution may be expensive, as an additional stream will need to be captured. Keep in mind that a higher resolution may quickly become expensive for your model: doubling the width and height of the image quadruples the amount of pixels in the image.

It may be advantageous to downscale the image, if your model can still perform well on a lower resolution image.

Configure an additional high resolution CPU image stream

The performance of your ML model may depend on the resolution of the image used as input. The resolution of these streams can be adjusted by changing the current CameraConfig using Session.setCameraConfig(), selecting a valid configuration from Session.getSupportedCameraConfigs().

Java

CameraConfigFilter cameraConfigFilter =
    new CameraConfigFilter(session)
        // World-facing cameras only.
        .setFacingDirection(CameraConfig.FacingDirection.BACK);
List<CameraConfig> supportedCameraConfigs =
    session.getSupportedCameraConfigs(cameraConfigFilter);

// Select an acceptable configuration from supportedCameraConfigs.
CameraConfig cameraConfig = selectCameraConfig(supportedCameraConfigs);
session.setCameraConfig(cameraConfig);

Kotlin

val cameraConfigFilter =
  CameraConfigFilter(session)
    // World-facing cameras only.
    .setFacingDirection(CameraConfig.FacingDirection.BACK)
val supportedCameraConfigs = session.getSupportedCameraConfigs(cameraConfigFilter)

// Select an acceptable configuration from supportedCameraConfigs.
val cameraConfig = selectCameraConfig(supportedCameraConfigs)
session.setCameraConfig(cameraConfig)

Retrieve the CPU image

Retrieve the CPU image using Frame.acquireCameraImage(). These images should be disposed of as soon as they're no longer needed.

Java

Image cameraImage = null;
try {
  cameraImage = frame.acquireCameraImage();
  // Process `cameraImage` using your ML inference model.
} catch (NotYetAvailableException e) {
  // NotYetAvailableException is an exception that can be expected when the camera is not ready
  // yet. The image may become available on a next frame.
} catch (RuntimeException e) {
  // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException.
  // Handle this error appropriately.
  handleAcquireCameraImageFailure(e);
} finally {
  if (cameraImage != null) {
    cameraImage.close();
  }
}

Kotlin

// NotYetAvailableException is an exception that can be expected when the camera is not ready yet.
// Map it to `null` instead, but continue to propagate other errors.
fun Frame.tryAcquireCameraImage() =
  try {
    acquireCameraImage()
  } catch (e: NotYetAvailableException) {
    null
  } catch (e: RuntimeException) {
    // A different exception occurred, e.g. DeadlineExceededException, ResourceExhaustedException.
    // Handle this error appropriately.
    handleAcquireCameraImageFailure(e)
  }

// The `use` block ensures the camera image is disposed of after use.
frame.tryAcquireCameraImage()?.use { image ->
  // Process `image` using your ML inference model.
}

Process the CPU image

To process the CPU image, various machine learning libraries can be used.

ML Kit: ML Kit provides an on-device Object Detection and Tracking API. It comes with a coarse classifier built into the API, and can also use custom classification models to cover a narrower domain of objects. Use InputImage.fromMediaImage to convert your CPU image into an InputImage.
Firebase Machine Learning: Firebase provides Machine Learning APIs that work either in the cloud or on the device. See Firebase documentation on Label Images Securely with Cloud Vision using Firebase Auth and Functions on Android.

Display results in your AR scene

Image recognition models often output detected objects by indicating a center point or a bounding polygon representing the detected object.

Using the center point or center of the bounding box that is output from the model, it's possible to attach an anchor to the detected object. Use Frame.hitTest() to estimate the pose of an object in the virtual scene.

Convert IMAGE_PIXELS coordinates to VIEW coordinates:

Java

// Suppose `mlResult` contains an (x, y) of a given point on the CPU image.
float[] cpuCoordinates = new float[] {mlResult.getX(), mlResult.getY()};
float[] viewCoordinates = new float[2];
frame.transformCoordinates2d(
    Coordinates2d.IMAGE_PIXELS, cpuCoordinates, Coordinates2d.VIEW, viewCoordinates);
// `viewCoordinates` now contains coordinates suitable for hit testing.

Kotlin

// Suppose `mlResult` contains an (x, y) of a given point on the CPU image.
val cpuCoordinates = floatArrayOf(mlResult.x, mlResult.y)
val viewCoordinates = FloatArray(2)
frame.transformCoordinates2d(
  Coordinates2d.IMAGE_PIXELS,
  cpuCoordinates,
  Coordinates2d.VIEW,
  viewCoordinates
)
// `viewCoordinates` now contains coordinates suitable for hit testing.

Use these VIEW coordinates to conduct a hit test and create an anchor from the result:

Java

List<HitResult> hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1]);
HitResult depthPointResult = null;
for (HitResult hit : hits) {
  if (hit.getTrackable() instanceof DepthPoint) {
    depthPointResult = hit;
    break;
  }
}
if (depthPointResult != null) {
  Anchor anchor = depthPointResult.getTrackable().createAnchor(depthPointResult.getHitPose());
  // This anchor will be attached to the scene with stable tracking.
  // It can be used as a position for a virtual object, with a rotation prependicular to the
  // estimated surface normal.
}

Kotlin

val hits = frame.hitTest(viewCoordinates[0], viewCoordinates[1])
val depthPointResult = hits.filter { it.trackable is DepthPoint }.firstOrNull()
if (depthPointResult != null) {
  val anchor = depthPointResult.trackable.createAnchor(depthPointResult.hitPose)
  // This anchor will be attached to the scene with stable tracking.
  // It can be used as a position for a virtual object, with a rotation prependicular to the
  // estimated surface normal.
}

Performance considerations

Follow the following recommendations to save processing power and consume less energy:

Do not run your ML model on every incoming frame. Consider running object detection at a low framerate instead.
Consider an online ML inference model to reduce computational complexity.

Next steps

Learn about Best Practices for ML Engineering.
Learn about Responsible AI practices.
Follow the basics of machine learning with TensorFlow course.

Use ARCore as input for Machine Learning models Stay organized with collections Save and categorize content based on your preferences.

Page Summary

Use ARCore's CPU image

CPU image size considerations

Configure an additional high resolution CPU image stream

Java

Kotlin

Retrieve the CPU image

Java

Kotlin

Process the CPU image

Display results in your AR scene

Java

Kotlin

Java

Kotlin

Performance considerations

Next steps

Use ARCore as input for Machine Learning models