The Mobile Vision API is now a part of ML Kit. We strongly encourage you to try it out, as it comes with new capabilities like on-device image labeling! Also, note that we ultimately plan to wind down the Mobile Vision API, with all new on-device ML capabilities released via ML Kit. Feel free to reach out to Firebase support for help.

Add Face Tracking with GoogleMVDataOutput To Your App

This page is a walkthrough of how to use the Face API with GoogleMVDataOutput with an AVFoundation pipeline to detect eye coordinates within faces in a camera feed. The photo below illustrates the result. We'll show you how to track several faces simultaneously and draw cartoon eyes for each face.

face demo

If you want to follow along with the code, or just want to build and try out the app, build the sample GooglyEyesDemo application by following the instructions on the Getting Started page.

This tutorial will show you how to:

  1. Specify custom camera settings.
  2. Track multiple faces.
  3. Make performance / feature trade-offs.

Creating the Face Detector Pipeline

GoogleMVDataOutput contains multiple instances of AVCaptureDataOutput that extend AVCaptureVideoDataOutput to allow you to integrate face tracking with your AVFoundation video pipeline.

Import the GoogleMobileVision framework to use the detector API and the GoogleMVDataOutput framework to use the video tracking pipeline.

@import GoogleMobileVision;
@import GoogleMVDataOutput;

The code for setting up and executing face tracking is in ViewController.m, which is the main view controller for this app. Typically, the video pipeline and face detector are specified in the viewDidLoad method as shown here:

- (void)viewDidLoad {
  [super viewDidLoad];

  // Setup default camera settings.
  self.session = [[AVCaptureSession alloc] init];
  self.session.sessionPreset = AVCaptureSessionPresetMedium;
  [self setCameraSelection];

Instantiate an AVCaptureSession to coordinate the data flow from the input to the output.

  // Setup the face detector.
  NSDictionary *options = @{
      GMVDetectorFaceTrackingEnabled : @(YES),
      GMVDetectorFaceMode : @(GMVDetectorFaceFastMode),
      GMVDetectorFaceLandmarkType : @(GMVDetectorFaceLandmarkAll),
      GMVDetectorFaceMinSize : @(0.15)
  };
  GMVDetector *faceDetector = [GMVDetector detectorOfType:GMVDetectorTypeFace
                                                  options:options];

Create an associated processor pipeline to receive detection results.

  // Setup the GMVDataOutput with the session.
  self.dataOutput = [[GMVMultiDataOutput alloc] initWithDetector:detector];
  ((GMVMultiDataOutput *)self.dataOutput).multiDataDelegate = self;
  [self.session addOutput:self.dataOutput];

Instantiate an AVCaptureVideoPreviewLayer with the session to display camera feed. In this example code, we have an overlay UIView which sits on top of the main view to replace the eyes with cartoons.

  // Setup camera preview.
  self.previewLayer = [[AVCaptureVideoPreviewLayer alloc] initWithSession:self.session];
  [self.view layer] addSublayer:self.previewLayer];
}

The resulting pipeline looks like this:

Once started, the session will continuously send preview images through the pipeline. The DataOutput class limits the incoming framerate to the rate at which the detector can process them, dropping frames if necessary.

Detector Settings

The Detector component receives images and runs face detection/tracking on the series of images that it receives. In our example, we created the Detector with the following properties:

  • GMVDetectorFaceMode = GMVDetectorFaceFastMode: This indicates that the face detector can use optimizations that favor speed over accuracy. For example, it may skip faces that aren’t facing the camera.

  • GMVDetectorFaceLandmarkType = GMVDetectorFaceLandmarkAll: Returns all landmarks for each face. Each face is processed separately.

  • GMVDetectorFaceClassificationType = GMVDetectorFaceClassificationNone: The demo does not use smile and eyes-open classification. Turning off classifications will speed up the detection.

  • GMVDetectorFaceTrackingEnabled = YES: In this app, tracking is used to maintain a consistent ID for each face. As the face moves, this identity is generally maintained. However, there are a couple of reasons why the ID may change:

    • Sometimes a detection is near the limits of what can be detected for the given settings (face is too small, too close to the edge). In this situation, the ID may change from frame to frame. In the sample application, this will appear as flickering and color changes on face markers.

    • The face becomes obstructed and/or disappears and re-enters the view. Tracking works on a continuous basis, so any period of time in which the detector is not seeing the face will reset the tracking information.

GMVMultiDataOutput

The GMVMultiDataOutput is a component for working with an arbitrary number of detected items (in this case, faces). Its use was shown in the right portion of the earlier diagram:

The FaceDetector may detect multiple faces in each frame. Each face corresponds to a distinct face detection, as specified by the “tracking” setting above. The GMVMultiDataOutput will call its delegate to create a FaceTracker instance for every face detection that it sees.

#pragma mark - GMVMultiDataOutputDelegate
- (id<GMVOutputTrackerDelegate>)dataOutput:(GMVDataOutput *)dataOutput
                         trackerForFeature:(GMVFeature *)feature {
  FaceTracker *tracker = [[FaceTracker alloc] init];
  tracker.delegate = self;
  return tracker;
}

As new faces are encountered, the GMVMultiDataOutput will call its delegate to create a FaceTracker instance that conforms to GMVOutputTrackerDelegate for each face. As those faces move over time, updates are routed to the appropriate face tracker instances. When a face is no longer visible, the GMVMultiDataOutput will dispose of its associated face tracker instance. In this way, we dynamically create/track/destroy an individual face tracker for each face that we encounter in the app.

Below is the implementation of FaceTracker, which holds the state associated with an individual face:

@implementation FaceTracker

#pragma mark - GMVOutputTrackerDelegate

- (void)dataOutput:(GMVDataOutput *)dataOutput detectedFeature:(GMVFeature *)feature {
  self.leftEyeView = [[GooglyEyeView alloc] init];
  self.rightEyeView = [[GooglyEyeView alloc] init];
  [[self.delegate overlayView] addSubview:self.leftEyeView];
  [[self.delegate overlayView] addSubview:self.rightEyeView];
}

- (void)dataOutput:(GMVDataOutput *)dataOutput
  updateFocusingFeature:(GMVFaceFeature *)face
           forResultSet:(NSArray<GMVFaceFeature *> *)features {
  self.leftEyeView.hidden = NO;
  self.rightEyeView.hidden = NO;

  // Update left eye rect.
  [self.leftEyeView updateEyeRect:[self eyeRect:face.leftEyePosition]];
  [self.rightEyeView updateEyeRect:[self eyeRect:face.rightEyePosition]];
}

- (void)dataOutput:(GMVDataOutput *)dataOutput
  updateMissingFeatures:(NSArray<GMVFaceFeature *> *)features {
  self.leftEyeView.hidden = YES;
  self.rightEyeView.hidden = YES;
}

- (void)dataOutputCompletedWithFocusingFeature:(GMVDataOutput *)dataOutput{
  [self.leftEyeView removeFromSuperview];
  [self.rightEyeView removeFromSuperview];
}

@end

Each FaceTracker instance maintains associated GooglyEyeView instances, which are graphics objects created initially when the face is first encountered, updated as the face changes, hidden when the face is temporarily missing, and removed when the face is no longer visible.