Face detection concepts

Face detection locates human faces in visual media such as digital images or video. When a face is detected it has an associated position, size, and orientation; and it can be searched for landmarks such as the eyes and nose.

Here are some of the terms that we use regarding the face detection feature of ML Kit:

  • Face tracking extends face detection to video sequences. Any face that appears in a video for any length of time can be tracked from frame to frame. This means a face detected in consecutive video frames can be identified as being the same person. Note that this isn't a form of face recognition; face tracking only makes inferences based on the position and motion of the faces in a video sequence.

  • A landmark is a point of interest within a face. The left eye, right eye, and base of the nose are all examples of landmarks. ML Kit provides the ability to find landmarks on a detected face.

  • A contour is a set of points that follow the shape of a facial feature. ML Kit provides the ability to find the contours of a face.

  • Classification determines whether a certain facial characteristic is present. For example, a face can be classified by whether its eyes are open or closed, or if the face is smiling or not.

Face orientation

The following terms describe the angle a face is oriented with respect to the camera:

  • Euler X: A face with a positive Euler X angle is facing upward.
  • Euler Y: A face with a positive Euler Y angle is looking to the right of the camera, or looking to the left if negative.
  • Euler Z: A face with a positive Euler Z angle is rotated counter-clockwise relative to the camera.

ML Kit doesn't report the Euler X, Euler Y or Euler Z angle of a detected face when LANDMARK_MODE_NONE, CONTOUR_MODE_ALL, CLASSIFICATION_MODE_NONE and PERFORMANCE_MODE_FASTare set together.

Landmarks

A landmark is a point of interest within a face. The left eye, right eye, and nose base are all examples of landmarks.

ML Kit detects faces without looking for landmarks. Landmark detection is an optional step that is disabled by default.

The following table summarizes all of the landmarks that can be detected given the Euler Y angle of an associated face:

Euler Y angle Detectable landmarks
< -36 degrees left eye, left mouth, left ear, nose base, left cheek
-36 degrees to -12 degrees left mouth, nose base, bottom mouth, right eye, left eye, left cheek, left ear tip
-12 degrees to 12 degrees right eye, left eye, nose base, left cheek, right cheek, left mouth, right mouth, bottom mouth
12 degrees to 36 degrees right mouth, nose base, bottom mouth, left eye, right eye, right cheek, right ear tip
> 36 degrees right eye, right mouth, right ear, nose base, right cheek

Each detected landmark includes its associated position in the image.

Contours

A contour is a set of points that represent the shape of a facial feature. The following image illustrates how these points map to a face. Click the image to enlarge it:

Each feature contour that ML Kit detects is represented by a fixed number of points:

Face oval 36 points Upper lip (top) 11 points
Left eyebrow (top) 5 points Upper lip (bottom) 9 points
Left eyebrow (bottom) 5 points Lower lip (top) 9 points
Right eyebrow (top) 5 points Lower lip (bottom) 9 points
Right eyebrow (bottom) 5 points Nose bridge 2 points
Left eye 16 points Nose bottom 3 points
Right eye 16 points
Left cheek (center) 1 point
Right cheek (center) 1 points

When you get all of a face's contours at once, you get an array of 133 points, which map to feature contours as shown below:

Indexes of feature contours
0-35 Face oval
36-40 Left eyebrow (top)
41-45 Left eyebrow (bottom)
46-50 Right eyebrow (top)
51-55 Right eyebrow (bottom)
56-71 Left eye
72-87 Right eye
88-96 Upper lip (bottom)
97-105 Lower lip (top)
106-116 Upper lip (top)
117-125 Lower lip (bottom)
126, 127 Nose bridge
128-130 Nose bottom (note that the center point is at index 128)
131 Left cheek (center)
132 Right cheek (center)

Classification

Classification determines whether a certain facial characteristic is present. ML Kit currently supports two classifications: eyes open and smiling.

Classification is a certainty value. It indicates the confidence that a facial characteristic is present. For example, a value of 0.7 or more for the smiling classification indicates that it's likely that a person is smiling.

Both of these classifications rely upon landmark detection.

Also note that the classifications "eyes open" and "smiling" only work for frontal faces, i.e., faces with a small Euler Y angle (between -18 and 18 degrees).

Minimum Face Size

The minimum face size is the desired face size, expressed as the ratio of the width of the head to the width of the image. For example, the value of 0.1 means that the smallest face to search for is roughly 10% of the width of the image being searched.

The minimum face size is a performance vs. accuracy trade-off: setting the minimum size smaller lets the detector find smaller faces but detection will take longer; setting it larger might exclude smaller faces but will run faster.

The minimum face size is not a hard limit; the detector may find faces slightly smaller than specified.

Next Steps

Use face detection in your iOS or Android app: