Pose Detection

The ML Kit Pose Detection API is a lightweight versatile solution for app developers to detect the pose of a subject's body in real time from a continuous video or static image. A pose describes the body's position at one moment in time with a set of x,y skeletal landmark points. The landmarks correspond to different body parts such as the shoulders and hips. The relative positions of landmarks can be used to distinguish one pose from another.

iOS Android

ML Kit Pose Detection produces a full-body 33 point skeletal match that includes facial landmarks (ears, eyes, mouth, and nose) and points on the hands and feet. Figure 1 below shows the landmarks looking through the camera at the user, so it's a mirror image. The user's right side appears on the left of the image:

Figure 1.Landmarks

ML Kit Pose Detection doesn't require specialized equipment or ML expertise in order to achieve great results. With this technology developers can create one of a kind experiences for their users with only a few lines of code.

The user's face must be present in order to detect a pose. Pose detection works best when the subject’s entire body is visible in the frame, but it also detects a partial body pose. In that case the landmarks that are not recognized are assigned coordinates outside of the image.

Key capabilities

  • Cross-platform support Enjoy the same experience on both Android and iOS.
  • Full body tracking The model returns 33 key skeletal points, including the positions of the hands and feet.
  • Per point InFrameLikelihood A per point confidence that a landmark is within the frame of the image. A low likelihood suggests that a landmark is outside the image frame, even if its coordinates are within the image bounds.
  • Two operating models The fast model runs real time on modern phones like the Pixel 4 and iPhoneXS. It returns results at the rate of ~30 and ~45+ fps respectively but the precision of the x,y coordinates may vary. The accurate model returns results at a slower framerate, but produces more accurate x,y values. Both models are bundled together and take up about 27Mbs. A smaller model will be available in a forthcoming release.

The Pose Detection API is similar to the Facial Recognition API in that it returns a set of landmarks and their location. However, while Face Detection also tries to recognize features such as a smiling mouth or open eyes, Pose Detection does not attach any meaning to the landmarks in a pose or the pose itself. You can create your own algorithms to interpret a pose. See Recognizing poses for some examples.

Pose detection can only detect one person in an image. If two people are in the image, the model will assign landmarks to the person detected with the highest confidence.

Sample results

The following table shows the coordinates and InFrameLikelihood for the first few landmarks in the pose to the right.

LandmarkTypePositionInFrameLikelihood
0NOSE(506.89526, 232.98584)0.9999994
1 LEFT_EYE_INNER (524.4051, 212.25941) 0.9999995
2 LEFT_EYE (535.0669, 212.34688) 0.99999946
3 LEFT_EYE_OUTER (545.6083, 212.80696) 0.9999991
4 RIGHT_EYE_INNER (495.92175, 210.01476) 0.9999995
5 RIGHT_EYE (485.4411, 209.35492) 0.99999946
6 RIGHT_EYE_OUTER (475.10806, 208.29173) 0.9999988
7 LEFT_EAR (560.1211, 223.12553) 0.9999993

Under the hood

For more implementation details on the underlying ML models for this API, check out our Google AI blog post.

For more information on how the models were trained and our ML fairness practices, check out our model cards: