tasks-vision package

Classes

Class Description
DrawingUtils Helper class to visualize the result of a MediaPipe Vision task.
FaceDetector Performs face detection on images.
FaceLandmarker Performs face landmarks detection on images.This API expects a pre-trained face landmarker model asset bundle.
FaceStylizer Performs face stylization on images.
FilesetResolver Resolves the files required for the MediaPipe Task APIs.This class verifies whether SIMD is supported in the current environment and loads the SIMD files only if support is detected. The returned filesets require that the Wasm files are published without renaming. If this is not possible, you can invoke the MediaPipe Tasks APIs using a manually created WasmFileset.
GestureRecognizer Performs hand gesture recognition on images.
HandLandmarker Performs hand landmarks detection on images.
HolisticLandmarker Performs holistic landmarks detection on images.
ImageClassifier Performs classification on images.
ImageEmbedder Performs embedding extraction on images.
ImageSegmenter Performs image segmentation on images.
ImageSegmenterResult The output result of ImageSegmenter.
InteractiveSegmenter Performs interactive segmentation on images.Users can represent user interaction through RegionOfInterest, which gives a hint to InteractiveSegmenter to perform segmentation focusing on the given region of interest.The API expects a TFLite model with mandatory TFLite Model Metadata.Input tensor: (kTfLiteUInt8/kTfLiteFloat32) - image input of size [batch x height x width x channels]. - batch inference is not supported (batch is required to be 1). - RGB inputs is supported (channels is required to be 3). - if type is kTfLiteFloat32, NormalizationOptions are required to be attached to the metadata for input normalization. Output tensors: (kTfLiteUInt8/kTfLiteFloat32) - list of segmented masks. - if output_type is CATEGORY_MASK, uint8 Image, Image vector of size 1. - if output_type is CONFIDENCE_MASK, float32 Image list of size channels. - batch is always 1
InteractiveSegmenterResult The output result of InteractiveSegmenter.
MPImage The wrapper class for MediaPipe Image objects.Images are stored as ImageData, ImageBitmap or WebGLTexture objects. You can convert the underlying type to any other type by passing the desired type to getAs...(). As type conversions can be expensive, it is recommended to limit these conversions. You can verify what underlying types are already available by invoking has...().Images that are returned from a MediaPipe Tasks are owned by by the underlying C++ Task. If you need to extend the lifetime of these objects, you can invoke the clone() method. To free up the resources obtained during any clone or type conversion operation, it is important to invoke close() on the MPImage instance.Converting to and from ImageBitmap requires that the MediaPipe task is initialized with an OffscreenCanvas. As we require WebGL2 support, this places some limitations on Browser support as outlined here: https://developer.mozilla.org/en-US/docs/Web/API/OffscreenCanvas/getContext
MPMask The wrapper class for MediaPipe segmentation masks.Masks are stored as Uint8Array, Float32Array or WebGLTexture objects. You can convert the underlying type to any other type by passing the desired type to getAs...(). As type conversions can be expensive, it is recommended to limit these conversions. You can verify what underlying types are already available by invoking has...().Masks that are returned from a MediaPipe Tasks are owned by by the underlying C++ Task. If you need to extend the lifetime of these objects, you can invoke the clone() method. To free up the resources obtained during any clone or type conversion operation, it is important to invoke close() on the MPMask instance.
ObjectDetector Performs object detection on images.
PoseLandmarker Performs pose landmarks detection on images.
PoseLandmarkerResult Represents the pose landmarks deection results generated by PoseLandmarker. Each vector element represents a single pose detected in the image.

Interfaces

Interface Description
BoundingBox An integer bounding box, axis aligned.
Category A classification category.
Classifications Classification results for a given classifier head.
Detection Represents one detection by a detection task.
DetectionResult Detection results of a model.
DrawingOptions Options for customizing the drawing routines
Embedding List of embeddings with an optional timestamp.One and only one of the two 'floatEmbedding' and 'quantizedEmbedding' will contain data, based on whether or not the embedder was configured to perform scalar quantization.
FaceDetectorOptions Options to configure the MediaPipe Face Detector Task
FaceLandmarkerOptions Options to configure the MediaPipe FaceLandmarker Task
FaceLandmarkerResult Represents the face landmarks deection results generated by FaceLandmarker.
FaceStylizerOptions Options to configure the MediaPipe Face Stylizer Task
GestureRecognizerOptions Options to configure the MediaPipe Gesture Recognizer Task
GestureRecognizerResult Represents the gesture recognition results generated by GestureRecognizer.
HandLandmarkerOptions Options to configure the MediaPipe HandLandmarker Task
HandLandmarkerResult Represents the hand landmarks deection results generated by HandLandmarker.
HolisticLandmarkerOptions Options to configure the MediaPipe HolisticLandmarker Task
HolisticLandmarkerResult Represents the holistic landmarks detection results generated by HolisticLandmarker.
ImageClassifierOptions Options to configure the MediaPipe Image Classifier Task.
ImageClassifierResult Classification results of a model.
ImageEmbedderOptions Options for configuring a MediaPipe Image Embedder task.
ImageEmbedderResult Embedding results for a given embedder model.
ImageSegmenterOptions Options to configure the MediaPipe Image Segmenter Task
InteractiveSegmenterOptions Options to configure the MediaPipe Interactive Segmenter Task
Landmark Landmark represents a point in 3D space with x, y, z coordinates. The landmark coordinates are in meters. z represents the landmark depth, and the smaller the value the closer the world landmark is to the camera.
LandmarkData Data that a user can use to specialize drawing options.
NormalizedLandmark Normalized Landmark represents a point in 3D space with x, y, z coordinates. x and y are normalized to [0.0, 1.0] by the image width and height respectively. z represents the landmark depth, and the smaller the value the closer the landmark is to the camera. The magnitude of z uses roughly the same scale as x.
ObjectDetectorOptions Options to configure the MediaPipe Object Detector Task
PoseLandmarkerOptions Options to configure the MediaPipe PoseLandmarker Task
RegionOfInterest A Region-Of-Interest (ROI) to represent a region within an image.

Variables

Variable Description
DEFAULT_CATEGORY_TO_COLOR_MAP A color map with 22 classes. Used in our demos.

Type Aliases

Type Alias Description
Callback A user-defined callback to take input data and map it to a custom output value.
CategoryToColorMap A category to color mapping that uses either a map or an array to assign category indexes to RGBA colors.
FaceStylizerCallback A callback that receives an MPImage object from the face stylizer, or null if no face was detected. The lifetime of the underlying data is limited to the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns (via image.clone()).
HolisticLandmarkerCallback A callback that receives the result from the holistic landmarker detection. The returned result are only valid for the duration of the callback. If asynchronous processing is needed, the masks need to be copied before the callback returns.
ImageSegmenterCallback A callback that receives the computed masks from the image segmenter. The returned data is only valid for the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns.
ImageSource Valid types of image sources which we can run our GraphRunner over.
InteractiveSegmenterCallback A callback that receives the computed masks from the interactive segmenter. The returned data is only valid for the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns.
PoseLandmarkerCallback A callback that receives the result from the pose detector. The returned masks are only valid for the duration of the callback. If asynchronous processing is needed, the masks need to be copied before the callback returns.
RGBAColor A four channel color with values for red, green, blue and alpha respectively.

DEFAULT_CATEGORY_TO_COLOR_MAP

A color map with 22 classes. Used in our demos.

Signature:

DEFAULT_CATEGORY_TO_COLOR_MAP: number[][]

Callback

A user-defined callback to take input data and map it to a custom output value.

Signature:

export declare type Callback<I, O> = (input: I) => O;

CategoryToColorMap

A category to color mapping that uses either a map or an array to assign category indexes to RGBA colors.

Signature:

export declare type CategoryToColorMap = Map<number, RGBAColor> | RGBAColor[];

FaceStylizerCallback

A callback that receives an MPImage object from the face stylizer, or null if no face was detected. The lifetime of the underlying data is limited to the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns (via image.clone()).

Signature:

export declare type FaceStylizerCallback = (image: MPImage | null) => void;

HolisticLandmarkerCallback

A callback that receives the result from the holistic landmarker detection. The returned result are only valid for the duration of the callback. If asynchronous processing is needed, the masks need to be copied before the callback returns.

Signature:

export declare type HolisticLandmarkerCallback = (result: HolisticLandmarkerResult) => void;

ImageSegmenterCallback

A callback that receives the computed masks from the image segmenter. The returned data is only valid for the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns.

Signature:

export declare type ImageSegmenterCallback = (result: ImageSegmenterResult) => void;

ImageSource

Valid types of image sources which we can run our GraphRunner over.

Signature:

export declare type ImageSource = HTMLCanvasElement | HTMLVideoElement | HTMLImageElement | ImageData | ImageBitmap | VideoFrame;

InteractiveSegmenterCallback

A callback that receives the computed masks from the interactive segmenter. The returned data is only valid for the duration of the callback. If asynchronous processing is needed, all data needs to be copied before the callback returns.

Signature:

export declare type InteractiveSegmenterCallback = (result: InteractiveSegmenterResult) => void;

PoseLandmarkerCallback

A callback that receives the result from the pose detector. The returned masks are only valid for the duration of the callback. If asynchronous processing is needed, the masks need to be copied before the callback returns.

Signature:

export declare type PoseLandmarkerCallback = (result: PoseLandmarkerResult) => void;

RGBAColor

A four channel color with values for red, green, blue and alpha respectively.

Signature:

export declare type RGBAColor = [
    number,
    number,
    number,
    number
] | number[];