Image segmentation guide for web

The MediaPipe Image Segmenter task lets you divide images into regions based on predefined categories for applying visual effects such as background blurring. These instructions show you how to use the Image Segmenter for Node and web apps. For more information about the capabilities, models, and configuration options of this task, see the Overview.

Code example

The example code for Image Segmenter provides a complete implementation of this task in JavaScript for your reference. This code helps you test this task and get started on building your own image segmentation app. You can view, run, and edit the Image Segmenter example code using just your web browser. You can also review the code for this example on GitHub.

Setup

This section describes key steps for setting up your development environment and code projects specifically to use Image Segmenter. For general information on setting up your development environment for using MediaPipe tasks, including platform version requirements, see the Setup guide for web.

JavaScript packages

Image Segmenter code is available through the MediaPipe @mediapipe/tasks-vision NPM package. You can find and download these libraries from links provided in the platform Setup guide.

You can install the required packages with the following code for local staging using the following command:

npm install --save @mediapipe/tasks-vision

If you want to import the task code via a content delivery network (CDN) service, add the following code in the tag in your HTML file:

<head>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/vision_bundle.js"
    crossorigin="anonymous"></script>
</head>

Model

The MediaPipe Image Segmenter task requires a trained model that is compatible with this task. For more information on available trained models for Image Segmenter, see the task overview Models section.

Select and download a model, and then store it within your project directory:

<dev-project-root>/app/shared/models/

Create the task

Use one of the Image Segmenter createFrom...() functions to prepare the task for running inferences. Use the createFromModelPath() function with a relative or absolute path to the trained model file. If your model is already loaded into memory, you can use the createFromModelBuffer() method.

The code example below demonstrates using the createFromOptions() function to set up the task. The createFromOptions function allows you to customize the Image Segmenter with configuration options. For more information on task configuration, see Configuration options.

The following code demonstrates how to build and configure the task with custom options:

runningMode = "IMAGE";

async function createImageSegmenter() {
  const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );

  imageSegmenter = await ImageSegmenter.createFromOptions(vision, {
    baseOptions: {
      modelAssetPath:
        "https://storage.googleapis.com/mediapipe-assets/deeplabv3.tflite?generation=1661875711618421",
    },
    outputCategoryMask: true,
    outputConfidenceMasks: false
    runningMode: runningMode
  });
}
createImageSegmenter();

For a more complete implementation of creating an Image Segmenter task, see the code example.

Configuration options

This task has the following configuration options for Web applications:

Option Name Description Value Range Default Value
runningMode Sets the running mode for the task. There are two modes:

IMAGE: The mode for single image inputs.

VIDEO: The mode for decoded frames of a video or on a livestream of input data, such as from a camera.
{IMAGE, VIDEO} IMAGE
outputCategoryMask If set to True, the output includes a segmentation mask as a uint8 image, where each pixel value indicates the winning category value. {True, False} False
outputConfidenceMasks If set to True, the output includes a segmentation mask as a float value image, where each float value represents the confidence score map of the category. {True, False} True
displayNamesLocale Sets the language of labels to use for display names provided in the metadata of the task's model, if available. Default is en for English. You can add localized labels to the metadata of a custom model using the TensorFlow Lite Metadata Writer API Locale code en

Prepare data

Image Segmenter can segment objects in images in any format supported by the host browser. The task also handles data input preprocessing, including resizing, rotation and value normalization.

Calls to the Image Segmenter segment() and segmentForVideo() methods run synchronously and block the user interface thread. If you segment objects in video frames from a device's camera, each segmentation task blocks the main thread. You can prevent this by implementing web workers to run segment() and segmentForVideo() on another thread.

Run the task

The Image Segmenter uses the segment() method with image mode and the segmentForVideo() method with video mode to trigger inferences. The Image Segmenter returns the detected segments as image data to a callback function you set when running an inference for the task.

The following code demonstrates how to execute processing with the task model:

Image

const image = document.getElementById("image") as HTMLImageElement;
imageSegmenter.segment(image, callback);
  

Video

async function renderLoop(): void {
  const video = document.getElementById("video");
  let startTimeMs = performance.now();

  imageSegmenter.segmentForVideo(video, startTimeMs, callbackForVideo);

  requestAnimationFrame(() => {
    renderLoop();
  });
}

For a more complete implementation of running an Image Segmenter task, see the code example.

Handle and display results

Upon running inference, the Image Segmenter task returns segment image data to a callback function. The content of the output depends on the outputType you set when you configured the task.

The following sections show examples of the output data from this task:

Category confidence

The following images show a visualization of the task output for a category confidence mask. The confidence mask output contains float values between [0, 1].

Original image and category confidence mask output. Source image from the Pascal VOC 2012 dataset.

Category value

The following images show a visualization of the task output for a category value mask. The category mask range is [0, 255] and each pixel value represents the winning category index of the model output. The winning category index is has the highest score among the categories the model can recognize.

Original image and category mask output. Source image from the Pascal VOC 2012 dataset.

The Image Segmenter example code demonstrates how to display the segmentation results returned from the task, see the code example for details.