Interactive image segmentation guide for web

The MediaPipe Interactive Image Segmenter task takes a location in an image, estimates the boundaries of an object at that location, and returns the segmentation for the object as image data. These instructions show you how to use the Interactive Image Segmenter for Node and web apps. For more information about the capabilities, models, and configuration options of this task, see the Overview.

Code example

The example code for Interactive Image Segmenter provides a complete implementation of this task in JavaScript for your reference. This code helps you test this task and get started on building your own interactive image segmentation app. You can view, run, and edit the Interactive Image Segmenter example code using just your web browser. You can also review the code for this example on GitHub.

Setup

This section describes key steps for setting up your development environment and code projects specifically to use Interactive Image Segmenter. For general information on setting up your development environment for using MediaPipe tasks, including platform version requirements, see the Setup guide for web.

JavaScript packages

Interactive Image Segmenter code is available through the MediaPipe @mediapipe/tasks-vision NPM package. You can find and download these libraries from links provided in the platform Setup guide.

You can install the required packages with the following code for local staging using the following command:

npm install --save @mediapipe/tasks-vision

If you want to import the task code via a content delivery network (CDN) service, add the following code in the tag in your HTML file:

<head>
  <script src="https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision/vision_bundle.js"
    crossorigin="anonymous"></script>
</head>

Model

The MediaPipe Interactive Image Segmenter task requires a trained model that is compatible with this task. For more information on available trained models for Interactive Image Segmenter, see the task overview Models section.

Select and download a model, and then store it within your project directory:

<dev-project-root>/app/shared/models/

Create the task

Use one of the Interactive Image Segmenter createFrom...() functions to prepare the task for running inferences. Use the createFromModelPath() function with a relative or absolute path to the trained model file. If your model is already loaded into memory, you can use the createFromModelBuffer() method.

The code example below demonstrates using the createFromOptions() function to set up the task. The createFromOptions function allows you to customize the Interactive Image Segmenter with configuration options. For more information on configuration options, see Configuration options.

The following code demonstrates how to build and configure the task with custom options:

async function createSegmenter() {
  const vision = await FilesetResolver.forVisionTasks(
    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm"
  );

  interactiveSegmenter = await InteractiveSegmenter.createFromOptions(vision, {
    baseOptions: {
      modelAssetPath:
        "https://storage.googleapis.com/mediapipe-tasks/interactive_segmenter/ptm_512_hdt_ptm_woid.tflite"
    },
  });
}
createSegmenter();

Configuration options

This task has the following configuration options for Web applications:

Option Name Description Value Range Default Value
outputCategoryMask If set to True, the output includes a segmentation mask as a uint8 image, where each pixel value indicates if the pixel is part of the object located at the area of interest. {True, False} False
outputConfidenceMasks If set to True, the output includes a segmentation mask as a float value image, where each float value represents the confidence that the pixel is part of the object located at the area of interest. {True, False} True
displayNamesLocale Sets the language of labels to use for display names provided in the metadata of the task's model, if available. Default is en for English. You can add localized labels to the metadata of a custom model using the TensorFlow Lite Metadata Writer API Locale code en

Prepare data

Interactive Image Segmenter can segment objects in images in any format supported by the host browser. The task also handles data input preprocessing, including resizing, rotation and value normalization.

Calls to the Interactive Image Segmenter segment() and segmentForVideo() methods run synchronously and block the user interface thread. If you segment objects in video frames from a device's camera, each segmentation task blocks the main thread. You can prevent this by implementing web workers to run segment() and segmentForVideo() on another thread.

Run the task

The Interactive Image Segmenter uses the segment() method to trigger inferences. The Interactive Image Segmenter returns the detected segments as image data to a callback function you set when running an inference for the task.

The following code demonstrates how to execute processing with the task model:

const image = document.getElementById("image") as HTMLImageElement;
interactiveSegmenter.segment(
  image,
  {
    keypoint: {
      x: event.offsetX / event.target.width,
      y: event.offsetY / event.target.height
    }
  },
  callback);

For a more complete implementation of running an Interactive Image Segmenter task, see the code example.

Handle and display results

Upon running inference, the Interactive Image Segmenter task returns segment image data to a callback function. The content of the output is image data and may include a category mask, confidence masks, or both, depending on what you set when you configured the task.

The following sections further explain the output data from this task:

Category mask

The following images show a visualization of the task output for a category value mask with a point area of interest indicated. Each pixel is a uint8 value indicating if the pixel is part of the object located at the area of interest. The black and white circle on the second image indicates the selected area of interest.

Original image and category mask output. Source image from the Pascal VOC 2012 dataset.

Confidence mask

The output for a confidence mask contains float values between [0, 1] for each image input channel. Higher values indicate a higher confidence that the image pixel is part of the object located at the area of interest.

The Interactive Image Segmenter example code demonstrates how to display the classification results returned from the task, see the code example for details.