Face stylizer customization guide

Colab logo Run in Colab GitHub logo View on GitHub

License information

The MediaPipe face stylizer solution provides several models you can use immediately to transform the face to the styles including (cartoon, oil painting, etc.) in your application. However, if you need to transfer the face to an unseen style not covered by the provided models, you can customize the pretrained model with your own data and MediaPipe Model Maker. This model modification tool fine-tune a portion of the model using data you provide. This method is faster than training a new model from scatch and can produce a model adapt to your specific application.

The following sections show you how to use Model Maker to retrain a pre-built model for face stylization with your own data, which you can then use with the MediaPipe Face Stylizer. The example retrains the face stylizer model to adapt to the provided cartoon style.

The MediaPipe Model Maker package is a low-code solution for customizing on-device machine learning (ML) Models.

This notebook shows the end-to-end process of customizing a face stylizer model for transferring the style presented by a stylized face to a raw human face.

Setup

This section describes key steps for setting up your development environment to retrain a model. These instructions describe how to update a model using Google Colab, and you can also use Python in your own development environment. For general information on setting up your development environment for using MediaPipe, including platform version requirements, see the Setup guide for Python.

To install the libraries for customizing a model, run the following commands:

pip install mediapipe-model-maker

Use the following code to import the required Python classes:

from google.colab import files
import os
import tensorflow as tf
assert tf.__version__.startswith('2')

from mediapipe_model_maker import face_stylizer
from mediapipe_model_maker import image_utils

import matplotlib.pyplot as plt

Prepare and review data

Retraining the face stylizer model requires user to provide a single stylized face image. The stylized face is expected to be forward facing with visible left right eyes and mouth. The face should only have minor rotation, i.e. less than 30 degress around the yaw, pitch, and roll axes.

The following example shows how to provide a stylized face image and visualize the image.

style_image_path = 'color_sketch.jpg'
!wget -q -O {style_image_path} https://storage.googleapis.com/mediapipe-assets/face_stylizer_style_color_sketch.jpg
# @title Visualize the training image
import cv2
from google.colab.patches import cv2_imshow

style_image_tensor = image_utils.load_image(style_image_path)
style_cv_image = cv2.cvtColor(style_image_tensor.numpy(), cv2.COLOR_RGB2BGR)
cv2_imshow(style_cv_image)

Create dataset

We need to create a dataset to wrap the single training image for training the face stylizer model

To create a dataset, use the Dataset.from_image method to load the image located at style_image_path. The face stylizer only support single style image.

data = face_stylizer.Dataset.from_image(filename=style_image_path)

Retrain model

Once you have completed preparing your data, you can begin retraining the face stylizer model to adapt to the new style. This type of model modification is called transfer learning. The instructions below use the data prepared in the previous section to retrain a face stylizer model to apply cartoon style to the raw human face.

Set retraining options

There are a few required settings to run a retraining aside from your training dataset:

  1. Model architecture: use the SupportedModels class to specify the model architecture. The only supported architecture is face_stylizer.SupportedModels.BLAZE_FACE_STYLIZER_256.
  2. Swap layers: this parameter is used to determine how to mix the latent code layers between the learned style and the raw face images. The latent code is represented as a tensor of shape [1, 12, 512]. The second dimension of the latent code tensor is called the layer. The face stylizer mixes the learned style and raw face images by generating a weighted sum of the two latent codes on the swap layers. The swap layers are therefore integers within [1, 12]. The more layers are set, the more style will be applied to the output image. Although there is no explicit mapping between the style semantics and the layer index, the shallow layers, e.g. 8, 9, represent the global features of the face, while the deep layers, e.g. 10, 11, represent the fine-grained features of the face. The output stylized image is sensitive to the setting of swap layers. By default, it is set to [8, 9, 10, 11]. Users can set it through ModelOptions.
  3. Learning rate and epochs: Use HParams object learning_rate and epoch to specify the these two hyperparameters. learning_rate is set to 4e-4 by default. epochs defines the number of iterations to fine-tune the BlazeStyleGAN model and are set to 100 by default. The lower the learning rate is, the greater the epochs is expected to retrain the model to converge.
  4. Batch size: Use HParams object batch_size to specify it. The batch size is used to define the number of latent code samples we sample around the latent code extracted by the encoder with the input image. The batch of latent codes are used to fine-tune the decoder. The greater the batch size usually yield to better performance. It is also limited by the hardware memory. For A100 GPU, the maximum batch size is 8. For P100 and T4 GPU, the maximum batch size is 2.

To set the required parameters, use the following code:

face_stylizer_options = face_stylizer.FaceStylizerOptions(
  model=face_stylizer.SupportedModels.BLAZE_FACE_STYLIZER_256,
  model_options=face_stylizer.ModelOptions(swap_layers=[10,11]),
  hparams=face_stylizer.HParams(
      learning_rate=8e-4, epochs=200, batch_size=2, export_dir="exported_model"
  )
)

Run retraining

With your training dataset and retraining options prepared, you are ready to start the retraining process. This process requires running on GPU and can take a few minutes to a few hours depending on your available compute resources. Using a Google Colab environment with GPU runtime, the example retraining below takes about 2 minutes.

To begin the retraining process, use the create() method with dataset and options you previously defined:

face_stylizer_model = face_stylizer.FaceStylizer.create(
  train_data=data, options=face_stylizer_options
)

Evaluate performance

After retraining the model, you can do a subjective evaluation on the reconstruction results of the input style image. If the main style features and human face in the style images are well reconstructed, then the model is defined as converged to the style and can be applied to other raw face images. If the style image cannot be reconstructed or significant artifacts are observed in the reconstructed style image, then the input style might not be suitable for the face stylizer model.

To run an evaluation of the example model, run it against the input style image as below:

print('Input style image')
resized_style_cv_image = cv2.resize(style_cv_image, (256, 256))
cv2_imshow(resized_style_cv_image)

eval_output = face_stylizer_model.stylize(data)
eval_output_data = eval_output.gen_tf_dataset()
iterator = iter(eval_output_data)

reconstruct_style_image = (tf.squeeze(iterator.get_next()).numpy())
test_output_image = cv2.cvtColor(output_image, cv2.COLOR_RGB2BGR)
print('\nReconstructed style image')
cv2_imshow(test_output_image)

Export model

After retraining a model, you must export it to Tensorflow Lite model format to use it with the MediaPipe in your application. The export process generates required model metadata, as well as a classification label file.

To export the retrained model for use in your application, use the following command:

face_stylizer_model.export_model()

Use the follow commands with Google Colab to list model and download it to your development environment. The face_stylizer.task file is a bundle of three TFLite models which are required to run the face stylizer task library.

!ls exported_model
files.download('exported_model/face_stylizer.task')

Model tuning

You can use the MediaPipe Model Maker tool to further improve and adjust the model retraining with configuation options and performance techniques such as data quantization. These steps are optional. Model Maker uses reasonable default settings for all of the training configuration parameters, but if you want to further tune the model retraining, the instructions below describe the available options.

Retraining parameters

You can further customize how the retraining process runs to adjust training time and potentially increase the retrained model's performance. These parameters are optional. Use the FaceStylizerOptions class and the HParams class to set these additional options.

Use the FaceStylizerModelOptions class parameters to customize the existing model. It has the following customizable parameter that affects model accuracy:

  • swap_layers: A set of layer index between 0 and the maximum layer index of the latent code layers. Since the latent code dimensiono of the default BLAZE_FACE_STYLIZER_256 is 12, the swap_layers must be between 0 and 11. This hyperparameters defines the layers that the style latent code is applied to be interpolated with the raw face feature. The more layers choose, the more style features will be applied. There is no explicit mapping between the layers index and semantic features. The users are expected to fine-tune these swap layers for subjective evaluation to determine the best options. The order does not matter.
  • alpha: Weighting coefficient of style latent for swapping layer interpolation. Its valid range is [0, 1]. The greater weight means stronger style is applied to the output image. Expect to set it to a small value, i.e. < 0.1.
  • perception_loss_weight: Weighting coefficients of image perception quality loss. It contains three coefficients, l1, content, and style which control the difference between the generated image and raw input image, the content difference between generated face and raw input face, and the how similar the style between the generated image and raw input image. Users can increase the style weight to enforce stronger style or the content weight to reserve more raw input face details.
  • adv_loss_weight: Weighting coeffcieint of adversarial loss versus image perceptual quality loss. This hyperparameter is used to control the realism of the generated image. It expects a small value, i.e. < 0.2.

Use the HParams class to customize other parameters related to training and saving the model:

  • learning_rate: The learning rate to use for gradient descent training. Defaults to 8e-4.
  • batch_size: Batch size for training. Defaults to 4.
  • epochs: Number of training iterations over the dataset. Defaults to 100.
  • beta_1: beta_1 used in tf.keras.optimizers.Adam. Defaults to 0.0.
  • beta_2: beta_2 used in tf.keras.optimizers.Adam. Defaults to 0.99.