This is an end-to-end LiteRT.js guide covering the process of converting a PyTorch model to run in the browser, leveraging backends including WebGPU and WebNN. This example uses ResNet18 for the vision model, and TensorFlow.js for pre- and post-processing.
The guide will cover the following steps:
- Convert your PyTorch model to LiteRT using LiteRT Torch. 1. Add the LiteRT package to your web app.
- Load the model.
- Write pre- and post-processing logic.
Convert to LiteRT
Use the PyTorch Converter
notebook
to convert a PyTorch model to the appropriate .tflite format. For an in-depth
guide on the types of errors you may encounter and how to fix them, see the AI
Edge Torch Converter
README.
Your model must be compatible with
torch.export.export, which
means it must be exportable with TorchDynamo. Therefore, it must not have any
python conditional branches that depend on the runtime values within tensors. If
you see the following errors during
torch.export.export,
your model is not exportable with torch.export.export. Your model also must
not have any dynamic input or output dimensions on its tensors. This includes
batch dimension.
You can also start with a TensorRT-compatible or ONNX-exportable PyTorch model:
A TensorRT-compatible version of a model can be a good starting point, since some types of TensorRT conversions also require models to be TorchDynamo exportable. If you use any NVIDIA / CUDA ops in the model, you will need to replace them with standard PyTorch ops.
An ONNX-exportable PyTorch model can be a good starting point, though some ONNX models use TorchScript instead of TorchDynamo to export, in which case the model may not be TorchDynamo-exportable (although it's likely closer than the original model code).
For more information, see Convert PyTorch models to LiteRT.
Add the LiteRT package
You can add LiteRT.js to your project using NPM (npm install @litertjs/core),
or by referencing it directly from a CDN like JSDelivr. The following
examples use JSDelivr for ease of copy-pasting.
To get started, import LiteRT.js and load its Wasm files:
import {loadLiteRt} from 'https://cdn.jsdelivr.net/npm/@litertjs/core/+esm';
// Load the LiteRT.js Wasm files from a CDN.
await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/')
// Alternatively, host them from your server.
// They are located in node_modules/@litertjs/core/wasm/
await loadLiteRt(`your/path/to/wasm/`);
The loadLiteRt function loads the LiteRT WebAssembly (Wasm) module and
supporting files. Based on your browser environment and the options you
provide, loadLiteRt loads one of several different builds to support
different features. For example, to use WebNN you must load LiteRT with JSPI
enabled, as described in Load with WebNN acceleration.
Platform Requirements
Depending on the hardware accelerator you plan to use, your browser environment must meet specific conditions.
WebGPU Requirements
WebGPU enables generic graphics acceleration on any system with a GPU.
- Browser Support:
- Chrome and Microsoft Edge (113+)
- Safari (17.4+)
- Firefox (121+, partial support)
- Hardware: A system with a discrete or integrated GPU.
WebNN Requirements
WebNN targets dedicated Neural Processing Units (NPUs) or system-level ML frameworks. Node that WebNN is still experimental, requiring strict browser configurations, and has not yet been made generally available by any browser.
- Browser Support:
- Experimental support in Chromium-based (121+) browsers (Chrome, Edge).
- Browser Activation (Flags):
- Enable
#web-machine-learning-neural-networkinchrome://flags. - Enable JavaScript Promise Integration (JSPI) using
#enable-experimental-webassembly-featuresor similar V8 flags.
- Enable
- OS Architecture Support:
- Windows: Requires DirectML-supported DirectX 12 hardware.
- MacOS: Requires Apple Silicon
- Linux: Requires OpenVINO configurations.
- Architecture Driver Limits: Ensure vendor-specific NPU drivers are installed.
- JSPI: LiteRT.js requires JSPI to bridge synchronous kernel scheduling
with asynchronous WebNN device polling. You must load LiteRT.js with JSPI when
using WebNN (
await loadLiteRt('...', {jspi: true});).
Load the model
Import and initialize LiteRT.js and the LiteRT-TFJS conversion utilities. You may also want to import TensorFlow.js to perform pre-post processing of tensors passed to or from LiteRT.js.
import {loadLiteRt, getWebGpuDevice} from 'https://cdn.jsdelivr.net/npm/@litertjs/core@2.5.0/+esm';
import {runWithTfjsTensors} from 'https://cdn.jsdelivr.net/npm/@litertjs/tfjs-interop@2.5.0/+esm';
// TensorFlow.js imports
import * as tf from 'https://cdn.jsdelivr.net/npm/@tensorflow/tfjs/+esm';
import {WebGPUBackend} from 'https://cdn.jsdelivr.net/npm/@tensorflow/tfjs-backend-webgpu/+esm';
async function main() {
// Initialize TensorFlow.js WebGPU backend
await tf.setBackend('webgpu');
// Initialize LiteRT.js's Wasm files
await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/');
// Make TFJS use the same GPU device as LiteRT.js (for tensor conversion)
const device = getWebGpuDevice();
tf.removeBackend('webgpu');
tf.registerBackend('webgpu', () => new WebGPUBackend(device, device.adapterInfo));
await tf.setBackend('webgpu');
// ...
}
main();
Load with WebGPU acceleration
Load the converted LiteRT model targeting Generic WebGPU graphics evaluators. This is standard for modern fast inference:
import {loadLiteRt, loadAndCompile} from 'https://cdn.jsdelivr.net/npm/@litertjs/core/+esm';
await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/');
const model = await loadAndCompile('path_to_model.tflite', {
accelerator: 'webgpu',
});
Load with WebNN acceleration (dedicated NPUs)
Load the converted LiteRT model leveraging dedicated hardware.
Ensure JSPI is enabled in loadLiteRt:
import {loadLiteRt, loadAndCompile} from 'https://cdn.jsdelivr.net/npm/@litertjs/core/+esm';
// Ensure JSPI is passed true to bridge asynchronous WebNN drivers
await loadLiteRt('https://cdn.jsdelivr.net/npm/@litertjs/core/wasm/', {jspi: true});
const model = await loadAndCompile('path_to_model.tflite', {
accelerator: 'webnn', // Or ['webnn', 'wasm'] for CPU fallback
webNNOptions: {devicePreference: 'npu'} // Targets dedicated neural silicon
});
Write the model pipeline
Write the pre- and post-processing logic that connects the model to your app.
Using TensorFlow.js for pre- and post-processing is recommended, but if it is
not written in TensorFlow.js, you can call await
tensor.data to get the
value as an ArrayBuffer or await
tensor.array to get a
structured JS array.
The following is an example end-to-end pipeline for ResNet18:
// Wrap in a tf.tidy call to automatically clean up intermediate TensorFlow.js tensors.
// (Note: tidy only supports synchronous functions).
const imageData = tf.tidy(() => {
// Get RGB data values from an image element and convert it to range [0, 1).
const image = tf.browser.fromPixels(dogs, 3).div(255);
// These preprocessing steps come from https://github.com/pytorch/vision/blob/main/torchvision/models/resnet.py#L315
// The mean and standard deviation for the image normalization come from https://github.com/pytorch/vision/blob/main/torchvision/transforms/_presets.py#L38
return image.resizeBilinear([224, 224])
.sub([0.485, 0.456, 0.406])
.div([0.229, 0.224, 0.225])
.reshape([1, 224, 224, 3])
.transpose([0, 3, 1, 2]);
});
// Run the model
const outputs = await runWithTfjsTensors(model, [imageData]);
const probabilities = outputs[0];
// Get the top five classes.
const top5 = tf.topk(probabilities, 5);
const values = await top5.values.data();
const indices = await top5.indices.data();
// Clean up TFJS tensors
tf.dispose(outputs);
tf.dispose(top5);
tf.dispose(imageData);
// Print the top five classes.
const classes = ... // Class names are loaded from a JSON file in the demo.
for (let i = 0; i < 5; ++i) {
const text = `${classes[indices[i]]}: ${values[i]}`;
console.log(text);
}
Testing and troubleshooting
Refer to the following sections on ways to test your application and handle errors.
Testing with fake inputs
After loading a model, it's a good idea to test the model with fake inputs first. This will catch any runtime errors before you spend the time writing the pre and post processing logic for your model pipeline. To check this, you can use the LiteRT.js Model Tester or test it manually.
LiteRT.js Model Tester
The LiteRT.js Model Tester runs your model on WebNN, WebGPU and CPU using random inputs to verify that the model runs correctly. It checks Whether the graph can be executed on the specialized WebGPU or WebNN backends, and if so, benchmarks the model over a configurable number of runs.
To run the LiteRT.js Model Tester, run npm i @litertjs/model-tester and then
npx model-tester. It will open a browser tab for you to run your model.
Manual model testing
If you prefer to manually test the model instead of using the LiteRT.js model
tester (@litertjs/model-tester), you can generate fake inputs and run the
model with runWithTfjsTensors.
To generate fake inputs, you need to know the names and shapes of the input
tensors. These can be found with LiteRT.js by calling model.getInputDetails or
model.getOutputDetails. Alternatively, use Model
Explorer.
Once you know the input and output shapes and names, you can test the model with a fake input. This gives some confidence that the model will run before you write the rest of the machine learning pipeline. This would test that all model operations are supported. For example:
// Imports, initialization, and model loading...
// Create fake inputs for the model
const fakeInputs = model.getInputDetails().map(
({shape, dtype}) => tf.ones(shape, dtype));
// Run the model
const outputs = await runWithTfjsTensors(model, fakeInputs);
console.log(outputs);
Error types
Some LiteRT models may not be supported by LiteRT.js. Errors usually fall into these categories:
- Shape Mismatch: A known bug that only affects GPU.
- Operation Not Supported: The semantic runtime doesn't contain a mapped implementation for an operation present in the requested model topology. Hardware backends (like WebGPU or WebNN) exhibit disparate coverage as compared to CPU, so falling back to 'wasm' is often an adequate solution.
- Unsupported Tensor Type: Specifically constrained to robust computation pathways, LiteRT.js exclusively bounds int32 and float32 buffer layouts at the tensor evaluation stage. (i.e. I/O to the model must be 4-byte aligned)
- Model Too Large: Suboptimal buffer alignments or raw artifact sizes occasionally exceed the WASM memory limit present in certain environments, solutions for WebGPU / WebNN are underway.
Operation Not Supported
This indicates that the backend being used does not support one of the operations in the model. You will need to rewrite the original PyTorch model to avoid this op and re-convert it, or you may be able to run the model on CPU.
In the case of BROADCAST_TO, this may be solved by making the batch dimension
the same for every input tensor to the model. Other cases may be more
complicated.
Unsupported Tensor Type
LiteRT.js only supports int32 and float32 tensors for the model's inputs and outputs.
Model Too Large
This usually appears as a call to Aborted() or a memory allocation failure at
model-loading time. LiteRT.js is limited in the size of models it can load, so
if you're seeing this, your model may be too large. You can try quantizing the
weights with the
ai-edge-quantizer, but
keep computations at float32 or float16, and model inputs and outputs as float32
or int32.