Attention: This MediaPipe Solutions Preview is an early release. Learn more

LLM Inference guide for iOS

The LLM Inference API lets you run large language models (LLMs) completely on-device for iOS applications, which you can use to perform a wide range of tasks, such as generating text, retrieving information in natural language form, and summarizing documents. The task provides built-in support for multiple text-to-text large language models, so you can apply the latest on-device generative AI models to your iOS apps.

You can see this task in action with the MediaPipe Studio demo. For more information about the capabilities, models, and configuration options of this task, see the Overview.

Code example

The MediaPipe Tasks example code is a basic implementation of an LLM Inference API app for iOS. You can use the app as a starting point for your own iOS app, or refer to it when modifying an existing app. The LLM Inference API example code is hosted on GitHub.

Download the code

The following instructions show you how to create a local copy of the example code using the git command line tool.

To download the example code:

Clone the git repository using the following command:

git clone https://github.com/googlesamples/mediapipe

Optionally, configure your git instance to use sparse checkout, so you have only the files for the LLM Inference API example app:
```
cd mediapipe
git sparse-checkout init --cone
git sparse-checkout set examples/llm_inference/ios/
```

After creating a local version of the example code, you can install the MediaPipe task library, open the project using Xcode and run the app. For instructions, see the Setup Guide for iOS.

Setup

This section describes key steps for setting up your development environment and code projects to use LLM Inference API. For general information on setting up your development environment for using MediaPipe tasks, including platform version requirements, see the Setup guide for iOS.

Dependencies

LLM Inference API uses the MediaPipeTasksGenai library, which must be installed using CocoaPods. The library is compatible with both Swift and Objective-C apps and does not require any additional language-specific setup.

For instructions to install CocoaPods on macOS, refer to the CocoaPods installation guide. For instructions on how to create a Podfile with the necessary pods for your app, refer to Using CocoaPods.

Add the MediaPipeTasksGenai pod in the Podfile using the following code:

target 'MyLlmInferenceApp' do
  use_frameworks!
  pod 'MediaPipeTasksGenAI'
  pod 'MediaPipeTasksGenAIC'
end

If your app includes unit test targets, refer to the Set Up Guide for iOS for additional information on setting up your Podfile.

Model

The MediaPipe LLM Inference API task requires a trained model that is compatible with this task. For more information on available trained models for LLM Inference API, see the task overview Models section.

Download a model

Download a model and add it to your project directory using Xcode. For instructions on how to add files to your Xcode project, refer to Managing files and folders in your Xcode project.

Download Gemma 2B

When building iOS apps, use one of the following variants:

gemma-2b-it-cpu-int4: Gemma 4-bit model with CPU compatibility.
gemma-2b-it-gpu-int4: Gemma 4-bit model with GPU compatibility.

For more information on other models, see the task overview Models section.

Create the task

You can create the LLM Inference API task by calling one of its initializers. The LlmInference(options:) initializer sets values for the configuration options.

If you don't need a LLM Inference API initialized with customized configuration options, you can use the LlmInference(modelPath:) initializer to create a LLM Inference API with the default options. For more information about configuration options, see Configuration Overview.

The following code demonstrates how to build and configure this task.

import MediaPipeTasksGenai

let modelPath = Bundle.main.path(forResource: "model",
                                      ofType: "bin")

let options = LlmInferenceOptions()
options.baseOptions.modelPath = modelPath
options.maxTokens = 1000
options.topk = 40
options.temperature = 0.8
options.randomSeed = 101

let LlmInference = try LlmInference(options: options)

Configuration options

This task has the following configuration options for iOS apps:

Option Name	Description	Value Range	Default Value
`modelPath`	The path to where the model is stored within the project directory.	PATH	N/A
`maxTokens`	The maximum number of tokens (input tokens + output tokens) the model handles.	Integer	512
`topk`	The number of tokens the model considers at each step of generation. Limits predictions to the top k most-probable tokens. When setting `topk`, you must also set a value for `randomSeed`.	Integer	40
`temperature`	The amount of randomness introduced during generation. A higher temperature results in more creativity in the generated text, while a lower temperature produces more predictable generation. When setting `temperature`, you must also set a value for `randomSeed`.	Float	0.8
`randomSeed`	The random seed used during text generation.	Integer	0

Prepare data

LLM Inference API works with text data. The task handles the data input preprocessing, including tokenization and tensor preprocessing.

All preprocessing is handled within the generateResponse(inputText:) function. There is no need for additional preprocessing of the input text beforehand.

let inputPrompt = "Compose an email to remind Brett of lunch plans at noon on Saturday."

Run the task

To run the LLM Inference API, use the generateResponse(inputText:) method. The LLM Inference API returns the possible categories for the input text.

let result = try LlmInference.generateResponse(inputText: inputPrompt)

To stream the response, use the generateResponseAsync(inputText:) method.

let resultStream =  LlmInference.generateResponseAsync(inputText: inputPrompt)

do {
  for try await partialResult in resultStream {
    print("\(partialResult)")
  }
  print("Done")
}
catch {
  print("Response error: '\(error)")
}

Handle and display results

The LLM Inference API returns a LlmInferenceResult, which includes the generated response text.

Here's a draft you can use:

Subject: Lunch on Saturday Reminder

Hi Brett,

Just a quick reminder about our lunch plans this Saturday at noon.
Let me know if that still works for you.

Looking forward to it!

Best,
[Your Name]