Introducing Google AI Edge Portal: Benchmark Edge AI at scale. Sign-up to request access during private preview.

LiteRT-LM API Overview

LiteRT-LM provides APIs for multiple platforms, allowing you to integrate large language models into your applications with ease.

Installation & Dependency Setup

Before writing code, you need to add the LiteRT-LM SDK to your project. Use the following tabs to see how to configure the dependency for your target language.

Python

Install the official API package from PyPI:

# Using pip
pip install litert-lm-api

Kotlin

Add the Maven dependency to your build.gradle file:

dependencies {
    // For Android apps
    implementation("com.google.ai.edge.litertlm:litertlm-android:latest.release")

    // For JVM-based desktop apps (Linux, macOS, Windows)
    implementation("com.google.ai.edge.litertlm:litertlm-jvm:latest.release")
}

Swift

Integrate LiteRT-LM natively in Xcode using Swift Package Manager (SPM):

In Xcode, select File > Add Package Dependencies...
Enter the repository URL: https://github.com/google-ai-edge/LiteRT-LM
Select LiteRTLM and add it to your target.

Or add it to your Package.swift dependencies:

dependencies: [
  .package(url: "https://github.com/google-ai-edge/LiteRT-LM", from: "0.12.0")
]

JavaScript

Install the package from npm,

# Using npm
npm install --save @litert-lm/core

or import it directly from a CDN in your HTML:

<!-- Direct HTML CDN import -->
<script type="module">
  import { Engine } from 'https://cdn.jsdelivr.net/npm/@litert-lm/core/+esm';
</script>

Quick Start Example

All official LiteRT-LM SDKs share a similar workflow:

Initialize the Engine: Load the model weights and configure execution settings (such as hardware acceleration).
Create a Conversation: Manage the history and state of a chat session.
Send Messages: Send prompts to the model and receive responses (either blocking or streaming).

The following is a comparison of how to perform a basic text-generation request across different platforms using the provided SDKs.

Python

import litert_lm

# 1. Initialize the Engine
with litert_lm.Engine("path/to/model.litertlm") as engine:

  # 2. Create a Conversation
  with engine.create_conversation() as conversation:

    # 3. Send a message and get the response
    response = conversation.send_message("What is the capital of France?")
    print(response["content"][0]["text"])

Kotlin

import com.google.ai.edge.litertlm.*

suspend fun main() {
  // 1. Initialize the Engine
  val config = EngineConfig(modelPath = "path/to/model.litertlm")
  Engine(config).use { engine ->
    engine.initialize()

    // 2. Create a Conversation
    engine.createConversation().use { conversation ->

      // 3. Send a message and get the response
      val response = conversation.sendMessage("What is the capital of France?")
      println(response)
    }
  }
}

Swift

import LiteRTLM

// 1. Initialize the Engine
let config = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu,
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

// 2. Create a Conversation
let conversation = try await engine.createConversation()

// 3. Send a message and get the response
let response = try await conversation.sendMessage(Message("What is the capital of France?"))
print(response.toString)

JavaScript

import { Engine } from '@litert-lm/core';

// 1. Initialize the Engine
const engine = await Engine.create({
  model: 'url/path/to/model.litertlm'
});

// 2. Create a Conversation
const conversation = await engine.createConversation();

// 3. Send a message and get the response
const response = await conversation.sendMessage("What is the capital of France?");
console.log(response.content[0].text);

// Clean up
await engine.delete();

Hardware Acceleration (GPU)

You can run LiteRT-LM models on your device's GPU to accelerate inference speeds. The following is how to configure the engine to run on the GPU.

Python

import litert_lm

# Set backend to GPU
with litert_lm.Engine("path/to/model.litertlm", backend=litert_lm.Backend.GPU()) as engine:
    with engine.create_conversation() as conversation:
        response = conversation.send_message("Hello!")
        print(response["content"][0]["text"])

Kotlin

import com.google.ai.edge.litertlm.*

suspend fun main() {
  // Set backend to GPU in EngineConfig
  val config = EngineConfig(
      modelPath = "path/to/model.litertlm",
      backend = Backend.GPU()
  )
  Engine(config).use { engine ->
    engine.initialize()
    engine.createConversation().use { conversation ->
      val response = conversation.sendMessage("Hello!")
      println(response)
    }
  }
}

Swift

import LiteRTLM

// Set backend to .gpu in EngineConfig
let config = try EngineConfig(
  modelPath: "path/to/model.litertlm",
  backend: .gpu(),
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

let conversation = try await engine.createConversation()
let response = try await conversation.sendMessage(Message("Hello!"))
print(response.toString)

JavaScript

WebGPU acceleration is enabled by default in the Web SDK and does not require any manual backend configuration:

import { Engine } from '@litert-lm/core';

// WebGPU is used automatically
const engine = await Engine.create({
  model: 'url/path/to/model.litertlm'
});

const conversation = await engine.createConversation();
const response = await conversation.sendMessage("Hello!");
console.log(response.content[0].text);

Multi-Modality (Images)

LiteRT-LM supports multimodal input (such as image attachments) when using compatible models.

Python

import litert_lm

# Initialize engine with a vision backend
with litert_lm.Engine(
    "path/to/multimodal_model.litertlm",
    vision_backend=litert_lm.Backend.GPU()
) as engine:
    with engine.create_conversation() as conversation:
        # Send image attachment alongside text prompt
        response = conversation.send_message(
            litert_lm.Contents.of(
                "Describe this image.",
                litert_lm.Content.ImageFile(absolute_path="path/to/image.jpg")
            )
        )
        print(response["content"][0]["text"])

Kotlin

import com.google.ai.edge.litertlm.*

suspend fun main() {
  // Initialize engine with a vision backend
  val config = EngineConfig(
      modelPath = "path/to/multimodal_model.litertlm",
      visionBackend = Backend.GPU()
  )
  Engine(config).use { engine ->
    engine.initialize()
    engine.createConversation().use { conversation ->
      // Send image attachment alongside text prompt
      val response = conversation.sendMessage(Contents.of(
          Content.Text("Describe this image."),
          Content.ImageFile("path/to/image.jpg"),
      ))
      println(response)
    }
  }
}

Swift

import LiteRTLM

// Initialize engine with a vision backend
let config = try EngineConfig(
  modelPath: "path/to/multimodal_model.litertlm",
  visionBackend: .cpu(), // Or .gpu() if supported
  cacheDir: NSTemporaryDirectory()
)
let engine = Engine(engineConfig: config)
try await engine.initialize()

let conversation = try await engine.createConversation()

// Send image attachment alongside text prompt
let message = Message(contents: [
  Content.text("Describe this image."),
  Content.imageFile("path/to/image.jpg")
])
let response = try await conversation.sendMessage(message)
print(response.toString)

JavaScript

Function Calling (Tool Use)

Function calling allows the language model to request the execution of client-side code tools to perform actions or retrieve information.

Python

import litert_lm

# 1. Define a tool function
def get_current_time() -> str:
    """Returns the current time."""
    return "12:00 PM"

# 2. Register the tool with the conversation
tools = [get_current_time]
with engine.create_conversation(tools=tools) as conversation:
    # The model will invoke the tool automatically if it needs it
    response = conversation.send_message("What time is it?")
    print(response["content"][0]["text"])

Kotlin

import com.google.ai.edge.litertlm.*

// 1. Define a ToolSet
class MyToolSet: ToolSet {
    @Tool(description = "Get the current time")
    fun getCurrentTime(): String {
        return "12:00 PM"
    }
}

suspend fun main() {
  // 2. Register the tool with ConversationConfig
  val config = ConversationConfig(
      tools = listOf(tool(MyToolSet()))
  )
  val conversation = engine.createConversation(config)

  // The model will invoke the tool automatically if it needs it
  val response = conversation.sendMessage("What time is it?")
  println(response)
}

Swift

import LiteRTLM

// 1. Define a Tool conforming to the Tool protocol
struct GetCurrentTimeTool: Tool {
  static let name = "get_current_time"
  static let description = "Get the current time."

  func run() async throws -> Any {
    return ["time": "12:00 PM"]
  }
}

// 2. Register the tool in ConversationConfig
let config = ConversationConfig(
  tools: [GetCurrentTimeTool()]
)
let conversation = try await engine.createConversation(with: config)

// The model will invoke the tool automatically if it needs it
let response = try await conversation.sendMessage(Message("What time is it?"))
print(response.toString)

JavaScript

Next Steps

Explore the detailed integration guides for your target platform:

For Flutter development, check out the community-maintained flutter_gemma package.

LiteRT-LM API Overview Stay organized with collections Save and categorize content based on your preferences.

Installation & Dependency Setup

Python

Kotlin

Swift

JavaScript

Quick Start Example

Python

Kotlin

Swift

JavaScript

Hardware Acceleration (GPU)

Python

Kotlin

Swift

JavaScript

Multi-Modality (Images)

Python

Kotlin

Swift

JavaScript

Function Calling (Tool Use)

Python

Kotlin

Swift

JavaScript

Next Steps

LiteRT-LM API Overview