Within ML Kit's GenAI Speech Recognition API, you can transcribe audio content to text. This API supports the following modes:
- Basic: The Speech Recognition API uses the traditional on-device speech
recognition model, similar to the SpeechRecognizer API
- Generally available on most Android devices with API level 31 and higher
- Advanced: The Speech Recognition API uses the GenAI model, which produces
broader language coverage and better overall quality
- Available on Pixel 10 devices, with more devices in development
Key capabilities
- Captures streaming input from microphone or audio file
- Transcribed text is provided as a continuous stream, which may initially be partial (and subject to change) before becoming the final content.
Example results
| Audio | Mode | Locale | Transcription |
|---|---|---|---|
| audio_1 | Basic | en-US | "This is a short message" |
| audio_2 | Advanced | es-ES | "Este es un mensaje corto." |
Comparison with the platform Speech Recognition API
When using Basic mode, the ML Kit Speech Recognition API offers similar core functionality to the platform Speech Recognition API. A key advantage of ML Kit is its support for a wider range of Android platform versions, requiring API level 31 or higher, which is broader than some platform APIs.
Also, the ML Kit Speech Recognition API uses the on-device Gemini model in Advanced mode, providing broader language coverage.
Get started
Add the ML Kit Speech Recognition API as a dependency in your build.gradle
configuration
implementation("com.google.mlkit:genai-speech-recognition:1.0.0-alpha1")
To integrate the Speech Recognition API into your app, create a
SpeechRecognizer client. Check the status of the necessary on-device model
features and download the model if it isn't already on the device. After
preparing your audio input in a SpeechRecognizerRequest, run inference using
the client to receive streaming output from the Kotlin flow. Finally,
remember to close the client to release resources.
// 1. Create a SpeechRecognizer with desired options.
val options: SpeechRecognizerOptions =
speechRecognizerOptions {
locale = Locale.US
preferredMode = SpeechRecognizerOptions.Mode.MODE_ADVANCED
}
val speechRecognizer: SpeechRecognizer = SpeechRecognition.getClient(options)
// 2. Check if the recognition model is available or needs downloading.
launch {
val status: Int = speechRecognizer.checkStatus()
if (status == FeatureStatus.DOWNLOADABLE) {
// 3. If needed, download the model and monitor progress.
speechRecognizer.download.collect { downloadStatus ->
when (downloadStatus) {
is DownloadStatus.DownloadCompleted -> {
// Model is ready, start recognition.
startMyRecognition(speechRecognizer)
}
is DownloadStatus.DownloadFailed -> {
// Handle download failure (e.g., inform the user).
}
is DownloadStatus.DownloadProgress -> {
// Handle download progress (e.g., update a progress bar).
}
}
}
} else if (status == FeatureStatus.AVAILABLE) {
// Model is already ready, start recognition immediately.
startMyRecognition(speechRecognizer)
} else {
// Handle other statuses (e.g., DOWNLOADING, UNAVAILABLE).
}
}
// 4. Define your recognition logic using a suspend function.
suspend fun startMyRecognition(recognizer: SpeechRecognizer) {
// Create a request (e.g., specifying audio source).
val request: SpeechRecognizerRequest
= speechRecognizerRequest { audioSource = AudioSource.fromMic() }
// Start recognition and process the continuous stream of responses.
recognizer.startRecognition(request).collect {
// Process the SpeechRecognitionResponse data here.
}
}
// 5. Stop recognition and clean up resources when the session is complete.
launch {
recognizer.stopRecognition()
recognizer.close()
}
Supported languages and devices
| Mode | Locales |
| Basic | en-US, fr-FR (beta), it-IT (beta), de-DE (beta), es-ES (beta), hi-IN (beta), ja-JP (beta), pt-BR (beta), tr-TR (beta), pl-PL (beta), cmn-Hans-CN (beta), ko-KR (beta), cmn-Hant-TW (beta), ru-RU (beta), vi-VN (beta) |
| Advanced | Locales that typically have high accuracy: en-US, ko-KR, es-ES, fr-FR, de-DE, it-IT, pt-PT, cmn-Hans-CN, cmn-Hant-TW, ja-JP, th-TH, ru-RU, nl-NL (beta), da-DK (beta), sv-SE (beta), pl-PL (beta), hi-IN (beta), vi-VN (beta), id-ID (beta), ar-SA (beta), tr-TR (beta) |
Supported devices
| Mode | Supported Devices |
| Basic | Android devices using API level 31 and higher. |
| Advanced | Pixel 10 |
Common setup issues
ML Kit GenAI APIs rely on the Android AICore app to access Gemini Nano. When a device is just setup (including reset), or the AICore app is just reset (e.g. clear data, uninstalled then reinstalled), the AICore app may not have enough time to finish initialization (including downloading latest configurations from server). As a result, the ML Kit GenAI APIs may not function as expected. Here are the common setup error messages you may see and how to handle them:
| Example error message | How to handle |
| AICore failed with error type 4-CONNECTION_ERROR and error code 601-BINDING_FAILURE: AICore service failed to bind. | This could happen when you install the app using ML Kit GenAI APIs immediately after device setup or when AICore is uninstalled after your app is installed. Updating AICore app then reinstalling your app should fix it. |
| AICore failed with error type 3-PREPARATION_ERROR and error code 606-FEATURE_NOT_FOUND: Feature ... is not available. |
This could happen when AICore hasn't finished downloading the latest configurations. When the device is connected to the internet, it usually takes a few minutes to a few hours to update. Restarting the device can speed up the update.
Note that if the device's bootloader is unlocked, you'll also see this error—this API does not support devices with unlocked bootloaders. |
| AICore failed with error type 1-DOWNLOAD_ERROR and error code 0-UNKNOWN: Feature ... failed with failure status 0 and error esz: UNAVAILABLE: Unable to resolve host ... | Keep network connection, wait for a few minutes and retry. |
Sample code
- Explore the ML Kit Speech Recognition API code sample on GitHub