Speech to text transcription with the Cloud Speech-to-Text API

The Cloud Speech API lets you do speech to text transcription from audio files in over 80 languages.

In this lab, we will record an audio file and send it to the Cloud Speech API for transcription.

What you'll learn

  • Creating a Speech API request and calling the API with curl
  • Calling the Speech API with audio files in different languages

What you'll need

  • A Google Cloud Platform Project
  • A Browser, such Chrome or Firefox

How will you use this tutorial?

Read it through only Read it and complete the exercises

How would rate your experience with Google Cloud Platform?

Novice Intermediate Proficient

Self-paced environment setup

If you don't already have a Google Account (Gmail or Google Apps), you must create one. Sign-in to Google Cloud Platform console (console.cloud.google.com) and create a new project:

Screenshot from 2016-02-10 12:45:26.png

Remember the project ID, a unique name across all Google Cloud projects (the name above has already been taken and will not work for you, sorry!). It will be referred to later in this codelab as PROJECT_ID.

Next, you'll need to enable billing in the Cloud Console in order to use Google Cloud resources.

Running through this codelab shouldn't cost you more than a few dollars, but it could be more if you decide to use more resources or if you leave them running (see "cleanup" section at the end of this document).

New users of Google Cloud Platform are eligible for a $300 free trial.

Click on the menu icon in the top left of the screen.

Select the APIs and Services dashboard from the drop down.

Click on Enable APIs and Services.

Then, search for "speech" in the search box. Click on Google Cloud Speech API:

Click Enable to enable the Cloud Speech API:

Wait for a few seconds for it to enable. You will see this once it's enabled:

Google Cloud Shell is a command line environment running in the Cloud. This Debian-based virtual machine is loaded with all the development tools you'll need (gcloud, bq, git and others) and offers a persistent 5GB home directory. We'll use Cloud Shell to create our request to the Speech API.

To get started with Cloud Shell, Click on the "Activate Google Cloud Shell" Screen Shot 2015-06-08 at 5.30.32 PM.pngicon in top right hand corner of the header bar

A Cloud Shell session opens inside a new frame at the bottom of the console and displays a command-line prompt. Wait until the user@project:~$ prompt appears

Since we'll be using curl to send a request to the Speech API, we'll need to generate an API key to pass in our request URL. To create an API key, navigate to the APIs & Services > Credentials section of your project dashboard:

Then click Create credentials:

In the drop down menu, select API key:

Next, copy the key you just generated and select Close (don't restrict the key).

Now that you have an API key, save it to an environment variable to avoid having to insert the value of your API key in each request. You can do this in Cloud Shell. Be sure to replace <your_api_key> with the key you just copied.

export API_KEY=<YOUR_API_KEY>

You can build your request to the speech API in a request.json file. To create and edit this file, you can use one of your preferred command line editors (nano, vim, emacs) or use the built-in web editor in Cloud Shell:

Create the file in your home directory to be able to easily reference it and add the following to your request.json file :

request.json

{
  "config": {
      "encoding":"FLAC",
      "languageCode": "en-US"
  },
  "audio": {
      "uri":"gs://cloud-samples-tests/speech/brooklyn.flac"
  }
}

The request body has a config and audio object. In config, we tell the Speech API how to process the request. The encoding parameter tells the API which type of audio encoding you're using for the audio file you're sending to the API. FLAC is the encoding type for .raw files (see the documentation for encoding type for more details). There are other parameters you can add to your config object, but encoding is the only required one. languageCode will default to English if left out of the request.

In the audio object, you can pass the API either the uri of our audio file in Cloud Storage or the base64 encoded audio as a string. Here were using Cloud Storage URLs. The next step is calling the Speech API!

You can now pass your request body, along with the API key environment variable you saved earlier, to the Speech API with the following curl command (all in one single command line):

curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json "https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}"

The response returned by this curl command should look something like the following:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "how old is the Brooklyn Bridge",
          "confidence": 0.98267895
        }
      ]
    }
  ]
}

The transcript value will return the Speech API's text transcription of your audio file, and the confidence value indicates how sure the API is that it has accurately transcribed your audio.

You'll notice that we called the recognize method in our request above. The Speech API supports both synchronous and asynchronous speech to text transcription. In this example we sent it a complete audio file, but you can also use the longrunningrecognize method to perform streaming speech to text transcription while the user is still speaking.

Are you multilingual? The Speech API supports speech to text transcription in over 100 languages! You can change the languageCode parameter in request.json. You can find a list of supported languages here.

Let's try a French audio file (listen to it here if you'd like a preview). Change your request.json to the following:

request.json

 {
  "config": {
      "encoding":"FLAC",
      "languageCode": "fr"
  },
  "audio": {
      "uri":"gs://speech-language-samples/fr-sample.flac"
  }
}

You should see the following response:

{
  "results": [
    {
      "alternatives": [
        {
          "transcript": "maître corbeau sur un arbre perché tenait en son bec un fromage",
          "confidence": 0.9710122
        }
      ]
    }
  ]
}

This is a sentence from a popular French children's tale. If you've got audio files in another language, you can try adding them to Cloud Storage and changing the languageCode parameter in your request.

You've learned how to perform speech to text transcription with the Speech API. In this example you passed the API the Google Cloud Storage URI of your audio file. Alternatively, you can pass a base64 encoded string of your audio content.

What we've covered

  • Passing the Speech API a Google Cloud Storage URI of an audio file
  • Creating a Speech API request and calling the API with curl
  • Calling the Speech API with audio files in different languages

Next steps