The CREATE MODEL statement for remote models over Cloud AI services

This document describes the CREATE MODEL statement for creating remote models in BigQuery over Cloud AI services. For example, the Cloud Natural Language API.

CREATE MODEL syntax

{CREATE MODEL | CREATE MODEL IF NOT EXISTS | CREATE OR REPLACE MODEL}
`project_id.dataset.model_name`
REMOTE WITH CONNECTION `project_id.region.connection_id`
OPTIONS(REMOTE_SERVICE_TYPE = remote_service_type
[, DOCUMENT_PROCESSOR = document_processor]
[, SPEECH_RECOGNIZER = speech_recognizer]
);

CREATE MODEL

Creates and trains a new model in the specified dataset. If the model name exists, CREATE MODEL returns an error.

CREATE MODEL IF NOT EXISTS

Creates and trains a new model only if the model doesn't exist in the specified dataset.

CREATE OR REPLACE MODEL

Creates and trains a model and replaces an existing model with the same name in the specified dataset.

model_name

The name of the model you're creating or replacing. The model name must be unique in the dataset: no other model or table can have the same name. The model name must follow the same naming rules as a BigQuery table. A model name can:

  • Contain up to 1,024 characters
  • Contain letters (upper or lower case), numbers, and underscores

model_name is not case-sensitive.

If you don't have a default project configured, then you must prepend the project ID to the model name in the following format, including backticks:

`[PROJECT_ID].[DATASET].[MODEL]`

For example, `myproject.mydataset.mymodel`.

REMOTE WITH CONNECTION

Syntax

`[PROJECT_ID].[LOCATION].[CONNECTION_ID]`

BigQuery uses a Cloud resource connection to interact with the Cloud AI service.

The connection elements are as follows:

  • PROJECT_ID: the project ID of the project that contains the connection.
  • LOCATION: the location used by the connection. The connection must be in the same location as the dataset that contains the model.
  • CONNECTION_ID: the connection ID—for example, myconnection.

    To find your connection ID, view the connection details in the Google Cloud console. The connection ID is the value in the last section of the fully qualified connection ID that is shown in Connection ID—for example projects/myproject/locations/connection_location/connections/myconnection.

You need to grant the Vertex AI User role to the connection's service account in the project where you create the model.

Example

`myproject.us.my_connection`

REMOTE_SERVICE_TYPE

Syntax

REMOTE_SERVICE_TYPE = { 'CLOUD_AI_NATURAL_LANGUAGE_V1' | 'CLOUD_AI_TRANSLATE_V3' | 'CLOUD_AI_VISION_V1' | 'CLOUD_AI_DOCUMENT_V1' | 'CLOUD_AI_SPEECH_TO_TEXT_V2' }

Description

Specifies the service to use to create the model:

After you create a remote model based on a Cloud AI service, you can use the model with one of the following BigQuery ML functions to analyze your BigQuery data:

Example

REMOTE_SERVICE_TYPE = 'CLOUD_AI_VISION_V1'

DOCUMENT_PROCESSOR

This option identifies the document processor to use when the REMOTE_SERVICE_TYPE value is CLOUD_AI_DOCUMENT_V1. You must use this option when creating a remote model over the Document AI API. You can't use this option with any other type of remote model.

A document processor from Document AI should exist when you specify this option to create the model in BigQuery. You can create a document processor supported by BigQuery in 2 ways:

  • Select a prebuilt processor from the Specialized section of the Processor Gallery. Supported processors have a description that starts with the word Extract. For example, Invoice Parser, Utility Parser, and W2 Parser are all supported processors. These types of processors extract predefined, domain-specific fields from the documents as output columns.
  • Use Workbench to build a Custom Extractor processor based on a Vertex AI foundation model. You can specify the fields for extraction, and then tune the model with custom documents that contain those fields.

The CREATE MODEL statement fails if you specify an unsupported processor, or if the processor isn't enabled.

The DOCUMENT_PROCESSOR value must be a string in the following format:

projects/PROJECT_NUMBER/locations/LOCATION/processors/PROCESSOR_ID/processorVersions/PROCESSOR_VERSION

Replace the following:

  • PROJECT_NUMBER: the project number of the project that contains the document processor. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the projects element—for example https://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process.
  • LOCATION: the location used by the document processor. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the locations element—for example https://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process.
  • PROCESSOR_ID: the document processor ID. To find this value, look at the processor details, look at Prediction endpoint, and take the value following the processors element—for example https://us-documentai.googleapis.com/v1/projects/project_number/locations/processor_location/processors/processor_id:process.
  • PROCESSOR_VERSION: the document processor version. You can find this value by looking at the processor details, selecting the Manage Versions tab, and copying the Version ID value of the version you want to use.

SPEECH_RECOGNIZER

This option identifies the speech recognizer to optionally use when the REMOTE_SERVICE_TYPE value is CLOUD_AI_SPEECH_TO_TEXT_V2. If you don't specify this option, you must specify a value for the recognition_config argument of the ML.TRANSCRIBE function if you reference the remote model. You can't use this option with any other type of remote model.

The SPEECH_RECOGNIZER value must be a string in the following format:

projects/PROJECT_NUMBER/locations/LOCATION/recognizers/RECOGNIZER_ID

Replace the following:

  • PROJECT_NUMBER: the project number of the project that contains the speech recognizer. You can find this value on the Project info card in the Dashboard page of the Google Cloud console.
  • LOCATION: the location used by the speech recognizer. You can find this value in the Location field on the List recognizers page of the Google Cloud console.
  • RECOGNIZER_ID: the speech recognizer ID. You can find this value in the ID field on the List recognizers page of the Google Cloud console.

Example

The following example creates a BigQuery ML remote model that uses the Cloud Vision API:

CREATE MODEL `project_id.mydataset.mymodel`
REMOTE WITH CONNECTION `myproject.us.test_connection`
 OPTIONS(REMOTE_SERVICE_TYPE = 'CLOUD_AI_VISION_V1')

What's next

For more information about Generative AI in BigQuery ML, see Generative AI overview.