Google Prediction API

Prediction V1.4 / V1.4.1 Reference

Contents

Data Method REST URI
Relative to https://www.googleapis.com/prediction/v1.4
hostedmodels Collection
Predict against a hosted model
prediction.hostedmodels.predict
POST
/hostedmodels/{hostedModelName}/predict
trainedmodels Collection
↳ trainedmodels Resource
Get model information
prediction.trainedmodels.get
GET
/trainedmodels/{id}
Train a new model
prediction.trainedmodels.insert
POST
/trainedmodels
Add streaming training
prediction.trainedmodels.update
PUT
/trainedmodels/{id}
Delete a trained model
prediction.trainedmodels.delete
DELETE
/trainedmodels/{id}
Predict against your own model
prediction.trainedmodels.predict
POST
/trainedmodels/{id}/predict

Standard query parameters

Query parameters that apply to all Google Prediction API operations are shown in the table below.

Notes (on API keys and auth tokens):

  1. The key parameter is required with every request, unless you provide an OAuth 2.0 token with the request.
  2. You must send an authorization token with every request that is marked (AUTHENTICATED). OAuth 2.0 is the preferred authorization protocol.
  3. You can provide an OAuth 2.0 token with any request in one of two ways:
    • Using the access_token query parameter like this: ?access_token=oauth2-token
    • Using the HTTP Authorization header like this: Authorization: Bearer oauth2-token

All parameters are optional except where noted.

Parameter Meaning Notes
access_token OAuth 2.0 token for the current user.
callback Callback function.
  • Name of the JavaScript callback function that handles the response.
  • Used in JavaScript JSON-P requests.
fields Selector specifying a subset of fields to include in the response.
  • For more information, see the partial response section in the Performance Tips document.
  • Use for better performance.
key API key. (REQUIRED*)
  • *Required unless you provide an OAuth 2.0 token.
  • Your API key identifies your project and provides you with API access, quota, and reports.
  • Obtain your project's API key from the Google Developers Console.
prettyPrint

Returns response with indentations and line breaks.

  • Returns the response in a human-readable format if true.
  • Default value: true.
  • When this is false, it can reduce the response payload size, which might lead to better performance in some environments.
quotaUser Alternative to userIp.
  • Lets you enforce per-user quotas from a server-side application even in cases when the user's IP address is unknown. This can occur, for example, with applications that run cron jobs on App Engine on a user's behalf.
  • You can choose any arbitrary string that uniquely identifies a user, but it is limited to 40 characters.
  • Overrides userIp if both are provided.
  • Learn more about capping usage.
userIp IP address of the end user for whom the API call is being made.
  • Lets you enforce per-user quotas when calling the API from a server-side application.
  • Learn more about capping usage.

hostedmodels Collection

hostedmodels

The hostedmodels collection is a collection of publicly available trained models. These models can be free, but most have a usage fee associated with them, as described in their documentation. See a list of hosted models in the hosted model gallery.

Using hosted models is convenient when you don't have the time, resources, or expertise to build a model for a specific topic. If you have a model that you'd like to make public, follow the submission links in the hosted model gallery.

Sending a prediction request against a hosted model is nearly the same as sending a prediction against any other model; the only difference is the request URL.

prediction.hostedmodels.predict

Run a prediction request against a hosted model.

Request

POST https://www.googleapis.com/prediction/v1.4/hostedmodels/{hostedModelName}/predict
{
  "input":{
    "csvInstance":[ col1_value, col2_value, ... ]
  }
}
Property Name Value Description
col1_value, col2_value, ... Array of string or number An array of entity features, as described by the hosted model's documentation. Note that string fields must be surrounded by escaped quotes. The array can be a mix of string and number columns.

Try it now in the APIs Explorer!


Response

{
  "kind": "prediction#output",
  "id": string,
  "selfLink": string,
  "outputLabel": string,
  "outputMulti": [
    {
      "label": string,
      "score": double
    }
  ],
  "outputValue": double
}
Property Name Value Description
kind string What kind of resource this is.
id string The unique name for the predictive model.
selfLink string A URL to re-request this resource.
outputLabel string [Categorical models only] The most likely class label.
outputMulti[] list [Categorical models only] A list of class labels with their estimated scores.
outputMulti[].label string The class label.
outputMulti[].score double

A score for this class label. A few notes on the scores:

  • Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
  • Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
  • These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
  • It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value below which you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
outputValue double [Regression models only] The estimated regression value .

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

trainedmodels Collection

trainedmodels

Trained models are trained models that you own.

trainedmodels Resource

Represents a trained model.

{
  "kind": "prediction#training",
  "id": string,
  "storageDataLocation": string,
  "storagePMMLLocation": string,
  "selfLink": string,
  "utility": [
    {
      any value: double
    }
  ],
  "modelInfo": {
    "numberInstances": long,
    "modelType": string,
    "numberLabels": long,
    "classificationAccuracy": double,
    "classWeightedAccuracy": double,
    "confusionMatrix": {
        any value: {
            any value: double
        }
    },
    "confusionMatrixRowTotals": {
        any value: double
    },
    "meanSquaredError": double,
  },
  "trainingStatus": string,
  "dataAnalysis": {         // Present only in GET requests for models with warnings or errors
    "warnings": [
      string array
    ]
  }
}
Property Name Value Description
kind string What kind of resource this is.
id string A name for the predictive model, unique within this user account. Naming restrictions are 1-255 characters long, any mix of digits, lowercase letters, dashes, and underscores: [0-9a-z_\-]
storageDataLocation string Google Cloud Storage location of the training data file.
storagePMMLLocation string Google Cloud Storage location of the preprocessing PMML file. See Importing PMML Models for details.
string A URL to re-request this resource.
utility[] list

[Categorical models only] A class label weighting function, which allows the importance weights for class labels to be specified. See prediction.trainedmodels.insert() for details.

The format of this array is: [{'label1':val_1},{'label2':val_2}] where the value is a positive double precision value. Not all labels must be specified; default value for unspecified labels is 1.0. Labels must match example labels exactly. Example:  'utility': [ {'not_spam' : 5}, {'spam' : 1} ]

modelInfo object Model metadata.
modelInfo.numberInstances long Number of valid data instances used in the trained model.
modelInfo.modelType string Type of predictive model: either CLASSIFICATION or REGRESSION.
modelInfo.numberLabels long [Categorical models only] Number of class labels in the trained model.
modelInfo.classificationAccuracy double

[Categorical models only] A number between 0.0 and 1.0, where 1.0 is 100% accurate. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data.

If you are retraining an existing model, the modelInfo field will show an accuracy value in even if the new training is not complete. This number will be the accuracy of the previously trained model, which is still usable, until the new model has finished training.

modelInfo.classWeightedAccuracy double [Categorical models only] Estimated accuracy of the model, taking utility weights into account.
modelInfo.confusionMatrix object [Categorical models only] An output confusion matrix. This shows an estimate for how accurate this model will be in actual use. See prediction.trainedmodels.get() for information.
modelInfo.confusionMatrixRowTotals object A list of the confusion matrix row totals. See prediction.trainedmodels.get() for more information.
modelInfo.meanSquaredError double [Regression models only] An estimated mean squared error. The can be used to measure the quality of the predicted model .
trainingStatus string

The current status of the training job. This can be one of following:

  • RUNNING - Only returned when retraining a model; for a new model, a trainedmodels.get call will return HTTP 404 before training is complete.
  • DONE
  • ERROR
  • ERROR: NO VALID DATA INSTANCES
  • ERROR: TRAINING JOB NOT FOUND
  • ERROR: TRAINING TIME LIMIT EXCEEDED
  • ERROR: TRAINING SYSTEM CAPACITY EXCEEDED
  • ERROR: TRAINING DATA FILE SIZE LIMIT ERROR
  • ERROR: STORAGE LOCATION IS INVALID
dataAnalysis object An object that is only present if there are problems training the model.
dataAnalysis.warnings string array An array of strings describing recommendations, warnings, or errors in model data, training, or other aspects of the model.

prediction.trainedmodels.get

Returns information about a trained model, including training status, confusion matrix, and estimated error values.

Note that this will not return successfully for a new model until training has completed successfully.

  • If this is an attempt to train a new model, trainedmodels.get will return HTTP 404 "No model found. Training running." until training completes successfully or not. If training completes successfully, the method will return the trainedmodels resource; if training fails, this method will return an HTTP 404 "No model found. Model must first be trained"
  • If this is an attempt to retrain an existing model, trainedmodels.get will always return a trainedmodels resource. If the retraining succeeds, the resource will be for the new model. If the retraining fails, the resource will be for the previous model, but the trainingStatus property value will be ERROR.

Important: Only the user who trained a model can call this method.

Confusion Matrices

This method returns a modelInfo.confusionMatrix property that describes a confusion matrix. This matrix describes how many labels were properly and improperly guessed for each training entry during training. This is useful for evaluating the accuracy of training over your data; if the matrix indicates that specific values are often confused, you might want to change your training data structure.

Here is an example confusion matrix for a language identification model. In this model, for all entries with the label "French", 12 were properly identified as French and 0.5 were improperly identified as English. You can see the values for items labeled "Spanish" and "English" as well. Numbers can be fractions because they are averaged across multiple training runs. confusionMatrixRowTotals describes the total number of each label applied.

"confusionMatrix": {
   "French": {
    "French": 12.0,
    "English": 0.5
   },
   "Spanish": {
    "Spanish": 6.0,
    "English": 1.0
   },
   "English": {
    "French": 0.5,
    "Spanish": 2.0,
    "English": 20.0
   }
  },
  "confusionMatrixRowTotals": {
   "French": 12.5,
   "Spanish": 7.0,
   "English": 22.5
  }
 }

Request

GET https://www.googleapis.com/prediction/v1.4/trainedmodels/{id}

Try it now in the APIs Explorer!

Response

Returns a trainedmodels resource.

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.trainedmodels.insert

Asynchronous call to start training a model. When training has completed succesfully, prediction.trainedmodels.get() will return information about the training model. Training can take up to several hours, depending on the complexity and size of the data, but will typically take less time.

Note that each time you call this method, it will overwrite any existing model with the same ID if training succeeds. If training fails, the existing model will not be replaced. You can continue to run queries against an existing model during the training of a new model.

In order to train against data stored in Google Cloud Storage, you must have read or owner rights on that data. You must have read permission on the Google Cloud Storage object that holds your training data. By default, when you upload a file, you are assigned owner rights. When you train against that model, your training method call must be authenticated to a user with read rights.

Note that a model can be used only by the user who calls trainedmodels.insert to create that model.

Learn more about modifying Google Cloud Storage object ACLs.

Authentication is required.

Request

POST https://www.googleapis.com/prediction/v1.4/trainedmodels
{
  "id": string,
  "storageDataLocation": string,
  "storagePMMLLocation": string,  // Only used for PMML preprocessing
  "utility": [                    // Optional, categorical models only
    {
      any value: double
    }
  ]
}
Property Name Value Description
id string A name for the model. The ID must be unique within this user account. Naming restrictions are 1-255 characters long, any mix of digits, lowercase letters, dashes, and underscores: [0-9a-z_\-]
storageDataLocation string

[Optional] The Google Cloud Storage path to your training data, without the gs:// prefix. Your training data must have at least six examples. See the full training file format. Example: mybucket/languages/languagedata.txt

If not specified, you can add examples to the empty model by calling update(). However, you will not be able to run any predictions until you add at least one example.

storagePMMLLocation string [Optional] If you want to preprocess your data using a PMML transform, this is the location of your PMML file in Google Cloud Storage, without the gs:// prefix.
utility array of values

[Optional, categorical models only] Assigns a numeric weight to one or more categories in the training data. The purpose of this property is to prevent false positives by assigning a relative weight to specific categories, where the higher the value, the higher the associated cost with mislabeling something that is actually in that category as something else.

Example: In a spam identification model, identifying some spam as non-spam is relatively lower cost than identifying some non-spam as spam. Therefore you would include a utility property with the following value (assuming your non-spam examples have the label 'not_spam'):

'utility':[{'not_spam':5.0}]

Unlisted labels receive a default weight of 1.0, so the previous example would assign 'spam' a utility value of 1.0.

 

Try it now in the APIs Explorer!


Response

Returns a trainedmodels resource if training has completed successfully, or an HTTP 404 error if training is not yet complete, or failed because of an error. Here is an example error message when training has not yet completed:

{
  "error": {
    "errors": [{
    "domain": "global",
    "reason": "notFound",
    "message": "No model found. Model must first be trained."
    }],
    "code": 404,
    "message": "No model found. Model must first be trained."
  }
}

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.trainedmodels.update

[Categorization models only] Add new data to a trained model.

Adding new data to a trained model is called streaming training. Streaming training trains a previously trained model against a new example. This is useful if you have a regular stream of new information that you'd like to add to your model as it becomes available, rather than having to recompile, re-upload, and retrain the data with batches of new data. The model is not retrained each time it receives a new example; rather, it retrains after every N new examples have been added, where N is a small number.

Note that the system may weight newer streamed examples more than earlier examples. If you do not want this, you should add the examples to your training data and retrain the system against all the data by calling prediction.trainedmodels.insert().

Note: If you retrain a model against its original training data file, all the streamed data will be lost. If you want to retain the streamed data, you must store it and update the model data yourself.

Authentication is required.

Request

PUT https://www.googleapis.com/prediction/v1.4/trainedmodels/{id}
{
  "label" : my_label
  "csvInstance: [ col1, col2....colN ]
}
Property Name Value Description
label string The category label to assign to this example. Only category examples can be streamed to an existing model.
csvInstance Array of string or number The example data as an array of columns, in the same format as the CSV file.

Try it now in the APIs Explorer!

Response

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.trainedmodels.delete

Delete a trained model.

Authentication is required.

Request

DELETE https://www.googleapis.com/prediction/v1.4/trainedmodels/{id}

Try it now in the APIs Explorer!


Response

Returns an HTTP 200 and an empty JSON object: {}

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.trainedmodels.predict

Run a prediction against your model.

Here's an example request to a hosted model that predicts a person's height, if the model expects a string gender ("M" or "F"), two height numbers, and a string country name:

{
  "input":{
    "csvInstance":["M", 1.59, 1.51,"France"]
  }
}

Authentication is required.

Request

POST https://www.googleapis.com/prediction/v1.4/trainedmodels/{id}/predict

{
  "input":{
    "csvInstance":[ col1_value, col2_value, ... ]
  }
}
Property Name Value Description
col1_value, col2_value, ... Array of string or number An array of entity features, as described by the model's schema. Note that string fields must be surrounded by escaped quotes. The array can be a mix of string and number columns.

Try it now in the APIs Explorer!


Response

{
  "kind": "prediction#output",
  "id": string,
  "selfLink": string,
  "outputLabel": string,
  "outputMulti": [
    {
      "label": string,
      "score": double
    }
  ],
  "outputValue": double
}
Property Name Value Description
kind string What kind of resource this is.
id string The name of the predictive model.
selfLink string A URL to re-request this resource.
outputLabel string [Categorical models only] The most likely class label.
outputMulti[] list [Categorical models only] A list of class labels with their estimated scores.
outputMulti[].label string The class label.
outputMulti[].score double A score for this label. Some notes on the score:
  • Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
  • Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
  • These values are not necessarily probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
  • Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
  • It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
outputValue double [Regression models only] The estimated regression value.

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction


Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.