Google Prediction API

Prediction V1.3 Reference

This Reference Guide is a detailed technical reference of the collections, resources, methods, and authentication requirements for the Prediction API v1.3.

Introduction

Contents and Overview

Data Method REST URI
Relative to https://www.googleapis.com/prediction/v1.3
Access
Hostedmodels Collection
↳ Hostedmodels Resource

Hosted model prediction

prediction.hostedmodels.predict

POST
/hostedmodels/hostedModelName/predict
AUTHENTICATED
Training Collection
↳ Training Resource

Train

prediction.training.insert

POST 
/training
AUTHENTICATED

Streaming training

prediction.training.update

PUT 
/training/bucket%2Fobject
AUTHENTICATED

Get training status / model info

prediction.training.get

GET
/training/bucket%2Fobject
AUTHENTICATED

Predict

prediction.training.predict

POST
/training/bucket%2Fobject/predict
AUTHENTICATED

Delete a model

prediction.training.delete

DELETE
/training/bucket%2Fobject
AUTHENTICATED

Hostedmodels Collection

Hosted Models

Hosted models are trained models that anyone can call. These models can be free, but most have a usage fee associated with them, as described in their documentation. See a list of available hosted models in the hosted model hosted model gallery.  To send a prediction request against a hosted model is nearly the same as sending a prediction against any other model; the only difference is that the request URL is slightly different. Using hosted models is convenient when you don't have the time, resources, or expertise to build a model for a specific topic. If you have a model that you'd like to make public, follow the submission links in the hosted model gallery.

Hostedmodels Resource

Hostedmodel resources are not needed or returned by any method calls.

prediction.hostedmodels.predict (AUTHENTICATED)

Run a prediction request against a hosted model. The hosted model name is part of the model URL in this format: 

https://www.googleapis.com/prediction/vx.x/hostedmodels/{model_name}/predict.

For example, if the access URL were this:

https://www.googleapis.com/prediction/v1.3/hostedmodels/sample.languageid/predict

then the model name would be "sample.languageid" .

Input data is an object with the following syntax:

{
  "input":{
    "csvInstance":[ col1_value, col2_value, ... ]
  }
}

Where col1_value, col2_value, and so on are entity features, as described by the hosted model's documentation. Note that string fields must be surrounded by escaped quotes.

Here's an example request to a hosted model that predicts a person's height, if the model expects a string gender ("M" or "F"), two height numbers, and a string country name:

{
  "input":{
    "csvInstance":["M", 1.59, 1.51,"France"]
  }
}

Notes on categorical model scores in the response:

  • Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
  • Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
  • These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
  • Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
  • It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
POST https://www.googleapis.com/prediction/v1.3/hostedmodels/hostedModelName/predict

Try it now in the APIs Explorer!

{
  "kind": "prediction#output",
  "id": string,
  "selfLink": string,
  "outputLabel": string,
  "outputMulti": [
    {
      "label": string,
      "score": double
    }
  ],
  "outputValue": double
}
Property Name Value Description
kind string What kind of resource this is.
id string The name of the hosted model.
selfLink string A URL to re-request this resource.
outputLabel string [Present in categorical models only] A predicted value for the submitted item, calculated based on given values in the training data.
outputMulti[] list [Present in categorical models only] The results, with one entry for every category in the training table, along with a score assigned to that category. The largest, most positive score is the most likely match. A value will be returned for every category present in the training data; you cannot currently specify how many categories to return. See the notes above in the method description.
outputMulti[].label string The category being described.
outputMulti[].score double A score associated with this category; the largest score is the most likely. See notes below.
outputValue double [Present in regression models only] The category that best fits the submitted value.

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

Training Collection

Training Resource

A Training resource describes a trained prediction model.

{
  "kind": "prediction#training",
  "id": string,
  "selfLink": string,
  "utility": [
    {
      label_n: double
    },
    ...
  ],
  "modelInfo": {
    "numberInstances": long,
    "modelType": string,
    "numberClasses": long,
    "classificationAccuracy": double,
    "classWeightedAccuracy": double,
    "confusionMatrix": { 
        actual_label_name: {
            predicted_label_name: double,
            ...
        },
        ...
    },
    "confusionMatrixRowTotals": {
        label_name: double
    },
    "meanSquaredError": double
  },
  "trainingStatus": string
}
Property Name Value Description
kind string What kind of resource this is.
id string The name of the model. This is the bucket/object path of the training data in Google Storage.
selfLink string A URL to re-request this resource.
utility[] list [Categorical models only] Input only, for training requests. See prediction.training.insert() for details. Format is: [{'label1':val_1},{'label2':val_2}] where the value is a positive double precision value. Not all labels must be specified; default value for unspecified labels is 1.0. Labels must match example labels exactly. Example:  'utility': [ {'not_spam' : 5}, {'spam' : 1} ]
modelInfo object An object containing information about the model. Present on replies; do not include this member in requests.
modelInfo.numberInstances long Describes how many training entries are present in the training data. This is less than or equal to the number of entries in the training data + any streaming training entries. If an entry could not be imported or parsed, it will not be included in this value. This number can be used to check for import errors or to count how many training examples comprise the model. Streaming training entries are included in this value.
modelInfo.modelType string The type of model. This will be either "classification" or "regression".
modelInfo.numberClasses long [Categorical models only] Describes the number of categories in the training data and any streaming updates.
modelInfo.classificationAccuracy double [Categorical models only] A number between 0.0 and 1.0, where 1.0 is 100% accurate. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data.
modelInfo.classWeightedAccuracy double [Categorical models only] Similar to modelInfo.classificationAccuracy, but takes any utility weights into account.
modelInfo.confusionMatrix object [Categorical models only] Describes a confusion matrix of labels that the Prediction engine properly and improperly categorized during training, as assessed during a post-training self-assessment. See prediction.training.get() for details.
modelInfo.confusionMatrixRowTotals object Description of total number of labels assigned to each category.
modelInfo.meanSquaredError double [Regression models only] A number 0.0 or greater, representing the mean squared error. The mean squared error is the average of the square of the difference between the predicted and actual values. This is an estimate, based on the amount and quality of the training data, of the estimated prediction accuracy. You can use this is a guide to decide whether the results are accurate enough for your needs. This estimate will be more reliable if your real input data is similar to your training data.
trainingStatus string The status of the training request. It will be one of the following values: RUNNING; DONE; ERROR; ERROR: TRAINING JOB NOT FOUND

prediction.training.get (AUTHENTICATED)

Returns information about a trained model; most often used to request the training status of a model. Training is an asynchronous process; after invoking training by calling prediction.training.insert(), you must call get() and examine the trainingStatus member of the returned resource to learn the training status.

Important: Only the user who trained a model can call this method.

This method returns a modelInfo.confusionMatrix property that describes a confusion matrix of labels properly and improperly applied to each training entry during training. This is useful for evaluating the accuracy of training over your data; if the matrix indicates that specific values are often confused, you might want to change your training data structure.

Here is an example confusion matrix for a language identification model. In this model, for all entries with the label "French", 12 were properly identified as French and 0.5 were improperly identified as English. You can see the values for items labeled "Spanish" and "English" as well. Numbers can be fractions because they are averaged across multiple training runs. confusionMatrixRowTotals describes the total number of each label applied.

"confusionMatrix": {
   "French": {
    "French": 12.0,
    "English": 0.5
   },
   "Spanish": {
    "Spanish": 6.0,
    "English": 1.0
   },
   "English": {
    "French": 0.5,
    "Spanish": 2.0,
    "English": 20.0
   }
  },
  "confusionMatrixRowTotals": {
   "French": 12.5,
   "Spanish": 7.0,
   "English": 22.5
  }
 }

Note: If you are retraining an existing model, the modelInfo field will show an accuracy value in even if the new training is not complete. This number will be the accuracy of the previously trained model, which is still usable, until the new model has finished training.

GET https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject

Try it now in the APIs Explorer!

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.training.insert (AUTHENTICATED)

Asynchronous request to train your model.

Invoke training on your data by sending a POST request as described below. Note that each time you call this method, it will clear out any existing model with the same name. After making this request, you must call prediction.training.get() to check training status to determine when training is complete.

You must have read permission on the Google Storage object that holds your training data. By default, a Google Storage object only supports read access to the object creator. See here to learn how to read or modify Google Storage object ACLs.

Request data is a Training resource with the following properties:

  • id - The bucket/object name of the model.
  • utility [Optional, categorical models only] - Assigns a numeric weight to one or more categories in the training data. The purpose of this property is to prevent false positives by assigning a relative weight to specific categories, where the higher the value, the higher the associated cost with mislabeling something that is actually in that category as something else. For example, in a spam identification model, identifying some spam as non-spam is relatively lower cost than identifying some non-spam as spam. Therefore you would include a utility property with the following value (assuming your non-spam examples have the label 'not_spam'): 'utility':[{'not_spam':5.0}] . Unlisted labels receive a default weight of 1.0, so the previous example would assign 'spam' a utility value of 1.0.

Training requests are asynchronous; if successful, the request returns immediately with the following reply, indicating that training has begun. Check training status to learn when training is complete. Training can take up to 10 minutes, depending on the complexity and size of the data, but will typically take less time. A successful response is a simple echo of the data location as shown here:

{
  "kind":"prediction#training",
  "id":"bucket/object",
  "selfLink":"https://www.googleapis.com/prediction/v1.3/URL_of_resource,
}
POST https://www.googleapis.com/prediction/v1.3/training

Try it now in the APIs Explorer!

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.training.update (AUTHENTICATED)

[Categorization models only]  Streaming training: trains a previously trained model against a new example. This is useful if you have a regular stream of new information that you'd like to add to your model as it becomes available, rather than having to recompile, re-upload, and retrain the data with batches of new data. The model is not retrained each time it receives a new example; rather, it retrains after every N new examples have been added, where N is a small number.

Note that the system may weight newer streamed examples more than earlier examples. If you do not want this, you should add the examples to your training data and retrain the system against all the data by calling prediction.training.insert().

Note: If you retrain a model against its original training data file, all the streamed data will be lost.

The request takes a JSON object with the following parameters:

{
  "classLabel" : my_label
  "csvInstance: [ col1, col2....colN ]
}
classLabel
The category label to assign to this example. Only category examples can be streamed to an existing model.
csvInstance
The example data as an array of columns, in the same format as the CSV file.
PUT https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject

Try it now in the APIs Explorer!

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.training.delete (AUTHENTICATED)

Deletes a trained model. Only the user who inserted (trained) a model can delete it.

If successful, an empty response is returned. Otherwise, an appropriate HTTP or Prediction API error will be returned. 

DELETE https://www.googleapis.com/prediction/v1.3/training/bucket%2Fobject

Try it now in the APIs Explorer!

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

prediction.training.predict (AUTHENTICATED)

Run a prediction request against your model.

Input data is an object with the following syntax:

{
  "input":{
    "csvInstance":[ col1_value, col2_value, ... ]
  }
}

Where col1_value, col2_value, and so on are entity features, as described by the hosted model's documentation. Note that string fields must be surrounded by escaped quotes.

Here's an example request to a hosted model that predicts a person's height, if the model expects a string gender ("M" or "F"), two height numbers, and a string country name:

{
  "input":{
    "csvInstance":["M", 1.59, 1.51,"France"]
  }
}

Notes on categorical model response scores:

  • Score values range from 0.0–1.0, with 1.0 being the highest. All values should add up to 1.0. Note: if you used an earlier version of the API that used a different range, you must retrain your data model in order for scores to be scaled to 0.0–1.0.
  • Consider having a cutoff value above which the categorization is useful and below which you might ignore it. We can't advise a hard cutoff value; instead, try running a few queries for borderline items, and use that as an approximate cutoff value for your categories.
  • These values are not probabilities; that is, they are not the confidence that a rating is correct. They are a measure of how closely a category seems to conform to the query item.
  • Scores are relative to each other, and do not need to add up to a specific value (for example, to 1.0).
  • It is hard to say absolutely what is a significant difference in scores. For example, is 0.33 is "significantly" better than 0.42? Is 0.25 "twice as good" as 0.125? Instead, assume that the highest value is the best fit, and have a cutoff value that, if the best fit is below it, you won't use the data. You'll have to experiment with the system to determine what is a meaningful cutoff value for your data.
POST https://www.googleapis.com/prediction/v1.3/training/mybucket%2Fmyobject/predict

Try it now in the APIs Explorer!

{
  "kind": "prediction#output",
  "id": string,
  "selfLink": string,
  "outputLabel": string,
  "outputMulti": [
    {
      "label": string,
      "score": double
    }
  ],
  "outputValue": double
}
Property Name Value Description
kind string What kind of resource this is.
id string The name of the hosted model.
selfLink string A URL to re-request this resource.
outputLabel string [Present in categorical models only] A predicted value for the submitted item, calculated based on given values in the training data.
outputMulti[] list [Present in categorical models only] The results, with one entry for every category in the training table, along with a score assigned to that category. The largest, most positive score is the most likely match. A value will be returned for every category present in the training data; you cannot currently specify how many categories to return. See the notes above in the method description.
outputMulti[].label string The category being described.
outputMulti[].score double A score associated with this category. Scores are typically negative; whether negative or positive, the largest score is the most likely. See notes below.
outputValue double [Present in regression models only] The category that best fits the submitted data.

Invoking this method requires the use of a token with access to:
https://www.googleapis.com/auth/prediction

Authentication required

You need to be signed in with Google+ to do that.

Signing you in...

Google Developers needs your permission to do that.