Connected Sheets allows you to analyze petabytes of data directly within Sheets. You can connect your spreadsheets with a BigQuery data warehouse and do the analysis by using familiar Sheets tools like pivot tables, charts and formulas.
This guide uses the public dataset shakespeare to show how to use Connected Sheets. The dataset contains the following information:
Field | Type | Description |
---|---|---|
word | STRING | A single unique word (where whitespace is the delimiter) extracted from a corpus |
word_count | INTEGER | The number of times this word appears in this corpus |
corpus | STRING | The work from which this word was extracted |
corpus_date | INTEGER | The year in which this corpus was published |
Working with a DataSource
Add a BigQuery data source
To add a data source, supply an AddDataSourceRequest in batchUpdate method. The request body should specify a DataSource field.
Replace <YOUR_PROJECT_ID>
below with a valid Google Cloud project ID.
"addDataSource":{
"dataSource":{
"spec":{
"bigQuery":{
"projectId":"<YOUR_PROJECT_ID>",
"tableSpec":{
"tableProjectId":"bigquery-public-data",
"datasetId":"samples",
"tableId":"shakespeare"
}
}
}
}
}
After a data source is created successfully, an associated DATA_SOURCE
sheet is created to provide a preview of up to 500 rows. The preview isn’t available immediately. An execution is triggered asynchronously to import BigQuery data.
The AddDataSourceResponse contains the following fields:
dataSource: The newly created DataSource. The dataSourceId is also populated and will be referenced to create each DataSource object from the data source.
dataExecutionStatus: As mentioned above, this is the status of an execution that imports BigQuery data into the preview sheet. See the DataExecutionStatus for more information.
Update or delete a data source
Use the batchUpdate method and supply an UpdateDataSourceRequest or DeleteDataSourceRequest request accordingly.
Working with DataSource Objects
Once a data source is added to the spreadsheet, a data source object can be created from the data source. A data source object is a regular Sheets tool like pivot tables, charts and formulas, but integrated with Connected Sheets, to power your analysis.
There are four types of objects:
- DataSource Table
- DataSource PivotTable
- DataSource Chart
- DataSource Formula
Add a DataSource Table
Also known as “Extract” in the Sheets editor, the object imports a static dump of the data from the data source into Sheets. Similar to a pivot table, the table is specified and anchored at the top left cell.
In this example, we use the batchUpdate method and supply an updateCells request to create a data source table of two columns word
and word_count
, up to 1000 rows
"updateCells":{
"rows":{
"values":[
{
"dataSourceTable":{
"dataSourceId":"<YOUR_DATA_SOURCE_ID>",
"columns":[
{
"name":"word"
},
{
"name":"word_count"
}
],
"rowLimit":{
"value":1000
},
"columnSelectionType":"SELECTED"
}
}
]
},
"fields":"dataSourceTable"
}
After a data source table is created successfully, the data isn’t available immediately. In the Sheets editor, it is shown as a
preview. You need to refresh the data source table to fetch BigQuery data. You can specify a RefreshDataSourceRequest within the same batchUpdate
. See Refresh a DataSource object below for details. Note that all data source objects work in a similar way.
After a refresh completes (BigQuery data is fetched), the data source table is populated as shown:
Add a DataSource PivotTable
Unlike a conventional pivot table, a data source pivot table is backed by a data source and references the data by column name. This example creates a pivot table showing total word count by corpus.
"updateCells":{
"rows":{
"values":[
{
"pivotTable":{
"dataSourceId":"<YOUR_DATA_SOURCE_ID>",
"rows":{
"dataSourceColumnReference":{
"name":"corpus"
},
"sortOrder":"ASCENDING"
},
"values":{
"summarizeFunction":"SUM",
"dataSourceColumnReference":{
"name":"word_count"
}
}
}
}
]
},
"fields":"pivotTable"
}
After BigQuery data is fetched, the data source pivot table is populated as shown:
Add a DataSource Chart
This example creates a data source chart, with COLUMN type, showing total word count by corpus.
"addChart":{
"chart":{
"spec":{
"title":"Corpus by word count",
"basicChart":{
"chartType":"COLUMN",
"domains":[
{
"domain":{
"columnReference":{
"name":"corpus"
}
}
}
],
"series":[
{
"series":{
"columnReference":{
"name":"word_count"
},
"aggregateType":"SUM"
}
}
]
}
},
"dataSourceChartProperties":{
"dataSourceId":"<YOUR_DATA_SOURCE_ID>"
}
}
}
After BigQuery data is fetched, the data source chart is rendered as shown:
Add a DataSource Formula
This example creates a data source formula to compute the average word count.
"updateCells":{
"rows":[
{
"values":[
{
"userEnteredValue":{
"formulaValue":"=AVERAGE(shakespeare!word_count)"
}
}
]
}
],
"fields":"userEnteredValue"
}
After BigQuery data is fetched, the data source formula is populated as shown:
Refresh a DataSource object
You can refresh a data source object to fetch the latest data from BigQuery based on the current data source specs and object configurations. You can use batchUpdate method, supply a RefreshDataSourceRequest and specify one or multiple objects to refresh via DataSourceObjectReferences.
Note that you can both create data source objects and refresh them within one batchUpdate
request.
DataExecutionStatus
When you create new data sources or refresh data source objects, an execution is created in the background to fetch the data from BigQuery and return a response containing the execution status. If the execution starts successfully, the execution status is
usually in the RUNNING
state.
Because the process is asynchronous, your application should implement a polling model to retrieve the status for data source objects periodically, using spreadsheets.get), until the status stores either SUCCEEDED
or FAILED
state. The execution completes quickly in most cases, but this depends on the complexity of your data source spec. In any case, the execution should take no longer than 10 minutes.