Create a dataset

Creating a dataset is a two-step process:

  1. Make a request to create the dataset.

  2. Make a request to upload data to the dataset.

After the initial data upload, you can upload new data to the dataset to create a new version of the dataset.

Prerequisites

When creating a dataset:

  • Display names must be unique within your Google Cloud project.
  • Display names must be less than 64 bytes (Because these characters are represented in UTF-8, in some languages each character can be represented by multiple bytes).
  • Descriptions must be less than 1000 bytes.

When uploading data:

  • The supported file types are CSV, GeoJSON, and KML.
  • The maximum supported file size is 350 MB.
  • Attribute column names cannot begin with the string "?_".
  • Three-dimensional geometries are not supported. This includes the "Z" suffix in the WKT format, and the altitude coordinate in the GeoJSON format.

Data preparation best practices

If your source data is complex or large, such as dense points, long linestrings or polygons (often source file sizes larger than 50 MB fall into this category), consider simplifying your data before uploading to achieve the best performance in a visual map.

Here are some best practices for preparing your data:

  1. Minimize feature properties. Only keep feature properties needed to style your map, for example "id" and "category". You can join additional properties to a feature in a client application using data-driven styles on a unique identifier key. For example, see See your data in real time with Data-driven styling.
  2. Use simple data types for property objects where possible, such as integers, to minimize tile size and improve map performance.
  3. Simplify complex geometries prior to uploading a file. You can do this in a geospatial tool of your choice, such as the open source Mapshaper.org utility, or in BigQuery using ST_Simplify on complex polygon geometries.
  4. Cluster very dense points prior to uploading a file. You can do this in a geospatial tool of your choice, such as the open source turf.js cluster functions, or in BigQuery using ST_CLUSTERDBSCAN on dense point geometries.

See additional guidance about datasets best practices in Visualize your data with Datasets and BigQuery.

GeoJSON requirements

Maps Datasets API supports the current GeoJSON specification. Maps Datasets API also support GeoJSON files that contain any of the following object types:

  • Geometry objects. A geometry object is a spatial shape, described as a union of points, lines, and polygons with optional holes.
  • Feature objects. A feature object contains a geometry plus additional name/value pairs, whose meaning is application-specific.
  • Feature collections. A feature collection is a set of feature objects.

Maps Datasets API does not support GeoJSON files that have data in a coordinate reference system (CRS) other than WGS84.

For more information on GeoJSON, see RFC 7946 compliant.

KML requirements

Maps Datasets API has the following requirements:

  • All URLs must be local (or relative) to the file itself.
  • Point, line, and polygon geometries supported.
  • All data attributes are considered strings.
The following KML features are not supported:
  • Icons or <styleUrl> defined outside of the file.
  • Network links, such as <NetworkLink>
  • Ground overlays, such as <GroundOverlay>
  • 3D geometries or any altitude-related tags such as <altitudeMode>
  • Camera specifications such as <LookAt>
  • Styles defined inside the KML file.

CSV requirements

For CSV files, the supported column names are listed below in order of priority:

  • latitude, longitude
  • lat, long
  • x, y
  • wkt (Well-Known Text)
  • address, city, state, zip
  • address
  • A single column containing all address information, such as 1600 Amphitheatre Parkway Mountain View, CA 94043

For example, your file contains columns named x, y, and wkt. Because x and y have a higher priority, as determined by the order of supported column names in the list above, the values in the x and y columns are used and the wkt column is ignored.

In addition:

  • Each column name must belong to a single column. That is, you cannot have a column named xy that contains both x and y coordinate data. The x and y coordinates must be in separate columns.
  • Column names are case-insensitive.
  • The order of the column names does not matter. For example, if your CSV file contains lat and long columns, they can occur in any order.

Handle data upload errors

When uploading data to a dataset, you might experience one of the common errors described in this section.

GeoJSON errors

Common GeoJSON errors include:

  • Missing type field, or the type is not a string. The uploaded GeoJSON data file must contain a string field named type as part of each Feature object and Geometry object definition.

KML errors

Common KML errors include:

  • The data file must not contain any of the unsupported KML features listed above, otherwise the data import might fail.

CSV errors

Common CSV errors include:

  • Some rows are missing values for a geometry column. All rows in a CSV file must contain non-empty values for the geometry columns. The geometry columns include:
    • latitude, longitude
    • lat, long
    • x, y
    • wkt
    • address, city, state, zip
    • address
    • A single column containing all address information, such as 1600 Amphitheatre Parkway Mountain View, CA 94043
  • If x and y are your geometry columns, ensure that the units are longitude and latitude. Some public datasets use different coordinate systems under the headers x and y. If the wrong units are used, the dataset might import successfully, but the rendered data can show the dataset points in unexpected locations.

Create the dataset

Create a dataset by sending a POST request to the datasets endpoint:

https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets

Pass a JSON body to the request defining the dataset. You must:

  • Specify the displayName of the dataset. The value of displayName must be unique for all datasets.

  • Set usage to USAGE_DATA_DRIVEN_STYLING.

For example:

curl -X POST -d '{
    "displayName": "My Test Dataset", 
    "usage": "USAGE_DATA_DRIVEN_STYLING"
  }' \
  -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \
  -H 'Content-Type: application/json' \
  -H "Authorization: Bearer $TOKEN" \
  https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets

The response contains the ID of the dataset, in the form projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID along with additional information. Use the dataset ID when making requests to update or modify the dataset.

{
  "name": "projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46",
  "displayName": "My Test Dataset",
  "usage": [
    "USAGE_DATA_DRIVEN_STYLING"
  ],
  "createTime": "2022-08-15T17:50:00.189682Z",
  "updateTime": "2022-08-15T17:50:00.189682Z" 
}

Upload data to the dataset

After you create the dataset, upload the data from Google Cloud Storage or from a local file to the dataset.

Upload data from Cloud Storage

You upload from Cloud Storage to your dataset by sending a POST request to the datasets endpoint that also includes the ID of the dataset:

https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID:import

In the JSON request body:

  • Use inputUri to specify the file path to the resource containing the data in Cloud Storage. This path is in the form gs://GCS_BUCKET/FILE.

    The user making the request requires the Storage Object Viewer role, or any other role that includes the storage.objects.get permission. For more information about managing access to Cloud Storage, see Overview of access control.

  • Use fileFormat to specify the file format of the data as either: FILE_FORMAT_GEOJSON (GeoJson file), FILE_FORMAT_KML (KML file), or FILE_FORMAT_CSV (CSV file).

For example:

curl -X POST  -d '{
    "gcs_source":{
      "inputUri": "gs://my_bucket/my_csv_file",
      "fileFormat": "FILE_FORMAT_CSV"
    }
  }' \
  -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \
  -H "content-type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  https://mapsplatformdatasets.googleapis.com/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:import

The response is in the form:

{
  "name": "projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID@VERSION_NUMBER"
}

Upload data from a file

To upload data from a file, send an HTTP POST request to the datasets endpoint that also includes the ID of the dataset::

https://mapsplatformdatasets.googleapis.com/upload/v1/projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID:import

The request contains:

  • The Goog-Upload-Protocol header is set to multipart.

  • The metadata property specifying the path to a file that specifies the type of data to upload, as either: FILE_FORMAT_GEOJSON (GeoJSON file), FILE_FORMAT_KML (KML file), or FILE_FORMAT_CSV (CSV file).

    The contents of this file have the following format:

    {"local_file_source": {"file_format": "FILE_FORMAT_GEOJSON"}}
  • The rawdata property specifying the path to the GeoJSON, KML, or CSV file containing the data to upload.

The following request uses the curl -F option to specify the path to the two files:

curl -X POST \
  -H 'X-Goog-User-Project: PROJECT_NUMBER_OR_ID' \
  -H "Authorization: Bearer $TOKEN" \
  -H "X-Goog-Upload-Protocol: multipart" \
  -F "metadata=@csv_metadata_file" \
  -F "rawdata=@csv_data_file" \
  https://mapsplatformdatasets.googleapis.com/upload/v1/projects/PROJECT_NUMBER_OR_ID/datasets/f57074a0-a8b6-403e-9df1-e9fc46:import

The response is in the form:

{
  "name": "projects/PROJECT_NUMBER_OR_ID/datasets/DATASET_ID@VERSION_NUMBER"
}

Upload new data to the dataset

After you create the dataset and upload the initial data successfully, the state of the dataset is set to STATE_COMPLETED. That means the dataset is ready to use in your app. To determine the state of the dataset, see Get a dataset.

You can also upload new data to the dataset to create a new version of the dataset. To upload new data, use the same process as you did to Upload data from Cloud Storage or Upload data from a file, and specify the new data to upload.

If the new data uploads successfully:

  • The state of the new version of the dataset is set to STATE_COMPLETED.

  • The new version becomes the "active" version and is the version used by your app.

If there is an error in the upload:

  • The state of the new dataset version is set to one of the following states:

    • STATE_IMPORT_FAILED
    • STATE_PROCESSING_FAILED
    • STATE_PUBLISHING_FAILED
    • STATE_DELETION_FAILED
  • The previous dataset successful version stays as the "active" version and is the version used by your app.