Data types and semantic types

When you build a community connector, each field that you define in the schema requires a data type. The data type defines the field's primitive type such as BOOLEAN, STRING, NUMBER, etc.

In addition to data types, Data Studio also makes use of semantic types. Semantic types help to describe the kind of information the data represents. For example, a field with a NUMBER data type may semantically represent a currency amount or percentage and a field with a STRING data type may semantically represent a city.

Community Connector schema and Data Studio fields

When you define the schema for your community connector, there are various properties for each field that will determine how the field is represented and used in Data Studio. For example:

  • The concept type is defined in your connector schema using the conceptType property. This property determines whether the field is treated as a dimension or metric.
  • The semantic type is automatically detected by Data Studio based on the data type property defined in your connector and the data values returned by your connector. The semantic type of a field cannot be set directly in the schema. See Automatic semantic type detection for details on how Data Studio automatically sets semantic types for fields.
  • The aggregation type determines whether the metric (dimensions are ignored) values can be reaggregated. This can only be defaulted to the SUM aggregation type by setting the semantics.isReaggregatable property to true, otherwise it is set to Auto.

When you configure and connect using a connector in Data Studio, the fields editor shows the complete schema for the connector based on how you've defined the properties above and Automatic semantic type detection. For example: Fields Screen

Automatic semantic type detection

Data Studio attempts to automatically detect semantic types for your community connector schema based on the data type property and the format of the data values returned by your connector.

The steps of the automatic detection process are as follows:

  1. Request the schema by executing the getSchema function of your community connector.
  2. Iterate through batches of fields defined in the connector schema and issue getData requests the fields. The getData requests are executed with the sampleExtraction parameter set to true to indicate the data requests are for the purposes of semantic detection.
  3. Based on the field data type and the format of the value returned from the getData request, identify the semantic type of the field.

Options for handling semantic type detection

It is not possible to properly set all semantic types with the automatic detection provided by Data Studio.

There are several ways to improve semantic type detection in your code:

  • Recommended: Pass predefined values
    Return a predefined value for each field that best represents the semantic type for the field and is known to be properly be detected by Data Studio. For example, if the semantic type for a field is Country then return a value such as IT for Italy. The other benefit of this approach is that it is much quicker since it does not require you to make HTTP requests to the third-party service for data.

  • Return only n number of records
    If the third-party service from which you're fetching data supports row limits when requesting data then return a small subset of rows to Data Studio instead of the full data set. This will limit the amount of data you need to pass to Data Studio for each semantic detection request.

  • Request all columns and cache the response
    If it's possible to request all columns for the third-party service from which you're fetching data then on the first semantic detection request received from Data Studio fetch all columns and cache the results. For subsequent semantic detection requests fetch column values from the cache instead of making additional HTTP requests to the third-party service.

  • Do nothing different
    You can choose to not implement any specific accommodation for requests where sampleExtraction is set to true. This will cause the Semantic Detection process to be slow since Data Studio will fetch all data for the Semantic Detection process. In addition, this will affect the request rate to your external data source since many semantic detection requests will be executed in parallel.

Recognized formats for automatic semantic type detection

Date & Time

  • YYYY-MM-DD [HH:MM:SS[.uuuuuu]]
  • YYYY/MM/DD [HH:MM:SS[.uuuuuu]]
  • YYYYMMDD [HH:MM:SS[.uuuuuu]]
  • Sat, 24 May 2008 20:09:47 GMT
  • 2008-05-24T20:09:47Z
  • Time: epoch for second, micro, milli, and nano.