编写地点数据分析查询

如需访问 Places Insights 数据,您需要在 BigQuery 中编写 SQL 查询,以返回有关地点的汇总分析洞见。结果会根据查询中指定的搜索条件从数据集中返回。

查询要求

查询中的 SELECT 语句必须包含 WITH AGGREGATION_THRESHOLD 并指定数据集。例如:

SELECT WITH AGGREGATION_THRESHOLD
COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`

在此示例中,您使用 FROM 指定美国所用的 places_insights___us___sample.places_sample 数据集。

指定位置限制

请注意,上一个查询未指定任何位置限制,这意味着数据汇总应用于整个数据集。通常,您会指定如下所示的位置限制:

SELECT WITH AGGREGATION_THRESHOLD
COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`
WHERE
ST_DWITHIN(ST_GEOGPOINT(-73.9857, 40.7484), point, 1000)

在此示例中,您的查询指定了以纽约市帝国大厦为中心、半径为 1,000 米的目标限制。

您可以使用多边形指定搜索区域。使用多边形时,多边形的点必须定义一个闭环,其中多边形的第一个点与最后一个点相同:

SELECT WITH AGGREGATION_THRESHOLD
COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`
WHERE
ST_CONTAINS(ST_GEOGFROMTEXT("""POLYGON((-73.985708 40.75773,-73.993324 40.750298,
                                      -73.9857 40.7484,-73.9785 40.7575,
                                      -73.985708 40.75773))"""), point)

在下一个示例中,您将使用一条由若干个相连的点组成的线来定义搜索区域。此线条类似于 Routes API 计算出的出行路线。路线可能适用于车辆、自行车或步行者:

DECLARE route GEOGRAPHY;

SET route = ST_GEOGFROMTEXT("""LINESTRING(-73.98903537033028 40.73655649223003,
                                          -73.93580216278471 40.80955538843361)""");

SELECT WITH AGGREGATION_THRESHOLD
  COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`
WHERE
  ST_DWITHIN(route, point, 100)

在此示例中,您将搜索半径设置为线路周围 100 米。

按地点数据集字段过滤

根据数据集架构定义的字段优化搜索。根据数据集字段(例如地点 regular_opening_hoursprice_level 和客户 rating)过滤结果。

引用由您感兴趣的国家/地区的数据集架构定义的任何字段。每个国家/地区的数据集架构都包含两部分:

例如,您的查询可以包含一个 WHERE 子句,用于定义查询的过滤条件。在以下示例中,您将返回类型为 restaurantbusiness_statusOPERATIONALrating 大于或等于 4.0 且 allows_dogs 设置为 true 的地点的汇总数据:

SELECT WITH AGGREGATION_THRESHOLD
COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`
WHERE
ST_DWITHIN(ST_GEOGPOINT(-73.9857, 40.7484), point, 1000)
AND 'restaurant' IN UNNEST(types)
AND business_status = "OPERATIONAL"
AND rating >= 4.0
AND allows_dogs = true

以下查询会返回至少有 8 个电动车辆充电站的地点:

SELECT WITH AGGREGATION_THRESHOLD
  COUNT(*) AS count
FROM
  `places_insights___us___sample.places_sample`
WHERE
  ev_charge_options.connector_count > 8;

按地点主要类型和地点类型进行过滤

数据集中的每个地点都可以有:

  • 表 A 定义的类型相关联的单个主要类型。例如,主要类型可能是 mexican_restaurantsteak_house。在查询中使用 primary_type 可按地点的主要类型过滤结果。

  • 表 A 定义的类型关联的多个类型值。例如,某餐厅可能具有以下类型:seafood_restaurantrestaurantfoodpoint_of_interestestablishment。在查询中使用 types 可根据与地点相关联的类型列表过滤结果。

以下查询会返回主要类型为 bar 但也可用作 restaurant 的所有地点的结果:

SELECT WITH AGGREGATION_THRESHOLD
  COUNT(*) AS count
FROM
  `places_insights___us___sample.places_sample`
WHERE
  'restaurant' IN UNNEST(types)
  AND 'bar' = primary_type

按预定义的数据值过滤

许多数据集字段都具有预定义的值。例如

  • price_level 字段支持以下预定义值:

    • PRICE_LEVEL_FREE
    • PRICE_LEVEL_INEXPENSIVE
    • PRICE_LEVEL_MODERATE
    • PRICE_LEVEL_EXPENSIVE
    • PRICE_LEVEL_VERY_EXPENSIVE
  • business_status 字段支持以下预定义值:

    • OPERATIONAL
    • CLOSED_TEMPORARILY
    • CLOSED_PERMANENTLY

在此示例中,查询会返回纽约市帝国大厦方圆 1, 000 米内 business_statusOPERATIONAL 的所有餐厅的数量:

SELECT WITH AGGREGATION_THRESHOLD
COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`
WHERE
ST_DWITHIN(ST_GEOGPOINT(-73.9857, 40.7484), point, 1000)
AND business_status = "OPERATIONAL"
AND 'restaurant' IN UNNEST(types)

按营业时间过滤

在此示例中,返回地理区域内所有提供周五欢乐时光的场所的数量:

SELECT WITH AGGREGATION_THRESHOLD COUNT(*) AS count
FROM `places_insights___us___sample.places_sample`,
UNNEST(regular_opening_hours_happy_hour.friday) AS friday_hours
WHERE '17:00:00' BETWEEN friday_hours.start_time AND friday_hours.end_time
AND ST_DWITHIN(ST_GEOGPOINT(-73.9857, 40.7484), point, 1000);

使用聚合函数

以下示例展示了受支持的 BigQuery 聚合函数。 此查询会汇总纽约市帝国大厦方圆 1,000 米内的所有地点的评分,以生成评分统计信息:

SELECT WITH AGGREGATION_THRESHOLD
  COUNT(id) AS place_count,
  APPROX_COUNT_DISTINCT(rating) as distinct_ratings,
  COUNTIF(rating > 4.0) as good_rating_count,
  LOGICAL_AND(rating <= 5) as all_ratings_equal_or_below_five,
  LOGICAL_OR(rating = 5) as any_rating_exactly_five,
  AVG(rating) as avg_rating,
  SUM(user_rating_count) as rating_count,
  COVAR_POP(rating, user_rating_count) as rating_covar_pop,
  COVAR_SAMP(rating, user_rating_count) as rating_covar_samp,
  STDDEV_POP(rating) as rating_stddev_pop,
  STDDEV_SAMP(rating) as rating_stddev_samp,
  VAR_POP(rating) as rating_var_pop,
  VAR_SAMP(rating) as rating_var_samp,
FROM
  `places_insights___us___sample.places_sample`
WHERE
  ST_DWITHIN(ST_GEOGPOINT(-73.9857, 40.7484), point, 1000)
  AND business_status = "OPERATIONAL"

返回结果组

到目前为止,我们展示的查询会在结果中返回一行,其中包含查询的汇总计数。您还可以使用 GROUP BY 运算符根据分组条件在响应中返回多行。

例如,以下查询返回的结果按搜索区域内每个地点的主要类型分组:

SELECT WITH AGGREGATION_THRESHOLD
  primary_type,
  COUNT(*) AS count
FROM
  `places_insights___us___sample.places_sample`
WHERE
  ST_DWITHIN(ST_GEOGPOINT(-73.99992071622756, 40.71818785986936), point, 1000)
GROUP BY primary_type

下图显示了此查询的示例输出:

按主要类型对结果进行分组的查询结果。

在此示例中,您定义了一个位置表。然后,针对每个位置,计算附近餐厅的数量(即 1, 000 米范围内的餐厅):

WITH my_locations AS (
  SELECT 'Location 1' AS name, ST_GEOGPOINT(-74.00776440888504, 40.70932825380786) AS location
  UNION ALL
  SELECT 'Location 2' AS name, ST_GEOGPOINT(-73.98257192833559, 40.750738934863215) AS location
  UNION ALL
  SELECT 'Location 3' AS name, ST_GEOGPOINT(-73.94701794263223, 40.80792954838445)  AS location
)
SELECT WITH AGGREGATION_THRESHOLD
  l.name,
  COUNT(*) as count
FROM
  `places_insights___us___sample.places_sample` p
JOIN
   my_locations l
ON
  ST_DWITHIN(l.location, p.point, 1000)
WHERE
  primary_type = "restaurant"
  AND business_status = "OPERATIONAL"
GROUP BY
  l.name

下图显示了此查询的示例输出:

按位置对结果进行分组的查询结果。