Aggregation queries

An aggregation query processes the data from multiple indexed entities to return a single summary value. Firestore in Datastore mode supports the following aggregation queries:

  • count()
  • sum()
  • avg()

Aggregation queries simplify your application code and cost less than fetching each entity for processing. Read this page to learn how to use aggregation queries.

count() aggregation

Use the count() aggregation to return the total number of indexed entities that match a given query. For example, this count() aggregation returns the total number of entities in a kind.

Java
import static com.google.cloud.datastore.aggregation.Aggregation.count;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.common.collect.Iterables;

public class CountAggregationOnKind {
  // Instantiates a client.
  private static final Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

  // The kind for the new entity.
  private static final String kind = "Task";

  // Setting up Tasks in database
  private static void setUpTasks() {
    Key task1Key = datastore.newKeyFactory().setKind(kind).newKey("task1");
    Key task2Key = datastore.newKeyFactory().setKind(kind).newKey("task2");
    Key task3Key = datastore.newKeyFactory().setKind(kind).newKey("task3");

    // Save all the tasks.
    datastore.put(
        Entity.newBuilder(task1Key).set("done", true).build(),
        Entity.newBuilder(task2Key).set("done", false).build(),
        Entity.newBuilder(task3Key).set("done", true).build());
  }

  // Accessing aggregation result by the generated alias.
  private static void usageWithGeneratedAlias() {
    EntityQuery selectAllTasks = Query.newEntityQueryBuilder().setKind(kind).build();
    // Creating an aggregation query to get the count of all tasks.
    AggregationQuery allTasksCountQuery =
        Query.newAggregationQueryBuilder().over(selectAllTasks).addAggregation(count()).build();
    // Executing aggregation query.
    AggregationResult aggregationResult =
        Iterables.getOnlyElement(datastore.runAggregation(allTasksCountQuery));

    System.out.printf(
        "Total tasks (accessible from default alias) is %d",
        aggregationResult.get("property_1")); // 3
  }

  // Accessing aggregation result by the provided custom alias.
  private static void usageWithCustomAlias() {
    EntityQuery selectAllTasks = Query.newEntityQueryBuilder().setKind(kind).build();
    // Creating an aggregation query to get the count of all tasks.
    AggregationQuery allTasksCountQuery =
        Query.newAggregationQueryBuilder()
            .over(selectAllTasks)
            // passing 'total_count' as alias in the aggregation query.
            .addAggregation(count().as("total_count"))
            .build();
    // Executing aggregation query.
    AggregationResult aggregationResult =
        Iterables.getOnlyElement(datastore.runAggregation(allTasksCountQuery));

    System.out.printf("Total tasks count is %d", aggregationResult.get("total_count")); // 3
  }

  public static void invoke() {
    setUpTasks();
    usageWithGeneratedAlias();
    usageWithCustomAlias();
  }
}
Python
task1 = datastore.Entity(client.key("Task", "task1"))
task2 = datastore.Entity(client.key("Task", "task2"))

tasks = [task1, task2]
client.put_multi(tasks)
all_tasks_query = client.query(kind="Task")
all_tasks_count_query = client.aggregation_query(all_tasks_query).count()
query_result = all_tasks_count_query.fetch()
for aggregation_results in query_result:
    for aggregation in aggregation_results:
        print(f"Total tasks (accessible from default alias) is {aggregation.value}")
Go
aggregationCountQuery := datastore.NewQuery("Task").
  NewAggregationQuery().
  WithCount("total_tasks")

countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery)

count := countResults["total_tasks"]
countValue := count.(*datastorepb.Value)
fmt.Printf("Number of results from query: %d\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT(*) AS total OVER ( SELECT * AS total FROM tasks )

GQL supports a simplified form of count() queries:

SELECT COUNT(*) AS total FROM tasks

This example uses an optional alias of total.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

The count() aggregation takes into account any filters on the query and any limit clauses. For example, the following aggregation returns a count of the number of entities that match the given filters.

Java

import static com.google.cloud.datastore.aggregation.Aggregation.count;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.cloud.datastore.StructuredQuery.PropertyFilter;
import com.google.common.collect.Iterables;

public class CountAggregationWithPropertyFilter {

  public static void invoke() {
    // Instantiates a client.
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    // The kind for the new entity.
    String kind = "Task";

    Key task1Key = datastore.newKeyFactory().setKind(kind).newKey("task1");
    Key task2Key = datastore.newKeyFactory().setKind(kind).newKey("task2");
    Key task3Key = datastore.newKeyFactory().setKind(kind).newKey("task3");

    // Save all the tasks.
    datastore.put(
        Entity.newBuilder(task1Key).set("done", true).build(),
        Entity.newBuilder(task2Key).set("done", false).build(),
        Entity.newBuilder(task3Key).set("done", true).build());

    EntityQuery completedTasks =
        Query.newEntityQueryBuilder()
            .setKind(kind)
            .setFilter(PropertyFilter.eq("done", true))
            .build();
    EntityQuery remainingTasks =
        Query.newEntityQueryBuilder()
            .setKind(kind)
            .setFilter(PropertyFilter.eq("done", false))
            .build();
    // Creating an aggregation query to get the count of all completed tasks.
    AggregationQuery completedTasksCountQuery =
        Query.newAggregationQueryBuilder()
            .over(completedTasks)
            .addAggregation(count().as("total_completed_count"))
            .build();
    // Creating an aggregation query to get the count of all remaining tasks.
    AggregationQuery remainingTasksCountQuery =
        Query.newAggregationQueryBuilder()
            .over(remainingTasks)
            .addAggregation(count().as("total_remaining_count"))
            .build();

    // Executing aggregation query.
    AggregationResult completedTasksCountQueryResult =
        Iterables.getOnlyElement(datastore.runAggregation(completedTasksCountQuery));
    AggregationResult remainingTasksCountQueryResult =
        Iterables.getOnlyElement(datastore.runAggregation(remainingTasksCountQuery));

    System.out.printf(
        "Total completed tasks count is %d",
        completedTasksCountQueryResult.get("total_completed_count")); // 2
    System.out.printf(
        "Total remaining tasks count is %d",
        remainingTasksCountQueryResult.get("total_remaining_count")); // 1
  }
}
Python
task1 = datastore.Entity(client.key("Task", "task1"))
task2 = datastore.Entity(client.key("Task", "task2"))
task3 = datastore.Entity(client.key("Task", "task3"))

task1["done"] = True
task2["done"] = False
task3["done"] = True

tasks = [task1, task2, task3]
client.put_multi(tasks)
completed_tasks = client.query(kind="Task").add_filter("done", "=", True)
remaining_tasks = client.query(kind="Task").add_filter("done", "=", False)

completed_tasks_query = client.aggregation_query(query=completed_tasks).count(
    alias="total_completed_count"
)
remaining_tasks_query = client.aggregation_query(query=remaining_tasks).count(
    alias="total_remaining_count"
)

completed_query_result = completed_tasks_query.fetch()
for aggregation_results in completed_query_result:
    for aggregation_result in aggregation_results:
        if aggregation_result.alias == "total_completed_count":
            print(f"Total completed tasks count is {aggregation_result.value}")

remaining_query_result = remaining_tasks_query.fetch()
for aggregation_results in remaining_query_result:
    for aggregation_result in aggregation_results:
        if aggregation_result.alias == "total_remaining_count":
            print(f"Total remaining tasks count is {aggregation_result.value}")
Go
aggregationCountQuery := datastore.NewQuery("Task").
  FilterField("done", "=", true).
  NewAggregationQuery().
  WithCount("total_tasks_done")

countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery)

count := countResults["total_tasks_done"]
countValue := count.(*datastorepb.Value)
fmt.Printf("Number of results from query: %d\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT(*) OVER ( SELECT * FROM tasks WHERE is_done = false AND tag = 'house')

GQL supports a simplified form of count() queries:

SELECT COUNT(*) AS total
FROM tasks
WHERE is_done = false AND tag = 'house'

This example uses an optional alias of total.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

This example shows how to count up to a certain value. You can use this to, for example, stop counting at a certain number and inform users that they exceeded that number.

Java

import static com.google.cloud.datastore.aggregation.Aggregation.count;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.common.collect.Iterables;

public class CountAggregationWithLimit {
  public static void invoke() {
    // Instantiates a client.
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    // The kind for the new entity.
    String kind = "Task";

    Key task1Key = datastore.newKeyFactory().setKind(kind).newKey("task1");
    Key task2Key = datastore.newKeyFactory().setKind(kind).newKey("task2");
    Key task3Key = datastore.newKeyFactory().setKind(kind).newKey("task3");

    // Save all the tasks.
    datastore.put(
        Entity.newBuilder(task1Key).set("done", true).build(),
        Entity.newBuilder(task2Key).set("done", false).build(),
        Entity.newBuilder(task3Key).set("done", true).build());

    EntityQuery selectAllTasks = Query.newEntityQueryBuilder().setKind(kind).setLimit(2).build();
    // Creating an aggregation query to get the count of all tasks.
    AggregationQuery allTasksCountQuery =
        Query.newAggregationQueryBuilder()
            .over(selectAllTasks)
            .addAggregation(count().as("at_least"))
            .build();
    // Executing aggregation query.
    AggregationResult limitQueryResult =
        Iterables.getOnlyElement(datastore.runAggregation(allTasksCountQuery));

    System.out.printf("We have at least %d tasks", limitQueryResult.get("at_least")); // 2
  }
}
Python
task1 = datastore.Entity(client.key("Task", "task1"))
task2 = datastore.Entity(client.key("Task", "task2"))
task3 = datastore.Entity(client.key("Task", "task3"))

tasks = [task1, task2, task3]
client.put_multi(tasks)
all_tasks_query = client.query(kind="Task")
all_tasks_count_query = client.aggregation_query(all_tasks_query).count()
query_result = all_tasks_count_query.fetch(limit=2)
for aggregation_results in query_result:
    for aggregation in aggregation_results:
        print(f"We have at least {aggregation.value} tasks")
Go
aggregationCountQuery := datastore.NewQuery("Task").
  Limit(2).
  NewAggregationQuery().
  WithCount("at_least")

countResults, err := client.RunAggregationQuery(ctx, aggregationCountQuery)

count := countResults["at_least"]
countValue := count.(*datastorepb.Value)
fmt.Printf("We have at least %d tasks\n", countValue.GetIntegerValue())
GQL
AGGREGATE COUNT_UP_TO(1000) OVER ( SELECT * FROM tasks WHERE is_done = false)

GQL supports a simplified form of count_up_to() queries:

SELECT COUNT_UP_TO(1000) AS total
FROM tasks
WHERE is_done = false AND tag = 'house'

This example uses an optional alias of total.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

sum() aggregation

Use the sum() aggregation to return the total sum of numeric values that match a given query. For example, the following sum() aggregation returns the total sum of numeric values of the given property from entities of the given kind:

Java

import static com.google.cloud.datastore.aggregation.Aggregation.sum;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.common.collect.Iterables;

public class SumAggregationOnKind {

  // Instantiates a client.
  private static final Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

  // The kind for the new entity.
  private static final String kind = "Sales";

  // Setting up Sales in database
  private static void setUpSales() {
    Key sales1Key = datastore.newKeyFactory().setKind(kind).newKey("sales1");
    Key sales2Key = datastore.newKeyFactory().setKind(kind).newKey("sales2");
    Key sales3Key = datastore.newKeyFactory().setKind(kind).newKey("sales3");

    // Save all the sales.
    datastore.put(
        Entity.newBuilder(sales1Key).set("amount", 89).build(),
        Entity.newBuilder(sales2Key).set("amount", 95).build(),
        Entity.newBuilder(sales3Key).set("amount", 55).build());
  }

  // Accessing aggregation result by the provided custom alias.
  private static void usageWithCustomAlias() {
    EntityQuery selectAllSales = Query.newEntityQueryBuilder().setKind(kind).build();
    // Creating an aggregation query to get the sum of all sales.
    AggregationQuery sumOfSalesQuery =
        Query.newAggregationQueryBuilder()
            .over(selectAllSales)
            // passing 'total_sales_amount' as alias in the aggregation query.
            .addAggregation(sum("amount").as("total_sales_amount"))
            .build();
    // Executing aggregation query.
    AggregationResult aggregationResult =
        Iterables.getOnlyElement(datastore.runAggregation(sumOfSalesQuery));

    System.out.printf("Total sales is %d", aggregationResult.getLong("total_sales_amount")); // 239
  }

  public static void invoke() {
    setUpSales();
    usageWithCustomAlias();
  }
}
Python
# Set up sample entities
# Use incomplete key to auto-generate ID
task1 = datastore.Entity(client.key("Task"))
task2 = datastore.Entity(client.key("Task"))
task3 = datastore.Entity(client.key("Task"))

task1["hours"] = 5
task2["hours"] = 3
task3["hours"] = 1

tasks = [task1, task2, task3]
client.put_multi(tasks)

# Execute sum aggregation query
all_tasks_query = client.query(kind="Task")
all_tasks_sum_query = client.aggregation_query(all_tasks_query).sum("hours")
query_result = all_tasks_sum_query.fetch()
for aggregation_results in query_result:
    for aggregation in aggregation_results:
        print(f"Total sum of hours in tasks is {aggregation.value}")
Go
aggregationSumQuery := datastore.NewQuery("Task").
  NewAggregationQuery().
  WithSum("hours", "total_hours")
sumResults, err := client.RunAggregationQuery(ctx, aggregationSumQuery)

sum := sumResults["total_hours"]
sumValue := sum.(*datastorepb.Value)
fmt.Printf("Sum of results from query: %d\n", sumValue.GetIntegerValue())
GQL
AGGREGATE
  SUM(hours) AS total_hours
OVER (
  SELECT *
  FROM tasks
)

GQL supports a simplified form of sum() queries:

SELECT SUM(hours) AS total_hours FROM tasks

This example uses an optional alias of total_hours.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

The sum() aggregation takes into account any filters on the query and any limit clauses. For example, the following aggregation returns a sum of the specified property with a numeric value in entities that match the given filters.

Java

import static com.google.cloud.datastore.aggregation.Aggregation.sum;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.cloud.datastore.StructuredQuery.PropertyFilter;
import com.google.common.collect.Iterables;

public class SumAggregationWithPropertyFilter {

  public static void invoke() {
    // Instantiates a client.
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    // The kind for the new entity.
    String kind = "Sales";

    Key sales1Key = datastore.newKeyFactory().setKind(kind).newKey("sales1");
    Key sales2Key = datastore.newKeyFactory().setKind(kind).newKey("sales2");
    Key sales3Key = datastore.newKeyFactory().setKind(kind).newKey("sales3");

    // Save all the tasks.
    datastore.put(
        Entity.newBuilder(sales1Key).set("amount", 89).set("customerId", 1).build(),
        Entity.newBuilder(sales2Key).set("amount", 95).set("customerId", 1).build(),
        Entity.newBuilder(sales3Key).set("amount", 55).set("customerId", 2).build());

    EntityQuery customer1Sales =
        Query.newEntityQueryBuilder()
            .setKind(kind)
            .setFilter(PropertyFilter.eq("customerId", 1))
            .build();

    // Creating an aggregation query to get the sum of all sales for customerId 1.
    AggregationQuery customer1SalesSum =
        Query.newAggregationQueryBuilder()
            .over(customer1Sales)
            .addAggregation(sum("amount").as("total_sales"))
            .build();

    // Executing aggregation query.
    AggregationResult customer1SalesSumQueryResult =
        Iterables.getOnlyElement(datastore.runAggregation(customer1SalesSum));

    System.out.printf(
        "Customer 1 sales sum is %d", customer1SalesSumQueryResult.getLong("total_sales")); // 184
  }
}
Python
# Set up sample entities
# Use incomplete key to auto-generate ID
task1 = datastore.Entity(client.key("Task"))
task2 = datastore.Entity(client.key("Task"))
task3 = datastore.Entity(client.key("Task"))

task1["hours"] = 5
task2["hours"] = 3
task3["hours"] = 1

task1["done"] = True
task2["done"] = True
task3["done"] = False

tasks = [task1, task2, task3]
client.put_multi(tasks)

# Execute sum aggregation query with filters
completed_tasks = client.query(kind="Task").add_filter("done", "=", True)
completed_tasks_query = client.aggregation_query(query=completed_tasks).sum(
    property_ref="hours",
    alias="total_completed_sum_hours"
)

completed_query_result = completed_tasks_query.fetch()
for aggregation_results in completed_query_result:
    for aggregation_result in aggregation_results:
        if aggregation_result.alias == "total_completed_sum_hours":
            print(f"Total sum of hours in completed tasks is {aggregation_result.value}")

This query requires an index such as:

- kind: Task
  properties:
  - name: done
  - name: hours
Go
aggregationSumQuery := datastore.NewQuery("Task").
  FilterField("done", "=", false).
  FilterField("tag", "=", "house").
  NewAggregationQuery().
  WithSum("hours", "total_hours")
sumResults, err := client.RunAggregationQuery(ctx, aggregationSumQuery)

sum := sumResults["total_hours"]
sumValue := sum.(*datastorepb.Value)
fmt.Printf("Sum of results from query: %d\n", sumValue.GetIntegerValue())
GQL
AGGREGATE
  SUM(hours) AS total_hours
OVER (
  SELECT *
  FROM tasks
  WHERE is_done = false AND tag = 'house'
)

GQL supports a simplified form of sum() queries:

SELECT
  SUM(hours) AS total_hours
FROM tasks
WHERE is_done = false AND tag = 'house'

This example uses an optional alias of total_hours.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

avg() aggregation

Use the avg() aggregation to return the average of numeric values that match a given query. For example, the following avg() aggregation returns the arithmetic mean of the specified property from the numeric property values of entities that match the query:

Java

import static com.google.cloud.datastore.aggregation.Aggregation.avg;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.common.collect.Iterables;

public class AvgAggregationOnKind {

  // Instantiates a client.
  private static final Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

  // The kind for the new entity.
  private static final String kind = "Sales";

  // Setting up Sales in database
  private static void setUpSales() {
    Key sales1Key = datastore.newKeyFactory().setKind(kind).newKey("sales1");
    Key sales2Key = datastore.newKeyFactory().setKind(kind).newKey("sales2");
    Key sales3Key = datastore.newKeyFactory().setKind(kind).newKey("sales3");

    // Save all the sales.
    datastore.put(
        Entity.newBuilder(sales1Key).set("amount", 89).build(),
        Entity.newBuilder(sales2Key).set("amount", 95).build(),
        Entity.newBuilder(sales3Key).set("amount", 55).build());
  }

  // Accessing aggregation result by the provided custom alias.
  private static void usageWithCustomAlias() {
    EntityQuery selectAllSales = Query.newEntityQueryBuilder().setKind(kind).build();
    // Creating an aggregation query to get the avg of all sales.
    AggregationQuery avgOfSalesQuery =
        Query.newAggregationQueryBuilder()
            .over(selectAllSales)
            // passing 'avg_sales_amount' as alias in the aggregation query.
            .addAggregation(avg("amount").as("avg_sales_amount"))
            .build();
    // Executing aggregation query.
    AggregationResult aggregationResult =
        Iterables.getOnlyElement(datastore.runAggregation(avgOfSalesQuery));

    System.out.printf(
        "Average sales is %.8f", aggregationResult.getDouble("avg_sales_amount")); // 79.66666667
  }

  public static void invoke() {
    setUpSales();
    usageWithCustomAlias();
  }
}
Python
# Set up sample entities
# Use incomplete key to auto-generate ID
task1 = datastore.Entity(client.key("Task"))
task2 = datastore.Entity(client.key("Task"))
task3 = datastore.Entity(client.key("Task"))

task1["hours"] = 5
task2["hours"] = 3
task3["hours"] = 1

tasks = [task1, task2, task3]
client.put_multi(tasks)

# Execute average aggregation query
all_tasks_query = client.query(kind="Task")
all_tasks_avg_query = client.aggregation_query(all_tasks_query).avg("hours")
query_result = all_tasks_avg_query.fetch()
for aggregation_results in query_result:
    for aggregation in aggregation_results:
        print(f"Total average of hours in tasks is {aggregation.value}")
Go
aggregationAvgQuery := datastore.NewQuery("Task").
  NewAggregationQuery().
  WithAvg("hours", "avg_hours")
avgResults, err := client.RunAggregationQuery(ctx, aggregationAvgQuery)

avg := avgResults["avg_hours"]
avgValue := avg.(*datastorepb.Value)
fmt.Printf("average hours: %f\n", avgValue.GetDoubleValue())
GQL
AGGREGATE
  AVG(hours) as avg_hours
OVER (
  SELECT *
  FROM tasks
)

GQL supports a simplified form of avg() queries:

SELECT AVG(hours) as avg_hours

This example uses an optional alias of avg_hours.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

The avg() aggregation takes into account any filters on the query and any limit clauses. For example, the following aggregation returns the arithmetic mean of the specified property from the numeric property values of entities that match the query filters.

Java

import static com.google.cloud.datastore.aggregation.Aggregation.avg;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.cloud.datastore.StructuredQuery.PropertyFilter;
import com.google.common.collect.Iterables;

public class AvgAggregationWithPropertyFilter {

  public static void invoke() {
    // Instantiates a client.
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    // The kind for the new entity.
    String kind = "Sales";

    Key sales1Key = datastore.newKeyFactory().setKind(kind).newKey("sales1");
    Key sales2Key = datastore.newKeyFactory().setKind(kind).newKey("sales2");
    Key sales3Key = datastore.newKeyFactory().setKind(kind).newKey("sales3");

    // Save all the tasks.
    datastore.put(
        Entity.newBuilder(sales1Key).set("amount", 89).set("customerId", 1).build(),
        Entity.newBuilder(sales2Key).set("amount", 95).set("customerId", 1).build(),
        Entity.newBuilder(sales3Key).set("amount", 55).set("customerId", 2).build());

    EntityQuery customer1Sales =
        Query.newEntityQueryBuilder()
            .setKind(kind)
            .setFilter(PropertyFilter.eq("customerId", 1))
            .build();

    // Creating an aggregation query to get the avg of all sales for customerId 1.
    AggregationQuery customer1SalesAvg =
        Query.newAggregationQueryBuilder()
            .over(customer1Sales)
            .addAggregation(avg("amount").as("total_sales"))
            .build();

    // Executing aggregation query.
    AggregationResult customer1SalesAvgQueryResult =
        Iterables.getOnlyElement(datastore.runAggregation(customer1SalesAvg));

    System.out.printf(
        "Customer 1 sales avg is %d", customer1SalesAvgQueryResult.getLong("total_sales")); // 92
  }
}
Python
# Set up sample entities
# Use incomplete key to auto-generate ID
task1 = datastore.Entity(client.key("Task"))
task2 = datastore.Entity(client.key("Task"))
task3 = datastore.Entity(client.key("Task"))

task1["hours"] = 5
task2["hours"] = 3
task3["hours"] = 1

task1["done"] = True
task2["done"] = True
task3["done"] = False

tasks = [task1, task2, task3]
client.put_multi(tasks)

# Execute average aggregation query with filters
completed_tasks = client.query(kind="Task").add_filter("done", "=", True)
completed_tasks_query = client.aggregation_query(query=completed_tasks).avg(
    property_ref="hours",
    alias="total_completed_avg_hours"
)

completed_query_result = completed_tasks_query.fetch()
for aggregation_results in completed_query_result:
    for aggregation_result in aggregation_results:
        if aggregation_result.alias == "total_completed_avg_hours":
            print(f"Total average of hours in completed tasks is {aggregation_result.value}")

This query requires an index such as:

- kind: Task
  properties:
  - name: done
  - name: hours
Go
aggregationAvgQuery := datastore.NewQuery("Task").
  FilterField("done", "=", false).
  FilterField("tag", "=", "house").
  NewAggregationQuery().
  WithAvg("hours", "avg_hours")
avgResults, err := client.RunAggregationQuery(ctx, aggregationAvgQuery)

avg := avgResults["avg_hours"]
avgValue := avg.(*datastorepb.Value)
fmt.Printf("average hours: %f\n", avgValue.GetDoubleValue())
GQL
AGGREGATE
  AVG(hours) as avg_hours
OVER (
  SELECT *
  FROM tasks
  WHERE is_done = false AND tag = 'house'
)

GQL supports a simplified form of avg() queries:

SELECT
  AVG(hours) as avg_hours
FROM tasks
WHERE is_done = false AND tag = 'house'

This example uses an optional alias of avg_hours.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

Calculate multiple aggregations in a query

You can combine multiple aggregations in a single aggregation pipeline. This can reduce the number of index reads required. If the query includes aggregations on multiple fields, the query requires a composite index, and each aggregation calculation includes only those entities that contain all the fields used by each aggregation.

The following example performs multiple aggregations in a single aggregation query:

Java

import static com.google.cloud.datastore.aggregation.Aggregation.avg;
import static com.google.cloud.datastore.aggregation.Aggregation.count;
import static com.google.cloud.datastore.aggregation.Aggregation.sum;

import com.google.cloud.datastore.AggregationQuery;
import com.google.cloud.datastore.AggregationResult;
import com.google.cloud.datastore.Datastore;
import com.google.cloud.datastore.DatastoreOptions;
import com.google.cloud.datastore.Entity;
import com.google.cloud.datastore.EntityQuery;
import com.google.cloud.datastore.Key;
import com.google.cloud.datastore.Query;
import com.google.common.collect.Iterables;

public class MultipleAggregationsInStructuredQuery {

  public static void invoke() {
    // Instantiates a client.
    Datastore datastore = DatastoreOptions.getDefaultInstance().getService();

    // The kind for the new entity.
    String kind = "Sales";

    Key sales1Key = datastore.newKeyFactory().setKind(kind).newKey("sales1");
    Key sales2Key = datastore.newKeyFactory().setKind(kind).newKey("sales2");
    Key sales3Key = datastore.newKeyFactory().setKind(kind).newKey("sales3");

    // Save all the sales.
    datastore.put(
        Entity.newBuilder(sales1Key).set("amount", 89).set("customerId", 1).build(),
        Entity.newBuilder(sales2Key).set("amount", 95).set("customerId", 1).build(),
        Entity.newBuilder(sales3Key).set("amount", 55).set("customerId", 2).build());

    EntityQuery baseQuery = Query.newEntityQueryBuilder().setKind(kind).build();

    // Creating an aggregation query with COUNT, SUM and AVG aggregations.
    AggregationQuery aggregationQuery =
        Query.newAggregationQueryBuilder()
            .over(baseQuery)
            .addAggregation(count().as("total_count"))
            .addAggregation(sum("amount").as("sales_sum"))
            .addAggregation(avg("amount").as("sales_avg"))
            .build();

    // Executing aggregation query.
    AggregationResult aggregationResult =
        Iterables.getOnlyElement(datastore.runAggregation(aggregationQuery));

    System.out.printf("Total sales count: %d", aggregationResult.getLong("total_count")); // 3
    System.out.printf("Sum of sales: %d", aggregationResult.getLong("sales_sum")); // 239
    System.out.printf(
        "Avg of sales: %.8f", aggregationResult.getDouble("sales_avg")); // 79.66666667
  }
}
Python
# Set up sample entities
# Use incomplete key to auto-generate ID
task1 = datastore.Entity(client.key("Task"))
task2 = datastore.Entity(client.key("Task"))
task3 = datastore.Entity(client.key("Task"))

task1["hours"] = 5
task2["hours"] = 3
task3["hours"] = 1

tasks = [task1, task2, task3]
client.put_multi(tasks)

# Execute query with multiple aggregations
all_tasks_query = client.query(kind="Task")
aggregation_query = client.aggregation_query(all_tasks_query)
# Add aggregations
aggregation_query.add_aggregations(
    [
        datastore.aggregation.CountAggregation(alias="count_aggregation"),
        datastore.aggregation.SumAggregation(
            property_ref="hours", alias="sum_aggregation"),
        datastore.aggregation.AvgAggregation(
            property_ref="hours", alias="avg_aggregation")
    ]
)

query_result = aggregation_query.fetch()
for aggregation_results in query_result:
    for aggregation in aggregation_results:
        print(f"{aggregation.alias} value is {aggregation.value}")
Go
aggregationQuery := datastore.NewQuery("Task").
  NewAggregationQuery().
  WithCount("total_tasks").
  WithSum("hours", "total_hours").
  WithAvg("hours", "avg_hours")
Results, err := client.RunAggregationQuery(ctx, aggregationQuery)

fmt.Printf("Number of results from query: %d\n", Results["total_tasks"].(*datastorepb.Value).GetIntegerValue())
fmt.Printf("Sum of results from query: %d\n", Results["total_hours"].(*datastorepb.Value).GetIntegerValue())
fmt.Printf("Avg of results from query: %f\n", Results["avg_hours"].(*datastorepb.Value).GetDoubleValue())
GQL
AGGREGATE 
  SUM(hours) AS total_hours, 
  COUNT(*) AS total_tasks
OVER (
  SELECT *
  FROM tasks
  WHERE is_done = false AND tag = 'house'
)

GQL supports a simplified form for aggregation queries:

SELECT
  SUM(hours) AS total_hours,
  COUNT(*) AS total_tasks
FROM tasks
WHERE is_done = false AND tag = 'house'

This example uses the optional aliases of total_hours and total_tasks.

The simplified form supports only FROM and WHERE clauses. See the GQL reference for more information.

Queries with multiple aggregation include only the entities that contain all the properties in each aggregation. This might lead to different results from performing each aggregation separately.

Behavior and limitations

As you work with aggregation queries, note the following behavior and limitations:

  • The query you provide to the aggregation must meet the restrictions on queries.
  • If an aggregation query cannot resolve within 60 seconds, it returns a DEADLINE_EXCEEDED error. Performance depends on your index configuration and on the size of the dataset.

    If the operation cannot be completed within the 60 second deadline, a possible workaround is to use cursors to merge multiple aggregations.

  • Aggregation queries read from index entries and include only indexed properties in the calculation.

  • Adding an OrderBy clause to the query limits the aggregation to the entities where the sorting property exists.

  • In GQL, the simplified form does not support ORDER BY, LIMIT, or OFFSET clauses.

  • In a projection query, you can aggregate data only from the properties in the projection. For example, in the GQL query SELECT a, b FROM k WHERE c = 1, you can aggregate data only from among a or b.

  • A count() aggregation does not de-duplicate entities with array properties. Each array value that matches the query adds one to the count.

  • For sum() and avg() aggregations, Non-numeric values are ignored. sum() and avg() aggregation take into account only integer values, floating-point number values, and timestamps. Timestamps are converted to microsecond integer values for sum(), avg(), and projections.

  • When combining multiple aggregations in a single query, note that sum() and avg() ignore non-numeric values while count() includes non-numeric values.

  • If you combine aggregations that are on different properties, the calculation includes only the entities that contain all those properties. This might lead to different results from performing each aggregation separately.

Pricing

Pricing for the count(), sum(), and avg() aggregation queries depends on the number of index entries scanned during the operation. You are billed one entity read for up to 1,000 index entries matched. Subsequent index entries matched cost additional read units. There is a minimum cost of one read unit for every query. For pricing information, see Firestore in Datastore mode pricing.

If you combine multiple aggregation in a single query, the query uses the same index for each aggregation and performs a single scan over the data. This can help reduce the number of index scans and reads billed when compared to performing each aggregation separately. However, queries with multiple aggregations include only the entities that contain all those properties. This might lead to different results from performing each aggregation separately.

What's next