User-defined aggregate functions
For support during the preview, email bigquery-sql-preview-support@google.com.
This document describes how to create, call, and delete user-defined aggregate functions (UDAFs) in BigQuery.
A UDAF lets you create an aggregate function by using an expression that contains code. A UDAF accepts columns of input, performs a calculation on a group of rows at a time, and then returns the result of that calculation as a single value.
Create a SQL UDAF
This section describes the various ways that you can create a persistent or temporary SQL UDAF in BigQuery.
Create a persistent SQL UDAF
You can create a SQL UDAF that is persistent, meaning that you can reuse the UDAF across multiple queries. Persistent UDAFs are safe to call when they are shared between owners. UDAFs can't mutate data, talk to external systems, or send logs to Google Cloud Observability or similar applications.
To create a persistent UDAF, use the
CREATE AGGREGATE FUNCTION
statement
without the TEMP
or TEMPORARY
keyword. You must include the dataset in the
function path.
For example, the following query creates a persistent UDAF that's called
ScaledAverage
:
CREATE AGGREGATE FUNCTION myproject.mydataset.ScaledAverage( dividend FLOAT64, divisor FLOAT64) RETURNS FLOAT64 AS ( AVG(dividend / divisor) );
Create a temporary SQL UDAF
You can create a SQL UDAF that is temporary, meaning that the UDAF only exists in the scope of a single query, script, session, or procedure.
To create a temporary UDAF, use the
CREATE AGGREGATE FUNCTION
statement with the TEMP
or TEMPORARY
keyword.
For example, the following query creates a temporary UDAF that's called
ScaledAverage
:
CREATE TEMP AGGREGATE FUNCTION ScaledAverage( dividend FLOAT64, divisor FLOAT64) RETURNS FLOAT64 AS ( AVG(dividend / divisor) );
Use aggregate and non-aggregate parameters
You can create a SQL UDAF that has both aggregate and non-aggregate parameters.
UDAFs normally aggregate function parameters across all rows in a
group.
However, you can specify a function parameter as non-aggregate with the
NOT AGGREGATE
keyword.
A non-aggregate function parameter is a scalar function parameter with a constant value for all rows in a group. A valid non-aggregate function parameter must be a literal. Inside the UDAF definition, aggregate function parameters can only appear as function arguments to aggregate function calls. References to non-aggregate function parameters can appear anywhere in the UDAF definition.
For example, the following function contains an aggregate parameter that's
called dividend
, and a non-aggregate parameter called divisor
:
-- Create the function. CREATE TEMP AGGREGATE FUNCTION ScaledSum( dividend FLOAT64, divisor FLOAT64 NOT AGGREGATE) RETURNS FLOAT64 AS ( SUM(dividend) / divisor );
Use the default project in the function body
In the body of a SQL UDAF, any references to BigQuery entities, such as tables or views, must include the project ID unless the entity resides in the same project that contains the UDAF.
For example, consider the following statement:
CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage( dividend FLOAT64, divisor FLOAT64) RETURNS FLOAT64 AS ( ( SELECT AVG(dividend / divisor) FROM dataset_a.my_table ) );
If you run the preceding statement in the project1
project, the statement
succeeds because my_table
exists in project1
. However, if you run
the preceding statement from a different project, the statement fails.
To correct the error, include the project ID in the table reference:
CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage( dividend FLOAT64, divisor FLOAT64) RETURNS FLOAT64 AS ( ( SELECT AVG(dividend / divisor) FROM project1.dataset_a.my_table ) );
You can also reference an entity in a different project or dataset from the one where you create the function:
CREATE AGGREGATE FUNCTION project1.dataset_a.ScaledAverage( dividend FLOAT64, divisor FLOAT64) RETURNS FLOAT64 AS ( ( SELECT AVG(dividend / divisor) FROM project2.dataset_c.my_table ) );
Call a UDAF
This section describes the various ways that you can call a persistent or temporary UDAF after you create it in BigQuery.
Call a persistent UDAF
You can call a persistent UDAF in the same way that you call a built-in aggregate function. For more information, see Aggregate function calls. You must include the dataset in the function path.
In the following example, the query calls a persistent UDAF that's
called WeightedAverage
:
SELECT my_project.my_dataset.WeightedAverage(item, weight, 2) AS weighted_average FROM ( SELECT 1 AS item, 2.45 AS weight UNION ALL SELECT 3 AS item, 0.11 AS weight UNION ALL SELECT 5 AS item, 7.02 AS weight );
A table with the following results is produced:
/*------------------*
| weighted_average |
+------------------+
| 4.5 |
*------------------*/
Call a temporary UDAF
You can call a temporary UDAF in the same way that you call a built-in aggregate function. For more information, see Aggregate function calls.
The temporary function must be included in a multi-statement query or procedure that contains the UDAF function call.
In the following example, the query calls a temporary UDAF that's
called WeightedAverage
:
CREATE TEMP AGGREGATE FUNCTION WeightedAverage(...) -- Temporary UDAF function call SELECT WeightedAverage(item, weight, 2) AS weighted_average FROM ( SELECT 1 AS item, 2.45 AS weight UNION ALL SELECT 3 AS item, 0.11 AS weight UNION ALL SELECT 5 AS item, 7.02 AS weight );
A table with the following results is produced:
/*------------------*
| weighted_average |
+------------------+
| 4.5 |
*------------------*/
Delete a UDAF
This section describes the various ways that you can delete a persistent or temporary UDAF after you created it in BigQuery.
Delete a persistent UDAF
To delete a persistent UDAF, use the
DROP FUNCTION
statement.
You must include the dataset in the function path.
In the following example, the query deletes a persistent UDAF that's
called WeightedAverage
:
DROP FUNCTION IF EXISTS my_project.my_dataset.WeightedAverage;
Delete a temporary UDAF
To delete a temporary UDAF, use the
DROP FUNCTION
statement.
In the following example, the query deletes a temporary UDAF that's
called WeightedAverage
:
DROP FUNCTION IF EXISTS WeightedAverage;
A temporary UDAF expires as soon as the query finishes. The UDAF doesn't need to be deleted unless you want to remove it early from a multi-statement query or procedure.
List UDAFs
UDAFs are a type of routine. To list all of the routines in a dataset, see List routines.
Limitations
UDAFs have the same limitations that apply to UDFs. For details, see UDF limitations.
Only literals, query parameters, and script variables can be passed in as non-aggregate arguments for a UDAF.
Pricing
UDAFs are billed using the standard BigQuery pricing model.
Quotas and limits
UDAFs have the same quotas and limits that apply to UDFs. For information about UDF quotas, see Quotas and limits.