The ML.DISTANCE function

This document describes the ML.DISTANCE scalar function, which lets you compute the distance between two vectors.

Syntax

ML.DISTANCE(vector1, vector2 [, type])

Arguments

ML.DISTANCE has the following arguments:

  • vector1: an ARRAY value that represents the first vector, in one of the following forms:

    • ARRAY<Numerical type>
    • ARRAY<STRUCT<STRING, Numerical type>>
    • ARRAY<STRUCT<INT64, Numerical type>>

    where Numerical type is BIGNUMERIC, FLOAT64, INT64 or NUMERIC. For example ARRAY<STRUCT<INT64, BIGNUMERIC>>.

    When a vector is expressed as ARRAY<Numerical type>, each element of the array denotes one dimension of the vector. An example of a four-dimensional vector is [0.0, 1.0, 1.0, 0.0].

    When a vector is expressed as ARRAY<STRUCT<STRING, Numerical type>> or ARRAY<STRUCT<INT64, Numerical type>>, each STRUCT array item denotes one dimension of the vector. An example of a three-dimensional vector is [("a", 0.0), ("b", 1.0), ("c", 1.0)].

    The initial INT64 or STRING value in the STRUCT is used as an identifier to match the STRUCT values in vector2. The ordering of data in the array doesn't matter; the values are matched by the identifier rather than by their position in the array. If either vector has any STRUCT values with duplicate identifiers, running this function returns an error.

  • vector2: an ARRAY value that represents the second vector.

    vector2 must have the same type as vector1.

    For example, if vector1 is an ARRAY<STRUCT<STRING, FLOAT64>> column with three elements, like [("a", 0.0), ("b", 1.0), ("c", 1.0)], then vector2 must also be an ARRAY<STRUCT<STRING, FLOAT64>> column.

    When vector1 and vector2 are ARRAY<Numerical type> columns, they must have the same array length.

  • type: a STRING value that specifies the type of distance to calculate. Valid values are EUCLIDEAN, MANHATTAN, and COSINE. If this argument isn't specified, the default value is EUCLIDEAN.

Output

ML.DISTANCE returns a FLOAT64 value that represents the distance between the vectors. Returns NULL if either vector1 or vector2 is NULL.

Example

Get the Euclidean distance for two tensors of ARRAY<FLOAT64> values:

  1. Create the table t1:

    CREATE TABLE mydataset.t1
    (
    v1 ARRAY<FLOAT64>,
    v2 ARRAY<FLOAT64>
    )
    
  2. Populate t1:

    INSERT mydataset.t1 (v1,v2)
    VALUES ([4.1,0.5,1.0], [3.0,0.0,2.5])
    
  3. Calculate the Euclidean norm for v1 and v2:

    SELECT v1, v2, ML.DISTANCE(v1, v2, 'EUCLIDEAN') AS output FROM mydataset.t1
    

    This query produces the following output:

    +---------------+---------------+-------------------+
    | v1            | v2            | output            |
    +---------------+---------------+-------------------|
    | [4.1,0.5,1.0] | [3.0,0.0,2.5] | 1.926136028425822 |
    +------------+------------------+-------------------+
    

What's next