bitmap_construct_agg aggregate function

Applies to: check marked yes Databricks SQL check marked yes Databricks Runtime 13.3 LTS and above

Returns the bitwise OR of all bit positions values in the group. between 0 and 32767 in a group as a BINARY.

The function is commonly used to count the number of a dense set of distinct integral numbers in combination with the bitmap_bucket_number(), bitmap_count(), and bitmap_bit_position() functions.

To aggregate bitmaps in form of integral numerics use the bit_or() aggregate function.

Syntax

bitmap_construct_agg(expr) [FILTER ( WHERE cond ) ]

This function can also be invoked as a window function using the OVER clause.

Arguments

  • expr: An integral numeric expression between 0 and 32767 representing a bit position in a BINARY string.

  • cond: An optional boolean expression filtering the rows used for aggregation.

Returns

A BINARY of length 4096.

Examples

> SELECT hex(trim(TRAILING X'00' FROM bitmap_construct_agg(val))) FROM VALUES(0) AS T(val);
 01

> SELECT hex(trim(TRAILING X'00' FROM bitmap_construct_agg(val))) FROM VALUES(15) AS T(val);
 0080

> SELECT hex(trim(TRAILING X'00' FROM bitmap_construct_agg(val)))
   FROM VALUES(1), (3), (7), (15), (31) AS T(val);
 8A800080

-- Count the number of distinct values
> SELECT sum(num_distinct) AS num_distinct
    FROM (SELECT bitmap_bucket_number(val),
                 bitmap_count(bitmap_construct_agg(bitmap_bit_position(val)))
            FROM VALUES(1), (2), (1), (-1), (5), (0), (5) AS t(val)
            GROUP BY ALL) AS distinct_vals_by_bucket(bucket, num_distinct)
  5