`ai_summarize` function

Applies to: Databricks SQL Databricks Runtime

Preview

This functionality is in Public Preview and HIPAA compliant.

During the preview:

The underlying language model can handle several languages, but this AI Function is tuned for English.
See Features with limited regional availability for AI Functions region availability.

The ai_summarize() function allows you to invoke a state-of-the-art generative AI model to generate a summary of a given text using SQL. This function uses a chat model serving endpoint made available by Databricks Foundation Model APIs.

Requirements

importante

The underlying models that might be used at this time are licensed under the Apache 2.0 License, Copyright © The Apache Software Foundation or the LLAMA 3.3 Community License Copyright © Meta Platforms, Inc. All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.

Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks's internal benchmarks, Databricks might change the model (and the list of applicable licenses provided on this page).

This function is only available on workspaces in regions that support AI Functions optimized for batch inference.
This function is not available on Databricks SQL Classic.
Check the Databricks SQL pricing page.
Batch inference workloads require Databricks Runtime 15.4 ML LTS for improved performance.

Syntax

ai_summarize(content[, max_words])

Arguments

content: A STRING expression, the text to be summarized.
max_words: An optional non-negative integral numeric expression representing the best-effort target number of words in the returned summary text. The default value is 50. If set to 0, there is no word limit.

Returns

A STRING.

If content is NULL, the result is NULL.

Examples

SQL
> SELECT ai_summarize(
    'Apache Spark is a unified analytics engine for large-scale data processing. ' ||
    'It provides high-level APIs in Java, Scala, Python and R, and an optimized ' ||
    'engine that supports general execution graphs. It also supports a rich set ' ||
    'of higher-level tools including Spark SQL for SQL and structured data ' ||
    'processing, pandas API on Spark for pandas workloads, MLlib for machine ' ||
    'learning, GraphX for graph processing, and Structured Streaming for incremental ' ||
    'computation and stream processing.',
    20
  );
 "Apache Spark is a unified, multi-language analytics engine for large-scale data processing
 with additional tools for SQL, machine learning, graph processing, and stream computing."

Requirements​

Syntax​

Arguments​

Returns​

Examples​

Related functions​

Requirements

Syntax

Arguments

Returns

Examples

Related functions