AI関数を使用したバッチLLM 推論の実行

備考

プレビュー

この記事では、AI Functions を大規模に使用してバッチ推論を実行する方法について説明します。この記事の例は、スケジュールされたワークフローとしてバッチ推論パイプラインをデプロイする、構造化ストリーミングのための ai_query と Databricksホスト基盤モデルの使用など、本番運用シナリオに推奨されます。

AI Functions の使用を開始するには、Databricks では次のいずれかを使用することをお勧めします。

必要条件

基盤モデルAPIがサポートされているリージョンに存在するワークスペース。
Databricks Runtime 15.4 LTS 以降は、 AI Functionsを使用したバッチ推論ワークロードに必要です。
使用するデータを含む Unity Catalog の Delta テーブルに対するクエリのアクセス許可。
テーブルプロパティの pipelines.channel を「プレビュー」に設定して、 ai_query()を使用します。クエリの例については、「要件」を参照してください。

タスク固有のAI関数を使用したバッチLLM推論

タスク固有の AI 関数を使用してバッチ推論を実行できます。タスク固有の関数をパイプラインに組み込む方法のガイダンスについては、「バッチ推論パイプラインのデプロイAI 」を参照してください。

以下は、タスク固有の AI 関数 ai_translateの使用例です。

SQL
SELECT
writer_summary,
  ai_translate(writer_summary, "cn") as cn_translation
from user.batch.news_summaries
limit 500
;

バッチ LLM 推論 `ai_query`

汎用のAI関数 ai_query を使用して、バッチ推論を実行できます。ai_queryがサポートするモデルタイプと関連モデルを確認します。

このセクションの例では、 ai_query の柔軟性と、バッチ推論パイプラインとワークフローでの使用方法に焦点を当てています。

`ai_query` と Databricks でホストされる基盤モデル

Databricks でホストされ、事前にプロビジョニングされた基盤モデルをバッチ推論に使用すると、Databricks は、ワークロードに基づいて自動的にスケーリングするプロビジョニングされたスループットエンドポイントをユーザーに代わって構成します。

このメソッドをバッチ推論に使用するには、リクエストで以下を指定します。

ai_queryで使用する事前プロビジョニングされた LLM 。サポートされている事前プロビジョニングされた LLM から選択します。これらの事前にプロビジョニングされた LLM には、制限の緩いライセンスと使用ポリシーが適用されます ( 適用可能なモデル開発者のライセンスと条件を参照してください)。
Unity Catalog の入力テーブルと出力テーブル。
モデルプロンプトと任意のモデルパラメーター。

SQL
SELECT text, ai_query(
    "databricks-meta-llama-3-1-8b-instruct",
    "Summarize the given text comprehensively, covering key points and main ideas concisely while retaining relevant details and examples. Ensure clarity and accuracy without unnecessary repetition or omissions: " || text
) AS summary
FROM uc_catalog.schema.table;

`ai_query`およびカスタムまたはファインチューンされた基盤モデル

このセクションのノートブックの例では、カスタムまたはファインチューンされた基盤モデルを使用して複数の入力を処理するバッチ推論ワークロードを示しています。この例では、プロピジョン済みスループット基盤モデルAPIを使用する既存のモデルサービングエンドポイントが必要です。

エンベディングモデルを使用した LLM バッチ推論

次のノートブックの例では、プロビジョニングされたスループットエンドポイントを作成し、Python と GTE Large (英語) または BGE Large (英語) 埋め込みモデルのいずれかを選択してバッチ LLM 推論を実行します。

プロビジョニングされたスループットエンドポイントを用いた LLM バッチ推論エンベディングのノートブック

Open notebook in new tab

バッチ推論と構造化データ抽出

次のノートブックの例は、ai_query を使用して基本的な構造化データ抽出を実行し、自動抽出手法を使用して生の非構造化データを整理された使用可能な情報に変換する方法を示しています。このノートブックでは、Mosaic AI Agent Evaluation を活用して、グラウンドトゥルースデータを使用して精度を評価する方法も示しています。

バッチ推論と構造化データ抽出ノートブック

Open notebook in new tab

名前付きエンティティ認識のためのBERTを使用したバッチ推論

次のノートブックは、BERT を使用した従来の ML モデルのバッチ推論の例を示しています。

名前付きエンティティ認識ノートブックの BERT を使用したバッチ推論

Open notebook in new tab

バッチ推論パイプラインをデプロイする

このセクションでは、 AI 関数を他の Databricks データや AI 製品に統合して、完全なバッチ推論パイプラインを構築する方法を示します。これらのパイプラインは、インジェスト、前処理、推論、後処理などのエンドツーエンドのワークフローを実行できます。パイプラインは、SQL または Python で作成し、次のようにデプロイできます。

LakeFlow 宣言型パイプライン
Databricks ワークフローを使用したスケジュールされたワークフロー
構造化ストリーミングを使用したストリーミング推論ワークフロー

宣言型パイプラインでインクリメンタルバッチ推論を実行するLakeFlow

次の例では、データが継続的に更新される場合に LakeFlow 宣言型パイプラインを使用してインクリメンタルバッチ推論を実行します。

ステップ 1: ボリュームから生のニュースデータを取り込む

SQL
Python

SQL

CREATE OR REFRESH STREAMING TABLE news_raw
COMMENT "Raw news articles ingested from volume."
AS SELECT *
FROM STREAM(read_files(
  '/Volumes/databricks_news_summarization_benchmarking_data/v01/csv',
  format => 'csv',
  header => true,
  mode => 'PERMISSIVE',
  multiLine => 'true'
));

パッケージをインポートし、LLM 応答の JSON スキーマを Python 変数として定義します

Python

import dlt
from pyspark.sql.functions import expr, get_json_object, concat

news_extraction_schema = (
    '{"type": "json_schema", "json_schema": {"name": "news_extraction", '
    '"schema": {"type": "object", "properties": {"title": {"type": "string"}, '
    '"category": {"type": "string", "enum": ["Politics", "Sports", "Technology", '
    '"Health", "Entertainment", "Business"], "strict": true}}'
)

Unity Catalog ボリュームからデータを取り込みます。

Python
@dlt.table(
  comment="Raw news articles ingested from volume."
)
def news_raw():
  return (
    spark.readStream
      .format("cloudFiles")
      .option("cloudFiles.format", "csv")
      .option("header", True)
      .option("mode", "PERMISSIVE")
      .option("multiLine", "true")
      .load("/Volumes/databricks_news_summarization_benchmarking_data/v01/csv")
  )

ステップ 2: LLM 推論を適用してタイトルとカテゴリを抽出する

SQL
Python

SQL

CREATE OR REFRESH MATERIALIZED VIEW news_categorized
COMMENT "Extract category and title from news articles using LLM inference."
AS
SELECT
  inputs,
  ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Extract the category of the following news article: " || inputs,
    responseFormat => '{
      "type": "json_schema",
      "json_schema": {
        "name": "news_extraction",
        "schema": {
          "type": "object",
          "properties": {
            "title": { "type": "string" },
            "category": {
              "type": "string",
              "enum": ["Politics", "Sports", "Technology", "Health", "Entertainment", "Business"]
            }
          }
        },
        "strict": true
      }
    }'
  ) AS meta_data
FROM news_raw
LIMIT 2;

Python
@dlt.table(
  comment="Extract category and title from news articles using LLM inference."
)
def news_categorized():
  # Limit the number of rows to 2 as in the SQL version
  df_raw = spark.read.table("news_raw").limit(2)
  # Inject the JSON schema variable into the ai_query call using an f-string.
  return df_raw.withColumn(
    "meta_data",
    expr(
      f"ai_query('databricks-meta-llama-3-3-70b-instruct', "
      f"concat('Extract the category of the following news article: ', inputs), "
      f"responseFormat => '{news_extraction_schema}')"
    )
  )

ステップ 3: 要約前に LLM 推論出力を検証する

SQL
Python

SQL
CREATE OR REFRESH MATERIALIZED VIEW news_validated (
  CONSTRAINT valid_title EXPECT (size(split(get_json_object(meta_data, '$.title'), ' ')) >= 3),
  CONSTRAINT valid_category EXPECT (get_json_object(meta_data, '$.category') IN ('Politics', 'Sports', 'Technology', 'Health', 'Entertainment', 'Business'))
)
COMMENT "Validated news articles ensuring the title has at least 3 words and the category is valid."
AS
SELECT *
FROM news_categorized;

Python
@dlt.table(
  comment="Validated news articles ensuring the title has at least 3 words and the category is valid."
)
@dlt.expect("valid_title", "size(split(get_json_object(meta_data, '$.title'), ' ')) >= 3")
@dlt.expect_or_fail("valid_category", "get_json_object(meta_data, '$.category') IN ('Politics', 'Sports', 'Technology', 'Health', 'Entertainment', 'Business')")
def news_validated():
  return spark.read.table("news_categorized")

ステップ4:検証済みのデータからニュース記事を要約する

SQL
Python

SQL
CREATE OR REFRESH MATERIALIZED VIEW news_summarized
COMMENT "Summarized political news articles after validation."
AS
SELECT
  get_json_object(meta_data, '$.category') as category,
  get_json_object(meta_data, '$.title') as title,
  ai_query(
    "databricks-meta-llama-3-3-70b-instruct",
    "Summarize the following political news article in 2-3 sentences: " || inputs
  ) AS summary
FROM news_validated;

Python

@dlt.table(
  comment="Summarized political news articles after validation."
)
def news_summarized():
  df = spark.read.table("news_validated")
  return df.select(
    get_json_object("meta_data", "$.category").alias("category"),
    get_json_object("meta_data", "$.title").alias("title"),
    expr(
      "ai_query('databricks-meta-llama-3-3-70b-instruct', "
      "concat('Summarize the following political news article in 2-3 sentences: ', inputs))"
    ).alias("summary")
  )

Databricks ワークフローを使用したバッチ推論ジョブの自動化

バッチ推論ジョブをスケジュールし、AI パイプラインを自動化します。

SQL
Python

SQL
SELECT
   *,
   ai_query('databricks-meta-llama-3-3-70b-instruct', request => concat("You are an opinion mining service. Given a piece of text, output an array of json results that extracts key user opinions, a classification, and a Positive, Negative, Neutral, or Mixed sentiment about that subject.


AVAILABLE CLASSIFICATIONS
Quality, Service, Design, Safety, Efficiency, Usability, Price


Examples below:


DOCUMENT
I got soup. It really did take only 20 minutes to make some pretty good soup. The noises it makes when it's blending are somewhat terrifying, but it gives a little beep to warn you before it does that. It made three or four large servings of soup. It's a single layer of steel, so the outside gets pretty hot. It can be hard to unplug the lid without knocking the blender against the side, which is not a nice sound. The soup was good and the recipes it comes with look delicious, but I'm not sure I'll use it often. 20 minutes of scary noises from the kitchen when I already need comfort food is not ideal for me. But if you aren't sensitive to loud sounds it does exactly what it says it does..


RESULT
[
 {'Classification': 'Efficiency', 'Comment': 'only 20 minutes','Sentiment': 'Positive'},
 {'Classification': 'Quality','Comment': 'pretty good soup','Sentiment': 'Positive'},
 {'Classification': 'Usability', 'Comment': 'noises it makes when it's blending are somewhat terrifying', 'Sentiment': 'Negative'},
 {'Classification': 'Safety','Comment': 'outside gets pretty hot','Sentiment': 'Negative'},
 {'Classification': 'Design','Comment': 'Hard to unplug the lid without knocking the blender against the side, which is not a nice sound', 'Sentiment': 'Negative'}
]


DOCUMENT
", REVIEW_TEXT, '\n\nRESULT\n')) as result
FROM catalog.schema.product_reviews
LIMIT 10

Python

import json
from pyspark.sql.functions import expr

# Define the opinion mining prompt as a multi-line string.
opinion_prompt = """You are an opinion mining service. Given a piece of text, output an array of json results that extracts key user opinions, a classification, and a Positive, Negative, Neutral, or Mixed sentiment about that subject.

AVAILABLE CLASSIFICATIONS
Quality, Service, Design, Safety, Efficiency, Usability, Price

Examples below:

DOCUMENT
I got soup. It really did take only 20 minutes to make some pretty good soup.The noises it makes when it's blending are somewhat terrifying, but it gives a little beep to warn you before it does that.It made three or four large servings of soup.It's a single layer of steel, so the outside gets pretty hot. It can be hard to unplug the lid without knocking the blender against the side, which is not a nice sound.The soup was good and the recipes it comes with look delicious, but I'm not sure I'll use it often. 20 minutes of scary noises from the kitchen when I already need comfort food is not ideal for me. But if you aren't sensitive to loud sounds it does exactly what it says it does.

RESULT
[
 {'Classification': 'Efficiency', 'Comment': 'only 20 minutes','Sentiment': 'Positive'},
 {'Classification': 'Quality','Comment': 'pretty good soup','Sentiment': 'Positive'},
 {'Classification': 'Usability', 'Comment': 'noises it makes when it's blending are somewhat terrifying', 'Sentiment': 'Negative'},
 {'Classification': 'Safety','Comment': 'outside gets pretty hot','Sentiment': 'Negative'},
 {'Classification': 'Design','Comment': 'Hard to unplug the lid without knocking the blender against the side, which is not a nice sound', 'Sentiment': 'Negative'}
]

DOCUMENT
"""

# Escape the prompt so it can be safely embedded in the SQL expression.
escaped_prompt = json.dumps(opinion_prompt)

# Read the source table and limit to 10 rows.
df = spark.table("catalog.schema.product_reviews").limit(10)

# Apply the LLM inference to each row, concatenating the prompt, the review text, and the tail string.
result_df = df.withColumn(
    "result",
    expr(f"ai_query('databricks-meta-llama-3-3-70b-instruct', request => concat({escaped_prompt}, REVIEW_TEXT, '\\n\\nRESULT\\n'))")
)

# Display the result DataFrame.
display(result_df)

構造化ストリーミングを用いたAI関数

ai_query と構造化ストリーミングを使用して、リアルタイムまたはマイクロバッチに近いシナリオでAI推論を適用します。

ステップ1.静的 Delta テーブルの読み取り

静的 Delta テーブルをストリームのように読み取ります。

Python

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

spark = SparkSession.builder.getOrCreate()

# Spark processes all existing rows exactly once in the first micro-batch.
df = spark.table("enterprise.docs")  # Replace with your table name containing enterprise documents
df.repartition(50).write.format("delta").mode("overwrite").saveAsTable("enterprise.docs")
df_stream = spark.readStream.format("delta").option("maxBytesPerTrigger", "50K").table("enterprise.docs")

# Define the prompt outside the SQL expression.
prompt = (
    "You are provided with an enterprise document. Summarize the key points in a concise paragraph. "
    "Do not include extra commentary or suggestions. Document: "
)

ステップ2. 適用 `ai_query`

Spark は、新しい行がテーブルに到着しない限り、静的データに対してこれを 1 回だけ処理します。

Python

df_transformed = df_stream.select(
    "document_text",
    F.expr(f"""
      ai_query(
        'databricks-meta-llama-3-1-8b-instruct',
        CONCAT('{prompt}', document_text)
      )
    """).alias("summary")
)

ステップ 3: 要約された出力を書き込む

集計された出力を別の Delta テーブルに書き込む

Python

# Time-based triggers apply, but only the first trigger processes all existing static data.
query = df_transformed.writeStream \
    .format("delta") \
    .option("checkpointLocation", "/tmp/checkpoints/_docs_summary") \
    .outputMode("append") \
    .toTable("enterprise.docs_summary")

query.awaitTermination()

バッチ推論ワークロードのコストを表示する

次の例は、ジョブ、コンピュート、 SQLウェアハウス、 LakeFlow 宣言型パイプラインに基づいてバッチ推論ワークロードをフィルター処理する方法を示しています。

を使用するバッチ推論ワークロードのコストを表示する方法の一般的な例については、「モデルサービングコストの監視AI Functions 」を参照してください。

Jobs
Compute
Lakeflow Declarative Pipelines pipeline
SQL warehouse

次のクエリは、 system.workflow.jobs システムテーブルを使用してバッチ推論に使用されているジョブを示しています。Monitor job costs & performance with システムテーブルを参照してください。

SQL

SELECT *
FROM system.billing.usage u
  JOIN system.workflow.jobs x
    ON u.workspace_id = x.workspace_id
    AND u.usage_metadata.job_id = x.job_id
  WHERE u.usage_metadata.workspace_id = <workspace_id>
    AND u.billing_origin_product = "MODEL_SERVING"
    AND u.product_features.model_serving.offering_type = "BATCH_INFERENCE";

次に、 system.compute.clusters システムテーブルを使用したバッチ推論に使用されているクラスターを示します。

SQL
SELECT *
FROM system.billing.usage u
  JOIN system.compute.clusters x
    ON u.workspace_id = x.workspace_id
    AND u.usage_metadata.cluster_id = x.cluster_id
  WHERE u.usage_metadata.workspace_id = <workspace_id>
    AND u.billing_origin_product = "MODEL_SERVING"
    AND u.product_features.model_serving.offering_type = "BATCH_INFERENCE";

次に、system.lakeflow.pipelines システムテーブルを使用したバッチ推論に使用されている LakeFlow 宣言型パイプラインを示します。

SQL
SELECT *
FROM system.billing.usage u
  JOIN system.lakeflow.pipelines x
    ON u.workspace_id = x.workspace_id
    AND u.usage_metadata.dlt_pipeline_id = x.pipeline_id
  WHERE u.usage_metadata.workspace_id = <workspace_id>
    AND u.billing_origin_product = "MODEL_SERVING"
    AND u.product_features.model_serving.offering_type = "BATCH_INFERENCE";

次に、system.compute.warehouses システムテーブルを使用したバッチ推論に使用されている LakeFlow 宣言型パイプラインを示します。

SQL
SELECT *
FROM system.billing.usage u
  JOIN system.compute.warehouses x
    ON u.workspace_id = x.workspace_id
    AND u.usage_metadata.warehouse_id = x.warehouse_id
  WHERE u.usage_metadata.workspace_id = <workspace_id>
    AND u.billing_origin_product = "MODEL_SERVING"
    AND u.product_features.model_serving.offering_type = "BATCH_INFERENCE";

必要条件​

タスク固有のAI関数を使用したバッチLLM推論​

バッチ LLM 推論 ai_query​

ai_query と Databricks でホストされる基盤モデル​

ai_queryおよびカスタムまたはファインチューンされた基盤モデル​

エンベディングモデルを使用した LLM バッチ推論​

プロビジョニングされたスループットエンドポイントを用いた LLM バッチ推論エンベディングのノートブック

バッチ推論と構造化データ抽出​

バッチ推論と構造化データ抽出ノートブック

名前付きエンティティ認識のためのBERTを使用したバッチ推論​

名前付きエンティティ認識ノートブックの BERT を使用したバッチ推論

バッチ推論パイプラインをデプロイする​

宣言型パイプラインでインクリメンタル バッチ推論を実行するLakeFlow​

ステップ 1: ボリュームから生のニュース データを取り込む​

ステップ 2: LLM 推論を適用してタイトルとカテゴリを抽出する​

ステップ 3: 要約前に LLM 推論出力を検証する​

ステップ4:検証済みのデータからニュース記事を要約する​

Databricks ワークフローを使用したバッチ推論ジョブの自動化​

構造化ストリーミングを用いたAI関数​

ステップ1.静的 Delta テーブルの読み取り​

ステップ2. 適用 ai_query​

ステップ 3: 要約された出力を書き込む​

バッチ推論ワークロードのコストを表示する​

必要条件

タスク固有のAI関数を使用したバッチLLM推論

バッチ LLM 推論 `ai_query`

`ai_query` と Databricks でホストされる基盤モデル

`ai_query`およびカスタムまたはファインチューンされた基盤モデル

エンベディングモデルを使用した LLM バッチ推論

バッチ推論と構造化データ抽出

名前付きエンティティ認識のためのBERTを使用したバッチ推論

バッチ推論パイプラインをデプロイする

宣言型パイプラインでインクリメンタルバッチ推論を実行するLakeFlow

ステップ 1: ボリュームから生のニュースデータを取り込む

ステップ 2: LLM 推論を適用してタイトルとカテゴリを抽出する

ステップ 3: 要約前に LLM 推論出力を検証する

ステップ4:検証済みのデータからニュース記事を要約する

Databricks ワークフローを使用したバッチ推論ジョブの自動化

構造化ストリーミングを用いたAI関数

ステップ1.静的 Delta テーブルの読み取り

ステップ2. 適用 `ai_query`

ステップ 3: 要約された出力を書き込む

バッチ推論ワークロードのコストを表示する