Skip to main content

ai_extract

Extracts structured data from a document column using AI/LLM.

For the corresponding Databricks SQL function, see ai_extract function.

Syntax

Python
from pyspark.sql import functions as dbf

dbf.ai_extract(col=<col>, schema=<schema>, options=<options>)

Parameters

Parameter

Type

Description

col

pyspark.sql.Column or str

A column containing the document content to extract from.

schema

dict or list

A Python dict (field name to {"type": ..., "description": ...}) or list of field-name strings. Serialized to a JSON literal automatically.

options

dict, optional

A dictionary of options to control extraction behavior.

Returns

pyspark.sql.Column: A new column of VariantType containing the extracted fields.

Examples

Python
df.select(ai_extract("text", {"name": {"type": "string", "description": "Name"}}))
df.select(ai_extract("text", ["name", "age"]))