This feature is in Public Preview.
The AI function,
ai_generate_text() is deprecated. Databricks recommends using ai_query with external models.
This article describes what to consider and what to set up before you start using the
ai_generate_text() function, specifically how to retrieve authentication credentials and store them securely. It also includes functionality limitations and cost-performance considerations.
ai_generate_text() function is a built-in Databricks SQL function that allows you to access large language models (LLMs) directly from SQL. This function currently supports access to OpenAI and Azure OpenAI models, and enables customers to use them as building blocks in data pipelines and machine learning workloads. For syntax and design patterns, see the ai_generate_text function language manual content.
Possible use cases for
ai_generate_text() include translation, summarization, recommended actions, topic or theme identification, and much more.
The following are a few advantages of using LLMs on Databricks:
Unified access and management layer across open source and proprietary LLMs.
Serverless, auto-scaling, data-integrated LLM infrastructure.
Point-and-click simplicity to customize LLMs to your business requirements and use cases.
For advanced users, tools for rapid development and customization of open source LLMs.
Databricks SQL Pro or Serverless.
Understand that the enablement and use of this functionality directs data to leave your SQL environment and pass to third party LLM model providers: OpenAI and Azure OpenAI.
You have access to Azure OpenAI or OpenAI.
A GPT 3.5 Turbo model deployed.
To use the
ai_generate_text() function you need to be able to access Azure OpenAI or OpenAI.
Retrieve authentication details for Azure OpenAI with the following steps. Your authentication details populate the
deploymentName parameters of
Navigate to Cognitive Services on the Azure Portal and select Azure OpenAI.
Select the resource you want to use.
Select Keys and Endpoint under Resource Management.
Copy your key and the name of your resource.
Select Model Deployments under Resource Management.
Copy your model deployment name.
For OpenAI, you can navigate to OpenAI and select API keys to create your key.
You cannot copy keys for an existing key configuration.
You can either:
Retrieve the key from the person, also referred to as the principal, that created the configuration.
Create a new key and copy the key provided upon successful creation.
Do not include your token in plain text in your notebook, code, or git repo.
If you don’t already have a secret scope to keep your OpenAI keys in, create one:
databricks secrets create-scope openai
You need to give READ permissions or higher to users or groups that are allowed to connect to OpenAI. Databricks recommends creating a group
openai-usersand adding permitted users to that group.
databricks secrets put-acl openai openai-users READ
Create a secret for your API access token. Databricks recommends the format
databricks secrets put-secret openai demo-key --string-value yourkey123
ai_generate_text()is not supported in interactive or jobs clusters.
The only models supported in the preview are
The token limit for
azure_openai/gpt-35-turbois 4096 tokens.
OpenAI and Azure OpenAI Services require subscriptions and charge separately from Databricks.
Within a given query, calls to the LLM APIs are made sequentially for the column(s) on which the functions are called.
Compared to most SQL functions, queries using
ai_generate_text()tend to run slower.
The response time of a query that invokes AI Functions depends on both the task specified in the prompt, as well as the number of tokens provided and requested.
Azure OpenAI Service is only available in a small number of Azure regions at the time of this preview.
See the language manual documentation for syntax and design patterns for the ai_generate_text function.
See Analyze customer reviews with ai_generate_text() and OpenAI for an example on how to use
ai_generate_text() in a business scenario.