What is Databricks Assistant?

Preview

This feature is currently in Public Preview. Usage of the feature during the preview is free. Final pricing will be established and communicated prior to general availability (GA).

Databricks Assistant works as an AI-based companion pair-programmer to make you more efficient as you create notebooks, queries, and files. It can help you rapidly answer questions by generating, optimizing, completing, explaining, and fixing code and queries.

This page provides general information about the Assistant in the form of frequently asked questions. For questions about privacy and security, see Privacy and security.

Enable or disable Databricks Assistant

Databricks Assistant is enabled by default.

To enable or disable all workspaces in an account for Databricks Assistant, follow these instructions:

  1. As an account admin, log in to the account console.

  2. Click Settings Settings icon.

  3. Click the Feature enablement tab.

  4. In the Partner-powered AI assistive features section, select Enabled or Disabled and then click Save. You can prevent workspace setting overrides for this feature by setting the Enforce toggle to on.

If the account setting permits workspace setting overrides, workspace admins can enable or disable specific workspaces. To do this, use a Workspace Setting to override the default setting in the Account Console as follows:

  1. Go to the workspace admin settings page.

  2. Click the Advanced tab.

  3. Use the Partner-powered AI assistive features drop-down menu to make your selection.

  4. Click Save.

Use Databricks Assistant for coding suggestions and help

To access Databricks Assistant, click the Assistant icon Databricks assistant icon in the left sidebar of the notebook, the file editor, the SQL Editor, or the Lakeview Data tab.

Databricks assistant icon location

The Assistant panel opens in the left side of the screen.

Databricks assistant panel

Some capabilities of Databricks Assistant are the following:

  • Generate: Use natural language to generate a SQL query.

  • Explain: Highlight a query or a block of code and have Databricks Assistant walk through the logic in clear, concise English.

  • Fix: Explain and fix syntax and runtime errors with a single click.

  • Transform and optimize: Convert Pandas code to PySpark for faster execution.

Any code generated by the Databricks Assistant is intended for execution within a Databricks compute environment. It is optimized to create code in Databricks supported programming languages, frameworks, and dialects. It is not intended as a general purpose programming Assistant. The Assistant will often use information from Databricks knowledge bases such as documentation in order to better answer user queries. It performs best when the user question is related to questions that can be answered with knowledge from Databricks documentation, Unity Catalog, and user code in the Workspace.

Users should always review any code generated by the Assistant prior to execution as it can sometimes make mistakes.

Create visualizations using the Databricks Assistant for Lakeview

You can use the Databricks Assistant when drafting Lakeview Dashboards. As you create visualizations on an existing Lakeview dataset, prompt the Assistant with questions to receive responses in the form of generated charts. To use the Assistant in Lakeview, first create one or more datasets, then add a visualization widget to the Canvas. The visualization widget includes a prompt to describe your new chart. Type a description of the chart you want to see, and the assistant will generate it. You can approve or reject the chart, or modify the description to generate something new. For details and examples of using the Assistant with Lakeview dashboards, see Create visualizations with Databricks Assistant for Lakeview.

Services used by Databricks Assistant

Databricks Assistant may use third-party services to provide responses, including Azure OpenAI operated by Microsoft. These services are subject to their respective data management policies. Data sent to these services is not used for any model training. For Azure OpenAI, Databricks has opted out of Abuse Monitoring so no prompts or responses are stored with Azure OpenAI. For details, see Azure data management policy.

Tips for improving the accuracy of returned results

  • Be as specific as possible. Specify tables and examples of what the data looks like.

  • Databricks Assistant knows about your table and column schema and metadata. This allows you to use natural language and generate fairly accurate queries. For example, if your table has columns userID and State, you can ask Databricks Assistant to generate a list of users who live in Washington.

  • Databricks Assistant has access only to table and column metadata and does not have access to row level data. Thus, it may not write queries correctly if the actual data has unique shapes. For example, if you have a column Price, and each value appends a country denomination (for example, $10.99 USD, $5.99 CAD) the returned query may have trouble summing that column since it’s not a DECIMAL. Try to provide specific instructions to Databricks Assistant for parsing that column. For example: “Sum the total revenue from crackers. Price is a string column that has a country denomination appended to each currency value like ‘$10.99 USD’”.

Databricks Assistant considers the history of the conversation so you can refine your questions as you go.

Give feedback

The best way to send feedback is to use the “Provide Feedback” links in the notebook and SQL editor. You can also send an email to assistant-feedback@databricks.com or to your account team.

We’re primarily interested in hearing about product improvement suggestions and user experience issues rather than prompt accuracy. If you receive an unhelpful suggestion from the Assistant, click the “Not useful” thumbs down button to let us capture that feedback.

Privacy and security

Q: What data is being sent to the models?

Databricks Assistant sends code and metadata to the models on each API request. This helps return more relevant results for your data. Examples include:

  • Code/queries in the current notebook cell or SQL Editor tab

  • Table and Column names and descriptions

  • Previous questions

  • Favorite tables

Q: Does the metadata sent to the models respect the user’s Unity Catalog permissions?

Yes, all of the data sent to the model respects the user’s Unity Catalog permissions, so it does not send metadata relating to tables that the user does not have permission to see.

Q: If I execute a query with results, and then ask a question, do the results of my query get sent to the model?

No, only the code contents in cells, metadata about tables, and the user-entered text is shared with the model. For the “fix error” feature, Databricks also shares the stack trace from the error output.

Q: Will Databricks Assistant execute dangerous code?

No. Databricks Assistant does not automatically execute code on your behalf. AI models can make mistakes, misunderstand intent, and hallucinate or give incorrect answers. Be sure to review AI generated code prior to executing it.

Q: Has Databricks done any assessment to evaluate the accuracy and appropriateness of the Assistant responses?

Yes. Databricks has mitigations to prevent the Assistant from generating harmful responses such as hate speech, insecure code, prompt jailbreaks, and third-party copyright content. Databricks has done extensive testing of all our AI assistive features with thousands of simulated user inputs to assess the robustness of mitigations. These assessments focused on the expected use cases for the Assistant such as code generation in the Python, Databricks SQL, R, and Scala languages.

Q: Can I use Databricks Assistant with tables that process regulated data (PHI, PCI, IRAP, FedRAMP)?

Yes. To do so, you must comply with requirements, such as enabling the compliance security profile, and add the relevant compliance standard as part of the compliance security profile configuration. For HIPAA, it is your responsibility to ensure you have a BAA agreement with Databricks in place before you process PHI data.