In general, Databricks personnel cannot access customer workspaces. A Databricks workspace is an environment for accessing all of your Databricks assets. The workspace organizes objects (notebooks, libraries, and experiments) into folders, and provides access to data and computational resources such as clusters and jobs. To resolve some types of technical issues, it may be necessary to grant personnel access to customer workspaces and underlying infrastructure.
To grant secure access, Databricks uses an internal application called Genie. There is a Genie instance for each cloud, such as AWS, Azure, and Google Cloud. Genie access for AWS and Google Cloud requires multi-factor authentication and requires users to be on the Databricks network or Databricks VPN. Genie for Azure requires multi-factor authentication and requires users to be on the Databricks network, Databricks VPN, or the Microsoft internal support network. Databricks limits the set of users who can access Genie and which types of access may be granted to each user. See the following sections for the types of access.
This document describes the general security processes in place for Genie describes the Customer approved workspace login feature.
This document generally refers to workspaces running Databricks on AWS. For Google Cloud, see Genie for Databricks on Google Cloud.
From time to time, Databricks updates Genie security controls, and this document will evolve over time. Please see the revision history at the bottom.
Almost no customer data is stored within the Databricks-owned account. For details about the Databricks architecture, see Databricks architecture overview.
There are two categories of access to Genie: workspace access and infrastructure access.
For both types of access, there is also a mechanism for emergency access (in the event that the ticketing interaction fails), which does not require a ticket. This access is rarely used and requires approval from a very small number of Databricks staff. Emergency access is still reflected in customer audit logs.
Databricks customer support personnel (or individuals in direct support roles such as Solution Architects) can use Genie to request HTTPS access to the Web application to provide support.
Databricks support personnel must enter the Databricks Web application ID of the customer and provide a valid Salesforce support ticket identification number, which must be associated with the customer’s workspace and remain in an Open status at the time that access is requested. Support personnel are required to gain and document your consent before using Genie to access your workspace. If Databricks customer support personnel require additional troubleshooting, they create an internal Engineering Support Ticket for the Databricks Engineering Team.
The Databricks Engineering Team can log into Genie to request access to your workspace in the web application for further troubleshooting or emergency support. The Databricks Engineering Team follows the same process as above except they enter the web application ID and an internal Engineering Support Ticket (not a Customer Support Ticket).
Genie grants web application access through a time-limited access token. After the session time expires, the request process must be repeated.
At that point, the user will have web browser access to the workspace as if they were a workspace admin. Certain other security controls are also applied (for example, Genie users cannot create long-lived personal access tokens). Databricks performs threat modeling to identify scenarios for abuse and to provide technical controls to mitigate risk.
Additionally, if you’ve configured audit log delivery, audit logs show the initial Genie event and Databricks staff actions. Actions taken within the system will be included in the audit logs, similar to auditable events from your own users. In the current implementation, the Databricks user
email@example.com appears as
firstname.lastname@example.org within the audit logs.
Databricks has built a feature called Customer approved workspace login that can be enabled to allow you to control Genie access to the web application. With the customer approved workspace login feature enabled, Databricks staff cannot get workspace-level access unless you allow it.
Only personnel in the Databricks engineering organization who support internal infrastructure can log into Genie and access the Databricks core production infrastructure systems. If Databricks personnel outside infrastructure support roles request access to such systems, additional approvals are required. Genie grants access through a time-limited TLS client certificate. After the session time expires, the request process must be repeated.
The control plane is subdivided into microservices, and access is granted to the required service. Infrastructure access does not provide UI access to any customer’s Databricks deployment, and there are limitations to data access based on service isolation. Customers can further reduce risk of data exposure by leveraging capabilities such as Customer-managed Keys to encrypt certain data (such as notebooks, secrets, Databricks SQL queries and query history) within the control plane, which adds additional technical barriers to infrastructure Genie access to that data.
Because the internal core production infrastructure systems are generally not specific to any one customer’s deployment, this Genie access does not create events in your audit logs and is not impacted by enabling CAWL.
Genie access via the Web UI (workspace-level access) requires either a support ticket or engineering ticket tied expressly to your workspace. There are technical controls requiring that the ticket must be open and that the workspace is present in a specific field. Most Genie events originate from a support ticket. You must explicitly grant access, either by clicking a checkbox when submitting the ticket or explicitly approving it in the text conversation with the support engineer.
Genie access is limited to a subset of employees who have a role in supporting customers.
The Genie system is accessible only over VPN (which requires a multi-factor prompt), and the authentication into Genie is also configured to always require an additional multi-factor prompt.
Genie access is specific to the given workspace. For example, if customer A authorizes the usage of Genie in a support ticket for a particular workspace, the support engineer cannot use that to access a workspace for customer B or to access a different workspace for customer A.
Each usage of Genie is also limited in time. For AWS and Google Cloud accounts, the maximum time is 24 hours and the default is 60 minutes.
If you have enabled audit log delivery, those logs will show the Genie event. Importantly, the initial access to your workspace is facilitated by Genie, but activities thereafter are bound by normal Databricks rules (as if the support staff were your employee). Any actions performed by Databricks staff once in the workspace generate audit log events just as they would for your staff.
Databricks retains Genie logs for at least one year internally, and is happy to help customers build alerting pipelines for Genie activity (such as unusual API calls from support staff accessing the workspace with Genie or new support staff using Genie) Databricks has strong technical controls for automatic termination of accounts (currently performed via automation when the Human Resources Information System processes a termination). Additionally, Databricks performs a quarterly account review as an additional check to guard against accounts not properly terminated.
Importantly, for customers with the Customer Approved Workspace Login feature enabled, the above security controls apply whenever the customer configures that Genie is allowed for a period of time. When Genie is not allowed, Databricks staff can go through the above process but a technical control will not honor Genie access to the environment.
Genie access via command-line for engineers (infrastructure-level access) requires an open engineering ticket in the back-end engineering ticketing system (or emergency access, as detailed above, in the event of a ticketing system failure).
Users must be a part of an engineering group that has a role in supporting customers.
The Genie system is only accessible over VPN (which requires a multi-factor prompt), and authentication into Genie is also configured to always require an additional multi-factor prompt.
Each usage of Genie is also limited in time. For AWS and Google Cloud environments, the maximum time is 24 hours, and the default is 60 minutes.
Databricks retains Genie logs for at least one year internally.
While Databricks has strong technical controls for automatic termination of accounts (currently performed via automation when the Human Resources Information System processes a termination), Databricks also performs a quarterly account review as an additional check to guard against accounts not properly terminated.
Enabling the customer approved workspace login (CAWL) feature for a Databricks on AWS account will, by default, disable workspace login access to all the workspaces in that account for Databricks engineers and support staff. Workspace admins can temporarily enable access to a restricted workspace within the account for up to 48 hours if required.
The CAWL time duration setting limits when a Genie request can start, not when it must end. A Genie session that has started can remain open up to 24 hours.
This feature is enabled at the account level and must be performed by Databricks engineers. Please work with your support staff or account team to enable this feature on your account if required.
Once enabled, workspace administrators can verify that the feature is enabled by viewing the current status on the Manage Workspace page, as shown below.
To enable access by Databricks engineers to the workspace, go to the Access Control tab in the Admin Console:
Workspace admins can enable access for a maximum of 48 hours whenever the need arises to let Databricks Engineers and support staff access the relevant workspace:
After access is enabled, workspace admins can track the CAWL expiration time of access and can disable access before the expiration time is up:
The majority of this document is focused on Databricks deployments on AWS. The same processes and rules apply for Databricks deployments on Google Cloud, except that the customer approved workspace login (CAWL) feature is not available for Google Cloud deployments.