Use Unity Catalog service credentials to connect to external cloud services
This article describes how to use a service credential in Unity Catalog to connect to external cloud services. A service credential object in Unity Catalog encapsulates a long-term cloud credential that provides access to an external cloud service that users need to connect to from Databricks.
See also:
Before you begin
Before you can use a service credential to connect to an external cloud service, you must have:
-
A Databricks workspace that is enabled for Unity Catalog.
-
A compute resource that is on Databricks Runtime 16.2 or above.
SQL warehouses are not supported.
-
A service credential created in your Unity Catalog metastore that gives access to the cloud service.
-
The
ACCESS
privilege on the service credential or ownership of the service credential.
Use a service credential in your code
This section provides examples of using service credentials in a notebook. Replace placeholder values. These examples don’t necessarily show the installation of required libraries, which depend on the client service you want to access.
Python example
This example uses a service credential to provide access to Google Pub/Sub using the Python SDK. The example does not include installing the packages that you would need to install on your compute to run it.
from google.cloud import pubsub_v1
project_id = 'your-project'
topic_id = 'your-topic'
credentials = dbutils.credentials.getServiceCredentialsProvider(cred_name)
# Publish some messages
publisher = pubsub_v1.PublisherClient(credentials=credentials)
with publisher:
topic_path = publisher.topic_path(project_id, topic_id)
data = f"Oh, Hi, Mark!".encode('utf-8')
future = publisher.publish(topic_path, data)
print(f"Published {data} to {topic_path}")
print(f"Future result: {future.result(timeout=5)}")
# Read them out
subscriber = pubsub_v1.SubscriberClient(credentials=credentials)
with subscriber:
subscription_id = 'your-subscription'
subscription_path = subscriber.subscription_path(project_id, subscription_id)
# Print policy info
policy = subscriber.get_iam_policy(request={"resource": subscription_path})
print("\nPolicy for subscription {}:".format(subscription_path))
for binding in policy.bindings:
print("Role: {}, Members: {}".format(binding.role, binding.members))
# Retrieve messages from the subscription (up to 3 messages)
ack_ids = []
response = subscriber.pull(request={"subscription": subscription_path, "max_messages": 3})
for msg in response.received_messages:
print(f"Received: {msg.message.data.decode('utf-8')}")
ack_ids.append(msg.ack_id)
# Acknowledge receipt if there were any messages
if len(ack_ids) > 0:
subscriber.acknowledge(request={"subscription": subscription_path, "ack_ids": ack_ids})
print(f"Received {len(ack_ids)} messages from subscription {subscription_path}")
Scala example:
This example uses a service credential to provide access to Google Pub/Sub using the Scala SDK. The example does not include installing the Maven libraries that you would need on your compute to run it. These include google-oauth-client
,google-auth-library-oauth2-http
, and google-cloud-pubsub
.
For Google Cloud SDK Maven dependencies, you must use a shaded version of Guava.
import com.google.cloud.pubsub.v1.Publisher
import com.google.pubsub.v1.{ProjectTopicName, ProjectSubscriptionName, PubsubMessage}
import com.google.auth.oauth2.GoogleCredentials
import com.google.pubsub.v1.PullRequest
import com.google.pubsub.v1.PullResponse
import com.google.pubsub.v1.ReceivedMessage
import com.google.pubsub.v1.SubscriptionName
import com.google.pubsub.v1.TopicName
import com.google.pubsub.v1.PubsubMessage
import com.google.protobuf.ByteString
import scala.collection.JavaConverters._
import java.util.concurrent.TimeUnit
import com.google.api.gax.core.FixedCredentialsProvider
// Set up credentials
val gcpCredentials = dbutils.credentials.getServiceCredentialsProvider("your-credential-name").asInstanceOf[GoogleCredentials]
// Project and topic details
val projectId = "your-project"
val topicId = "your-topic"
val subscriptionId = "your-subscription"
val topicName = TopicName.of(projectId, topicId)
// Version 1: Glean AI gave me this one, not sure lambda would work
val publisher = Publisher
.newBuilder(topicName)
.setCredentialsProvider(() => gcpCredentials)
.build()
// Stack overflow instead uses FixedCredentialsProvider
val publisher = Publisher
.newBuilder(topicName)
.setCredentialsProvider(
FixedCredentialsProvider.create(gcpCredentials))
.build()
try {
val data = "Oh, Hi, Mark!".getBytes("UTF-8")
val pubsubMessage = PubsubMessage.newBuilder().setData(ByteString.copyFrom(data)).build()
val messageIdFuture = publisher.publish(pubsubMessage)
Specify a default service credential for a compute resource
You can optionally specify a default service credential for an all-purpose or jobs compute cluster by setting an environment variable. By default, the SDK uses that service credential if no authentication is provided. Users still require ACCESS
on that service credential to connect to the external cloud service. Databricks does not recommend this approach, because it makes your code less portable than naming the service credential in your code.
Serverless compute and SQL warehouses don’t support environment variables, and therefore they don’t support default service credentials.
-
Open the edit page for the cluster.
See Manage compute.
-
Click Advanced at the bottom of the page and go to the Spark tab.
-
Add the following entry in Environment variables, replacing
<your-service-credential>
:DATABRICKS_DEFAULT_SERVICE_CREDENTIAL_NAME=<your-service-credential>
The following code samples do not specify a service credential. Instead, they use the service credential specified in the DATABRICKS_DEFAULT_SERVICE_CREDENTIAL_NAME
environment variable:
- Python
- Scala
If you are using a default service credential, you don’t need to specify credentials as an argument:
publisher = pubsub_v1.PublisherClient()
Compare this to the example in Python example, which does not import DefaultAzureCredential
and adds the credential specification:
credentials = dbutils.credentials.getServiceCredentialsProvider(cred_name)
publisher = pubsub_v1.PublisherClient(credentials=credentials)
For Scala, you replace the service credential name with null
.
val gcpCredentials = dbutils.credentials.getServiceCredentialsProvider(null).asInstanceOf[GoogleCredentials]