Google BigQuery

This article describes how to read from and write to Google BigQuery tables in Databricks.

Requirements

  • A Databricks cluster running Databricks Runtime 7.1 or above.
  • Databricks CLI installed and configured.

Step 1: Set up Google Cloud

Enable BigQuery Storage API

The BigQuery Storage API is enabled by default in any new projects where BigQuery is enabled. However, in case you are using existing project and the API isn’t enabled, follow the steps in this section to enable the API.

Use the CLI

gcloud services enable bigquerystorage.googleapis.com

Use the Google Cloud Platform Console

  1. Click APIs & Services in the left navigation pane.

  2. Click the ENABLE APIS AND SERVICES button.

    Google Enable Services
  3. Type bigquery storage api in the search bar and select the first result.

    Google BigQuery Storage
  4. Ensure that the BigQuery Storage API is enabled.

    Google BigQuery

Create service account for Databricks

Create a service account for the Databricks cluster. We recommend giving this service account the least privileges needed to perform its tasks. See BigQuery Roles and Permissions.

Important

The JSON key you generate for the service account is a private key that should only be shared with authorized users as it controls access to datasets and resources in your Google Cloud account.

Use the CLI

gcloud iam service-accounts create <service-account-name>

gcloud projects add-iam-policy-binding <project-name> \
--role roles/bigquery.user \
--member="serviceAccount:<service-account-name>@<project-name>.iam.gserviceaccount.com"

gcloud projects add-iam-policy-binding <project-name> \
--role roles/bigquery.dataEditor \
--member="serviceAccount:<service-account-name>@<project-name>.iam.gserviceaccount.com"

gcloud iam service-accounts keys create --iam-account \
"<service-account-name>@<project-name>.iam.gserviceaccount.com" \
<project-name>-deadbeefdead.json

Use the Console

  1. Click IAM and Admin in the left navigation pane.

  2. Click Service Accounts.

  3. Click + CREATE SERVICE ACCOUNT.

  4. Enter the service account name and description.

    Google Create Service Account
  5. Click CREATE.

  6. Specify roles for the service account. In the Select a role drop-down, type BigQuery and add the following roles:

    Google Permissions
  7. Click CONTINUE.

  8. Click DONE.

  9. In the service accounts list, click the newly created account.

  10. In the Keys section, select ADD KEY > Create new key button.

    Google Create Key
  11. Accept the JSON key type.

  12. Click CREATE. The key file is downloaded to your computer.

Create GCS bucket for temporary storage

To write data to BigQuery the data source needs access to a GCS bucket.

  1. Click Storage in the left navigation pane.

  2. Click CREATE BUCKET.

    Google Create Bucket
  3. Configure the bucket details.

    Google Bucket Details
  4. Click CREATE.

  5. Click the Permissions tab and Add members.

  6. Provide the following permissions to the service account on the bucket.

    Google Bucket Permissions
  7. Click SAVE.

Step 2: Set up Databricks

Upload the service credentials to DBFS and configure a cluster to read the credentials.

Upload Google credentials to DBFS

Upload the JSON key containing Google Cloud service credentials you generated in Step 1 to DBFS.

dbfs cp <project-name>-deadbeefdead.json dbfs:/gcp/service-account-key.json

Configure cluster

  1. Set the Databricks Runtime Version to Databricks Runtime 7.1 or above.

  2. Set the GOOGLE_APPLICATION_CREDENTIALS environment variable to point to the JSON key path in DBFS.

    Google Credentials

Usage

To read a BigQuery table, specify

df = spark.read.format("bigquery").option("table",<table-name>).load()

To write to a BigQuery table, specify

df.write.format("bigquery").mode("<mode>").option("temporaryGcsBucket", "<bucket-name>").option("table", <table-name>).save()

where <bucket-name> is the name of the bucket you created in Create GCS bucket for temporary storage.

Example notebooks

Google BigQuery Python sample notebook

Open notebook in new tab

Google BigQuery Scala sample notebook

Open notebook in new tab