Google 広告からデータを取り込む

備考

ベータ版

この機能はベータ版です。ワークスペース管理者は、 プレビュー ページからこの機能へのアクセスを制御できます。「Databricks プレビューの管理」を参照してください。

Google 広告から Databricks にデータを取り込むためのマネージド取り込みパイプラインを作成する方法を学びます。

要件

取り込みパイプラインを作成するには、次の要件を満たす必要があります。
- ワークスペースでUnity Catalogが有効になっている必要があります。
- ワークスペースでサーバレスコンピュートを有効にする必要があります。「サーバレスコンピュート要件」を参照してください。
- 新しい接続を作成する場合：メタストアに対してCREATE CONNECTION権限が必要です。Unity Catalogの「権限の管理」を参照してください。
  
  コネクタが UI ベースのパイプラインオーサリングをサポートしている場合、管理者はこのページのステップを完了することで、接続とパイプラインを同時に作成できます。ただし、パイプラインを作成するユーザーが API ベースのパイプラインオーサリングを使用している場合、または管理者以外のユーザーである場合、管理者はまずカタログエクスプローラーで接続を作成する必要があります。「管理対象取り込みソースへの接続」を参照してください。
- 既存の接続を使用する場合: 接続オブジェクトに対するUSE CONNECTION権限またはALL PRIVILEGESが必要です。
- ターゲットカタログに対するUSE CATALOG権限が必要です。
- 既存のスキーマに対するUSE SCHEMAおよびCREATE TABLE権限、またはターゲットカタログに対するCREATE SCHEMA権限が必要です。
Google 広告から取り込むには、「Google 広告取り込み用にOAuthを構成する」のステップを完了する必要があります。

取り込みパイプラインを作成する

Databricks Asset Bundles
Databricks notebook

このタブでは、宣言型自動化バンドルを使用してデータ取り込みパイプラインをデプロイする方法について説明します。バンドルにはジョブとタスクの YAML 定義を含めることができ、 Databricks CLIを使用して管理でき、さまざまなターゲットワークスペース (開発、ステージング、本番運用など) で共有して実行できます。詳細については、「宣言的オートメーションバンドルとは何ですか?」を参照してください。。

Databricks CLI を使用して新しいバンドルを作成します。
Bash
```
databricks bundle init
```
バンドルに 2 つの新しいリソースファイルを追加します。
- パイプライン定義ファイル (例: resources/google_ads_pipeline.yml )。
- データ取り込みの頻度を制御するジョブ定義ファイル (例: resources/google_ads_job.yml )。
パイプライン.ingestion_定義を参照してください。および例。
Databricks CLI を使用してパイプラインをデプロイします。
Bash
```
databricks bundle deploy
```

例

Databricks Asset Bundles
Databricks notebook

次のパイプライン定義ファイルは、1 つのアカウントから現在のテーブルと将来のテーブルをすべて取り込みます。

YAML
resources:
  pipelines:
    pipeline_google_ads:
      name: <pipeline>
      catalog: <destination-catalog>
      target: <destination-schema>
      ingestion_definition:
        connection_name: <connection>
        objects:
          - schema:
              source_schema: <account-id>
              destination_catalog: <destination-catalog>
              destination_schema: <destination-schema>
              google_ads_options:
                manager_account_id: <manager-account-id>
                lookback_window_days: <lookback-window-days>
                sync_start_date: <sync-start-date>

次のパイプライン定義ファイルは、アカウントから特定のテーブルを選択して取り込みます。

YAML
resources:
  pipelines:
    pipeline_google_ads:
      name: <pipeline-name>
      catalog: <destination-catalog>
      target: <destination-schema>
      ingestion_definition:
        connection_name: <connection-name>
        objects:
          - table:
            source_schema: <customer-account-id>
            source_table: <table1>
            destination_catalog: <destination-catalog>
            destination_schema: <destination-schema>
            destination_table: <destination-table>
            google_ads_options:
              manager_account_id: <manager-account-id>
              lookback_window_days: <lookback-window-days>
              sync_start_date: <sync-start-date>
          - table:
            source_schema: <customer-account-id>
            source_table: table2
            destination_catalog: <destination-catalog>
            destination_schema: <destination-schema>
            destination_table: <destination-table>
            google_ads_options:
              manager_account_id: <manager-account-id>
              lookback_window_days: <lookback-window-days>
              sync_start_date: <sync-start-date>

以下はジョブ定義ファイルの例です。

YAML
resources:
  jobs:
    google_ads_dab_job:
      name: google_ads_dab_job
      trigger:
        # Run this job every day, exactly one day from the last run
        # See https://docs.databricks.com/api/workspace/jobs/create#trigger
        periodic:
          interval: 1
          unit: DAYS
      email_notifications:
        on_failure:
          - <email-address>
      tasks:
        - task_key: refresh_pipeline
          pipeline_task:
            pipeline_id: <pipeline-id>

次のパイプライン仕様は、1 つのアカウントから現在のテーブルと将来のテーブルをすべて取り込みます。

Python
pipeline_spec = {
  "name": <pipeline>,
  "catalog": "<destination-catalog>",
  "schema": "<destination-schema>",
  "ingestion_definition": {
    "connection_name": <connection>,
    "objects": [
      {
        "schema": {
          "source_schema": "<account-id>",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>",
          "google_ads_options": {
            "manager_account_id": "<manager-account-id>",
            "lookback_window_days": <lookback-window-days>,
            "sync_start_date": "<sync-start-date>"
          }
        }
      }
    ]
  }
}

json_payload = json.dumps(pipeline_spec, indent=2)
create_pipeline(json_payload)

次のパイプライン仕様では、アカウントから特定のテーブルを選択して取り込みます。

Python
pipeline_spec = {
  "name": <pipeline>,
  "catalog": "<destination-catalog>",
  "schema": "<destination-schema>",
  "ingestion_definition": {
    "connection_name": <connection>,
    "objects": [
      {
        "table": {
          "source_schema": "<customer-account-id>",
          "source_table": "<table1>",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>",
          "destination_table": "<destination-table>",
          "google_ads_options": {
            "manager_account_id": "<manager-account-id>",
            "lookback_window_days": <lookback-window-days>,
            "sync_start_date": "<sync-start-date>"
          }
        }
      },
      {
        "table": {
          "source_schema": "<customer-account-id>",
          "source_table": "table2",
          "destination_catalog": "<destination-catalog>",
          "destination_schema": "<destination-schema>",
          "destination_table": "<destination-table>",
          "google_ads_options": {
            "manager_account_id": "<manager-account-id>",
            "lookback_window_days": <lookback-window-days>,
            "sync_start_date": "<sync-start-date>"
          }
        }
      }
    ]
  }
}

json_payload = json.dumps(pipeline_spec, indent=2)
create_pipeline(json_payload)

一般的なパターン

高度なパイプライン構成については、「管理された取り込みパイプラインの一般的なパターン」を参照してください。

次のステップ

パイプラインを開始、スケジュールし、アラートを設定します。一般的なパイプラインメンテナンスタスクを参照してください。

要件​

取り込みパイプラインを作成する​

例​

一般的なパターン​

次のステップ​

その他のリソース​

要件

取り込みパイプラインを作成する

例

一般的なパターン

次のステップ

その他のリソース