Google Analytics Raw Data connector limitations
Preview
The Google Analytics Raw Data connector is in Public Preview.
This page lists limitations and considerations for ingesting raw, event-level data from Google Analytics using Databricks Lakeflow Connect and Google BigQuery.
General SaaS connector limitations
The limitations in this section apply to all SaaS connectors in Lakeflow Connect.
- When you run a scheduled pipeline, alerts don't trigger immediately. Instead, they trigger when the next update runs.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with DLT behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- Column-level selection and deselection are API-only.
- If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
- Managed ingestion pipelines aren't supported for the following:
- Workspaces in AWS GovCloud regions
- Workspaces in Azure GovCloud regions
- FedRAMP-compliant workspaces
Connector-specific limitations
The limitations in this section are specific to the GA4 connector.
Authentication
- The connector only supports authentication using a GCP service account.
Pipelines
- Updates and deletes in GA4 are not ingested.
- The connector only supports one GA4 property per pipeline.
- Ingestion from Universal Analytics (UA) is not supported.
Tables
- The connector can't reliably ingest BigQuery date-partitioned tables that are larger than 50 GB.
- The connector only ingests raw data that you export from GA4 to BigQuery, and it inherits GA4 limits on the amount of historical data that you can export to BigQuery.
- The initial load fetches the data for all dates that are present in your GA4/BigQuery project.
- Databricks can't guarantee retention of
events_intraday
data for a given day after the data is available in theevents
table. This is because theevents_intraday
table is only intended for interim use until theevents
table is ready for that day. - The connector assumes that each row is unique. Databricks can't guarantee correct behavior if there are unexpected duplicates.