Google Analytics Raw Data connector limitations
Preview
The Google Analytics Raw Data connector is in Public Preview.
This article lists limitations and considerations for connecting to and ingesting raw, event-level data from Google Analytics 4 (GA4) using Lakeflow Connect and Google BigQuery.
- When a source table is deleted, the destination table is not automatically deleted. You must delete the destination table manually. This behavior is not consistent with DLT behavior.
- During source maintenance periods, Databricks might not be able to access your data.
- If a source table name conflicts with an existing destination table name, the pipeline update fails.
- Multi-destination pipeline support is API-only.
- You can optionally rename a table that you ingest. If you rename a table in your pipeline, it becomes an API-only pipeline, and you can no longer edit the pipeline in the UI.
- Column-level selection and deselection are API-only.
- If you select a column after a pipeline has already started, the connector does not automatically backfill data for the new column. To ingest historical data, manually run a full refresh on the table.
- Updates and deletes in GA4 are not ingested.
- The connector only supports one GA4 property per pipeline.
- Ingestion from Universal Analytics (UA) is not supported.
- The connector only supports authentication using a GCP service account.
- The connector can’t reliably ingest BigQuery date-partitioned tables that are larger than 50 GB.
- The connector only ingests raw data that you export from GA4 to BigQuery, and it inherits GA4 limits on the amount of historical data that you can export to BigQuery.
- The initial load fetches the data for all dates that are present in your GA4/BigQuery project.
- Databricks can't guarantee retention of
events_intraday
data for a given day after the data is available in the events table. This is because theevents_intraday
table is only intended for interim use until theevents
table is ready for that day. - The connector assumes that each row is unique. Databricks can't guarantee correct behavior if there are unexpected duplicates.