Zerobus Ingest connector overview
The Zerobus Ingest connector is in Public Preview. To try it, contact your Databricks account representative.
The Zerobus Ingest connector enables record-by-record data ingestion directly into Delta tables through a gRPC API. This serverless connector operates at any scale and streamlines ingestion workflows by eliminating the need for message bus infrastructure and Delta-specific dependencies.
The connector benefits clients who face challenges when integrating with systems or writing directly in Delta Lake format. Any application that can communicate via gRPC and construct Protobuf messages can use Zerobus Ingest to push data efficiently into Delta tables.
Applications can build integration against a standard API interface, simplifying architecture by removing message bus dependencies. For example, clickstream data can flow directly from applications to Delta tables without intermediate message bus infrastructure.
The Zerobus Ingest API buffers transmitted data before adding it to a Delta table. This buffering creates an efficient and durable ingestion mechanism that supports a high volume of clients with variable throughput.
Once materialized into Delta format, the data becomes fully compatible with the comprehensive Databricks Data Intelligence Platform, allowing users to leverage familiar tools and functionalities for further data analysis and processing.
Concepts
A data producer first opens a stream to a Delta table, constructs a message matching its schema, and then pushes the message to the Zerobus Ingest API. The service makes the data durable, acknowledges the client's message, and materializes the data in the Delta table.
Server
The Zerobus Ingest service does not create or manipulate tables automatically. The service takes data from clients, validates that it fits into the table schema, and then writes the data to the table.
The service responsibilities include:
- Schema validation of the message to the table.
- Materializing the data in a timely manner to the target table.
- Sending an acknowledgement to the client that the data is durable.
Client
Client integration involves:
- Selecting a target table.
- Establishing a stream with the Zerobus Ingest service.
- Constructing a schema-compatible message.
- Sending the message.
- Managing message acknowledgements.
- Implementing recovery mechanisms in the case of client, stream, or server-side failures (e.g., connection issues, schema mismatches)
The Databricks Python SDK provides user-friendly methods to accomplish this, and the documentation offers examples of different development patterns. For custom integrations, the SDK can serve as a reference for integration structure and recovery handling.
Get started with Zerobus Ingest
- Get a Zerobus Ingest URL.
- Create or identify the table you want to ingest data into.
- Create a service principal and grant privileges to the table.
- Write a client to start sending data.
For full instructions, see Use the Zerobus Ingest connector.
Cost
At this time, you won't be charged for your Zerobus usage. However, Databricks intends to introduce charges in the future.