Azure Cosmos DB¶
Azure Cosmos DB is Microsoft’s globally distributed, multi-model database. Azure Cosmos DB enables you to elastically and independently scale throughput and storage across any number of Azure’s geographic regions. It offers throughput, latency, availability, and consistency guarantees with comprehensive service level agreements (SLAs). Azure Cosmos DB provides APIs for the following data models, with SDKs available in multiple languages:
- SQL API
- MongoDB API
- Cassandra API
- Graph (Gremlin) API
- Table API
This topic explains how to read data from and write data to Azure Cosmos DB.
Azure Cosmos DB Spark Connector, developed by Microsoft, is in preview and requires Databricks Runtime 3.4 or above.
Create and attach required libraries¶
- Download the following libraries. You do not need to download dependencies of these libraries.
- com.microsoft.azure:azure-cosmosdb-spark_2.2.0_2.10:0.0.4 or com.microsoft.azure:azure-cosmosdb-spark_2.2.0_2.11:0.0.4. Use the JAR with the Scala version corresponding to the Scala version of your Databricks Runtime.
- Upload the downloaded JAR files to Databricks following the instructions in Upload a Java JAR or Scala JAR.
- Attach the uploaded libraries to your Databricks cluster.
Use the Azure Cosmos DB Spark connector¶
The following Scala notebook provides a simple example of how to write data to Cosmos DB and read data from Cosmos DB. See the Azure Cosmos DB Spark Connector project for detailed documentation. The Azure Cosmos DB Spark Connector User Guide, developed by Microsoft, also shows how to use this connector in Python.