SQL Data Warehouse

In this example, we’ll connect to SQL DataWarehouse using the SQLServer Driver.

The Apache Spark upstream docs mentions the various supported arguments. It also covers the DataFrame API syntax and how to control the parallelism of reading through the JDBC interface.

Load your SQLServer JDBC Driver onto Azure Databricks

  • Databricks comes preloaded with JDBC libraries for many data stores, but you can attach other JDBC libraries and reference them in your code
  • See our Libraries Documentation for instructions on how to install a Java JAR.

Check that the SQLServer JDBC Driver is available

This checks that the class exists in your classpath. Users can use the %scala cell to test this from other notebook types such as python.

// Register SQLServerDriver
    Class.forName("com.microsoft.sqlserver.jdbc.SQLServerDriver")

Load Your SQL Configuration

import java.util.Properties

    // Build the parameters into a JDBC url to pass into the DataFrame APIs
    val jdbcUsername = ""
    val jdbcPassword = ""
    val jdbcHostname = "sql-dbtestcustomer-dev-westus.database.windows.net"
    val jdbcPort = 1433
    val jdbcDatabase = ""
    val jdbcUrl = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase}"

    // Create a Properties() object to hold the parameters. You can create the JDBC URL without passing in the user/password parameters directly.
    // These confs are based on the connection string provided by Azure SQL Databases.  Add these parameters to the conection properties

    val connectionProperties = new Properties()
    connectionProperties.put("user", jdbcUsername)
    connectionProperties.put("password", jdbcPassword)
    connectionProperties.put("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
    connectionProperties.put("encrypt", "true")
    connectionProperties.put("trustServerCertificate", "true")
    connectionProperties.put("thostNameInCertificaterustServerCertificate", "*.database.windows.net")
    connectionProperties.put("loginTimeout", "30")

Check Connectivity to your SQL Database

import java.sql.DriverManager
val connection = DriverManager.getConnection(jdbcUrl, jdbcUsername, jdbcPassword)
assert(!connection.isClosed())

Reading data from JDBC

In this section, we’ll load data from a SQL Data Warehouse table that already exists. We’ll use the connectionProperties() defined above.

val testTableName = "table_" + System.currentTimeMillis

val tempDf = List.range(1, 100).toSeq.toDF("id")

    tempDf
      .select($"id")
      .write
      .jdbc(jdbcUrl, testTableName, connectionProperties)

    val actualResult =
      spark
        .read
        .jdbc(jdbcUrl, testTableName, connectionProperties)
        .orderBy($"id")
        .coalesce(1)
        .as[(Long)]
        .collect
        .toSeq