Testing for Databricks Connect for Python

Note

This article covers Databricks Connect for Databricks Runtime 13.0 and above.

This article describes how to run tests by using pytest for Databricks Connect for Databricks Runtime 13.0 and above. For more information about Databricks Connect, see Databricks Connect for Python.

This information assumes that you have already installed Databricks Connect for Python. See Install Databricks Connect for Python.

You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest to test your functions that accept and return PySpark DataFrame objects in local memory. To get started with pytest and run it locally, see Get Started in the pytest documentation.

For example, given the following file named nyctaxi_functions.py containing a get_spark function that returns a SparkSession instance and a get_nyctaxi_trips function that returns a DataFrame representing the trips table in the samples catalog’s nyctaxi schema:

nyctaxi_functions.py:

from databricks.connect import DatabricksSession
from pyspark.sql import DataFrame, SparkSession

def get_spark() -> SparkSession:
  spark = DatabricksSession.builder.getOrCreate()
  return spark

def get_nyctaxi_trips() -> DataFrame:
  spark = get_spark()
  df = spark.read.table("samples.nyctaxi.trips")
  return df

And given the following file named main.py that calls these get_spark and get_nyctaxi_trips functions:

main.py:

from nyctaxi_functions import *

df = get_nyctaxi_trips()
df.show(5)

The following file named test_nyctaxi_functions.py tests whether the get_spark function returns a SparkSession instance and whether the get_nyctaxi_trips function returns a DataFrame that contains at least one row of data:

test_nyctaxi_functions.py:

import pyspark.sql.connect.session
from nyctaxi_functions import *

def test_get_spark():
  spark = get_spark()
  assert isinstance(spark, pyspark.sql.connect.session.SparkSession)

def test_get_nyctaxi_trips():
  df = get_nyctaxi_trips()
  assert df.count() > 0

To run these tests, run the pytest command from the code project’s root, which should produce test results similar to the following:

$ pytest
=================== test session starts ====================
platform darwin -- Python 3.11.7, pytest-8.1.1, pluggy-1.4.0
rootdir: <project-rootdir>
collected 2 items

test_nyctaxi_functions.py .. [100%]
======================== 2 passed ==========================