Testing for Databricks Connect for Python
This article covers Databricks Connect for Databricks Runtime 13.3 LTS and above.
This article describes how to run tests using pytest
for Databricks Connect for Databricks Runtime 13.3 LTS and above. For information about installing Databricks Connect, see Install Databricks Connect for Python.
You can run pytest on local code that does not need a connection to a cluster in a remote Databricks workspace. For example, you might use pytest
to test your functions that accept and return PySpark DataFrame
objects in local memory. To get started with pytest
and run it locally, see Get Started in the pytest
documentation.
When running Databricks Connect from the terminal, pytest
only works with the DEFAULT configuration profile. The profile should include the Databricks compute you want to use, either a cluster or serverless compute. For information about configuring compute, see Compute configuration for Databricks Connect.
For example, given the following file named nyctaxi_functions.py
containing a get_spark
function that returns a SparkSession
instance and a get_nyctaxi_trips
function that returns a DataFrame
representing the trips
table in the samples
catalog’s nyctaxi
schema:
nyctaxi_functions.py
:
from databricks.connect import DatabricksSession
from pyspark.sql import DataFrame, SparkSession
def get_spark() -> SparkSession:
spark = DatabricksSession.builder.getOrCreate()
return spark
def get_nyctaxi_trips() -> DataFrame:
spark = get_spark()
df = spark.read.table("samples.nyctaxi.trips")
return df
And given the following file named main.py
that calls these get_spark
and get_nyctaxi_trips
functions:
main.py
:
from nyctaxi_functions import *
df = get_nyctaxi_trips()
df.show(5)
The following file named test_nyctaxi_functions.py
tests whether the get_spark
function returns a SparkSession
instance and whether the get_nyctaxi_trips
function returns a DataFrame
that contains at least one row of data:
test_nyctaxi_functions.py
:
import pyspark.sql.connect.session
from nyctaxi_functions import *
def test_get_spark():
spark = get_spark()
assert isinstance(spark, pyspark.sql.connect.session.SparkSession)
def test_get_nyctaxi_trips():
df = get_nyctaxi_trips()
assert df.count() > 0
To run these tests, run the pytest
command from the code project’s root, which should produce test results similar to the following:
$ pytest
=================== test session starts ====================
platform darwin -- Python 3.11.7, pytest-8.1.1, pluggy-1.4.0
rootdir: <project-rootdir>
collected 2 items
test_nyctaxi_functions.py .. [100%]
======================== 2 passed ==========================