ノートブックの単体テスト

単体テスト を使用すると、ノートブックのコードの品質と一貫性を向上させることができます。単体テストは、関数などの自己完結型のコード単位を早期かつ頻繁にテストするためのアプローチです。これにより、コードの問題をより迅速に発見し、コードに関する誤った仮定をより早く発見し、全体的なコーディング作業を合理化できます。

この記事では、関数を使用した基本的な単体テストの概要について説明します。単体テストクラスやインターフェイス、スタブ、モック、テストハーネスの使用などの高度な概念は、ノートブックの単体テストでもサポートされますが、この記事の範囲外です。この記事では、統合テスト、システムテスト、受け入れテスト、パフォーマンステストやユーザビリティテストなどの非機能テスト方法など、他の種類のテスト方法についても説明しません。

この記事では、次の内容について説明します。

関数とその単体テストを整理する方法。
Python、R、Scala、およびSQLのユーザー定義関数で、単体テスト用に適切に設計された関数を作成する方法。
これらの関数を Python、R、Scala、および SQL ノートブックから呼び出す方法。
一般的なテストフレームワークである pytest for Python、 testthat 、Scala for Scala を使用して、Python、R、および Scala で単体テストを記述する方法。また、SQLユーザー定義関数(SQL UDF)を単体テストするSQLの書き方も同様です。
これらの単体テストを Python、R、Scala、および SQL ノートブックから実行する方法。

注

Databricks では、単体テストをノートブックに記述して実行することをお勧めします。 Webターミナルで一部のコマンドを実行できますが、WebターミナルにはSparkのサポートがないなど、より多くの制限があります。「Databricks Webターミナルの実行シェルコマンド」を参照してください。

関数と単体テストを整理する

ノートブックを使用して関数とその単体テストを整理するための一般的な方法がいくつかあります。それぞれのアプローチには利点と課題があります。

Python、R、および Scala ノートブックの場合、一般的なアプローチは次のとおりです。

関数とその単体テストをノートブックの外部に格納します。
- 利点: これらの関数は、ノートブックの内外で呼び出すことができます。テストフレームワークは、ノートブックの外部でテストを実行するように設計されています。
- 課題: このアプローチは、Scala ノートブックではサポートされていません。この方法では、追跡および保守するファイルの数も増加します。
関数を 1 つのノートブックに格納し、その単体テストを別のノートブックに格納します。
- 利点: これらの関数は、ノートブック間で簡単に再利用できます。
- 課題: 追跡および保守するノートブックの数が増えます。これらの関数は、ノートブックの外部では使用できません。これらの関数は、ノートブックの外部でテストするのがより困難になる場合もあります。
関数とその単体テストを同じノートブック内に格納します。
- 利点: 関数とその単体テストは、追跡とメンテナンスを容易にするために 1 つのノートブックに格納されます。
- 課題: これらの機能は、ノートブック間で再利用するのがより困難になる可能性があります。これらの関数は、ノートブックの外部では使用できません。これらの関数は、ノートブックの外部でテストするのがより困難になる場合もあります。

Python および R ノートブックの場合、Databricks では、関数とその単体テストをノートブックの外部に格納することをお勧めします。 Scala ノートブックの場合、Databricks では、関数を 1 つのノートブックに含め、その単体テストを別のノートブックに含めることをお勧めします。

SQL ノートブックの場合、Databricks では、関数をスキーマ (データベースとも呼ばれます) に SQL ユーザー定義関数 (SQL UDF) として格納することをお勧めします。その後、これらの SQL UDF とその単体テストを SQL ノートブックから呼び出すことができます。

関数の書き込み

このセクションでは、以下を決定する関数例の簡単なセットについて説明します。

テーブルがデータベースに存在するかどうか。
テーブルに列が存在するかどうか。
列内の値の列に存在する行数。

これらの関数は、関数自体に焦点を当てるのではなく、この記事の単体テストの詳細に集中できるように、単純であることが意図されています。

最適な単体テスト結果を得るには、関数が 1 つの予測可能な結果を返し、1 つのデータ型である必要があります。たとえば、何かが存在するかどうかを確認するには、関数は true または false の Boolean 値を返す必要があります。存在する行数を返すには、関数は負でない整数を返す必要があります。最初の例では、何かが存在しない場合はfalseを返し、存在する場合はモノ自体を返すべきではありません。同様に、2 番目の例では、存在する行数を返すことも、行が存在しない場合は false も返しません。

これらの関数は、次のように既存の Databricks ワークスペースに Python、R、Scala、または SQL で追加できます。

次のコードは、 Databricks Git フォルダー ( Repos ) をセットアップし、リポジトリを追加し、そのリポジトリを Databricks ワークスペースで開いていることを前提としています。

リポジトリ内に myfunctions.py という名前のファイルを作成し、次の内容をファイルに追加します。この記事の他の例では、このファイルの名前が myfunctions.pyであることを想定しています。独自のファイルには別の名前を使用できます。

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Does the specified table exist in the specified database?
def tableExists(tableName, dbName):
  return spark.catalog.tableExists(f"{dbName}.{tableName}")

# Does the specified column exist in the given DataFrame?
def columnExists(dataFrame, columnName):
  if columnName in dataFrame.columns:
    return True
  else:
    return False

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
def numRowsInColumnForValue(dataFrame, columnName, columnValue):
  df = dataFrame.filter(col(columnName) == columnValue)

  return df.count()

次のコードは、 Databricks Git フォルダー ( Repos ) をセットアップし、リポジトリを追加し、そのリポジトリを Databricks ワークスペースで開いていることを前提としています。

リポジトリ内に myfunctions.r という名前のファイルを作成し、次の内容をファイルに追加します。この記事の他の例では、このファイルの名前が myfunctions.rであることを想定しています。独自のファイルには別の名前を使用できます。

library(SparkR)

# Does the specified table exist in the specified database?
table_exists <- function(table_name, db_name) {
  tableExists(paste(db_name, ".", table_name, sep = ""))
}

# Does the specified column exist in the given DataFrame?
column_exists <- function(dataframe, column_name) {
  column_name %in% colnames(dataframe)
}

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
num_rows_in_column_for_value <- function(dataframe, column_name, column_value) {
  df = filter(dataframe, dataframe[[column_name]] == column_value)

  count(df)
}

次の内容の myfunctions という名前の Scala ノートブックを作成します。この記事の他の例では、このノートブックの名前が myfunctionsであることを想定しています。独自のノートブックに別の名前を使用できます。

import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.col

// Does the specified table exist in the specified database?
def tableExists(tableName: String, dbName: String) : Boolean = {
  return spark.catalog.tableExists(dbName + "." + tableName)
}

// Does the specified column exist in the given DataFrame?
def columnExists(dataFrame: DataFrame, columnName: String) : Boolean = {
  val nameOfColumn = null

  for(nameOfColumn <- dataFrame.columns) {
    if (nameOfColumn == columnName) {
      return true
    }
  }

  return false
}

// How many rows are there for the specified value in the specified column
// in the given DataFrame?
def numRowsInColumnForValue(dataFrame: DataFrame, columnName: String, columnValue: String) : Long = {
  val df = dataFrame.filter(col(columnName) === columnValue)

  return df.count()
}

次のコードでは、Databricks ワークスペースからアクセスできる main という名前のカタログ内の default という名前のスキーマ内にサードパーティのサンプルデータセットダイヤモンドがあることを前提としています。使用するカタログまたはスキーマの名前が異なる場合は、次の USE ステートメントの一方または両方を一致するように変更します。

SQL ノートブックを作成し、この新しいノートブックに次の内容を追加します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して、指定したカタログとスキーマに次の SQL UDF を追加します。

注

SQL UDF table_exists および column_exists は、 Unity Catalogでのみ機能します。 Unity Catalog の SQL UDF サポートはパブリックプレビュー段階です。

USE CATALOG main;
USE SCHEMA default;

CREATE OR REPLACE FUNCTION table_exists(catalog_name STRING,
                                        db_name      STRING,
                                        table_name   STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.tables
     WHERE table_catalog = table_exists.catalog_name
       AND table_schema  = table_exists.db_name
       AND table_name    = table_exists.table_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION column_exists(catalog_name STRING,
                                         db_name      STRING,
                                         table_name   STRING,
                                         column_name  STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.columns
     WHERE table_catalog = column_exists.catalog_name
       AND table_schema  = column_exists.db_name
       AND table_name    = column_exists.table_name
       AND column_name   = column_exists.column_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION num_rows_for_clarity_in_diamonds(clarity_value STRING)
  RETURNS BIGINT
  RETURN SELECT count(*)
         FROM main.default.diamonds
         WHERE clarity = clarity_value

関数を呼び出す

このセクションでは、上記の関数を呼び出すコードについて説明します。これらの関数を使用して、たとえば、指定された列内に指定された値が存在するテーブル内の行数をカウントできます。ただし、先に進む前に、テーブルが実際に存在するかどうか、およびそのテーブルに列が実際に存在するかどうかを確認する必要があります。次のコードは、これらの条件をチェックします。

前のセクションの関数を Databricks ワークスペースに追加した場合は、次のようにワークスペースからこれらの関数を呼び出すことができます。

リポジトリ内の前の myfunctions.py ファイルと同じフォルダーに Python ノートブックを作成し、次の内容をノートブックに追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

from myfunctions import *

tableName   = "diamonds"
dbName      = "default"
columnName  = "clarity"
columnValue = "VVS2"

# If the table exists in the specified database...
if tableExists(tableName, dbName):

  df = spark.sql(f"SELECT * FROM {dbName}.{tableName}")

  # And the specified column exists in that table...
  if columnExists(df, columnName):
    # Then report the number of rows for the specified value in that column.
    numRows = numRowsInColumnForValue(df, columnName, columnValue)

    print(f"There are {numRows} rows in '{tableName}' where '{columnName}' equals '{columnValue}'.")
  else:
    print(f"Column '{columnName}' does not exist in table '{tableName}' in schema (database) '{dbName}'.")
else:
  print(f"Table '{tableName}' does not exist in schema (database) '{dbName}'.") 

リポジトリ内の前の myfunctions.r ファイルと同じフォルダーに R ノートブックを作成し、次の内容をノートブックに追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

library(SparkR)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "VVS2"

# If the table exists in the specified database...
if (table_exists(table_name, db_name)) {

  df = sql(paste("SELECT * FROM ", db_name, ".", table_name, sep = ""))

  # And the specified column exists in that table...
  if (column_exists(df, column_name)) {
    # Then report the number of rows for the specified value in that column.
    num_rows = num_rows_in_column_for_value(df, column_name, column_value)

    print(paste("There are ", num_rows, " rows in table '", table_name, "' where '", column_name, "' equals '", column_value, "'.", sep = "")) 
  } else {
    print(paste("Column '", column_name, "' does not exist in table '", table_name, "' in schema (database) '", db_name, "'.", sep = ""))
  }

} else {
  print(paste("Table '", table_name, "' does not exist in schema (database) '", db_name, "'.", sep = ""))
}

前の myfunctions Scala ノートブックと同じフォルダーに別の Scala ノートブックを作成し、この新しいノートブックに次の内容を追加します。

この新しいノートブックの最初のセルに、 %run マジックを呼び出す次のコードを追加します。この魔法により、 myfunctions ノートブックの内容を新しいノートブックで使用できるようになります。

%run ./myfunctions

この新しいノートブックの 2 番目のセルに、次のコードを追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

val tableName   = "diamonds"
val dbName      = "default"
val columnName  = "clarity"
val columnValue = "VVS2"

// If the table exists in the specified database...
if (tableExists(tableName, dbName)) {

  val df = spark.sql("SELECT * FROM " + dbName + "." + tableName)

  // And the specified column exists in that table...
  if (columnExists(df, columnName)) {
    // Then report the number of rows for the specified value in that column.
    val numRows = numRowsInColumnForValue(df, columnName, columnValue)

    println("There are " + numRows + " rows in '" + tableName + "' where '" + columnName + "' equals '" + columnValue + "'.")
  } else {
    println("Column '" + columnName + "' does not exist in table '" + tableName + "' in database '" + dbName + "'.")
  }

} else {
  println("Table '" + tableName + "' does not exist in database '" + dbName + "'.")
}

次のコードを前のノートブックの新しいセルまたは別のノートブックのセルに追加します。必要に応じてスキーマ名またはカタログ名を自分の名前と一致するように変更し、このセルを実行して結果を確認します。

SELECT CASE
-- If the table exists in the specified catalog and schema...
WHEN
  table_exists("main", "default", "diamonds")
THEN
  -- And the specified column exists in that table...
  (SELECT CASE
   WHEN
     column_exists("main", "default", "diamonds", "clarity")
   THEN
     -- Then report the number of rows for the specified value in that column.
     printf("There are %d rows in table 'main.default.diamonds' where 'clarity' equals 'VVS2'.",
            num_rows_for_clarity_in_diamonds("VVS2"))
   ELSE
     printf("Column 'clarity' does not exist in table 'main.default.diamonds'.")
   END)
ELSE
  printf("Table 'main.default.diamonds' does not exist.")
END

単体テストを書く

このセクションでは、この記事の冒頭で説明する各関数をテストするコードについて説明します。将来関数に変更を加えた場合は、単体テストを使用して、それらの関数が期待どおりに動作するかどうかを判断できます。

この記事の冒頭にある関数を Databricks ワークスペースに追加した場合は、次のようにこれらの関数の単体テストをワークスペースに追加できます。

リポジトリ内の前の myfunctions.py ファイルと同じフォルダーに test_myfunctions.py という名前の別のファイルを作成し、次の内容をファイルに追加します。デフォルトでは、pytest 名前が test_ で始まる (または _testで終わる) .py ファイルを検索してテストします。同様に、デフォルトで、 pytest はこれらのファイルの内部で、名前がテストする test_ で始まる関数を検索します。

一般に、運用環境でデータを操作する関数に対して単体テストを実行しないことをお勧めします。これは、データを追加、削除、またはその他の方法で変更する関数にとって特に重要です。予期しない方法で単体テストによって運用データが侵害されないようにするには、非運用データに対して単体テストを実行する必要があります。一般的なアプローチの1つは、本番データにできるだけ近い偽のデータを作成することです。単体テストを実行するための偽のデータを作成するコード例を次に示します。

import pytest
import pyspark
from myfunctions import *
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType, StringType

tableName    = "diamonds"
dbName       = "default"
columnName   = "clarity"
columnValue  = "SI2"

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema = StructType([ \
  StructField("_c0",     IntegerType(), True), \
  StructField("carat",   FloatType(),   True), \
  StructField("cut",     StringType(),  True), \
  StructField("color",   StringType(),  True), \
  StructField("clarity", StringType(),  True), \
  StructField("depth",   FloatType(),   True), \
  StructField("table",   IntegerType(), True), \
  StructField("price",   IntegerType(), True), \
  StructField("x",       FloatType(),   True), \
  StructField("y",       FloatType(),   True), \
  StructField("z",       FloatType(),   True), \
])

data = [ (1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43 ), \
         (2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31 ) ]

df = spark.createDataFrame(data, schema)

# Does the table exist?
def test_tableExists():
  assert tableExists(tableName, dbName) is True

# Does the column exist?
def test_columnExists():
  assert columnExists(df, columnName) is True

# Is there at least one row for the value in the specified column?
def test_numRowsInColumnForValue():
  assert numRowsInColumnForValue(df, columnName, columnValue) > 0

リポジトリ内の前の myfunctions.r ファイルと同じフォルダーに test_myfunctions.r という名前の別のファイルを作成し、次の内容をファイルに追加します。デフォルトでは、名前が test で始まる .r ファイルがテスト対象testthat 検索されます。

一般に、運用環境でデータを操作する関数に対して単体テストを実行しないことをお勧めします。これは、データを追加、削除、またはその他の方法で変更する関数にとって特に重要です。予期しない方法で単体テストによって運用データが侵害されないようにするには、非運用データに対して単体テストを実行する必要があります。一般的なアプローチの1つは、本番データにできるだけ近い偽のデータを作成することです。単体テストを実行するための偽のデータを作成するコード例を次に示します。

library(testthat)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "SI2"

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema <- structType(
  structField("_c0",     "integer"),
  structField("carat",   "float"),
  structField("cut",     "string"),
  structField("color",   "string"),
  structField("clarity", "string"),
  structField("depth",   "float"),
  structField("table",   "integer"),
  structField("price",   "integer"),
  structField("x",       "float"),
  structField("y",       "float"),
  structField("z",       "float"))

data <- list(list(as.integer(1), 0.23, "Ideal",   "E", "SI2", 61.5, as.integer(55), as.integer(326), 3.95, 3.98, 2.43),
             list(as.integer(2), 0.21, "Premium", "E", "SI1", 59.8, as.integer(61), as.integer(326), 3.89, 3.84, 2.31))

df <- createDataFrame(data, schema)

# Does the table exist?
test_that ("The table exists.", {
  expect_true(table_exists(table_name, db_name))
})

# Does the column exist?
test_that ("The column exists in the table.", {
  expect_true(column_exists(df, column_name))
})

# Is there at least one row for the value in the specified column?
test_that ("There is at least one row in the query result.", {
  expect_true(num_rows_in_column_for_value(df, column_name, column_value) > 0)
})

前の myfunctions Scala ノートブックと同じフォルダーに別の Scala ノートブックを作成し、この新しいノートブックに次の内容を追加します。

新しいノートブックの最初のセルに、 %run マジックを呼び出す次のコードを追加します。この魔法により、 myfunctions ノートブックの内容を新しいノートブックで使用できるようになります。

%run ./myfunctions

2 番目のセルに次のコードを追加します。このコードは、単体テストを定義し、その実行方法を指定します。

一般に、運用環境でデータを操作する関数に対して単体テストを実行しないことをお勧めします。これは、データを追加、削除、またはその他の方法で変更する関数にとって特に重要です。予期しない方法で単体テストによって運用データが侵害されないようにするには、非運用データに対して単体テストを実行する必要があります。一般的なアプローチの1つは、本番データにできるだけ近い偽のデータを作成することです。単体テストを実行するための偽のデータを作成するコード例を次に示します。

import org.scalatest._
import org.apache.spark.sql.types.{StructType, StructField, IntegerType, FloatType, StringType}
import scala.collection.JavaConverters._

class DataTests extends AsyncFunSuite {

  val tableName   = "diamonds"
  val dbName      = "default"
  val columnName  = "clarity"
  val columnValue = "SI2"

  // Create fake data for the unit tests to run against.
  // In general, it is a best practice to not run unit tests
  // against functions that work with data in production.
  val schema = StructType(Array(
                 StructField("_c0",     IntegerType),
                 StructField("carat",   FloatType),
                 StructField("cut",     StringType),
                 StructField("color",   StringType),
                 StructField("clarity", StringType),
                 StructField("depth",   FloatType),
                 StructField("table",   IntegerType),
                 StructField("price",   IntegerType),
                 StructField("x",       FloatType),
                 StructField("y",       FloatType),
                 StructField("z",       FloatType)
               ))

  val data = Seq(
                  Row(1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43),
                  Row(2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31)
                ).asJava

  val df = spark.createDataFrame(data, schema)

  // Does the table exist?
  test("The table exists") {
    assert(tableExists(tableName, dbName) == true)
  }

  // Does the column exist?
  test("The column exists") {
    assert(columnExists(df, columnName) == true)
  }

  // Is there at least one row for the value in the specified column?
  test("There is at least one matching row") {
    assert(numRowsInColumnForValue(df, columnName, columnValue) > 0)
  }
}

nocolor.nodurations.nostacks.stats.run(new DataTests)

注

このコード例では、ScalaTest でテストの FunSuite スタイルを使用します。その他の使用可能なテストスタイルについては、「プロジェクトのテストスタイルの選択」を参照してください。

単体テストを追加する前に、一般に、本番運用でデータを操作する関数に対して単体テストを実行しないことがベストプラクティスであることを認識しておく必要があります。これは、データを追加、削除、またはその他の方法で変更する関数にとって特に重要です。本番運用データがユニットテストによって予期しない方法で侵害されるのを防ぐには、本番運用以外のデータに対してユニットテストを実行する必要があります。一般的なアプローチの 1 つは、テーブルではなくビューに対して単体テストを実行することです。

ビューを作成するには、前のノートブックまたは別のノートブックの新しいセルから CREATE VIEW コマンドを呼び出します。次の例では、 mainという名前のカタログ内の default という名前のスキーマ (データベース) 内に diamonds という名前の既存のテーブルがあることを前提としています。必要に応じてこれらの名前を自分の名前と一致するように変更し、そのセルのみを実行します。

USE CATALOG main;
USE SCHEMA default;

CREATE VIEW view_diamonds AS
SELECT * FROM diamonds;

ビューを作成したら、次の各 SELECT ステートメントを前のノートブックの独自の新しいセルまたは別のノートブックの独自の新しいセルに追加します。必要に応じて、自分の名前と一致するように変更します。

SELECT if(table_exists("main", "default", "view_diamonds"),
          printf("PASS: The table 'main.default.view_diamonds' exists."),
          printf("FAIL: The table 'main.default.view_diamonds' does not exist."));

SELECT if(column_exists("main", "default", "view_diamonds", "clarity"),
          printf("PASS: The column 'clarity' exists in the table 'main.default.view_diamonds'."),
          printf("FAIL: The column 'clarity' does not exists in the table 'main.default.view_diamonds'."));

SELECT if(num_rows_for_clarity_in_diamonds("VVS2") > 0,
          printf("PASS: The table 'main.default.view_diamonds' has at least one row where the column 'clarity' equals 'VVS2'."),
          printf("FAIL: The table 'main.default.view_diamonds' does not have at least one row where the column 'clarity' equals 'VVS2'."));

単体テストを実行する

このセクションでは、前のセクションでコーディングした単体テストを実行する方法について説明します。単体テストを実行すると、どの単体テストが成功し、どの失敗したかを示す結果が得られます。

前のセクションの単体テストを Databricks ワークスペースに追加した場合は、ワークスペースからこれらの単体テストを実行できます。これらの単体テストは、手動またはスケジュールに従って実行できます。

リポジトリ内の前の test_myfunctions.py ファイルと同じフォルダーに Python ノートブックを作成し、次の内容を追加します。

新しいノートブックの最初のセルに次のコードを追加し、 %pip マジックを呼び出すセルを実行します。この魔法は pytestをインストールします。

%pip install pytest

2 番目のセルに次のコードを追加し、セルを実行します。結果は、どの単体テストが成功し、どの単体テストが失敗したかを示します。

import pytest
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True

# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])

# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

リポジトリ内の前の test_myfunctions.r ファイルと同じフォルダーに R ノートブックを作成し、次の内容を追加します。

最初のセルに次のコードを追加し、 install.packages 関数を呼び出すセルを実行します。この関数は、 testthatをインストールします。

install.packages("testthat")

2 番目のセルに次のコードを追加し、セルを実行します。結果は、どの単体テストが成功し、どの失敗したかを示します。

library(testthat)
source("myfunctions.r")

test_dir(".", reporter = "tap")

前のセクションのノートブックの 1 番目のセルと 2 番目のセルを実行します。結果は、どの単体テストが成功し、どの失敗したかを示します。

前のセクションのノートブックの 3 つのセルをそれぞれ実行します。結果は、各単体テストが成功したか失敗したかを示します。

単体テストの実行後にビューが不要になった場合は、ビューを削除できます。このビューを削除するには、上記のノートブックのいずれか内の新しいセルに次のコードを追加し、そのセルのみを実行します。

DROP VIEW view_diamonds;

ヒント

ノートブックの実行結果 (単体テストの結果を含む) は、クラスターのドライバーログで確認できます。また、クラスターのログ配信の場所を指定することもできます。

GitHub Actions などの継続的インテグレーションと継続的デリバリーまたはデプロイ (CI/CD) システムを設定して、コードが変更されるたびに単体テストを自動的に実行できます。例については、「ノートブックのソフトウェアエンジニアリングのベストプラクティス」の GitHub Actions のカバレッジを参照してください。

ノートブックの単体テスト

関数と単体テストを整理する

関数の書き込み

関数を呼び出す

単体テストを書く

単体テストを実行する

関連リソース

パイテスト

testthat

ScalaTest

SQL