ノートブックの単体テスト

単体テスト を使用すると、ノートブックのコードの品質と一貫性を向上させることができます。単体テストは、関数などの自己完結型のコード単位を早期かつ頻繁にテストするアプローチです。これにより、コードの問題をより迅速に発見し、コードに関する誤った仮定を早期に発見し、全体的なコーディング作業を効率化できます。

この記事では、関数を使用した基本的な単体テストの概要です。単体テストのクラスやインターフェイス、スタブ、モック、テストハーネスの使用などの高度な概念は、ノートブックの単体テストでもサポートされていますが、この記事では説明しません。この記事では、統合テスト、システムテスト、受け入れテスト、またはパフォーマンステストやユーザビリティテストなどの非機能テスト方法など、他の種類のテスト方法については説明しません。

この記事では、以下のトピックを紹介します。

関数とその単体テストを整理する方法。
Python、R、Scala での関数の書き方、および SQL でのユーザー定義関数の書き方で、単体テストに適した設計になっています。
Python、R、Scala、SQL ノートブックからこれらの関数を呼び出す方法。
Python、R、Scala の一般的なテストフレームワークである pytest for Python、 testthat 、および Scala for Scala を使用して単体テストを記述する方法。また、ユニットテストの SQL ユーザー定義関数 (SQL UDF) の SQL の記述方法も説明します。
これらの単体テストを Python、R、Scala、SQL ノートブックから実行する方法。

注記

Databricks では、単体テストをノートブックに記述して実行することをお勧めします。 Webターミナルで一部のコマンドを実行できますが、WebターミナルにはSparkのサポートがないなど、より多くの制限があります。「Databricks Webターミナルの実行シェルコマンド」を参照してください。

関数と単体テストの整理

ノートブックを使用して関数とその単体テストを整理するための一般的な方法がいくつかあります。それぞれのアプローチには、それぞれの利点と課題があります。

Python、R、Scala ノートブックの場合、一般的なアプローチは次のとおりです。

関数とその単体テストをノートブックの外部に格納します。
- 利点: これらの関数は、ノートブックの内外で呼び出すことができます。テストフレームワークは、ノートブックの外部でテストを実行するように適切に設計されています。
- 課題: このアプローチは Scala ノートブックではサポートされていません。このアプローチでは、追跡および保守するファイルの数も増加します。
関数を 1 つのノートブックに格納し、その単体テストを別のノートブックに格納します。
- 利点: これらの関数は、ノートブック間で再利用しやすくなります。
- 課題: 追跡および保守するノートブックの数が増加します。これらの機能は、ノートブックの外部では使用できません。また、これらの機能は、ノートブックの外部でテストするのがより困難になる場合があります。
関数とその単体テストを同じノートブック内に格納します。
- 利点: 関数とその単体テストは 1 つのノートブックに保存されるため、追跡と保守が容易になります。
- 課題: これらの関数は、ノートブック間で再利用するのがより難しくなる可能性があります。これらの機能は、ノートブックの外部では使用できません。また、これらの機能は、ノートブックの外部でテストするのがより困難になる場合があります。

Python ノートブックと R ノートブックの場合、Databricks では関数とその単体テストをノートブックの外部に格納することをお勧めします。 Scalaノートブックの場合、Databricks では、関数を 1 つのノートブックに含め、その単体テストを別のノートブックに含めることをお勧めします。

SQL ノートブックの場合、Databricks では、関数を SQL ユーザー定義関数 (SQL UDF) としてスキーマ (データベースとも呼ばれます) に格納することをお勧めします。その後、これらの SQL UDF とその単体テストを SQL ノートブックから呼び出すことができます。

書き込み関数

このセクションでは、以下を決定する関数の簡単な例のセットについて説明します。

データベースにテーブルが存在するかどうか。
テーブルに列が存在するかどうか。
その列内の値に対して、列に存在する行数。

これらの関数は単純に作られているので、関数そのものに集中するよりも、この記事のユニットテストの詳細に集中できます。

最適な単体テスト結果を得るには、関数は 1 つの予測可能な結果を返し、1 つのデータ型である必要があります。たとえば、何かが存在するかどうかを確認するには、関数は true または false のブール値を返す必要があります。存在する行数を返すには、関数は負でない整数を返す必要があります。最初の例では、何かが存在しない場合は false を返し、存在する場合はそれ自体を返すべきではありません。同様に、2 番目の例では、存在する行の数を返さず、行が存在しない場合は false を返さないでください。

これらの関数は、Python、R、Scala、または SQL で次のように、既存の Databricks ワークスペースに追加できます。

Python
R
Scala
SQL

次のコードは、 Databricks Git フォルダーを設定し、リポジトリを追加し、Databricks ワークスペースでリポジトリが開いていることを前提としています。

リポジトリ内に myfunctions.py という名前のファイルを作成し、次の内容をファイルに追加します。この記事の他の例では、このファイルの名前が myfunctions.pyであると想定しています。独自のファイルに異なる名前を使用できます。

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Does the specified table exist in the specified database?
def tableExists(tableName, dbName):
  return spark.catalog.tableExists(f"{dbName}.{tableName}")

# Does the specified column exist in the given DataFrame?
def columnExists(dataFrame, columnName):
  if columnName in dataFrame.columns:
    return True
  else:
    return False

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
def numRowsInColumnForValue(dataFrame, columnName, columnValue):
  df = dataFrame.filter(col(columnName) == columnValue)

  return df.count()

リポジトリ内に myfunctions.r という名前のファイルを作成し、次の内容をファイルに追加します。この記事の他の例では、このファイルの名前が myfunctions.rであると想定しています。独自のファイルに異なる名前を使用できます。

library(SparkR)

# Does the specified table exist in the specified database?
table_exists <- function(table_name, db_name) {
  tableExists(paste(db_name, ".", table_name, sep = ""))
}

# Does the specified column exist in the given DataFrame?
column_exists <- function(dataframe, column_name) {
  column_name %in% colnames(dataframe)
}

# How many rows are there for the specified value in the specified column
# in the given DataFrame?
num_rows_in_column_for_value <- function(dataframe, column_name, column_value) {
  df = filter(dataframe, dataframe[[column_name]] == column_value)

  count(df)
}

次の内容で myfunctions という名前の Scala ノートブックを作成します。この記事の他の例では、このノートブックの名前が myfunctionsと想定されています。自分のノートブックに異なる名前を使用できます。

Scala
import org.apache.spark.sql.DataFrame
import org.apache.spark.sql.functions.col

// Does the specified table exist in the specified database?
def tableExists(tableName: String, dbName: String) : Boolean = {
  return spark.catalog.tableExists(dbName + "." + tableName)
}

// Does the specified column exist in the given DataFrame?
def columnExists(dataFrame: DataFrame, columnName: String) : Boolean = {
  val nameOfColumn = null

  for(nameOfColumn <- dataFrame.columns) {
    if (nameOfColumn == columnName) {
      return true
    }
  }

  return false
}

// How many rows are there for the specified value in the specified column
// in the given DataFrame?
def numRowsInColumnForValue(dataFrame: DataFrame, columnName: String, columnValue: String) : Long = {
  val df = dataFrame.filter(col(columnName) === columnValue)

  return df.count()
}

次のコードでは、Databricks ワークスペースからアクセスできる main という名前のカタログ内に、default という名前のスキーマ内にサードパーティのサンプルデータセットダイヤモンドがあることを前提としています。使用するカタログまたはスキーマの名前が異なる場合は、次の USE ステートメントの 1 つまたは両方を一致するように変更します。

SQL ノートブックを作成し、この新しいノートブックに次の内容を追加します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して、指定したカタログとスキーマに次の SQL UDFs を追加します。

注記

SQL UDFs table_exists と column_exists は Unity Catalog でのみ機能します。Unity Catalog の SQL UDF サポートはパブリックプレビュー段階です。

SQL
USE CATALOG main;
USE SCHEMA default;

CREATE OR REPLACE FUNCTION table_exists(catalog_name STRING,
                                        db_name      STRING,
                                        table_name   STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.tables
     WHERE table_catalog = table_exists.catalog_name
       AND table_schema  = table_exists.db_name
       AND table_name    = table_exists.table_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION column_exists(catalog_name STRING,
                                         db_name      STRING,
                                         table_name   STRING,
                                         column_name  STRING)
  RETURNS BOOLEAN
  RETURN if(
    (SELECT count(*) FROM system.information_schema.columns
     WHERE table_catalog = column_exists.catalog_name
       AND table_schema  = column_exists.db_name
       AND table_name    = column_exists.table_name
       AND column_name   = column_exists.column_name) > 0,
    true,
    false
  );

CREATE OR REPLACE FUNCTION num_rows_for_clarity_in_diamonds(clarity_value STRING)
  RETURNS BIGINT
  RETURN SELECT count(*)
         FROM main.default.diamonds
         WHERE clarity = clarity_value

関数を呼び出す

このセクションでは、上記の関数を呼び出すコードについて説明します。たとえば、これらの関数を使用して、指定した列内に指定した値が存在するテーブル内の行数をカウントできます。ただし、先に進む前に、テーブルが実際に存在するかどうか、および列がそのテーブルに実際に存在するかどうかを確認する必要があります。次のコードでは、これらの条件を確認します。

前のセクションの関数を Databricks ワークスペースに追加した場合は、次のようにワークスペースからこれらの関数を呼び出すことができます。

Python
R
Scala
SQL

リポジトリ内の前の myfunctions.py ファイルと同じフォルダーに Python ノートブックを作成し、次の内容をノートブックに追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

from myfunctions import *

tableName   = "diamonds"
dbName      = "default"
columnName  = "clarity"
columnValue = "VVS2"

# If the table exists in the specified database...
if tableExists(tableName, dbName):

  df = spark.sql(f"SELECT * FROM {dbName}.{tableName}")

  # And the specified column exists in that table...
  if columnExists(df, columnName):
    # Then report the number of rows for the specified value in that column.
    numRows = numRowsInColumnForValue(df, columnName, columnValue)

    print(f"There are {numRows} rows in '{tableName}' where '{columnName}' equals '{columnValue}'.")
  else:
    print(f"Column '{columnName}' does not exist in table '{tableName}' in schema (database) '{dbName}'.")
else:
  print(f"Table '{tableName}' does not exist in schema (database) '{dbName}'.") 

リポジトリ内の前の myfunctions.r ファイルと同じフォルダーに R ノートブックを作成し、次の内容をノートブックに追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

library(SparkR)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "VVS2"

# If the table exists in the specified database...
if (table_exists(table_name, db_name)) {

  df = sql(paste("SELECT * FROM ", db_name, ".", table_name, sep = ""))

  # And the specified column exists in that table...
  if (column_exists(df, column_name)) {
    # Then report the number of rows for the specified value in that column.
    num_rows = num_rows_in_column_for_value(df, column_name, column_value)

    print(paste("There are ", num_rows, " rows in table '", table_name, "' where '", column_name, "' equals '", column_value, "'.", sep = "")) 
  } else {
    print(paste("Column '", column_name, "' does not exist in table '", table_name, "' in schema (database) '", db_name, "'.", sep = ""))
  }

} else {
  print(paste("Table '", table_name, "' does not exist in schema (database) '", db_name, "'.", sep = ""))
}

前の myfunctions Scala ノートブックと同じフォルダーに別の Scala ノートブックを作成し、この新しいノートブックに次の内容を追加します。

この新しいノートブックの最初のセルに、 %run マジックを呼び出す次のコードを追加します。この魔法により、 myfunctions ノートブックの内容を新しいノートブックで使用できるようになります。

%run ./myfunctions

この新しいノートブックの 2 番目のセルに、次のコードを追加します。必要に応じて、テーブル名、スキーマ (データベース) 名、カラム名、およびカラム値の変数値を変更します。次に、ノートブックをクラスターにアタッチし、ノートブックを実行して結果を確認します。

Scala
val tableName   = "diamonds"
val dbName      = "default"
val columnName  = "clarity"
val columnValue = "VVS2"

// If the table exists in the specified database...
if (tableExists(tableName, dbName)) {

  val df = spark.sql("SELECT * FROM " + dbName + "." + tableName)

  // And the specified column exists in that table...
  if (columnExists(df, columnName)) {
    // Then report the number of rows for the specified value in that column.
    val numRows = numRowsInColumnForValue(df, columnName, columnValue)

    println("There are " + numRows + " rows in '" + tableName + "' where '" + columnName + "' equals '" + columnValue + "'.")
  } else {
    println("Column '" + columnName + "' does not exist in table '" + tableName + "' in database '" + dbName + "'.")
  }

} else {
  println("Table '" + tableName + "' does not exist in database '" + dbName + "'.")
}

次のコードを、前のノートブックの新しいセルまたは別のノートブックのセルに追加します。必要に応じてスキーマ名またはカタログ名を自分の名前と一致するように変更し、このセルを実行して結果を確認します。

SQL
SELECT CASE
-- If the table exists in the specified catalog and schema...
WHEN
  table_exists("main", "default", "diamonds")
THEN
  -- And the specified column exists in that table...
  (SELECT CASE
   WHEN
     column_exists("main", "default", "diamonds", "clarity")
   THEN
     -- Then report the number of rows for the specified value in that column.
     printf("There are %d rows in table 'main.default.diamonds' where 'clarity' equals 'VVS2'.",
            num_rows_for_clarity_in_diamonds("VVS2"))
   ELSE
     printf("Column 'clarity' does not exist in table 'main.default.diamonds'.")
   END)
ELSE
  printf("Table 'main.default.diamonds' does not exist.")
END

単体テストの記述

このセクションでは、この記事の冒頭で説明した各関数をテストするコードについて説明します。将来、関数に変更を加えた場合は、単体テストを使用して、それらの関数が期待どおりに動作するかどうかを判断できます。

この記事の冒頭で関数を Databricks ワークスペースに追加した場合は、次のようにして、これらの関数の単体テストをワークスペースに追加できます。

Python
R
Scala
SQL

リポジトリ内の前の myfunctions.py ファイルと同じフォルダーに test_myfunctions.py という名前の別のファイルを作成し、次の内容をファイルに追加します。デフォルトでは、pytest 名前が test_ で始まる (または _testで終わる) .py ファイルを検索してテストします。同様に、デフォルトで、 pytest はこれらのファイルの内部で、名前がテストする test_ で始まる関数を検索します。

一般に、本番運用でデータを操作する関数に対してユニットテストを実行しないのがベストプラクティスです。これは、データを追加、削除、または変更する関数にとって特に重要です。ユニットテストによって予期せぬ形で本番運用データが損なわれるのを防ぐために、ユニット運用以外のデータに対してユニット運用を実施する必要があります。一般的なアプローチの1つは、本番運用データにできるだけ近い偽のデータを作成することです。次のコード例では、単体テストの実行対象となる偽のデータを作成します。

import pytest
import pyspark
from myfunctions import *
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, FloatType, StringType

tableName    = "diamonds"
dbName       = "default"
columnName   = "clarity"
columnValue  = "SI2"

# Because this file is not a Databricks notebook, you
# must create a Spark session. Databricks notebooks
# create a Spark session for you by default.
spark = SparkSession.builder \
                    .appName('integrity-tests') \
                    .getOrCreate()

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema = StructType([ \
  StructField("_c0",     IntegerType(), True), \
  StructField("carat",   FloatType(),   True), \
  StructField("cut",     StringType(),  True), \
  StructField("color",   StringType(),  True), \
  StructField("clarity", StringType(),  True), \
  StructField("depth",   FloatType(),   True), \
  StructField("table",   IntegerType(), True), \
  StructField("price",   IntegerType(), True), \
  StructField("x",       FloatType(),   True), \
  StructField("y",       FloatType(),   True), \
  StructField("z",       FloatType(),   True), \
])

data = [ (1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43 ), \
         (2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31 ) ]

df = spark.createDataFrame(data, schema)

# Does the table exist?
def test_tableExists():
  assert tableExists(tableName, dbName) is True

# Does the column exist?
def test_columnExists():
  assert columnExists(df, columnName) is True

# Is there at least one row for the value in the specified column?
def test_numRowsInColumnForValue():
  assert numRowsInColumnForValue(df, columnName, columnValue) > 0

リポジトリ内の前の myfunctions.r ファイルと同じフォルダーに test_myfunctions.r という名前の別のファイルを作成し、次の内容をファイルに追加します。デフォルトでは、testthat は、名前が test で始まる test .rファイルを探します。

library(testthat)
source("myfunctions.r")

table_name   <- "diamonds"
db_name      <- "default"
column_name  <- "clarity"
column_value <- "SI2"

# Create fake data for the unit tests to run against.
# In general, it is a best practice to not run unit tests
# against functions that work with data in production.
schema <- structType(
  structField("_c0",     "integer"),
  structField("carat",   "float"),
  structField("cut",     "string"),
  structField("color",   "string"),
  structField("clarity", "string"),
  structField("depth",   "float"),
  structField("table",   "integer"),
  structField("price",   "integer"),
  structField("x",       "float"),
  structField("y",       "float"),
  structField("z",       "float"))

data <- list(list(as.integer(1), 0.23, "Ideal",   "E", "SI2", 61.5, as.integer(55), as.integer(326), 3.95, 3.98, 2.43),
             list(as.integer(2), 0.21, "Premium", "E", "SI1", 59.8, as.integer(61), as.integer(326), 3.89, 3.84, 2.31))

df <- createDataFrame(data, schema)

# Does the table exist?
test_that ("The table exists.", {
  expect_true(table_exists(table_name, db_name))
})

# Does the column exist?
test_that ("The column exists in the table.", {
  expect_true(column_exists(df, column_name))
})

# Is there at least one row for the value in the specified column?
test_that ("There is at least one row in the query result.", {
  expect_true(num_rows_in_column_for_value(df, column_name, column_value) > 0)
})

前の myfunctions Scala ノートブックと同じフォルダーに別の Scala ノートブックを作成し、この新しいノートブックに次の内容を追加します。

新しいノートブックの最初のセルに、 %run マジックを呼び出す次のコードを追加します。この魔法により、 myfunctions ノートブックの内容を新しいノートブックで使用できるようになります。

%run ./myfunctions

2 番目のセルに、次のコードを追加します。このコードは、単体テストを定義し、その実行方法を指定します。

Scala
import org.scalatest._
import org.apache.spark.sql.types.{StructType, StructField, IntegerType, FloatType, StringType}
import scala.collection.JavaConverters._

class DataTests extends AsyncFunSuite {

  val tableName   = "diamonds"
  val dbName      = "default"
  val columnName  = "clarity"
  val columnValue = "SI2"

  // Create fake data for the unit tests to run against.
  // In general, it is a best practice to not run unit tests
  // against functions that work with data in production.
  val schema = StructType(Array(
                 StructField("_c0",     IntegerType),
                 StructField("carat",   FloatType),
                 StructField("cut",     StringType),
                 StructField("color",   StringType),
                 StructField("clarity", StringType),
                 StructField("depth",   FloatType),
                 StructField("table",   IntegerType),
                 StructField("price",   IntegerType),
                 StructField("x",       FloatType),
                 StructField("y",       FloatType),
                 StructField("z",       FloatType)
               ))

  val data = Seq(
                  Row(1, 0.23, "Ideal",   "E", "SI2", 61.5, 55, 326, 3.95, 3.98, 2.43),
                  Row(2, 0.21, "Premium", "E", "SI1", 59.8, 61, 326, 3.89, 3.84, 2.31)
                ).asJava

  val df = spark.createDataFrame(data, schema)

  // Does the table exist?
  test("The table exists") {
    assert(tableExists(tableName, dbName) == true)
  }

  // Does the column exist?
  test("The column exists") {
    assert(columnExists(df, columnName) == true)
  }

  // Is there at least one row for the value in the specified column?
  test("There is at least one matching row") {
    assert(numRowsInColumnForValue(df, columnName, columnValue) > 0)
  }
}

nocolor.nodurations.nostacks.stats.run(new DataTests)

注記

このコード例では、ScalaTest でのテストの FunSuite スタイルを使用しています。その他の使用可能なテストスタイルについては、プロジェクトのテストスタイルの選択を参照してください。

ユニット・テストを追加する前に、一般に、本番運用でデータを処理する関数に対してユニット・テストを実行しないのがベスト・プラクティスであることに注意してください。これは、データを追加、削除、または変更する関数にとって特に重要です。ユニットテストによって予期せぬ形で本番運用データが損なわれるのを防ぐために、ユニット運用以外のデータに対してユニット運用を実施する必要があります。一般的なアプローチの 1 つは、テーブルではなくビューに対して単体テストを実行することです。

ビューを作成するには、前のノートブックまたは別のノートブックの新しいセルから CREATE VIEW コマンドを呼び出すことができます。次の例では、mainという名前のカタログ内に default という名前のスキーマ (データベース) 内に diamonds という名前の既存のテーブルがあることを前提としています。必要に応じて、これらの名前を自分の名前と一致するように変更し、そのセルのみを実行します。

SQL
USE CATALOG main;
USE SCHEMA default;

CREATE VIEW view_diamonds AS
SELECT * FROM diamonds;

ビューを作成したら、次の各 SELECT ステートメントを、前のノートブックの独自の新しいセルに追加するか、別のノートブックの独自の新しいセルに追加します。必要に応じて、自分の名前と一致するように名前を変更します。

SQL
SELECT if(table_exists("main", "default", "view_diamonds"),
          printf("PASS: The table 'main.default.view_diamonds' exists."),
          printf("FAIL: The table 'main.default.view_diamonds' does not exist."));

SELECT if(column_exists("main", "default", "view_diamonds", "clarity"),
          printf("PASS: The column 'clarity' exists in the table 'main.default.view_diamonds'."),
          printf("FAIL: The column 'clarity' does not exists in the table 'main.default.view_diamonds'."));

SELECT if(num_rows_for_clarity_in_diamonds("VVS2") > 0,
          printf("PASS: The table 'main.default.view_diamonds' has at least one row where the column 'clarity' equals 'VVS2'."),
          printf("FAIL: The table 'main.default.view_diamonds' does not have at least one row where the column 'clarity' equals 'VVS2'."));

単体テストの実行

このセクションでは、前のセクションでコーディングした単体テストを実行する方法について説明します。単体テストを実行すると、成功した単体テストと失敗した単体テストの結果が表示されます。

前のセクションの単体テストを Databricks ワークスペースに追加した場合は、ワークスペースからこれらの単体テストを実行できます。これらの単体テストは、手動で実行することも、スケジュールに従って実行することもできます。

Python
R
Scala
SQL

リポジトリ内の前の test_myfunctions.py ファイルと同じフォルダーに Python ノートブックを作成し、次の内容を追加します。

新しいノートブックの最初のセルに次のコードを追加し、 %pip マジックを呼び出すセルを実行します。この魔法は pytestをインストールします。

%pip install pytest

2 番目のセルに次のコードを追加し、セルを実行します。結果には、成功した単体テストと失敗した単体テストが表示されます。

import pytest
import sys

# Skip writing pyc files on a readonly filesystem.
sys.dont_write_bytecode = True

# Run pytest.
retcode = pytest.main([".", "-v", "-p", "no:cacheprovider"])

# Fail the cell execution if there are any test failures.
assert retcode == 0, "The pytest invocation failed. See the log for details."

リポジトリ内の前の test_myfunctions.r ファイルと同じフォルダーに R ノートブックを作成し、次の内容を追加します。

最初のセルに次のコードを追加し、 install.packages 関数を呼び出すセルを実行します。この関数は、 testthatをインストールします。

R
install.packages("testthat")

2 番目のセルに次のコードを追加し、セルを実行します。結果には、成功した単体テストと失敗した単体テストが表示されます。

library(testthat)
source("myfunctions.r")

test_dir(".", reporter = "tap")

前のセクションのノートブックの 3 つのセルをそれぞれ実行します。結果には、各ユニットテストが成功したか失敗したかが表示されます。

単体テストの実行後にビューが不要になった場合は、ビューを削除できます。このビューを削除するには、上記のノートブックのいずれか内の新しいセルに次のコードを追加し、そのセルのみを実行します。

SQL
DROP VIEW view_diamonds;

ヒント

ノートブックの実行結果 (単体テストの結果を含む) は、クラスターのドライバーログで表示できます。クラスターのログ配信の場所を指定することもできます。

などの継続的インテグレーションと継続的デリバリーまたはデプロイメントCI/CD ()GitHub Actions システムを設定して、コードが変更されるたびにユニット・テストを自動的に実行できます。例については、「ノートブックのソフトウェアエンジニアリングのベストプラクティス」の GitHub Actions のカバレッジを参照してください。

関数と単体テストの整理​

書き込み関数​

関数を呼び出す​

単体テストの記述​

単体テストの実行​

追加のリソース​

pytest​

testthat​

ScalaTest​

SQL​

関数と単体テストの整理

書き込み関数

関数を呼び出す

単体テストの記述

単体テストの実行

追加のリソース

pytest

testthat

ScalaTest

SQL