Snowflake

Snowflake is a cloud-based SQL data warehouse that focuses on great performance, zero-tuning, diversity of data sources, and security. This article explains how to read data from and write data to Snowflake using the Databricks Snowflake connector.

Databricks and Snowflake have partnered to bring a first-class connector experience for customers of both Databricks and Snowflake, saving you from having to import and load libraries into your clusters, and therefore preventing version conflicts and misconfiguration.

Requirements

Databricks Runtime 4.2 or above.

Snowflake Connector for Spark notebooks

The following notebooks provide simple examples of how to write data to and read data from Snowflake. See Using the Spark Connector for more details. In particular, see Setting Configuration Options for the Connector for all configuration options.

Tip

Avoid exposing your Snowflake username and password in notebooks by using Secrets, which are demonstrated in the notebooks.

Train a machine learning model and save results to Snowflake

The following notebook walks through best practices for using the Snowflake Connector for Spark. It writes data to Snowflake, uses Snowflake for some basic data manipulation, trains a machine learning model in Databricks, and writes the results back to Snowflake.

Store ML training results in Snowflake notebook

Open notebook in new tab

Frequently asked questions (FAQ)

Why don’t my Spark DataFrame columns appear in the same order in Snowflake?

The Snowflake Connector for Spark doesn’t respect the order of the columns in the table being written to; you must explicitly specify the mapping between DataFrame and Snowflake columns. To specify this mapping, use the columnmap parameter.

Why is INTEGER data written to Snowflake always read back as DECIMAL?

Snowflake represents all INTEGER types as NUMBER, which can cause a change in data type when you write data to and read data from Snowflake. For example, INTEGER data can be converted to DECIMAL when writing to Snowflake, because INTEGER and DECIMAL are semantically equivalent in Snowflake (see Snowflake Numeric Data Types).

Why are the fields in my Snowflake table schema always uppercase?

Snowflake uses uppercase fields by default, which means that the table schema is converted to uppercase.