Snowflake is a cloud-based SQL data warehouse that focuses on great performance, zero-tuning, diversity of data sources, and security. This article explains how to read data from and write data to Snowflake using the Databricks Snowflake connector.
Databricks and Snowflake have partnered to bring a first-class connector experience for customers of both Databricks and Snowflake, saving you from having to import and load libraries into your clusters, and therefore preventing version conflicts and misconfiguration.
The following notebooks provide simple examples of how to write data to and read data from Snowflake. See Using the Spark Connector for more details. In particular, see Setting Configuration Options for the Connector for all configuration options.
Avoid exposing your Snowflake username and password in notebooks by using Secrets, which are demonstrated in the notebooks.
The following notebook walks through best practices for using the Snowflake Connector for Spark. It writes data to Snowflake, uses Snowflake for some basic data manipulation, trains a machine learning model in Databricks, and writes the results back to Snowflake.
Why don’t my Spark DataFrame columns appear in the same order in Snowflake?
The Snowflake Connector for Spark doesn’t respect the order of the columns in the table being written to; you must explicitly specify the mapping between DataFrame and Snowflake columns. To specify this mapping, use the columnmap parameter.
INTEGER data written to Snowflake always read back as
Snowflake represents all
INTEGER types as
NUMBER, which can cause a change in data type when you write data to and read data from Snowflake. For example,
INTEGER data can be converted to
DECIMAL when writing to Snowflake, because
DECIMAL are semantically equivalent in Snowflake (see Snowflake Numeric Data Types).
Why are the fields in my Snowflake table schema always uppercase?
Snowflake uses uppercase fields by default, which means that the table schema is converted to uppercase.