Load data using COPY INTO with an instance profile
This article describes how to use the COPY INTO command to load data from an Amazon S3 bucket in your AWS account into a table in Databricks SQL.
The steps in this article assume that your admin has configured a SQL warehouse to use an AWS instance profile so that you can access your source files in S3. If your admin configured a Unity Catalog external location with a storage credential, see Load data using COPY INTO with Unity Catalog volumes or external locations instead. If your admin gave you temporary credentials (an AWS access key ID, a secret key, and a session token), see Load data using COPY INTO with temporary credentials instead.
Databricks recommends using the COPY INTO command for incremental and bulk data loading with Databricks SQL.
COPY INTO works well for data sources that contain thousands of files. Databricks recommends that you use Auto Loader for loading millions of files, which is not supported in Databricks SQL.
Before you begin
Before you load data into Databricks, make sure you have the following:
- Access to data in S3. Your admin must first complete the steps in Configure data access for ingestion so your Databricks SQL warehouse can read your source files.
- A Databricks SQL warehouse that uses the instance profile that your admin created.
- The Can manage permission on the SQL warehouse.
- The fully qualified S3 URI.
- Familiarity with the Databricks SQL user interface.
Step 1: Confirm access to data in cloud storage
To confirm that you have access to the correct data in cloud object storage, do the following:
- 
In the sidebar, click Create > Query. 
- 
In the SQL editor's menu bar, select a SQL warehouse. 
- 
In the SQL editor, paste the following code: select * from csv.<path>Replace <path>with the S3 URI that you received from your admin. For example,s3://<bucket>/<folder>/.
- 
Click Run. 
Step 2: Create a table
This step describes how to create a table in your Databricks workspace to hold the incoming data.
- 
In the SQL editor, paste the following code: SQLCREATE TABLE <catalog_name>.<schema_name>.<table_name> (
 tpep_pickup_datetime TIMESTAMP,
 tpep_dropoff_datetime TIMESTAMP,
 trip_distance DOUBLE,
 fare_amount DOUBLE,
 pickup_zip INT,
 dropoff_zip INT
 );
- 
Click Run. 
Step 3: Load data from cloud storage into the table
This step describes how to load data from an S3 bucket into the table you created in your Databricks workspace.
- 
In the sidebar, click Create > Query. 
- 
In the SQL editor's menu bar, select a SQL warehouse and make sure the SQL warehouse is running. 
- 
In the SQL editor, paste the following code. In this code, replace: - <s3-bucket>with the name of your S3 bucket.
- <folder>with the name of the folder in your S3 bucket.
 SQLCOPY INTO <catalog-name>.<schema-name>.<table-name>
 FROM 's3://<s3-bucket>/<folder>/'
 FILEFORMAT = CSV
 FORMAT_OPTIONS (
 'header' = 'true',
 'inferSchema' = 'true'
 )
 COPY_OPTIONS (
 'mergeSchema' = 'true'
 );
 SELECT * FROM <catalog_name>.<schema_name>.<table_name>;noteFORMAT_OPTIONSdiffers depending onFILEFORMAT. In this case, theheaderoption instructs Databricks to treat the first row of the CSV file as a header, and theinferSchemaoptions instructs Databricks to automatically determine the data type of each field in the CSV file.
- 
Click Run. noteIf you click Run again, no new data is loaded into the table. This is because the COPY INTOcommand only processes what it considers to be new data.
Clean up
You can clean up the associated resources in your workspace if you no longer want to keep them.
Delete the tables
- 
In the sidebar, click Create > Query. 
- 
Select a SQL warehouse and make sure that the SQL warehouse is running. 
- 
Paste the following code: SQLDROP TABLE <catalog-name>.<schema-name>.<table-name>;
- 
Click Run. 
- 
Hover over the tab for this query, and then click the X icon. 
Delete the queries in the SQL editor
- In the sidebar, click SQL Editor.
- In the SQL editor's menu bar, hover over the tab for each query that you created for this tutorial, and then click the X icon.
Additional resources
- The COPY INTO reference article