Skip to main content

Load data using COPY INTO with compute credentials

Databricks recommends using the COPY INTO command for incremental and bulk data loading with Databricks SQL.

note

COPY INTO works well for data sources that contain thousands of files. Databricks recommends that you use Auto Loader for loading millions of files, which is not supported in Databricks SQL.

Before you begin

Before you load data into Databricks, make sure you have the following:

Step 1: Confirm access to data in cloud storage

To confirm that you have access to the correct data in cloud object storage, do the following:

  1. In the sidebar, click Create > Query.

  2. In the SQL editor’s menu bar, select a SQL warehouse.

  3. In the SQL editor, paste the following code:

    select * from csv.<path>
  4. Click Run.

Step 2: Create a table

This step describes how to create a table in your Databricks workspace to hold the incoming data.

  1. In the SQL editor, paste the following code:

    SQL
    CREATE TABLE <catalog_name>.<schema_name>.<table_name> (
    tpep_pickup_datetime TIMESTAMP,
    tpep_dropoff_datetime TIMESTAMP,
    trip_distance DOUBLE,
    fare_amount DOUBLE,
    pickup_zip INT,
    dropoff_zip INT
    );
  2. Click Run.

Step 3: Load data from cloud storage into the table

Clean up

You can clean up the associated resources in your workspace if you no longer want to keep them.

Delete the tables

  1. In the sidebar, click Create > Query.

  2. Select a SQL warehouse and make sure that the SQL warehouse is running.

  3. Paste the following code:

    SQL
    DROP TABLE <catalog-name>.<schema-name>.<table-name>;
  4. Click Run.

  5. Hover over the tab for this query, and then click the X icon.

Delete the queries in the SQL editor

  1. In the sidebar, click SQL Editor.
  2. In the SQL editor’s menu bar, hover over the tab for each query that you created for this tutorial, and then click the X icon.

Additional resources