Copy Into (Delta Lake on Databricks)

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported. See COPY INTO.

COPY INTO table_identifier
  FROM [ file_location | (SELECT identifier_list FROM file_location) ]
  FILEFORMAT = data_source
  [FILES = [file_name, ... | PATTERN = 'regex_pattern']
  [FORMAT_OPTIONS ('data_source_reader_option' = 'value', ...)]
  [COPY_OPTIONS 'force' = ('false'|'true')]

Load data from a file location into a Delta table. This is a re-triable and idempotent operation—files in the source location that have already been loaded are skipped.

table_identifier

The Delta table to copy into.

FROM file_location

The file location to load the data from. Files in this location must have the format specified in FILEFORMAT.

SELECT identifier_list

Selects the specified columns or expressions from the source data before copying into the Delta table.

FILEFORMAT = data_source

The format of the source files to load. One of CSV, JSON, AVRO, ORC, PARQUET.

FILES

A list of file names to load, with length up to 1000. Cannot be specified with PATTERN.

PATTERN

A regex pattern that identifies the files to load from the source directory. Cannot be specified with FILES.

FORMAT_OPTIONS

Options to be passed to the Apache Spark data source reader for the specified format.

COPY_OPTIONS

Options to control the operation of the COPY INTO command. The only option is 'force'; if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.

Examples

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')