Copy Into (Delta Lake on Databricks)

Preview

This feature is in Public Preview.

COPY INTO table_identifier
  FROM [ file_location | (SELECT identifier_list FROM file_location) ]
  FILEFORMAT = data_source
  [FILES = [file_name, ... | PATTERN = 'regex_pattern']
  [FORMAT_OPTIONS ('data_source_reader_option' = 'value', ...)]
  [COPY_OPTIONS 'force' = ('false'|'true')]

Load data from a file location into a Delta table. This is a re-triable and idempotent operation—files in the source location that have already been loaded are skipped.

table_identifier

The Delta table to copy into.

FROM file_location

The file location to load the data from. Files in this location must have the format specified in FILEFORMAT.

SELECT identifier_list

Selects the specified columns or expressions from the source data before copying into the Delta table.

FILEFORMAT = data_source

The format of the source files to load. One of CSV, JSON, AVRO, ORC, PARQUET.

FILES

A list of file names to load, with length up to 1000. Cannot be specified with PATTERN.

PATTERN

A regex pattern that identifies the files to load from the source directory. Cannot be specified with FILES.

FORMAT_OPTIONS

Options to be passed to the Apache Spark data source reader for the specified format.

COPY_OPTIONS

Options to control the operation of the COPY INTO command. The only option is 'force'; if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.

Examples

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')