COPY INTO (Delta Lake on Databricks)

Preview

This feature is in Public Preview.

Loads data from a file location into a Delta table. This is a re-triable and idempotent operation—files in the source location that have already been loaded are skipped.

Syntax

COPY INTO table_identifier
  FROM [ file_location | (SELECT identifier_list FROM file_location) ]
  FILEFORMAT = data_source
  [FILES = [file_name, ... | PATTERN = 'regex_pattern']
  [FORMAT_OPTIONS ('data_source_reader_option' = 'value', ...)]
  [COPY_OPTIONS 'force' = ('false'|'true')]
  • table_identifier

    • [database_name.] table_name: A table name, optionally qualified with a database name.
    • delta.`<path-to-table>` : The location of an existing Delta table.
  • FROM file_location

    The file location to load the data from. Files in this location must have the format specified in FILEFORMAT.

  • SELECT identifier_list

    Selects the specified columns or expressions from the source data before copying into the Delta table.

  • FILEFORMAT = data_source

    The format of the source files to load. One of CSV, JSON, AVRO, ORC, PARQUET.

  • FILES

    A list of file names to load, with length up to 1000. Cannot be specified with PATTERN.

  • PATTERN

    A regex pattern that identifies the files to load from the source directory. Cannot be specified with FILES.

  • FORMAT_OPTIONS

    Options to be passed to the Apache Spark data source reader for the specified format.

  • COPY_OPTIONS

    Options to control the operation of the COPY INTO command. The only option is 'force'; if set to 'true', idempotency is disabled and files are loaded regardless of whether they’ve been loaded before.

Examples

COPY INTO delta.`target_path`
  FROM (SELECT key, index, textData, 'constant_value' FROM 'source_path')
  FILEFORMAT = CSV
  PATTERN = 'folder1/file_[a-g].csv'
  FORMAT_OPTIONS('header' = 'true')