If your Databricks cluster or SQL warehouse doesn’t have permissions to read your source files, you can use temporary credentials to access data from external cloud object storage and load files into a Delta Lake table.
Depending on how your organization manages your cloud security, you may need to ask a cloud administrator or power user to provide you with credentials.
Credential and encryption options are available in Databricks Runtime 10.2 and above.
COPY INTO supports:
Azure SAS tokens to read data from ADLS Gen2 and Azure Blob Storage. Azure Blob Storage temporary tokens are at the container level, whereas ADLS Gen2 tokens can be at the directory level in addition to the container level. Databricks recommends using directory level SAS tokens when possible. The SAS token must have “Read”, “List”, and “Permissions” permissions.
AWS STS tokens to read data from AWS S3. Your tokens should have the “s3:GetObject*”, “s3:ListBucket”, and “s3:GetBucketLocation” permissions.
To avoid misuse or exposure of temporary credentials, Databricks recommends that you set expiration horizons that are just long enough to complete the task.
COPY INTO supports loading encrypted data from AWS S3. To load encrypted data, provide the type of encryption and the key to decrypt the data.
The following example loads data from S3 and ADLS Gen2 using temporary credentials to provide access to the source data.
COPY INTO my_json_data FROM 's3://my-bucket/jsonData' WITH ( CREDENTIAL (AWS_ACCESS_KEY = '...', AWS_SECRET_KEY = '...', AWS_SESSION_TOKEN = '...') ) FILEFORMAT = JSON COPY INTO my_json_data FROM 'abfss://container@storageAccount.dfs.core.windows.net/jsonData' WITH ( CREDENTIAL (AZURE_SAS_TOKEN = '...') ) FILEFORMAT = JSON
Using customer-provided encryption keys, the following example loads data from S3.
COPY INTO my_json_data FROM 's3://my-bucket/jsonData' WITH ( ENCRYPTION (TYPE = 'AWS_SSE_C', MASTER_KEY = '...') ) FILEFORMAT = JSON
The following example loads JSON data from a file on AWS S3 into the external Delta table called
This table must be created before
COPY INTO can be executed.
The command uses one existing credential to write to external Delta table and another to read from the S3 location.
COPY INTO my_json_data WITH (CREDENTIAL target_credential) FROM 's3://my-bucket/jsonData' WITH (CREDENTIAL source_credential) FILEFORMAT = JSON FILES = ('f.json')