FileStore
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.
FileStore is a special folder within DBFS where you can save files and have them accessible to your web browser. You can use FileStore to:
Save files, such as images and libraries, that are accessible within HTML and JavaScript when you call
displayHTML
.Save output files that you want to download to your local desktop.
Upload CSVs and other data files from your local desktop to process on Databricks.
When you use certain features, Databricks puts files in the following folders under FileStore:
/FileStore/jars
- contains uploaded legacy workspace libraries. If you delete files in this folder, libraries that reference these files in your workspace may no longer work./FileStore/tables
- contains the files that you import using the UI. If you delete files in this folder, tables that you created from these files may no longer be accessible.
Important
Libraries can be installed from DBFS when using Databricks Runtime 14.3 LTS and below. However, any workspace user can modify library files stored in DBFS. To improve the security of libraries in a Databricks workspace, storing library files in the DBFS root is deprecated and disabled by default in Databricks Runtime 15.1 and above. See Storing libraries in DBFS root is deprecated and disabled by default.
Instead, Databricks recommends uploading all libraries, including Python libraries, JAR files, and Spark connectors, to workspace files or Unity Catalog volumes, or using library package repositories. If your workload does not support these patterns, you can also use libraries stored in cloud object storage.
Save a file to FileStore
You can use dbutils.fs.put
to write arbitrary text files to the /FileStore
directory in DBFS:
dbutils.fs.put("/FileStore/my-stuff/my-file.txt", "This is the actual text that will be saved to disk. Like a 'Hello world!' example")
In the following, replace <databricks-instance>
with the workspace URL of your Databricks deployment.
Files stored in /FileStore
are accessible in your web browser at https://<databricks-instance>/files/
. For example, the file you stored in /FileStore/my-stuff/my-file.txt
is accessible at https://<databricks-instance>/files/my-stuff/my-file.txt
.
However, if there is ?o=
in the deployment URL, for example, https://<databricks-instance>/?o=6280049833385130
, replace https://<databricks-instance>/files/my-stuff/my-file.txt
with https://<databricks-instance>/files/my-stuff/my-file.txt?o=######
where the number after o=
is the same as in your URL.
Note
You can also use the DBFS file upload interfaces to put files in the /FileStore
directory. See Explore and create tables in DBFS.
Embed static images in notebooks
You can use the files/
location to embed static images into your notebooks:
displayHTML("<img src ='files/image.jpg'>")
or Markdown image import syntax:
%md
![my_test_image](files/image.jpg)
Example using Markdown
For example, suppose you have the Databricks logo image file in FileStore:
dbfs ls dbfs:/FileStore/
databricks-logo-mobile.png
When you include the following code in a Markdown cell the image is rendered in the cell:
Rendered example:
Example using DBFS API and requests Python HTTP library
You can upload static images using the DBFS API and the requests Python HTTP library. In the following example:
Replace
<databricks-instance>
with the workspace URL of your Databricks deployment.Replace
<token>
with the value of your personal access token.Replace
<image-dir>
with the location inFileStore
where you want to upload the image files.
Note
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
import requests
import json
import os
TOKEN = '<token>'
headers = {'Authorization': 'Bearer %s' % TOKEN}
url = "https://<databricks-instance>/api/2.0"
dbfs_dir = "dbfs:/FileStore/<image-dir>/"
def perform_query(path, headers, data={}):
session = requests.Session()
resp = session.request('POST', url + path, data=json.dumps(data), verify=True, headers=headers)
return resp.json()
def mkdirs(path, headers):
_data = {}
_data['path'] = path
return perform_query('/dbfs/mkdirs', headers=headers, data=_data)
def create(path, overwrite, headers):
_data = {}
_data['path'] = path
_data['overwrite'] = overwrite
return perform_query('/dbfs/create', headers=headers, data=_data)
def add_block(handle, data, headers):
_data = {}
_data['handle'] = handle
_data['data'] = data
return perform_query('/dbfs/add-block', headers=headers, data=_data)
def close(handle, headers):
_data = {}
_data['handle'] = handle
return perform_query('/dbfs/close', headers=headers, data=_data)
def put_file(src_path, dbfs_path, overwrite, headers):
handle = create(dbfs_path, overwrite, headers=headers)['handle']
print("Putting file: " + dbfs_path)
with open(src_path, 'rb') as local_file:
while True:
contents = local_file.read(2**20)
if len(contents) == 0:
break
add_block(handle, b64encode(contents).decode(), headers=headers)
close(handle, headers=headers)
mkdirs(path=dbfs_dir, headers=headers)
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
if ".png" in f:
target_path = dbfs_dir + f
resp = put_file(src_path=f, dbfs_path=target_path, overwrite=True, headers=headers)
if resp == None:
print("Success")
else:
print(resp)
Scale static images
To scale the size of an image that you have saved to DBFS, copy the image to /FileStore
and then resize using image parameters in displayHTML
:
dbutils.fs.cp('dbfs:/user/experimental/MyImage-1.png','dbfs:/FileStore/images/')
displayHTML('''<img src="files/images/MyImage-1.png" style="width:600px;height:600px;">''')