FileStore

FileStore is a special folder within Databricks File System (DBFS) where you can save files and have them accessible to your web browser. You can use FileStore to:

  • Save files, such as images and libraries, that are accessible within HTML and JavaScript when you call displayHTML.
  • Save output files that you want to download to your local desktop.
  • Upload CSVs and other data files from your local desktop to process on Databricks.

When you use certain features, Databricks puts files in the following folders under FileStore:

  • /FileStore/jars - contains libraries that you upload. If you delete files in this folder, libraries that reference these files in your workspace may no longer work.
  • /FileStore/tables - contains the files that you import using the UI. If you delete files in this folder, tables that you created from these files may no longer be accessible.
  • /FileStore/plots - contains images created in notebooks when you call display() on a Python or R plot object, such as a ggplot or matplotlib plot. If you delete files in this folder, you may have to regenerate those plots in the notebooks that reference them. See Matplotlib and ggplot2for more information.
  • /FileStore/import-stage - contains temporary files created when you import notebooks or Databricks archives files. These temporary files disappear after the notebook import completes.

Save a file to FileStore

To save a file to FileStore, put it in the /FileStore directory in DBFS:

dbutils.fs.put("/FileStore/my-stuff/my-file.txt", "Contents of my file")

In the following, replace <databricks-instance> with the workspace URL of your Databricks deployment.

Files stored in /FileStore are accessible in your web browser at https://<databricks-instance>/files/. For example, the file you stored in /FileStore/my-stuff/my-file.txt is accessible at https://<databricks-instance>/files/my-stuff/my-file.txt.

However, if there is ?o= in the deployment URL, for example, https://<databricks-instance>/?o=6280049833385130, replace https://<databricks-instance>/files/my-stuff/my-file.txt with https://<databricks-instance>/files/my-stuff/my-file.txt?o=###### where the number after o= is the same as in your URL.

Embed static images in notebooks

You can use the files/ location to embed static images into your notebooks:

displayHTML("<img src ='files/image.jpg/'>")

or Markdown image import syntax:

%md
![my_test_image](files/image.jpg)

You can upload static images using the DBFS API reference and the requests Python HTTP library. In the following example:

  • Replace <databricks-instance> with the workspace URL of your Databricks deployment.
  • Replace <token> with the value of your personal access token.
  • Replace <image-dir> with the location in FileStore where you want to upload the image files.
import requests
import json
import os

TOKEN = '<token>'
headers = {'Authorization': 'Bearer %s' % TOKEN}
url = "https://<databricks-instance>/api/2.0"
dbfs_dir = "dbfs:/FileStore/<image-dir>/"

def perform_query(path, headers, data={}):
  session = requests.Session()
  resp = session.request('POST', url + path, data=json.dumps(data), verify=True, headers=headers)
  return resp.json()

def mkdirs(path, headers):
  _data = {}
  _data['path'] = path
  return perform_query('/dbfs/mkdirs', headers=headers, data=_data)

def create(path, overwrite, headers):
  _data = {}
  _data['path'] = path
  _data['overwrite'] = overwrite
  return perform_query('/dbfs/create', headers=headers, data=_data)

def add_block(handle, data, headers):
  _data = {}
  _data['handle'] = handle
  _data['data'] = data
  return perform_query('/dbfs/add-block', headers=headers, data=_data)

def close(handle, headers):
  _data = {}
  _data['handle'] = handle
  return perform_query('/dbfs/close', headers=headers, data=_data)

def put_file(src_path, dbfs_path, overwrite, headers):
  handle = create(dbfs_path, overwrite, headers=headers)['handle']
  print("Putting file: " + dbfs_path)
  with open(src_path, 'rb') as local_file:
    while True:
      contents = local_file.read(2**20)
      if len(contents) == 0:
        break
      add_block(handle, b64encode(contents).decode(), headers=headers)
    close(handle, headers=headers)

mkdirs(path=dbfs_dir, headers=headers)
files = [f for f in os.listdir('.') if os.path.isfile(f)]
for f in files:
  if ".png" in f:
    target_path = dbfs_dir + f
    resp = put_file(src_path=f, dbfs_path=target_path, overwrite=True, headers=headers)
    if resp == None:
      print("Success")
    else:
      print(resp)

Scale static images

To scale the size of an image that you have saved to DBFS, copy the image to /FileStore and then resize using image parameters in displayHTML:

dbutils.fs.cp('dbfs:/user/experimental/MyImage-1.png','dbfs:/FileStore/images/')
displayHTML('''<img src="files/images/MyImage-1.png" style="width:600px;height:600px;">''')

Use a Javascript library

This notebook shows how to use FileStore to contain a JavaScript library.

FileStore demo notebook

Open notebook in new tab