renv on Databricks

renv is an R package that lets users manage R dependencies specific to the notebook.

Using renv, you can create and manage the R library environment for your project, save the state of these libraries to a lockfile, and later restore libraries as required. Together, these tools can help make projects more isolated, portable, and reproducible.

Basic renv workflow

Install renv

You can install renv as a cluster-scoped library or as a notebook-scoped library. To install renv as a notebook-scoped library, use:

require(devtools)

install_version(
  package = "renv",
  repos   = "http://cran.us.r-project.org"
)

Databricks recommends using a CRAN snapshot as the repository to fix the package version.

Initialize renv session with pre-installed R libraries

The first step when using renv is to initialize a session using renv::init(). Set libPaths to change the default download location to be your R notebook-scoped library path.

renv::init(settings = list(external.libraries=.libPaths()))
.libPaths(c(.libPaths()[2], .libPaths())

Use renv to install additional packages

You can now use renv’s API to install and remove R packages. For example, to install the latest version of digest, run the following inside of a notebook cell.

renv::install("digest")

To install an old version of digest, run the following inside of a notebook cell.

renv::install("digest@0.6.18")

To install digest from GitHub, run the following inside of a notebook cell.

renv::install("eddelbuettel/digest")

To install a package from Bioconductor, run the following inside of a notebook cell.

# (note: requires the BiocManager package)
renv::install("bioc::Biobase")

Note that the renv::install API uses the renv Cache.

Use renv to save your R notebook environment to DBFS

Run the following command once before saving the environment.

renv::settings$snapshot.type("all")

This sets renv to snapshot all packages that are installed into libPaths, not just the ones that are currently used in the notebook. See renv documentation for more information.

Now you can run the following inside of a notebook cell to save the current state of your environment.

renv::snapshot(lockfile="/dbfs/PATH/TO/WHERE/YOU/WANT/TO/SAVE/renv.lock", force=TRUE)

This updates the lockfile by capturing all packages installed on libPaths. It also moves your lockfile from the local filesystem to DBFS, where it persists even if your cluster terminates or restarts.

Reinstall a renv environment given a lockfile from DBFS

First, make sure that your new cluster is running an identical Databricks Runtime version as the one you first created the renv environment on. This ensures that the pre-installed R packages are identical. You can find a list of these in each runtime’s release notes. After you Install renv, run the following inside of a notebook cell.

renv::init(settings = list(external.libraries=.libPaths()))
.libPaths(c(.libPaths()[2], .libPaths()))
renv::restore(lockfile="/dbfs/PATH/TO/WHERE/YOU/SAVED/renv.lock", exclude=c("Rserve", "SparkR"))

This copies your lockfile from DBFS into the local file system and then restores any packages specified in the lockfile.

Note

To avoid missing repository errors, exclude the Rserve and SparkR packages from package restoration. Both of these packages are pre-installed in all runtimes.

renv Cache

A very useful feature of renv is its global package cache, which is shared across all renv projects on the cluster. It speeds up installation times and saves disk space. The renv cache does not cache packages downloaded via the devtools API or install.packages() with any additional arguments other than pkgs.