dist-keras is an open-source framework for distributed training of Keras models (deep neural networks). It leverages Apache Spark to distribute and coordinate the training computation, and runs training directly on data in Spark DataFrames. dist-keras provides a built-in set of optimization strategies, such as Downpour and Dynamic SGD. To learn more about the available optimization strategies, see the dist-keras README.
For single-machine training, see the Keras guide. For inference, we recommend that you use Deep Learning Pipelines, which leverages Spark to efficiently perform large-scale batch inference for Keras and TensorFlow models.
On CPU-only clusters, attach
tensorflow to your cluster as a PyPi library.
When you use
dist-keras on GPU-enabled clusters, you should leverage the
library to take advantage of GPU acceleration. However,
dist-keras depends on the CPU-only build
of TensorFlow by default. Therefore we recommend that you build
dist-keras as an egg modified to depend upon
tensorflow-gpu and attach it to your cluster as a library:
- Clone dist-keras to your local machine (
git clone https://github.com/cerndb/dist-keras).
setup.py(Python file in the root project directory).
- Modify the
install_requireskeyword argument (should be an array of Python dependencies); specifically, replace
- From the root project directory, run
python setup.py bdist_eggto build dist-keras as an egg.
- Upload the egg file (the only file in
./dist; for example,
./dist/dist_keras-0.2.1-py2.7.egg) to Databricks and attach it to your cluster.
- Install the following additional Python dependencies as PyPi Libraries:
The following example notebook describes how the library works and demonstrates various training
workflows. The notebook has been tested on GPU-enabled clusters using an egg built from
04cf7767e636cf614ea1fdb98753fe79647f81db of dist-keras.