Distributed Deep Learning with dist-keras¶
dist-keras is an open-source framework for distributed training of Keras models (deep neural networks). It leverages Apache Spark to distribute and coordinate the training computation, and runs training directly on data in Spark DataFrames. dist-keras provides a built-in set of optimization strategies, such as Downpour and Dynamic SGD. To learn more about the available optimization strategies, see the dist-keras README.
For single-machine training, see the Keras guide. For inference, we recommend that you use Deep Learning Pipelines, which leverages Spark to efficiently perform large-scale batch inference for Keras and TensorFlow models.
Installing dist-keras is a two step process:
- Specify network configurations on the driver and Spark workers.
- Install the dist-keras library.
Specifying network configuration¶
For both CPU- and GPU-enabled clusters, dist-keras requires that you specify additional networking configurations on the driver and Spark workers before you install the library itself. We recommend that you set this configuration through an init script. The notebook below demonstrates how:
Installing on CPU clusters¶
On CPU-only clusters, simply attach dist-keras to your cluster as a PyPi Library.
Installing on GPU clusters¶
When you use dist-keras on GPU-enabled clusters, you should leverage the
library to take advantage of GPU acceleration. However, dist-keras depends on the CPU-only build
of TensorFlow by default. Therefore we recommend that you build dist-keras as an egg modified to depend upon
tensorflow-gpu and attach it to your cluster as a library:
- Clone dist-keras to your local machine (
git clone github.com/cerndb/dist-keras).
setup.py(Python file in the root project directory).
- Modify the
install_requireskeyword argument (should be an array of Python dependencies); specifically, replace
- From the root project directory, run
python setup.py bdist_eggto build dist-keras as an egg.
- Upload the egg file (the only file in
./dist; for example,
./dist/dist_keras-0.2.1-py2.7.egg) to Databricks and attach it to your cluster.
- Install the following additional Python dependencies as PyPi Libraries:
The example notebook below has been tested on GPU-enabled clusters using an egg built from commit
04cf7767e636cf614ea1fdb98753fe79647f81db of dist-keras.