Due to licensing restrictions, the LZO compression codec is not available by default on Databricks clusters. To read an LZO compressed file, you must use an init script to install the codec on your cluster at launch time.
This article includes two notebooks:
- Init LZO compressed files
- Builds the LZO codec.
- Creates an init script that:
- Installs the LZO compression libraries and the
lzopcommand, and copies the LZO codec to proper class path.
- Configures Spark to use the LZO compression codec.
- Read LZO compressed files - Uses the codec installed by the init script.