Model inference using TensorFlow and TensorRT

NVIDIA TensorRT is a high-performance inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT is installed in the GPU-enabled version of Databricks Runtime 7.0 and above.

The following notebook demonstrates the Databricks recommended deep learning inference workflow. This example shows how to optimize a trained ResNet-50 model with TensorRT for model inference.

Databricks recommends that you use the G4 instance type series, which is optimized for deploying machine learning models in production.

Model inference Tensorflow-TensorRT notebook

Open notebook in new tab