NVIDIA TensorRT is a high-performance inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT is installed in the GPU-enabled version of Databricks Runtime 7.0 (Unsupported) and above.
The following notebook demonstrates the Databricks recommended deep learning inference workflow. This example shows how to optimize a trained ResNet-50 model with TensorRT for model inference.
Databricks recommends that you use the G4 instance type series, which is optimized for deploying machine learning models in production.