Model inference using TensorFlow and TensorRT
The example notebook in this article demonstrates the Databricks recommended deep learning inference workflow with TensorFlow and TensorFlowRT. This example shows how to optimize a trained ResNet-50 model with TensorRT for model inference.
NVIDIA TensorRT is a high-performance inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT is installed in the GPU-enabled version of Databricks Runtime for Machine Learning.
Databricks recommends you use the G4 instance type series, which is optimized for deploying machine learning models in production.