Skip to main content

Model inference using TensorFlow and TensorRT

The example notebook in this article demonstrates the Databricks recommended deep learning inference workflow with TensorFlow and TensorFlowRT. This example shows how to optimize a trained ResNet-50 model with TensorRT for model inference.

NVIDIA TensorRT is a high-performance inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT is installed in the GPU-enabled version of Databricks Runtime for Machine Learning.

Databricks recommends you use the G4 instance type series, which is optimized for deploying machine learning models in production.

Model inference TensorFlow-TensorRT notebook

Open notebook in new tab