Model inference using TensorFlow and TensorRT

The example notebook in this article demonstrates the Databricks recommended deep learning inference workflow with TensorFlow and TensorFlowRT. This example shows how to optimize a trained ResNet-50 model with TensorRT for model inference.

NVIDIA TensorRT is a high-performance inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT is installed in the GPU-enabled version of Databricks Runtime for Machine Learning.

Databricks recommends you use the G4 instance type series, which is optimized for deploying machine learning models in production.

Model inference TensorFlow-TensorRT notebook

Open notebook in new tab