Build a wide-and-deep model in a recommender system

Using example notebooks, this article walks you through building a recommender system with a wide-and-deep model. Building a machine learning pipeline of a wide-and-deep recommender system involves the stages shown in this diagram:

Workflow for a wide-and-deep recommender

This reference solution covers the stages shown in blue:

  • Model training and evaluation

  • Model export and version management

  • Batch model inference

  • Online model serving

For information on the steps not covered, see Project stages not covered.

What is a wide-and-deep model?

An effective choice for a recommender system, a wide-and-deep model combines a linear model with the capabilities of a deep learning model. The linear model analyzes historical data related to customer choices, while the deep learning capabilities generalize to extend the choices of relevant recommendations.

To learn more, see this academic paper: Wide & Deep Learning for Recommender Systems.

Databricks tools highlights

The notebook covers several tools provided on Databricks that simplify building a machine learning pipeline:

  1. SparkDatasetConverter

  2. MLflow model registry

  3. MLflow model serving

Notebook describing user dataset

The dataset used in this notebook consists of the following Delta tables:

  • user_profile: contains the user_id values and their static profiles

  • item_profile: contains the item_id values and their static profiles

  • user_item_interaction: contains events where a user interacts with an item. This table is randomly split into three Delta tables to build and evaluate the model: train, validation, and test.

This data format is common for recommendation problems. Some examples are:

  • For ad recommenders, the items are ads and the user-item interactions are records of users clicking the ads.

  • For online shopping recommenders, the items are products and the user-item interactions are records of users reviewing or order history.

When you adapt this notebook to your dataset, you only need to save your data in the Delta tables and provide the table names and locations. The code for loading data can mostly be reused.

See the dataset generation notebook for details.

Generate and save the dataset notebook

Open notebook in new tab

Notebook example: wide-and-deep model

A wide-and-deep model combines a wide linear model with a deep neural network to handle memorization and generalization required for good recommendations.

This model is just one example among many deep learning models for the recommender problem or for any machine learning pipelines in general. The focus here is showing how to build the workflow. You can swap in different models for your own use case and tune the model for better evaluation metrics.

Build and serve a wide-and-deep model in a recommender system notebook

Open notebook in new tab

Project stages not covered

To keep the notebook focused on showing how to implement a recommender system, the following stages are not covered. These stages are shown as gray blocks in the workflow diagram.

  1. Data collection and exploratory data analysis. See Run your first ETL workload on Databricks.

  2. Feature engineering. Feature engineering is an important part of a recommender system, and much information is available on this topic. This notebook assumes that you have a curated dataset containing user-item interactions. For details about the dataset used in this notebook, see Notebook describing user dataset. For more information about feature engineering, see the following resources:

  3. Model tuning. Model tuning involves revising the code of the existing pipeline, including feature engineering, model structure, model hyperparameters, or even updating the data collection stage, to improve the model’s performance. For more information about tools for model tuning on Databricks, see Hyperparameter tuning.