Using example notebooks, this article walks you through building a recommender system with a wide-and-deep model. Building a machine learning pipeline of a wide-and-deep recommender system involves the stages shown in this diagram:
This reference solution covers the stages shown in blue:
Model training and evaluation
Model export and version management
Batch model inference
Online model serving
For information on the steps not covered, see Project stages not covered.
An effective choice for a recommender system, a wide-and-deep model combines a linear model with the capabilities of a deep learning model. The linear model analyzes historical data related to customer choices, while the deep learning capabilities generalize to extend the choices of relevant recommendations.
To learn more, see:
The notebook covers several tools provided on Databricks that simplify building a machine learning pipeline:
The dataset used in this notebook consists of the following Delta tables:
user_profile: contains the
user_idvalues and their static profiles
item_profile: contains the
item_idvalues and their static profiles
user_item_interaction: contains events where a user interacts with an item. This table is randomly split into three Delta tables to build and evaluate the model:
This data format is common for recommendation problems. Some examples are:
For ad recommenders, the items are ads and the user-item interactions are records of users clicking the ads.
For online shopping recommenders, the items are products and the user-item interactions are records of users reviewing or order history.
When you adapt this notebook to your dataset, you only need to save your data in the Delta tables and provide the table names and locations. The code for loading data can mostly be reused.
See the dataset generation notebook for details.
A wide-and-deep model combines a wide linear model with a deep neural network to handle memorization and generalization required for good recommendations.
This model is just one example among many deep learning models for the recommender problem or for any machine learning pipelines in general. The focus here is showing how to build the workflow. You can swap in different models for your own use case and tune the model for better evaluation metrics.
To keep the notebook focused on showing how to implement a recommender system, the following stages are not covered. These stages are shown as gray blocks in the workflow diagram.
Data collection and exploratory data analysis. See Run your first ETL workload on Databricks.
Feature engineering. Feature engineering is an important part of a recommender system, and much information is available on this topic. This notebook assumes that you have a curated dataset containing user-item interactions. For details about the dataset used in this notebook, see Notebook describing user dataset. For more information about feature engineering, see the following resources:
The Databricks Solution Accelerators notebooks Personalizing the Customer Experience with Recommendations show examples of feature engineering in a recommender system.
Preprocess data for machine learning and deep learning for examples of feature engineering with scikit-learn, MLlib and transfer learning.
Model tuning. Model tuning involves revising the code of the existing pipeline, including feature engineering, model structure, model hyperparameters, or even updating the data collection stage, to improve the model’s performance. For more information about tools for model tuning on Databricks, see Hyperparameter tuning.