# Define pipeline components
labelIndexer = StringIndexer(inputCol="topic", outputCol="label", handleInvalid="keep")
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol="words", outputCol="features")
dt = DecisionTreeClassifier()
# Construct a Pipeline object using the defined components
pipeline = Pipeline(stages=[labelIndexer, tokenizer, hashingTF, dt])
MLflow Deployment: Train PySpark Model and Log in MLeap Format
This notebook walks through the process of:
The notebook contains the following sections:
Setup
Train a PySpark Pipeline model
Last refresh: Never