Tuning distributed training algorithms: Hyperopt and Apache Spark MLlib

Databricks Runtime for Machine Learning includes Hyperopt, a library for ML hyperparameter tuning in Python, and Apache Spark MLlib, a library of distributed algorithms for training ML models (also often called "Spark ML"). This example notebook shows how to use them together.

Use case

Distributed machine learning workloads in Python for which you want to tune hyperparameters.

In this example notebook

The demo shows how to tune hyperparameters for an example machine learning workflow in MLlib. You can follow this example to tune other distributed machine learning algorithms from MLlib or from other libraries.

This guide includes two sections to illustrate the process you can follow to develop your own workflow:

Run distributed training using MLlib. In this section, you get the MLlib model training code working without hyperparameter tuning.
Use Hyperopt to tune hyperparameters in the distributed training workflow. In this section, you wrap the MLlib code with Hyperopt for tuning.

Requirements

This notebooks requires Databricks Runtime for Machine Learning.

There are 60000 training images and 10000 test images.

Table

Copied!

label

features

{"vectorType": "sparse", "length": 780, "indices": [93, 94, 95, 96, 98, 99, 100, 120, 121, 122, 123, 124, 127, 128, 129, 130, 148, 149, 150, 151, 152, 155, 156, 157, 158, 159, 175, 176, 177, 178, 179, 180, 183, 184, 185, 186, 187, 188, 203, 204, 205, 206, 207, 212, 213, 214, 215, 216, 217, 231, 232, 233, 234, 235, 241, 242, 243, 244, 245, 246, 258, 259, 260, 261, 262, 270, 271, 272, 273, 274, 275, 285, 286, 287, 288, 289, 298, 299, 300, 301, 302, 303, 313, 314, 315, 316, 327, 328, 329, 330, 331, 332, 341, 342, 343, 344, 356, 357, 358, 359, 360, 369, 370, 371, 372, 385, 386, 387, 388, 397, 398, 399, 400, 413, 414, 415, 416, 425, 426, 427, 428, 440, 441, 442, 443, 444, 453, 454, 455, 456, 466, 467, 468, 469, 470, 471, 472, 481, 482, 483, 484, 485, 492, 493, 494, 495, 496, 497, 498, 499, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634], "values": [3, 106, 151, 2, 59, 134, 63, 5, 137, 253, 253, 47, 173, 243, 71, 20, 49, 253, 253, 253, 156, 13, 253, 253, 212, 28, 7, 145, 253, 253, 237, 69, 10, 209, 253, 253, 233, 40, 124, 253, 253, 253, 139, 7, 184, 253, 253, 198, 64, 157, 253, 253, 163, 12, 55, 229, 253, 253, 233, 62, 61, 235, 253, 192, 40, 72, 253, 253, 253, 235, 19, 11, 238, 253, 235, 45, 30, 226, 253, 253, 253, 70, 13, 253, 253, 205, 46, 228, 253, 253, 243, 10, 71, 253, 253, 205, 136, 253, 253, 253, 128, 134, 254, 254, 85, 207, 254, 254, 132, 133, 253, 253, 96, 206, 253, 253, 132, 117, 253, 253, 205, 63, 235, 253, 253, 115, 13, 253, 253, 205, 10, 49, 205, 253, 253, 232, 9, 13, 253, 253, 240, 81, 6, 79, 195, 253, 253, 253, 253, 116, 3, 171, 253, 253, 239, 194, 96, 73, 73, 189, 195, 199, 253, 253, 253, 253, 253, 225, 17, 47, 253, 253, 253, 253, 253, 253, 253, 253, 254, 253, 253, 253, 253, 253, 184, 30, 4, 171, 253, 253, 253, 253, 253, 253, 253, 254, 253, 253, 253, 253, 225, 47, 15, 126, 253, 253, 253, 253, 253, 253, 254, 253, 253, 232, 144, 45, 3, 40, 225, 253, 253, 253, 195, 17, 63, 11, 9]}

{"vectorType": "sparse", "length": 780, "indices": [94, 95, 96, 97, 98, 122, 123, 124, 125, 126, 127, 149, 150, 151, 152, 153, 154, 155, 158, 159, 160, 177, 178, 179, 180, 181, 182, 183, 186, 187, 188, 189, 204, 205, 206, 207, 208, 209, 215, 216, 217, 218, 232, 233, 234, 235, 236, 237, 243, 244, 245, 246, 260, 261, 262, 263, 264, 265, 271, 272, 273, 274, 287, 288, 289, 290, 291, 292, 299, 300, 301, 302, 315, 316, 317, 318, 319, 320, 327, 328, 329, 330, 331, 343, 344, 345, 346, 347, 348, 355, 356, 357, 358, 359, 370, 371, 372, 373, 374, 375, 376, 383, 384, 385, 386, 387, 398, 399, 400, 401, 402, 403, 410, 411, 412, 413, 414, 415, 426, 427, 428, 429, 430, 438, 439, 440, 441, 442, 443, 454, 455, 456, 457, 458, 465, 466, 467, 468, 469, 470, 471, 482, 483, 484, 485, 486, 492, 493, 494, 495, 496, 497, 498, 510, 511, 512, 513, 514, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 624, 625, 626, 627, 628, 629, 630, 631], "values": [7, 118, 252, 255, 111, 141, 253, 253, 253, 248, 115, 30, 237, 253, 253, 253, 212, 205, 105, 83, 10, 65, 253, 253, 253, 204, 91, 86, 98, 250, 200, 41, 8, 90, 253, 253, 253, 27, 241, 208, 33, 11, 63, 122, 248, 253, 253, 27, 166, 253, 120, 71, 192, 236, 204, 253, 196, 13, 170, 253, 223, 110, 33, 219, 253, 253, 253, 84, 104, 253, 253, 212, 199, 253, 253, 253, 253, 6, 104, 253, 253, 226, 34, 213, 253, 253, 253, 253, 6, 104, 253, 253, 253, 96, 19, 221, 253, 253, 253, 157, 2, 129, 253, 253, 253, 96, 112, 253, 253, 253, 244, 43, 14, 242, 253, 253, 253, 96, 234, 253, 253, 253, 161, 125, 253, 253, 253, 253, 96, 234, 253, 253, 253, 103, 4, 187, 253, 253, 253, 233, 49, 234, 253, 123, 239, 61, 83, 241, 253, 253, 253, 253, 156, 189, 253, 235, 185, 34, 30, 83, 183, 242, 253, 253, 253, 253, 154, 10, 73, 243, 253, 149, 139, 63, 196, 218, 253, 253, 246, 226, 253, 253, 80, 12, 213, 253, 253, 253, 253, 253, 253, 253, 253, 241, 203, 253, 158, 10, 81, 204, 239, 245, 253, 233, 251, 253, 238, 207, 132, 165, 5, 26, 38, 70, 116, 3, 152, 141, 31]}

{"vectorType": "sparse", "length": 780, "indices": [95, 96, 97, 123, 124, 125, 126, 152, 153, 154, 155, 156, 180, 181, 182, 183, 184, 185, 186, 187, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 232, 233, 234, 235, 238, 239, 240, 241, 242, 243, 244, 245, 246, 259, 260, 261, 262, 268, 269, 270, 271, 272, 273, 274, 287, 288, 289, 290, 299, 300, 301, 302, 303, 315, 316, 317, 328, 329, 330, 331, 343, 344, 345, 356, 357, 358, 370, 371, 372, 373, 383, 384, 385, 386, 398, 399, 400, 401, 410, 411, 412, 413, 414, 426, 427, 428, 429, 438, 439, 440, 441, 442, 454, 455, 456, 457, 466, 467, 468, 469, 482, 483, 484, 485, 493, 494, 495, 496, 510, 511, 512, 513, 514, 520, 521, 522, 523, 524, 539, 540, 541, 542, 543, 544, 547, 548, 549, 550, 551, 552, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 625, 626, 627, 628, 629, 630, 631, 632, 633], "values": [56, 247, 121, 24, 242, 245, 122, 231, 253, 253, 104, 12, 90, 253, 253, 254, 221, 120, 120, 85, 67, 75, 36, 11, 56, 222, 254, 253, 253, 253, 245, 207, 36, 86, 245, 249, 105, 44, 224, 230, 253, 253, 253, 253, 214, 10, 8, 191, 253, 143, 29, 119, 119, 158, 253, 253, 94, 15, 253, 226, 48, 4, 183, 253, 248, 56, 42, 253, 178, 179, 253, 184, 14, 164, 253, 178, 179, 253, 163, 61, 254, 254, 179, 76, 254, 254, 164, 60, 253, 253, 178, 29, 206, 253, 253, 40, 60, 253, 253, 178, 120, 253, 253, 245, 13, 60, 253, 253, 178, 120, 253, 239, 63, 60, 253, 253, 178, 14, 238, 253, 179, 18, 190, 253, 231, 70, 43, 184, 253, 253, 74, 86, 253, 253, 239, 134, 8, 56, 163, 253, 253, 213, 35, 16, 253, 253, 253, 253, 240, 239, 239, 247, 253, 253, 210, 27, 4, 59, 204, 253, 253, 253, 253, 253, 254, 253, 250, 110, 31, 122, 253, 253, 253, 253, 255, 217, 98]}

{"vectorType": "sparse", "length": 780, "indices": [96, 97, 98, 99, 100, 101, 102, 103, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 150, 151, 152, 153, 154, 155, 157, 158, 159, 160, 176, 177, 178, 179, 180, 181, 182, 186, 187, 188, 189, 204, 205, 206, 207, 208, 215, 216, 217, 232, 233, 234, 235, 243, 244, 245, 246, 259, 260, 261, 262, 272, 273, 274, 286, 287, 288, 289, 300, 301, 302, 314, 315, 316, 317, 328, 329, 330, 331, 341, 342, 343, 344, 357, 358, 359, 369, 370, 371, 372, 384, 385, 386, 387, 388, 397, 398, 399, 412, 413, 414, 415, 416, 425, 426, 427, 439, 440, 441, 442, 443, 444, 453, 454, 455, 465, 466, 467, 468, 469, 470, 471, 481, 482, 483, 491, 492, 493, 494, 495, 496, 497, 498, 509, 510, 511, 512, 513, 514, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 628, 629, 630, 631, 632, 633], "values": [9, 93, 154, 196, 231, 149, 56, 7, 2, 83, 211, 253, 253, 249, 243, 248, 253, 155, 3, 121, 253, 253, 253, 130, 53, 41, 173, 253, 100, 2, 80, 250, 253, 227, 58, 1, 10, 228, 243, 64, 89, 253, 253, 165, 36, 156, 253, 148, 210, 253, 215, 7, 32, 245, 250, 58, 81, 244, 228, 44, 179, 253, 136, 6, 214, 253, 163, 125, 253, 163, 125, 253, 246, 50, 51, 246, 233, 7, 11, 218, 253, 135, 234, 253, 9, 97, 253, 252, 53, 2, 235, 253, 111, 105, 217, 253, 165, 80, 253, 253, 253, 253, 254, 253, 89, 24, 198, 253, 253, 233, 100, 254, 253, 89, 5, 98, 244, 253, 253, 193, 70, 254, 253, 89, 11, 88, 211, 253, 254, 253, 253, 163, 152, 253, 180, 49, 148, 120, 30, 100, 210, 253, 253, 253, 253, 253, 242, 73, 5, 185, 253, 246, 253, 251, 162, 229, 242, 253, 253, 253, 253, 253, 254, 242, 71, 3, 128, 241, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 245, 73, 16, 109, 109, 109, 228, 253, 253, 253, 253, 227, 109, 20, 30, 54, 114, 113, 54, 30]}

{"vectorType": "sparse", "length": 780, "indices": [96, 97, 98, 99, 123, 124, 125, 126, 127, 151, 152, 153, 154, 155, 177, 178, 179, 180, 181, 182, 183, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 259, 260, 261, 262, 263, 264, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 287, 288, 289, 290, 291, 299, 300, 301, 302, 303, 314, 315, 316, 317, 318, 319, 328, 329, 330, 331, 342, 343, 344, 345, 346, 357, 358, 359, 360, 369, 370, 371, 372, 373, 385, 386, 387, 388, 397, 398, 399, 400, 401, 413, 414, 415, 416, 425, 426, 427, 428, 441, 442, 443, 444, 453, 454, 455, 456, 469, 470, 471, 472, 481, 482, 483, 484, 497, 498, 499, 500, 509, 510, 511, 512, 513, 524, 525, 526, 527, 528, 537, 538, 539, 540, 541, 542, 551, 552, 553, 554, 555, 556, 565, 566, 567, 568, 569, 570, 571, 572, 577, 578, 579, 580, 581, 582, 583, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637], "values": [39, 228, 153, 32, 40, 227, 253, 253, 192, 209, 253, 253, 167, 43, 41, 151, 250, 253, 186, 5, 54, 148, 253, 253, 223, 226, 228, 241, 228, 228, 228, 228, 137, 138, 250, 253, 253, 128, 129, 232, 253, 253, 253, 253, 253, 250, 137, 38, 241, 253, 253, 137, 7, 30, 38, 38, 62, 168, 179, 253, 250, 141, 39, 151, 253, 253, 253, 38, 6, 173, 253, 253, 117, 95, 251, 253, 253, 143, 8, 46, 224, 253, 220, 151, 253, 253, 177, 8, 117, 253, 252, 128, 29, 249, 253, 253, 126, 20, 253, 253, 253, 151, 253, 253, 232, 20, 20, 253, 253, 253, 255, 253, 253, 59, 20, 253, 253, 253, 254, 253, 253, 19, 20, 253, 253, 253, 254, 253, 253, 19, 20, 253, 253, 236, 255, 253, 253, 110, 3, 3, 112, 253, 253, 123, 244, 253, 253, 253, 110, 3, 3, 112, 253, 253, 239, 51, 54, 243, 253, 253, 253, 157, 33, 10, 10, 34, 158, 253, 253, 246, 111, 115, 250, 253, 253, 253, 253, 217, 137, 137, 137, 137, 218, 253, 253, 253, 239, 51, 54, 244, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 253, 172, 51]}

{"vectorType": "sparse", "length": 780, "indices": [97, 98, 99, 100, 101, 102, 125, 126, 127, 128, 129, 130, 153, 154, 155, 156, 157, 158, 159, 181, 182, 183, 184, 185, 186, 187, 188, 209, 210, 211, 212, 213, 214, 215, 216, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 262, 263, 264, 265, 266, 267, 269, 270, 271, 272, 273, 289, 290, 291, 292, 293, 294, 298, 299, 300, 301, 302, 316, 317, 318, 319, 320, 326, 327, 328, 329, 330, 343, 344, 345, 346, 347, 348, 354, 355, 356, 357, 358, 370, 371, 372, 373, 374, 375, 382, 383, 384, 385, 386, 398, 399, 400, 401, 402, 403, 410, 411, 412, 413, 414, 426, 427, 428, 429, 430, 437, 438, 439, 440, 441, 454, 455, 456, 457, 458, 465, 466, 467, 468, 482, 483, 484, 485, 486, 487, 492, 493, 494, 495, 496, 510, 511, 512, 513, 514, 515, 516, 518, 519, 520, 521, 522, 523, 524, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 625, 626, 627, 628, 629, 630, 631, 632, 633], "values": [254, 252, 252, 90, 51, 31, 252, 250, 250, 250, 252, 149, 130, 250, 250, 250, 252, 210, 60, 10, 130, 250, 250, 252, 250, 221, 40, 92, 252, 252, 212, 254, 252, 252, 252, 62, 102, 252, 250, 189, 29, 171, 250, 250, 250, 82, 62, 211, 250, 252, 189, 40, 20, 160, 250, 250, 202, 203, 221, 250, 250, 212, 29, 102, 250, 250, 243, 121, 41, 254, 252, 252, 212, 102, 252, 252, 254, 150, 62, 221, 252, 250, 189, 29, 102, 250, 250, 252, 149, 62, 211, 250, 252, 250, 100, 102, 250, 250, 252, 149, 102, 250, 250, 252, 169, 20, 102, 250, 250, 212, 29, 103, 252, 252, 244, 121, 92, 252, 252, 252, 163, 102, 250, 250, 222, 61, 252, 250, 250, 250, 102, 250, 250, 252, 210, 60, 123, 252, 250, 250, 250, 102, 250, 250, 252, 250, 221, 40, 82, 202, 241, 252, 250, 250, 250, 82, 202, 243, 255, 252, 252, 252, 254, 252, 252, 252, 254, 232, 202, 40, 121, 252, 250, 250, 250, 252, 250, 250, 250, 252, 149, 252, 250, 250, 250, 252, 250, 250, 250, 222, 60, 49, 130, 250, 250, 252, 250, 250, 88, 40]}

{"vectorType": "sparse", "length": 780, "indices": [98, 99, 100, 101, 125, 126, 127, 128, 129, 130, 152, 153, 154, 155, 156, 157, 158, 180, 181, 182, 183, 184, 185, 186, 187, 207, 208, 209, 210, 212, 213, 214, 215, 234, 235, 236, 237, 241, 242, 243, 262, 263, 264, 265, 269, 270, 271, 289, 290, 291, 292, 297, 298, 299, 317, 318, 319, 320, 325, 326, 327, 328, 344, 345, 346, 347, 353, 354, 355, 356, 372, 373, 374, 381, 382, 383, 384, 400, 401, 402, 409, 410, 411, 412, 427, 428, 429, 430, 437, 438, 439, 455, 456, 457, 458, 464, 465, 466, 467, 483, 484, 485, 486, 491, 492, 493, 494, 495, 511, 512, 513, 519, 520, 521, 522, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 624, 625, 626, 627, 628, 629, 630, 631], "values": [50, 237, 203, 75, 37, 232, 254, 254, 244, 15, 9, 156, 254, 209, 250, 254, 131, 31, 233, 163, 11, 143, 254, 233, 25, 9, 164, 192, 11, 74, 253, 254, 89, 3, 122, 254, 195, 95, 254, 89, 9, 254, 231, 44, 127, 254, 146, 11, 192, 254, 95, 65, 254, 207, 127, 254, 220, 35, 12, 254, 237, 45, 21, 233, 253, 86, 12, 255, 254, 71, 181, 254, 249, 12, 254, 254, 71, 208, 254, 148, 113, 254, 237, 45, 119, 250, 254, 129, 183, 254, 207, 190, 254, 254, 13, 112, 254, 254, 91, 190, 254, 192, 5, 12, 185, 254, 251, 78, 190, 254, 147, 173, 254, 252, 128, 190, 254, 225, 40, 66, 25, 16, 102, 243, 254, 212, 96, 254, 254, 230, 254, 221, 214, 254, 254, 217, 28, 10, 167, 254, 254, 254, 254, 254, 254, 217, 86, 6, 67, 194, 254, 254, 227, 100, 11]}

{"vectorType": "sparse", "length": 780, "indices": [98, 99, 100, 101, 126, 127, 128, 129, 130, 154, 155, 156, 157, 158, 159, 182, 183, 184, 185, 186, 187, 188, 208, 209, 210, 211, 212, 213, 214, 215, 216, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 261, 262, 263, 264, 269, 270, 271, 272, 273, 288, 289, 290, 291, 292, 297, 298, 299, 300, 301, 316, 317, 318, 319, 320, 326, 327, 328, 329, 344, 345, 346, 347, 348, 355, 356, 357, 358, 371, 372, 373, 374, 375, 383, 384, 385, 386, 399, 400, 401, 402, 411, 412, 413, 414, 427, 428, 429, 430, 438, 439, 440, 441, 442, 455, 456, 457, 458, 466, 467, 468, 469, 483, 484, 485, 486, 493, 494, 495, 496, 497, 511, 512, 513, 514, 520, 521, 522, 523, 524, 525, 539, 540, 541, 542, 547, 548, 549, 550, 551, 552, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 625, 626, 627, 628, 629, 630, 631, 632], "values": [70, 255, 165, 114, 122, 253, 253, 253, 120, 165, 253, 253, 253, 234, 52, 99, 253, 253, 253, 253, 228, 26, 60, 168, 238, 202, 174, 253, 253, 253, 127, 91, 81, 1, 215, 128, 28, 12, 181, 253, 253, 175, 3, 18, 204, 253, 77, 7, 253, 253, 253, 54, 54, 248, 253, 253, 143, 1, 127, 253, 253, 188, 104, 253, 253, 253, 20, 81, 249, 253, 191, 192, 253, 253, 218, 5, 203, 253, 208, 21, 56, 237, 253, 250, 100, 104, 253, 253, 75, 76, 253, 253, 224, 119, 253, 253, 75, 80, 253, 253, 103, 4, 241, 253, 218, 32, 213, 253, 253, 103, 125, 253, 253, 191, 213, 253, 253, 103, 3, 176, 253, 253, 135, 213, 253, 253, 103, 9, 162, 253, 253, 226, 37, 179, 253, 253, 135, 46, 157, 253, 253, 253, 63, 23, 188, 253, 249, 179, 179, 179, 179, 233, 253, 253, 233, 156, 10, 51, 235, 253, 253, 253, 253, 253, 253, 251, 232, 120, 16, 124, 253, 253, 253, 253, 152, 104]}

{"vectorType": "sparse", "length": 780, "indices": [99, 100, 101, 102, 103, 126, 127, 128, 129, 130, 131, 152, 153, 154, 155, 156, 157, 158, 159, 160, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 260, 261, 262, 263, 264, 265, 270, 271, 272, 273, 286, 287, 288, 289, 290, 291, 292, 298, 299, 300, 301, 314, 315, 316, 317, 318, 319, 327, 328, 329, 330, 342, 343, 344, 345, 346, 347, 355, 356, 357, 358, 370, 371, 372, 373, 374, 383, 384, 385, 386, 397, 398, 399, 400, 401, 411, 412, 413, 414, 425, 426, 427, 428, 429, 439, 440, 441, 442, 453, 454, 455, 456, 467, 468, 469, 470, 481, 482, 483, 484, 494, 495, 496, 497, 498, 509, 510, 511, 512, 521, 522, 523, 524, 525, 526, 537, 538, 539, 540, 541, 543, 544, 547, 548, 549, 550, 551, 552, 553, 554, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635], "values": [25, 114, 254, 254, 74, 123, 220, 253, 253, 253, 74, 12, 104, 254, 253, 253, 253, 253, 190, 78, 43, 222, 253, 254, 253, 253, 253, 253, 253, 182, 7, 40, 227, 253, 253, 254, 235, 227, 253, 253, 253, 253, 14, 72, 92, 219, 253, 253, 235, 74, 57, 48, 74, 127, 253, 253, 32, 224, 253, 253, 253, 229, 49, 75, 253, 253, 163, 6, 161, 251, 253, 253, 221, 46, 4, 183, 253, 163, 15, 253, 253, 253, 253, 49, 179, 253, 179, 11, 104, 253, 253, 253, 173, 6, 179, 253, 253, 59, 165, 254, 254, 242, 71, 181, 254, 255, 121, 50, 185, 253, 253, 187, 179, 253, 253, 208, 209, 253, 253, 204, 26, 179, 253, 253, 208, 209, 253, 253, 178, 179, 253, 253, 208, 209, 253, 253, 46, 23, 201, 253, 253, 208, 209, 253, 253, 47, 15, 97, 253, 253, 253, 208, 209, 253, 253, 218, 120, 48, 32, 32, 134, 191, 253, 253, 253, 253, 128, 47, 222, 253, 253, 251, 239, 244, 243, 239, 241, 243, 253, 253, 253, 253, 253, 168, 38, 138, 253, 253, 253, 253, 253, 253, 253, 254, 253, 253, 253, 253, 221, 208, 12, 7, 104, 147, 253, 253, 253, 253, 253, 255, 253, 253, 182, 104, 31]}

{"vectorType": "sparse", "length": 780, "indices": [99, 100, 101, 127, 128, 129, 130, 155, 156, 157, 158, 159, 182, 183, 184, 185, 186, 187, 208, 209, 210, 211, 213, 214, 215, 216, 235, 236, 237, 238, 239, 241, 242, 243, 244, 262, 263, 264, 265, 266, 269, 270, 271, 272, 290, 291, 292, 293, 294, 298, 299, 300, 318, 319, 320, 321, 326, 327, 328, 345, 346, 347, 348, 354, 355, 356, 373, 374, 375, 376, 382, 383, 384, 401, 402, 403, 404, 410, 411, 412, 429, 430, 431, 432, 437, 438, 439, 440, 457, 458, 459, 460, 465, 466, 467, 468, 485, 486, 487, 488, 492, 493, 494, 495, 513, 514, 515, 516, 519, 520, 521, 522, 523, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 569, 570, 571, 572, 573, 574, 575, 576, 577, 598, 599, 600, 601, 602, 603, 604, 627, 628, 629, 630, 631], "values": [147, 255, 155, 210, 254, 253, 103, 96, 156, 254, 245, 57, 7, 19, 10, 206, 254, 141, 7, 127, 210, 63, 55, 254, 240, 57, 3, 172, 254, 217, 12, 5, 254, 254, 111, 3, 173, 254, 254, 59, 2, 196, 254, 111, 9, 254, 254, 115, 1, 169, 254, 111, 113, 254, 237, 14, 169, 254, 111, 20, 246, 254, 135, 154, 254, 111, 34, 254, 254, 51, 81, 254, 111, 112, 254, 254, 4, 149, 254, 105, 144, 254, 254, 4, 1, 171, 254, 22, 201, 254, 254, 4, 33, 254, 180, 8, 177, 254, 254, 4, 59, 224, 254, 74, 112, 254, 254, 135, 87, 229, 254, 189, 4, 35, 254, 254, 244, 44, 52, 232, 254, 237, 42, 13, 214, 254, 254, 243, 248, 254, 242, 81, 31, 245, 254, 254, 254, 252, 76, 26, 188, 254, 238, 82]}

{"vectorType": "sparse", "length": 780, "indices": [101, 102, 103, 104, 129, 130, 131, 132, 157, 158, 159, 160, 183, 184, 185, 186, 187, 188, 211, 212, 213, 214, 215, 216, 237, 238, 239, 240, 241, 242, 243, 244, 264, 265, 266, 267, 268, 269, 270, 271, 272, 291, 292, 293, 294, 295, 297, 298, 299, 300, 319, 320, 321, 322, 325, 326, 327, 328, 346, 347, 348, 349, 350, 353, 354, 355, 356, 373, 374, 375, 376, 377, 381, 382, 383, 384, 400, 401, 402, 403, 404, 408, 409, 410, 411, 412, 428, 429, 430, 431, 435, 436, 437, 438, 439, 455, 456, 457, 458, 463, 464, 465, 466, 483, 484, 485, 486, 489, 490, 491, 492, 493, 494, 510, 511, 512, 513, 515, 516, 517, 518, 519, 520, 521, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 593, 594, 595, 596, 597, 598, 599, 600, 601, 621, 622, 623, 624, 625, 626, 627], "values": [24, 242, 181, 5, 174, 254, 231, 39, 206, 254, 204, 28, 10, 111, 248, 254, 207, 4, 172, 254, 254, 254, 236, 9, 5, 168, 247, 249, 180, 254, 254, 48, 5, 152, 254, 248, 91, 63, 254, 254, 96, 5, 126, 254, 237, 146, 55, 252, 254, 96, 145, 254, 254, 54, 51, 251, 254, 84, 42, 246, 254, 161, 7, 77, 254, 254, 12, 6, 169, 254, 232, 34, 161, 254, 242, 10, 5, 102, 254, 230, 77, 65, 243, 254, 174, 4, 81, 235, 247, 78, 12, 204, 254, 200, 35, 12, 200, 254, 146, 148, 254, 254, 77, 125, 254, 238, 59, 48, 190, 251, 254, 123, 10, 88, 243, 248, 56, 10, 149, 245, 254, 254, 102, 9, 184, 254, 155, 9, 110, 225, 254, 254, 217, 66, 2, 13, 251, 254, 182, 225, 254, 254, 254, 148, 18, 94, 254, 254, 254, 254, 254, 178, 51, 2, 69, 254, 254, 219, 107, 22, 1]}

713 rows|Truncated data

Create a function to train a model

In this section, you define a function to train a decision tree. Wrapping the training code in a function is important for passing the function to Hyperopt for tuning later.

Details: The tree algorithm needs to know that the labels are categories 0-9, rather than continuous values. This example uses the StringIndexer class to do this. A Pipeline ties this feature preprocessing together with the tree algorithm. ML Pipelines are tools Spark provides for piecing together Machine Learning algorithms into workflows. To learn more about Pipelines, check out other ML example notebooks in Databricks and the ML Pipelines user guide.

2021/06/29 22:55:36 WARNING mlflow.utils.autologging_utils: You are using an unsupported version of pyspark.ml. If you encounter errors during autologging, try upgrading / downgrading pyspark.ml to a supported version, or try upgrading MLflow.

The trained decision tree achieved an F1 score of 0.6703580220505504 on the validation data

Tune the model using Hyperopt `fmin()`

Set max_evals to the maximum number of points in hyperparameter space to test (the maximum number of models to fit and evaluate). Because this command evaluates many models, it can take several minutes to execute.
You must also specify which search algorithm to use. The two main choices are:
- hyperopt.tpe.suggest: Tree of Parzen Estimators, a Bayesian approach which iteratively and adaptively selects new hyperparameter settings to explore based on previous results
- hyperopt.rand.suggest: Random search, a non-adaptive approach that randomly samples the search space

Important:
When using Hyperopt with MLlib and other distributed training algorithms, do not pass a trials argument to fmin(). When you do not include the trials argument, Hyperopt uses the default Trials class, which runs on the cluster driver. Hyperopt needs to evaluate each trial on the driver node so that each trial can initiate distributed training jobs.

Do not use the SparkTrials class with MLlib. SparkTrials is designed to distribute trials for algorithms that are not themselves distributed. MLlib uses distributed computing already and is not compatible with SparkTrials.

100%|██████████| 8/8 [03:14<00:00, 24.26s/trial, best loss: -0.6827359145872123]

Out[12]: {'maxBins': 7.438805606211824, 'minInstancesPerNode': 49.28366143367587}

On the test data, the initial (untuned) model achieved F1 score 0.6777770408635782, and the final (tuned) model achieved 0.6978177456822743.

hyperopt-spark-ml(Python)

Tuning distributed training algorithms: Hyperopt and Apache Spark MLlib

Use case

In this example notebook

Requirements

MLflow autologging

Part 1. Run distributed training using MLlib

Load data

Create a function to train a model

Part 2. Use Hyperopt to tune hyperparameters

Define a function to minimize

Define the search space over hyperparameters

Tune the model using Hyperopt `fmin()`

Retrain the model on the full training dataset

hyperopt-spark-ml(Python)

Tuning distributed training algorithms: Hyperopt and Apache Spark MLlib

Use case

In this example notebook

Requirements

MLflow autologging

Part 1. Run distributed training using MLlib

Load data

Create a function to train a model

Part 2. Use Hyperopt to tune hyperparameters

Define a function to minimize

Define the search space over hyperparameters

Tune the model using Hyperopt fmin()

Retrain the model on the full training dataset

Tune the model using Hyperopt `fmin()`