Labelbox

Lablebox is a tooling, infrastructure, enabling services solution for AI data. Using Labelbox, AI teams can customize a workflow to operate, manage and improve data labeling, data cataloging, and model debugging in a single, unified platform. Labelbox is designed to help AI teams build and operate production-grade machine learning systems.

You can connect your Databricks clusters that have the Machine Learning version of the Databricks Runtime to Labelbox.

Connect to Labelbox by using Partner Connect

Note

If you already have a Labelbox account, Databricks recommends that you skip ahead to Connect to Labelbox manually instead. This is because the new connection experience in Partner Connect is optimized for new Labelbox accounts.

  1. Make sure your Databricks account, workspace, and the signed-in user all meet the requirements for Partner Connect.

  2. In the sidebar, click Partner Connect button Partner Connect.

  3. Click the Labelbox tile.

    Note

    If the Labelbox tile has a check mark icon inside of it, this means that someone else in your workspace has already created an ML cluster in this workspace along with a related Databricks service principal, and a notebook named labelbox_databricks_example should have already been added to the Workspace/Shared/labelbox_demo folder in your workspace. Skip ahead to Connect to Labelbox manually.

  4. In the Connect to partner dialog, click Next. Partner Connect creates the following resources in your workspace:

    • An ML cluster named LABELBOX_CLUSTER by default. (You can change this default name before you click Next.)
    • A Databricks service principal named LABELBOX_USER.
  5. For Email, enter the email address that you want to use for your new Labelbox account.

  6. Click Connect to Labelbox.

  7. Follow the on-screen instructions to Sign Up for your new Labelbox account.

  8. After you sign in to your new Labelbox account, Partner Connect creates the following resources in your workspace, if they do not already exist:

    • A Databricks personal access token, associating it with the LABELBOX_USER service principal.
    • A notebook named labelbox_databricks_example.ipynb in the Workspace/Shared/labelbox_demo folder.
  9. Create a Labelbox API key for your Labelbox account, if you do not have one. Copy the API key and save it in a secure location, as the key will eventually be hidden from view.

  10. Skip ahead to Set up the ML cluster and Labelbox starter notebook.

Connect to Labelbox

The following instructions describe how to connect Labelbox to a Databricks cluster.

Note

To connect faster, use Partner Connect.

  1. You must have an available cluster that has the Machine Learning version of the Databricks Runtime. To check this for an existing cluster, look for ML in the Runtime column when you display the cluster in your workspace. If you do not have an available ML cluster, create a cluster and for Databricks Runtime Version, choose a version from the ML list.
  2. Go to the Labelbox page to Sign Up for a new Labelbox account or to Log In to your existing Labelbox account.
  3. Create a Labelbox API key for your Labelbox account, if you do not have one. Copy the API key and save it in a secure location, as the key will eventually be hidden from view, and you will need this key later.
  4. Check for a Labelbox starter notebook in your workspace:
    1. In your Databricks workspace, ensure that you are in the Data Science & Engineering or Databricks Machine Learning environment. Use the sidebar persona-switcher if necessary.
    2. In the sidebar, click Workspace > Shared.
    3. If a folder named labelbox_demo does not already exist, create it:
      1. Click the down arrow next to Shared.
      2. Click Create > Folder.
      3. Enter labelbox_demo,
      4. Click Create Folder.
    4. Click the labelbox_demo folder. If a starter notebook named labelbox_databricks_example.ipynb does not exist in the folder, import it:
      1. Click the down arrow next to labelbox_demo.
      2. Click Import.
      3. Click URL.
      4. Enter https://github.com/Labelbox/labelbox-python/blob/develop/examples/integrations/databricks/labelbox_databricks_example.ipynb and click Import.
  5. Continue with Set up the ML cluster and Labelbox starter notebook.

Set up the ML cluster and Labelbox starter notebook

  1. In your Databricks workspace, ensure that you are still in the Data Science & Engineering or Databricks Machine Learning environment. Use the sidebar persona-switcher if necessary.

  2. Check that the required Labelbox libraries are installed in your ML cluster:

    1. In the sidebar, click Compute.

    2. Click your ML cluster. Use the Filter box to find it, if necessary.

      Note

      If you used Partner Connect to connect to Labelbox, the ML cluster’s name should be LABELBOX_CLUSTER.

    3. Click the Libraries tab.

    4. If the labelbox package is not listed, install it:

      1. Click Install New.
      2. Click PyPI.
      3. For Package, enter labelbox.
      4. Click Install.
    5. If the labelspark package is not listed, install it:

      1. Click Install New.
      2. Click PyPI.
      3. For Package, enter labelspark.
      4. Click Install.
  3. Attach your ML cluster to the starter notebook:

    1. In the sidebar, click Workspace > Shared > labelbox_demo > labelbox_databricks_example.ipynb.
    2. Attach your ML cluster to the notebook.
  4. Browse through the notebook to learn how to automate Labelbox.

For more information, see the README in GitHub for the starter notebook. See also the Labelbox Docs.