John Snow Labs provides production-grade, scalable, and trainable versions of the latest research in natural language processing (NLP) through the following products:
Spark NLP: state-of-the-art NLP for Python, Java, or Scala.
Spark NLP for Healthcare: state-of-the-art clinical and biomedical NLP.
Spark OCR: a scalable, private, and highly accurate OCR and de-identification library.
You can integrate your Databricks clusters with John Snow Labs.
John Snow Labs does not integrate with Databricks SQL warehouses (formerly Databricks SQL endpoints).
The Partner Connect steps cover the most popular NLP and OCR tasks:
Create a new cluster in your Databricks workspace.
Automatically install John Snow Labs NLP and OCR libraries on the new cluster.
Create and deploy a 30-day trial license for John Snow Labs NLP and OCR libraries.
Copy 20+ ready-to-use Python notebooks to the new cluster.
To connect to John Snow Labs using Partner Connect, you follow the steps in Connect to ML partners using Partner Connect. The John Snow Labs connection is different from standard machine learning connections in the following ways:
To complete the Partner Connect steps, you need a valid credit card. Your credit card is subject to pay-as-you-go charges that begin after the trial ends.
After you follow the on-screen instructions to start your John Snow Labs NLP trial, check your email inbox for a message from John Snow Labs that contains instructions about how to get started, then follow the instructions in the message. It could take up to a half hour for this message to arrive.
To connect your Databricks workspace to John Snow Labs using Partner Connect, see Connect to ML partners using Partner Connect.
Follow these instructions to automatically install the John Snow Labs NLP and OCR libraries and notebooks on your cluster, and to activate your trial of John Snow Labs if you do not already have a John Snow Labs account.
Before you integrate with John Snow Labs, you must have the following:
A Databricks cluster in your Databricks workspace.
A Databricks personal access token.
As a security best practice when you authenticate with automated tools, systems, scripts, and apps, Databricks recommends that you use OAuth tokens.
If you use personal access token authentication, Databricks recommends using personal access tokens belonging to service principals instead of workspace users. To create tokens for service principals, see Manage tokens for a service principal.
To integrate with John Snow Labs, complete these steps:
Make sure you meet the requirements for John Snow Labs.
Go to the John Snow Labs NLP on Databricks webpage.
Click Install in my Databricks account.
In the Please tell us about yourself dialog, enter your first name, last name, and company email address.
For Databricks instance url, enter your Databricks workspace URL, for example
For Databricks access token, enter your Databricks personal access token value from the requirements in this article.
Click Test connection.
After the connection succeeds, for Choose a cluster to install on, select the cluster from the requirements in this article.
Click Get Trial License.
Check your email inbox for a message from John Snow Labs that contains a request to validate your email address.
In the message, click Validate my email.
After several minutes, check your email inbox again for another message from John Snow Labs that contains instructions about how to get started. Note that in some cases it could take up to a half hour for this message to arrive.
Follow the instructions in the message.
To upgrade your trial of John Snow Labs, sign in to your John Snow Labs account, at https://my.johnsnowlabs.com/login.
Continue to next steps.
Explore one or more of the following resources on the John Snow Labs website: