This tutorial demonstrates how to quickly get started with the Databricks extension for Visual Studio Code by running a basic Python code file on a Databricks cluster in your remote workspace.
The Databricks extension for Visual Studio Code enables you to connect to your remote Databricks workspaces from the Visual Studio Code integrated development environment (IDE) running on your local development machine. Through these connections, you can:
Synchronize local code that you develop in Visual Studio Code with code in your remote workspaces.
Run local Python code files from Visual Studio Code on Databricks clusters in your remote workspaces.
Run local Python code files (
.py) and Python, R, Scala, and SQL notebooks (
.sql) from Visual Studio Code as automated Databricks jobs in your remote workspaces.
The Databricks extension for Visual Studio Code supports running R, Scala, and SQL notebooks as automated jobs but does not provide any deeper support for these languages within Visual Studio Code.
This following hands-on tutorial assumes:
Visual Studio Code is already running and has a local project opened.
You have already generated a Databricks personal access token for your target Databricks workspace. See Databricks personal access token authentication.
You have already added your Databricks personal access token as a
tokenfield along with your workspace instance URL, for example
https://dbc-a1b2345c-d6e7.cloud.databricks.com, as a
hostfield to the
DEFAULTconfiguration profile in your local
.databrickscfgfile. See Databricks configuration profiles.
Follow these steps:
Install the extension: on the Databricks extension for Visual Studio Code page in the Visual Studio Code Marketplace, click Install. To complete the installation, follow the on-screen instructions.
Open the extension: On the sidebar, click the Databricks logo.
Start configuring the extension: In the Configuration pane, click Configure Databricks.
Set the Databricks workspace: In the Command Palette, for Databricks Host, enter your workspace instance URL, for example
https://dbc-a1b2345c-d6e7.cloud.databricks.com. Then press Enter.
Click the entry DEFAULT: Authenticate using the DEFAULT profile.
Set the Databricks cluster: In the Configuration pane, click Cluster, and then click the gear (Configure cluster) icon.
Click the entry for the cluster that you want to use.
Start the cluster, if it is not already started: In the Configuration pane, next to Cluster, click the play (Start Cluster) icon.
Set the sync destination: In the Configuration pane, click Sync Destination, and then click the gear (Configure cluster) icon.
In the Command Palette, click the sync destination name that is randomly generated by the extension.
Create a basic, local Python code file to sync and run: On the sidebar, click the Explorer logo.
On the main menu, click File > New File. Name the file demo.py and save it to the project root.
Add the following code to the file and then save it. This code creates and displays the contents of a basic PySpark DataFrame:
from pyspark.sql import SparkSession from pyspark.sql.types import * spark = SparkSession.builder.getOrCreate() schema = StructType([ StructField('CustomerID', IntegerType(), False), StructField('FirstName', StringType(), False), StructField('LastName', StringType(), False) ]) data = [ [ 1000, 'Mathijs', 'Oosterhout-Rijntjes' ], [ 1001, 'Joost', 'van Brunswijk' ], [ 1002, 'Stan', 'Bokenkamp' ] ] customers = spark.createDataFrame(data, schema) customers.show() # Output: # # +----------+---------+-------------------+ # |CustomerID|FirstName| LastName| # +----------+---------+-------------------+ # | 1000| Mathijs|Oosterhout-Rijntjes| # | 1001| Joost| van Brunswijk| # | 1002| Stan| Bokenkamp| # +----------+---------+-------------------+
In the Configuration pane, next to Sync Destination, click the circled arrows (Start synchronization) icon.
In the Explorer view, right-click the
demo.pyfile, and then click Upload and Run File on Databricks. The output appears in the Debug Console pane.
Now that you have successfully used the Databricks extension for Visual Studio Code to run a basic Python file, learn more about how to use the extension:
Learn about additional ways to set up authentication for the extension, beyond Databricks personal access token authentication. See Authentication setup for the Databricks extension for Visual Studio Code.
Learn how to enable PySpark and Databricks Utilities code completion, run or debug Python code with Databricks Connect, run a file or a notebook as a Databricks job, run tests with
pytest, use environment variable definitions files, create custom run configurations, and more. See Development tasks for the Databricks extension for Visual Studio Code.