# File location and type
file_location = "/FileStore/tables/gentle_introduction_to_apache_spark-1.html"
file_type = "html"
# CSV options
infer_schema = "false"
first_row_is_header = "false"
delimiter = ","
# The applied options are for CSV files. For other file types, these will be ignored.
df = spark.read.format(file_type) \
.option("inferSchema", infer_schema) \
.option("header", first_row_is_header) \
.option("sep", delimiter) \
.load(file_location)
display(df)
# With this registered as a temp view, it will only be available to this particular notebook. If you'd like other users to be able to query this table, you can also create a table from the DataFrame.
# Once saved, this table will persist across cluster restarts as well as allow various users across different notebooks to query this data.
# To do so, choose your table name and uncomment the bottom line.
permanent_table_name = "gentle_introduction_to_apache_spark-1_html"
# df.write.format("parquet").saveAsTable(permanent_table_name)
Overview
This notebook will show you how to create and query a table or DataFrame that you uploaded to DBFS. DBFS is a Databricks File System that allows you to store data for querying inside of Databricks. This notebook assumes that you have a file already inside of DBFS that you would like to read from.
This notebook is written in Python so the default cell type is Python. However, you can use different languages by using the
%LANGUAGE
syntax. Python, Scala, SQL, and R are all supported.