sqlΒΆ

The sql function allows you to execute SQL queries on Spark SQL tables, and returns results as a SparkDataFrame.

Syntax:

  • sql(sqlQuery)

Parameters:

  • sqlQuery: String, SQL query

Output:

  • SparkDataFrame

Guide <http://spark.apache.org/docs/latest/sparkr.html>__ Approach 1: Perform SQL queries on Spark SQL tables.

-- Create Spark SQL table from CSV file
-- mode "FAILFAST" will abort file parsing with a RuntimeException if any malformed lines are encountered

CREATE TEMPORARY TABLE temp_diamonds
USING com.databricks.spark.csv
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true", mode "FAILFAST")
require(SparkR)

# Create SparkDataFrame using sql() and SQL queries
diamondsDF <- sql("SELECT carat, cut, price FROM temp_diamonds")
head(diamondsDF)

Approach 2: Perform SQL queries on your SparkDataFrames. This is possible when used in conjunction with registerTempTable.

# Create SparkDataFrame from local R data frame
df <- createDataFrame(mtcars)
head(df)
# Register df as temporary table
# Note: temporary tables do not persist across clusters and cluster restarts
registerTempTable(df, "tempDF")

# Create new SparkDataFrame using sql() and SQL queries
# Select rows where df$cyl == 6 and return first 4 columns
query <- sql("SELECT mpg, cyl, disp, hp FROM tempDF where cyl = 6")
head(query)
# View created tables
# column isTemporary indicates if table is temporary or not
head(sql("SHOW tables"))