charts-and-graphs-python

from pyspark.sql import Row array = map(lambda x: Row(key="k_%04d" % x, value = x), range(1, 5001))largeDataFrame = spark.createDataFrame(sc.parallelize(array))largeDataFrame.registerTempTable("largeTable")display(spark.sql("select * from largeTable"))

Configure tables with Plot Options....

The Keys section is for specifying the control variable which is typically displayed as the X-Axis on many of the graph types. Most graphs can plot about 1000 values for the keys, but again - it varies for different graphs.
The Values section is for specifying the observed variable and is typically displayed on the Y-Axis. This also tends to be an observed numerical value on most graph types.
The Series groupings section is for specifying ways to break out the data - for a bar graph - each series grouping has a different color for the bars with a legend to denote that value of each series grouping. Many of the graph types can only handle series groupings that has 10 or less unique values.

Some graph types also allow specifying even more options - and those will be discussed as applicable.

A Pivot table is another way to view data in a table format.

Instead of just returning the raw results of the table - it can automatically sort, count total or give the average of the data stored in the table.

Read more about Pivot Tables here: http://en.wikipedia.org/wiki/Pivot_table
For a Pivot Table, key, series grouping and value fields can be specified.
The Key is the first column, and there will be one row per key in the Pivot Table.
There will be additional column for each unique value for the Series Grouping.
The table will contain the Values field in the cells. Value must be a numerical field that can be combined using aggregation functions.
Cell in the Pivot Table are calculated from multiple rows of the original table.
- Select SUM, AVG, MIN, MAX, or COUNT as the way to combine the original rows into that cell.
Pivoting is done on the server side of Databricks Cloud to calculate the cell values.

# Click on the Plot Options Button...to see how this pivot table was configured.from pyspark.sql import Row largePivotSeries = map(lambda x: Row(key="k_%03d" % (x % 200), series_grouping = "group_%d" % (x % 3), value = x), range(1, 5001))largePivotDataFrame = spark.createDataFrame(sc.parallelize(largePivotSeries))largePivotDataFrame.registerTempTable("table_to_be_pivoted")display(spark.sql("select * from table_to_be_pivoted"))

%sql select key, series_grouping, sum(value) from table_to_be_pivoted group by key, series_grouping order by key, series_grouping

from pyspark.sql import RowsalesEntryDataFrame = spark.createDataFrame(sc.parallelize([  Row(category="fruits_and_vegetables", product="apples", year=2012, salesAmount=100.50),  Row(category="fruits_and_vegetables", product="oranges", year=2012, salesAmount=100.75),  Row(category="fruits_and_vegetables", product="apples", year=2013, salesAmount=200.25),  Row(category="fruits_and_vegetables", product="oranges", year=2013, salesAmount=300.65),  Row(category="fruits_and_vegetables", product="apples", year=2014, salesAmount=300.65),  Row(category="fruits_and_vegetables", product="oranges", year=2015, salesAmount=100.35),  Row(category="butcher_shop", product="beef", year=2012, salesAmount=200.50),  Row(category="butcher_shop", product="chicken", year=2012, salesAmount=200.75),  Row(category="butcher_shop", product="pork", year=2013, salesAmount=400.25),  Row(category="butcher_shop", product="beef", year=2013, salesAmount=600.65),  Row(category="butcher_shop", product="beef", year=2014, salesAmount=600.65),  Row(category="butcher_shop", product="chicken", year=2015, salesAmount=200.35),  Row(category="misc", product="gum", year=2012, salesAmount=400.50),  Row(category="misc", product="cleaning_supplies", year=2012, salesAmount=400.75),  Row(category="misc", product="greeting_cards", year=2013, salesAmount=800.25),  Row(category="misc", product="kitchen_utensils", year=2013, salesAmount=1200.65),  Row(category="misc", product="cleaning_supplies", year=2014, salesAmount=1200.65),  Row(category="misc", product="cleaning_supplies", year=2015, salesAmount=400.35)]))salesEntryDataFrame.registerTempTable("test_sales_table")display(spark.sql("select * from test_sales_table"))

%sql select cast(string(year) as date) as year, category, salesAmount from test_sales_table

A Pie Chart is pivot table graph type that can allow you to see what percentage of the whole your values represent.

NOTE: As opposed to the previous examples, Key & Series Groupings have been switched.
Plot Options... was used to configure the graph below.
The Key is Category and one color is used for each product.
The Series groupings is Year and there is different pie chart for each year.
The Values is salesAmount and is used to calculate the percentage of the pie.
Sum is selected as the aggregation method.

%sql select * from test_sales_table

A Map Graph is a way to visualize your data on a map.

Plot Options... was used to configure the graph below.
Keys should contain the field with the location.
Series groupings is always ignored for World Map graphs.
Values should contain exactly one field with a numerical value.
Since there can multiple rows with the same location key, choose "Sum", "Avg", "Min", "Max", "COUNT" as the way to combine the values for a single key.
Different values are denoted by color on the map, and ranges are always spaced evenly.

Tip: Apply a smoothing function to your graph if your values are not evenly distributed.

from pyspark.sql import RowstateRDD = spark.createDataFrame(sc.parallelize([  Row(state="MO", value=1), Row(state="MO", value=10),  Row(state="NH", value=4),  Row(state="MA", value=8),  Row(state="NY", value=4),  Row(state="CA", value=7)]))stateRDD.registerTempTable("test_state_table")display(spark.sql("Select * from test_state_table"))

from pyspark.sql import RowworldRDD = spark.createDataFrame(sc.parallelize([  Row(country="USA", value=1000),  Row(country="JPN", value=23),  Row(country="GBR", value=23),  Row(country="FRA", value=21),  Row(country="TUR", value=3)]))display(worldRDD)

charts-and-graphs-python(Python)

Chart and Graph Types with Python

A Table View is the most basic way to view data.

Configure tables with Plot Options....

A Pivot table is another way to view data in a table format.

A Bar Chart is a type of visual pivot table graph and a great basic way to visualize data.

A Line Graph is another example of a pivot table graph that can highlight trends for your data set.

A Pie Chart is pivot table graph type that can allow you to see what percentage of the whole your values represent.

A Map Graph is a way to visualize your data on a map.