DataFrame to nested JSON example(Scala)
Loading...

Convert a flattened Dataframe to nested JSON by nesting a case class within another case class.

Case classes are defined based on your JSON schema requirement. The output schema we are expecting per row is:

"""{"emp_details":{"id":{"id":"64989"},"name":"ADELYN","position":"SALESMAN","depId":{"dep_id":"3001"}},"incrementDate":"2000-02-20","commission":"400.00","country":"France","hireDate":{"hire_date":"1991-02-20"},"reports_to":{"reporting":[{"manager_id":"66928"}]}}"""

Step 1 - Define your custom nested schema using case classes.

case class empId(id:String)
case class depId(dep_id:String)
case class details(id:empId,name:String,position:String,depId:depId)
case class code(manager_id:String) 
case class reporting(reporting:Array[code])
case class hireDate(hire_date:String)
case class emp_record(emp_details:details,incrementDate:String,commission:String,country:String,hireDate:hireDate,reports_to:reporting)

//Simulating a flattended DF 
val DF = Seq(("64989","ADELYN", "SALESMAN","66928","1991-02-20","1700.00","400.00","3001","2000-02-20","France"),("64999","Raj", "SALESMAN","66928","1991-02-20","1700.00","400.00","3001","2000-02-20","Ind")).toDF("emp_id","emp_name","job_name","manager_id","hire_date","salary","commission","dep_id","increment_date","country")
display(DF)

Step 2 - Convert the flattented DF to a nested structure using map to pass every row object to a case class.

import spark.implicits._
val nestedDF= DF.map(r=>{
val empID_1= empId(r.getString(0))
val depId_1 = depId(r.getString(7))
val details_1=details(empID_1,r.getString(1),r.getString(2),depId_1)
val code_1=code(r.getString(3)) 
val reporting_1 = reporting(Array(code_1))
val hireDate_1 = hireDate(r.getString(4))
emp_record(details_1,r.getString(8),r.getString(6),r.getString(9),hireDate_1,reporting_1)

}
)

display(nestedDF)
nestedDF.repartition(1).write.mode("overWrite").option("multiLine","true").json("dbfs:/tmp/test/json1/")

Identify the JSON file name.

%fs ls dbfs:/tmp/test/json1/

Enter the name of the JSON output file in the next command and re-run the cell to ensure the data is correctly nested.

%fs head dbfs:/tmp/test/json1/<name of file identified in previous command>.json

Example JSON output should match the following:

{"emp_details":{"id":{"id":"64989"},"name":"ADELYN","position":"SALESMAN","depId":{"dep_id":"3001"}},"incrementDate":"2000-02-20","commission":"400.00","country":"France","hireDate":{"hire_date":"1991-02-20"},"reports_to":{"reporting":[{"manager_id":"66928"}]}} {"emp_details":{"id":{"id":"64999"},"name":"Raj","position":"SALESMAN","depId":{"dep_id":"3001"}},"incrementDate":"2000-02-20","commission":"400.00","country":"Ind","hireDate":{"hire_date":"1991-02-20"},"reports_to":{"reporting":[{"manager_id":"66928"}]}}