$30
CSE 512 Assignment 5 (Delta Lake)
This assignment requires you to setup sbt project and fill up five sql queries.Steps for
the assignment are as follows:
1. Download the provided Zip from canvas. You have to set up your intelliJ IDE with
it. This project includes the setup for Apache Sedona and Delta Lake. It should
look like as follows (image for reference)
2. Open the Scala file and write following five queries in given order
2.1. Read the given testpoint.csv file in csv format and write in delta format and
save named firstpointdata. I.e firstPointQuery()
2.2. Read the firstpointdata in delta format. Print the total count of the points.
I.e secondPointQuery()
2.3. Read the given testenvelope.csv in csv format and write in delta format
and save it named firstpolydata. I.e. firstPloygonQuery()
2.4. Read the firstpolydata in delta format. Print the total count of the polygon.
secondPolygonQuery()
2.5. Read the firstpointdata in delta format and find the total count for point
pairs where distance between the points within a pair is less than 2.
JoinQuery()
3. Hints: Go through the given links for reading and writing. You can read the given
files in a dataframe and make temporary tables and can use sparkSession.sql(“”)
to write sql statements.
You have to submit only the AssignmentFive.scala file to canvas with complete code.