Spark – Read JSON file to RDD
JSON has become one of the most common data format that is being exchanged between nodes in internet and applications.
In this tutorial, we shall learn how to read JSON file to an RDD with the help of SparkSession, DataFrameReader and DataSet<Row>.toJavaRDD().
Steps to Read JSON file to Spark RDD
To read JSON file Spark RDD,
- Create a SparkSession.
SparkSession spark = SparkSession .builder() .appName("Spark Example - Write Dataset to JSON File") .master("local[2]") .getOrCreate();
- Get DataFrameReader of the SparkSession.
spark.read() - Use DataFrameReader.json(String jsonFilePath) to read the contents of JSON to Dataset<Row>.
spark.read().json(jsonPath) - Use Dataset<Row>.toJavaRDD() to convert Dataset<Row> to JavaRDD<Row>.
spark.read().json(jsonPath).toJavaRDD()
Example : Spark – Read JSON file to RDD
Following is a Java Program to read JSON file to Spark RDD and print the contents of it.
employees.json
{"name":"Michael", "salary":3000}
{"name":"Andy", "salary":4500}
{"name":"Justin", "salary":3500}
{"name":"Berta", "salary":4000}
{"name":"Raju", "salary":3000}
JSONtoRDD.java
</>
Copy
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class JSONtoRDD {
public static void main(String[] args) {
// configure spark
SparkSession spark = SparkSession
.builder()
.appName("Spark Example - Read JSON to RDD")
.master("local[2]")
.getOrCreate();
// read list to RDD
String jsonPath = "data/employees.json";
JavaRDD<Row> items = spark.read().json(jsonPath).toJavaRDD();
items.foreach(item -> {
System.out.println(item);
});
}
}
Output
[Michael,3000]
[Andy,4500]
[Justin,3500]
[Berta,4000]
[Raju,3000]
Conclusion
In this Spark Tutorial, we have learnt to read JSON file to Spark RDD with the help of an example Java program.