Spark – Read JSON file to RDD

JSON has become one of the most common data format that is being exchanged between nodes in internet and applications.

In this tutorial, we shall learn how to read JSON file to an RDD with the help of SparkSession, DataFrameReader and DataSet<Row>.toJavaRDD().

Steps to Read JSON file to Spark RDD

To read JSON file Spark RDD,

  1. Create a SparkSession.
    SparkSession spark = SparkSession
    		.builder()
    		.appName("Spark Example - Write Dataset to JSON File")
    		.master("local[2]")
    		.getOrCreate();
  2. Get DataFrameReader of the SparkSession.spark.read()
  3. Use DataFrameReader.json(String jsonFilePath) to read the contents of JSON to Dataset<Row>.spark.read().json(jsonPath)
  4. Use Dataset<Row>.toJavaRDD() to convert Dataset<Row> to JavaRDD<Row>.spark.read().json(jsonPath).toJavaRDD()
ADVERTISEMENT

Example : Spark – Read JSON file to RDD

Following is a Java Program to read JSON file to Spark RDD and print the contents of it.

employees.json

{"name":"Michael", "salary":3000}
{"name":"Andy", "salary":4500}
{"name":"Justin", "salary":3500}
{"name":"Berta", "salary":4000}
{"name":"Raju", "salary":3000}

JSONtoRDD.java

import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class JSONtoRDD {
	public static void main(String[] args) {
		// configure spark
		SparkSession spark = SparkSession
				.builder()
				.appName("Spark Example - Read JSON to RDD")
				.master("local[2]")
				.getOrCreate();

		// read list to RDD
		String jsonPath = "data/employees.json";
		JavaRDD<Row> items = spark.read().json(jsonPath).toJavaRDD();

		items.foreach(item -> {
			System.out.println(item); 
		});
	}
}

Output

[Michael,3000]
[Andy,4500]
[Justin,3500]
[Berta,4000]
[Raju,3000]

Conclusion

In this Spark Tutorial, we have learnt to read JSON file to Spark RDD with the help of an example Java program.