Spark – Write Dataset to JSON file

Dataset class provides an interface for saving the content of the non-streaming Dataset out into external storage. JSON is one of the many formats it provides. In this tutorial, we shall learn to write Dataset to a JSON file.

Steps to Write Dataset to JSON file in Spark

To write Spark Dataset to JSON file

  1. Apply write method to the Dataset. Write method offers many data formats to be written to.
    Dataset.write()
  2. Use json and provide the path to the folder where JSON file has to be created with data from Dataset.
    Dataset.write().json(pathToJSONout)

Example – Spark – Write Dataset to JSON file

In the following Java Example, we shall read some data to a Dataset and write the Dataset to JSON file in the folder specified by the path.

WriteDataSetToJSON.java

</>
Copy
import java.io.Serializable;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;

public class WriteDataSetToJSON {
	public static class Employee implements Serializable{
		public String name;
		public int salary;
	}

	public static void main(String[] args) {
		// configure spark
		SparkSession spark = SparkSession
				.builder()
				.appName("Spark Example - Write Dataset to JSON File")
				.master("local[2]")
				.getOrCreate();

		Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class);
		String jsonPath = "data/employees.json";
		Dataset<Employee> ds = spark.read().json(jsonPath).as(employeeEncoder);
		
		// write dataset to JSON file
		ds.write().json("data/out_employees/");
	}
}

Output

A folder /out_employees/ is created with a JSON file and status if SUCCESS or FAILURE.

Spark Write Dataset to JSON file

Conclusion

In this Spark TutorialWrite Dataset to JSON file, we have learnt to use write() method of Dataset class and export the data to a JSON file using json() method.