Spark – Write Dataset to JSON file
Dataset class provides an interface for saving the content of the non-streaming Dataset out into external storage. JSON is one of the many formats it provides. In this tutorial, we shall learn to write Dataset to a JSON file.
Steps to Write Dataset to JSON file in Spark
To write Spark Dataset to JSON file
- Apply write method to the Dataset. Write method offers many data formats to be written to.
Dataset.write() - Use json and provide the path to the folder where JSON file has to be created with data from Dataset.
Dataset.write().json(pathToJSONout)
Example – Spark – Write Dataset to JSON file
In the following Java Example, we shall read some data to a Dataset and write the Dataset to JSON file in the folder specified by the path.
WriteDataSetToJSON.java
</>
Copy
import java.io.Serializable;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Encoder;
import org.apache.spark.sql.Encoders;
import org.apache.spark.sql.SparkSession;
public class WriteDataSetToJSON {
public static class Employee implements Serializable{
public String name;
public int salary;
}
public static void main(String[] args) {
// configure spark
SparkSession spark = SparkSession
.builder()
.appName("Spark Example - Write Dataset to JSON File")
.master("local[2]")
.getOrCreate();
Encoder<Employee> employeeEncoder = Encoders.bean(Employee.class);
String jsonPath = "data/employees.json";
Dataset<Employee> ds = spark.read().json(jsonPath).as(employeeEncoder);
// write dataset to JSON file
ds.write().json("data/out_employees/");
}
}
Output
A folder /out_employees/ is created with a JSON file and status if SUCCESS or FAILURE.
Conclusion
In this Spark Tutorial – Write Dataset to JSON file, we have learnt to use write() method of Dataset class and export the data to a JSON file using json() method.