Spark – Create RDD
To create RDD in Apache Spark, some of the possible ways are
- Create RDD from List<T> using Spark Parallelize.
 - Create RDD from Text file
 - Create RDD from JSON file
 
In this tutorial, we will go through examples, covering each of the above mentioned processes.
Example – Create RDD from List<T>
In this example, we will take a List of strings, and then create a Spark RDD from this list.
RDDfromList.java
</>
                        Copy
                        import java.util.Arrays;
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
public class RDDfromList {
	public static void main(String[] args) {
		// configure spark
		SparkConf sparkConf = new SparkConf().setAppName("Spark RDD foreach Example")
				.setMaster("local[2]").set("spark.executor.memory","2g");
		// start a spark context
		JavaSparkContext sc = new JavaSparkContext(sparkConf);
		// read list to RDD
		List<String> data = Arrays.asList("Learn","Apache","Spark","with","Tutorial Kart"); 
		JavaRDD<String> items = sc.parallelize(data,1);
		// apply a function for each element of RDD
		items.foreach(item -> {
			System.out.println("* "+item); 
		});
	}
}
Example – Create RDD from Text file
In this example, we have the data in text file and will create an RDD from this text file.
ReadTextToRDD.java
</>
                        Copy
                        import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
public class ReadTextToRDD {
	public static void main(String[] args) {
		// configure spark
		SparkConf sparkConf = new SparkConf().setAppName("Read Text to RDD")
										.setMaster("local[2]").set("spark.executor.memory","2g");
		// start a spark context
		JavaSparkContext sc = new JavaSparkContext(sparkConf);
		
		// provide path to input text file
		String path = "data/rdd/input/sample.txt";
		
		// read text file to RDD
		JavaRDD<String> lines = sc.textFile(path);
		
		// collect RDD for printing
		for(String line:lines.collect()){
			System.out.println(line);
		}
	}
}
Example – Create RDD from JSON file
In this example, we will create an RDD from JSON file.
JSONtoRDD.java
</>
                        Copy
                        import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
public class JSONtoRDD {
	public static void main(String[] args) {
		// configure spark
		SparkSession spark = SparkSession
				.builder()
				.appName("Spark Example - Read JSON to RDD")
				.master("local[2]")
				.getOrCreate();
		// read list to RDD
		String jsonPath = "data/employees.json";
		JavaRDD<Row> items = spark.read().json(jsonPath).toJavaRDD();
		items.foreach(item -> {
			System.out.println(item); 
		});
	}
}
Conclusion
In this Spark Tutorial, we have learnt to create Spark RDD from a List, reading a text or JSON file from file-system etc., with the help of example programs.
