Spark RDD foreach
Spark RDD foreach is used to apply a function for each element of an RDD. In this tutorial, we shall learn the usage of RDD.foreach() method with example Spark applications.
Syntax of RDD foreach
</>
Copy
public void foreach(scala.Function1<T,scala.runtime.BoxedUnit> f)
Argument could be a lambda function or use org.apache.spark.api.java.function VoidFunction functional interface as the assignment target for a lambda expression or method reference.
foreach method does not modify the contents of RDD.
Example – Spark RDD foreach
In this example, we will take an RDD with strings as elements. We shall use RDD.foreach() on this RDD, and for each item in the RDD, we shall print the item.
RDDforEach.java
</>
Copy
import java.util.Arrays;
import java.util.List;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
public class RDDforEach {
public static void main(String[] args) {
// configure spark
SparkConf sparkConf = new SparkConf().setAppName("Spark RDD foreach Example")
.setMaster("local[2]").set("spark.executor.memory","2g");
// start a spark context
JavaSparkContext sc = new JavaSparkContext(sparkConf);
// read list to RDD
List<String> data = Arrays.asList("Learn","Apache","Spark","with","Tutorial Kart");
JavaRDD<String> items = sc.parallelize(data,1);
// apply a function for each element of RDD
items.foreach(item -> {
System.out.println("* "+item);
});
}
}
Output
* Learn
* Apache
* Spark
* with
* Tutorial Kart
Conclusion
In this Spark Tutorial – RDD foreach, we have learnt to apply a function for each of the elements in RDD using RDD.foreach() method.