Spark RDD foreach
Spark RDD foreach is used to apply a function for each element of an RDD. In this tutorial, we shall learn the usage of RDD.foreach() method with example Spark applications.
Syntax of RDD foreach
public void foreach(scala.Function1<T,scala.runtime.BoxedUnit> f)
Argument could be a lambda function or use org.apache.spark.api.java.function VoidFunction functional interface as the assignment target for a lambda expression or method reference.
foreach method does not modify the contents of RDD.
ADVERTISEMENT
Example – Spark RDD foreach
In this example, we will take an RDD with strings as elements. We shall use RDD.foreach() on this RDD, and for each item in the RDD, we shall print the item.
RDDforEach.java
import java.util.Arrays; import java.util.List; import org.apache.spark.SparkConf; import org.apache.spark.api.java.JavaRDD; import org.apache.spark.api.java.JavaSparkContext; public class RDDforEach { public static void main(String[] args) { // configure spark SparkConf sparkConf = new SparkConf().setAppName("Spark RDD foreach Example") .setMaster("local[2]").set("spark.executor.memory","2g"); // start a spark context JavaSparkContext sc = new JavaSparkContext(sparkConf); // read list to RDD List<String> data = Arrays.asList("Learn","Apache","Spark","with","Tutorial Kart"); JavaRDD<String> items = sc.parallelize(data,1); // apply a function for each element of RDD items.foreach(item -> { System.out.println("* "+item); }); } }
Output
* Learn * Apache * Spark * with * Tutorial Kart
Conclusion
In this Spark Tutorial – RDD foreach, we have learnt to apply a function for each of the elements in RDD using RDD.foreach() method.