Spark Transformation算子->union

网友投稿 531 2022-11-21

Spark Transformation算子->union

Spark Transformation算子->union

合并两个数据集。两个数据集的类型要一致。 union 生成RDD的分区个数是父RDD的分区数之和

java实现

package transformations;import org.apache.spark.SparkConf;import org.apache.spark.api.java.JavaRDD;import org.apache.spark.api.java.JavaSparkContext;import java.util.Arrays;/** * @Author yqq * @Date 2021/12/09 17:10 * @Version 1.0 */public class UnionTest { public static void main(String[] args) { JavaSparkContext context = new JavaSparkContext( new SparkConf() .setAppName("union") .setMaster("local") ); context.setLogLevel("Error"); JavaRDD rdd = context.parallelize(Arrays.asList("a", "b", "c", "e", "f"),2); JavaRDD rdd1 = context.parallelize(Arrays.asList("a", "b", "f", "h", "g"),3); JavaRDD union = rdd.union(rdd1); System.out.println("rdd partition length = "+rdd.getNumPartitions()); System.out.println("rdd1 partition length = "+rdd1.getNumPartitions()); System.out.println("union partition length = "+union.getNumPartitions()); union.foreach(e-> System.out.print(e+"\t")); }}

package transformationimport org.apache.spark.rdd.RDDimport org.apache.spark.{SparkConf, SparkContext}/** * @Author yqq * @Date 2021/12/09 17:33 * @Version 1.0 */object UnionTest { def main(args: Array[String]): Unit = { val context = new SparkContext( new SparkConf() .setMaster("local") .setAppName("union") ) context.setLogLevel("Error") val rdd: RDD[String] = context.makeRDD(Array[String]("a", "b", "c", "d", "e")) val rdd1: RDD[String] = context.parallelize(Array[String]("a", "b", "f", "g", "h")) val value: RDD[String] = rdd.union(rdd1) value.foreach(print) }}

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:SpringDataES查询方式
下一篇:数据仓库分层设计思想
相关文章

 发表评论

暂时没有评论,来抢沙发吧~