轻量级前端框架助力开发者提升项目效率与性能
1106
2022-11-26
如何在maven项目里面编写mapreduce程序以及一个maven项目里面管理多个mapreduce程序
我们平时创建普通的mapreduce项目,在遍代码当你需要导包使用一些工具类的时候,
你需要自己找到对应的架包,再导进项目里面其实这样做非常不方便,我建议我们还是用maven项目来得方便多了
话不多说了,我们就开始吧
首先你在eclipse里把你本地安装的maven导进来
选择你本地安装的maven路径
勾选中你添加进来的maven
把本地安装的maven的setting文件添加进来
接下来创建一个maven项目
可以看到一个maven项目创建成功!!
现在我们来配置pom.xml文件,把mapreduce程序运行的一些架包通过maven导进来
这个是我的项目文件可以给大家作参考
下面我们来写一个经典例子wordcount代码来实验一下
如何新建一个类来写我就不说了,我直接把代码放上来
package com.gong.fusion.Alert;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount { public static class TokenizerMapper extends Mapper
我的eclipse是已经跟我的大数据集群HDFS连接的.
大家记得添加这个文件
我们运行一下这个代码
运行成功!!!!!
我们在hdfs上查看运行结果
这样们就实现了在maven 项目里面运行mapreduce程序了
接下来要讲的就是怎么管理多个mapreduce程序
我们新建一个MyDriver类用来管理多个mapreduce程序的类,和再创建另外一个mapreduce程序类wordmean
wordmean的内容跟wordcount是一样的,我就是把名字和输出路径改了一下!!!
当然在实际的开发中不会有这样的情况的,我是方便测试才这样做
package com.gong.fusion.Alert;import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.Reducer.Context;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import com.gong.fusion.Alert.WordCount.IntSumReducer;import com.gong.fusion.Alert.WordCount.TokenizerMapper;public class WordMean { public static class TokenizerMapper extends Mapper
package com.gong.fusion.Alert;import org.apache.hadoop.util.ProgramDriver;public class MyDriver { public static void main(String argv[]){ int exitCode = -1; ProgramDriver pgd = new ProgramDriver(); try { pgd.addClass("wordcount", WordCount.class, "A map/reduce program that counts the words in the input files."); pgd.addClass("wordmean", WordMean.class, "A map/reduce program that counts the average length of the words in the input files."); exitCode = pgd.run(argv); } catch(Throwable e){ e.printStackTrace(); } System.exit(exitCode); }}
现在就通过Mydriver这个类来同时管理两个mapreduce代码了
我们现在把程序通过maven打包放到大数据集群上面运行一下
在我们的电脑打开cmd窗口,切换到你的项目路径下,用mvn clean清除一下
然后我们通过命令mvn package对项目进行打包
打包成功!!!
一般都会打包在target目录下的
我们把这个包上传到我们的大数据集群上面去,怎么上传我就不多说了,用工具上传,或者用rz命令上传就可以了
我们在集群上运行一下
我们直接在代码包后面加上其中一个mapreduce类的别名就可以了,这个别名在Mydiver类里面定义的
可以看到我们对两个不同的mapreduce都起了不同的别名
下面我们看看运行的结果
[hadoop@cdh-master hadoop]$ hadoop jar Alert-0.0.1-SNAPSHOT.jar wordcount 18/08/10 20:07:14 INFO client.RMProxy: Connecting to ResourceManager at cdh-master/192.168.211.13:803218/08/10 20:07:18 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.18/08/10 20:08:02 INFO input.FileInputFormat: Total input paths to process : 118/08/10 20:08:03 INFO mapreduce.JobSubmitter: number of splits:118/08/10 20:08:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533902197727_000118/08/10 20:08:07 INFO impl.YarnClientImpl: Submitted application application_1533902197727_000118/08/10 20:08:08 INFO mapreduce.Job: The url to track the job: 20:08:08 INFO mapreduce.Job: Running job: job_1533902197727_000118/08/10 20:09:16 INFO mapreduce.Job: Job job_1533902197727_0001 running in uber mode : false18/08/10 20:09:16 INFO mapreduce.Job: map 0% reduce 0%18/08/10 20:11:28 INFO mapreduce.Job: map 100% reduce 0%18/08/10 20:11:52 INFO mapreduce.Job: map 100% reduce 100%18/08/10 20:11:54 INFO mapreduce.Job: Job job_1533902197727_0001 completed successfully18/08/10 20:11:54 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=136 FILE: Number of bytes written=218031 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=204 HDFS: Number of bytes written=87 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=118978 Total time spent by all reduces in occupied slots (ms)=20993 Total time spent by all map tasks (ms)=118978 Total time spent by all reduce tasks (ms)=20993 Total vcore-seconds taken by all map tasks=118978 Total vcore-seconds taken by all reduce tasks=20993 Total megabyte-seconds taken by all map tasks=121833472 Total megabyte-seconds taken by all reduce tasks=21496832 Map-Reduce Framework Map input records=7 Map output records=18 Map output bytes=163 Map output materialized bytes=132 Input split bytes=110 Combine input records=18 Combine output records=12 Reduce input groups=12 Reduce shuffle bytes=132 Reduce input records=12 Reduce output records=12 Spilled Records=24 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=852 CPU time spent (ms)=37740 Physical memory (bytes) snapshot=316510208 Virtual memory (bytes) snapshot=3017236480 Total committed heap usage (bytes)=136122368 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=94 File Output Format Counters Bytes Written=87
我们运行一下另外一个mapreduce程序
[hadoop@cdh-master hadoop]$ hadoop jar Alert-0.0.1-SNAPSHOT.jar wordmean 18/08/10 20:13:22 INFO client.RMProxy: Connecting to ResourceManager at cdh-master/192.168.211.13:803218/08/10 20:13:24 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.18/08/10 20:13:33 INFO input.FileInputFormat: Total input paths to process : 118/08/10 20:13:33 INFO mapreduce.JobSubmitter: number of splits:118/08/10 20:13:34 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1533902197727_000218/08/10 20:13:35 INFO impl.YarnClientImpl: Submitted application application_1533902197727_000218/08/10 20:13:35 INFO mapreduce.Job: The url to track the job: 20:13:35 INFO mapreduce.Job: Running job: job_1533902197727_000218/08/10 20:15:22 INFO mapreduce.Job: Job job_1533902197727_0002 running in uber mode : false18/08/10 20:15:22 INFO mapreduce.Job: map 0% reduce 0%18/08/10 20:16:30 INFO mapreduce.Job: map 100% reduce 0%18/08/10 20:16:56 INFO mapreduce.Job: map 100% reduce 100%18/08/10 20:16:57 INFO mapreduce.Job: Job job_1533902197727_0002 completed successfully18/08/10 20:16:58 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=136 FILE: Number of bytes written=218025 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=204 HDFS: Number of bytes written=87 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=65084 Total time spent by all reduces in occupied slots (ms)=23726 Total time spent by all map tasks (ms)=65084 Total time spent by all reduce tasks (ms)=23726 Total vcore-seconds taken by all map tasks=65084 Total vcore-seconds taken by all reduce tasks=23726 Total megabyte-seconds taken by all map tasks=66646016 Total megabyte-seconds taken by all reduce tasks=24295424 Map-Reduce Framework Map input records=7 Map output records=18 Map output bytes=163 Map output materialized bytes=132 Input split bytes=110 Combine input records=18 Combine output records=12 Reduce input groups=12 Reduce shuffle bytes=132 Reduce input records=12 Reduce output records=12 Spilled Records=24 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=493 CPU time spent (ms)=8170 Physical memory (bytes) snapshot=312655872 Virtual memory (bytes) snapshot=3007705088 Total committed heap usage (bytes)=150081536 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=94 File Output Format Counters Bytes Written=87[hadoop@cdh-master hadoop]$
可以看到两个不同的输出路径上,是两个程序分别运行的结果
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~