Partioner是通过启动多个map 与Reduce来将文件中的数据进行分组, 在Mapper向Reducer输出之前
对输出进行分组并根据此次分组指定每组数据在那台机器上执行,将结果输出到不同文件。
以下为实现代码:
package com.itbuilder.mr; import java.io.IOException; import java.util.HashMap; import java.util.Map; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Partitioner; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import com.itbuilder.mr.bean.DataBean; /** * 手机流量计算 * @author mrh * */ public class GRSDataCount { public static void main(String[] args) throws Exception { Job job = Job.getInstance(new Configuration()); job.setJarByClass(GRSDataCount.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(DataBean.class); job.setMapperClass(DCMapper.class); FileInputFormat.setInputPaths(job, new Path(args[0])); job.setNumReduceTasks(Integer.parseInt(args[2])); job.setOutputKeyClass(Text.class); job.setOutputValueClass(DataBean.class); job.setReducerClass(DCRuducer.class); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setPartitionerClass(DCPartioner.class); job.waitForCompletion(true); } /** * * @author mrh * */ public static class DCMapper extends Mapper<LongWritable, Text, Text, DataBean> { @Override protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, DataBean>.Context context) throws IOException, InterruptedException { String datas[] = value.toString().split("\t"); DataBean dataBean = new DataBean(datas[1], Long.parseLong(datas[8]), Long.parseLong(datas[9])); context.write(new Text(dataBean.getTelNo()), dataBean); } } /** * Partitioner * @author mrh * */ public static class DCPartioner extends Partitioner<Text, DataBean> { private static Map<String, Integer> providerMap = new HashMap<String, Integer>(); static { providerMap.put("135", 1); providerMap.put("136", 1); providerMap.put("137", 1); providerMap.put("138", 1); providerMap.put("139", 1); providerMap.put("150", 2); providerMap.put("159", 2); providerMap.put("180", 3); providerMap.put("182", 3); } @Override public int getPartition(Text key, DataBean value, int numPartitions) { String code = key.toString(); Integer partion = providerMap.get(code.substring(0, 3)); if (partion == null) { return 0; } return partion.intValue(); } } /** * * @author mrh * */ public static class DCRuducer extends Reducer<Text, DataBean, Text, DataBean> { @Override protected void reduce(Text key, Iterable<DataBean> beans, Reducer<Text, DataBean, Text, DataBean>.Context context) throws IOException, InterruptedException { long upPayLoad = 0; long downPayLoad = 0; for (DataBean bean : beans) { upPayLoad += bean.getUpload(); downPayLoad += bean.getDownload(); } DataBean outBean = new DataBean(key.toString(), upPayLoad, downPayLoad); context.write(key, outBean); } } }
相关推荐
Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop 序列化】---- 代码 Hadoop 3.x(MapReduce)----【Hadoop ...
hadoop-annotations-3.1.1.jar hadoop-common-3.1.1.jar hadoop-mapreduce-client-core-3.1.1.jar hadoop-yarn-api-3.1.1.jar hadoop-auth-3.1.1.jar hadoop-hdfs-3.1.1.jar hadoop-mapreduce-client-hs-3.1.1.jar ...
hadoop2.6-common-bin 解决在Windows上操作hadoop出现 Could not locate executable问题
Eclipse集成Hadoop2.10.0的插件,使用`ant`对hadoop的jar包进行打包并适应Eclipse加载,所以参数里有hadoop和eclipse的目录. 必须注意对于不同的hadoop版本,` HADDOP_INSTALL_PATH/share/hadoop/common/lib`下的jar包...
赠送jar包:hadoop-mapreduce-client-jobclient-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-jobclient-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-jobclient-2.6.5-sources.jar; 赠送...
hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包 hadoop-eclipse-plugin-2.7.3和2.7.7的jar包
hadoop-eclipse-plugin-3.1.3,eclipse版本为eclipse-jee-2020-03
spark-3.2.0-bin-hadoop3-without-hive
spark-1.6.3-bin-hadoop2.4-without-hive.tgz 经测试,hadoop 2.8.2下可用。hive2.1.1 可用
赠送jar包:hadoop-mapreduce-client-common-2.6.5.jar; 赠送原API文档:hadoop-mapreduce-client-common-2.6.5-javadoc.jar; 赠送源代码:hadoop-mapreduce-client-common-2.6.5-sources.jar; 赠送Maven依赖信息...
赠送jar包:hadoop-yarn-client-2.6.5.jar; 赠送原API文档:hadoop-yarn-client-2.6.5-javadoc.jar; 赠送源代码:hadoop-yarn-client-2.6.5-sources.jar; 赠送Maven依赖信息文件:hadoop-yarn-client-2.6.5.pom;...
Hadoop权威指南----读书笔记
Hadoop 2.7.3 Windows64位 编译bin(包含winutils.exe, hadoop.dll),自己用的,把压缩包里的winutils.exe, hadoop.dll 放在你的bin 目录 在重启eclipse 就好了
hadoop-eclipse-plugin-3.1.1, hadoop eclipse 插件 3.1.1
hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1hadoop-eclipse-plugin-1.2.1
赠送jar包:hadoop-yarn-common-2.6.5.jar 赠送原API文档:hadoop-yarn-common-2.6.5-javadoc.jar 赠送源代码:hadoop-yarn-common-2.6.5-sources.jar 包含翻译后的API文档:hadoop-yarn-common-2.6.5-javadoc-...
根据2017大数据发展趋势,结合国内国外的大数据发展现状,以及政策纲要,浅层次的介绍了大数据发展的趋势,以及简单的大数据架构。...文档仅限学习所用,禁止仅限商业转播。 注:ppt模板取自互联网。
hadoop-mapreduce-examples-2.7.1.jar
hadoop-eclipse-plugin-2.7.4.jar和hadoop-eclipse-plugin-2.7.3.jar还有hadoop-eclipse-plugin-2.6.0.jar的插件都在这打包了,都可以用。
赠送jar包:hadoop-yarn-server-resourcemanager-2.6.0.jar; 赠送原API文档:hadoop-yarn-server-resourcemanager-2.6.0-javadoc.jar; 赠送源代码:hadoop-yarn-server-resourcemanager-2.6.0-sources.jar; 赠送...