当前位置：首页 > news >正文

MapReduce打包运行

news 2025/5/14 5:39:08

1. 编写 MapReduce 程序

首先需要编写 MapReduce 程序，通常包含 Mapper、Reducer 和 Driver 类。例如，一个简单的 WordCount 程序：

java

import java.io.IOException;
import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class WordCount {public static class TokenizerMapperextends Mapper<Object, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);private Text word = new Text();public void map(Object key, Text value, Context context) throws IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());while (itr.hasMoreTokens()) {word.set(itr.nextToken());context.write(word, one);}}}public static class IntSumReducerextends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();public void reduce(Text key, Iterable<IntWritable> values,Context context) throws IOException, InterruptedException {int sum = 0;for (IntWritable val : values) {sum += val.get();}result.set(sum);context.write(key, result);}}public static void main(String[] args) throws Exception {Configuration conf = new Configuration();Job job = Job.getInstance(conf, "word count");job.setJarByClass(WordCount.class);job.setMapperClass(TokenizerMapper.class);job.setCombinerClass(IntSumReducer.class);job.setReducerClass(IntSumReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);FileInputFormat.addInputPath(job, new Path(args[0]));FileOutputFormat.setOutputPath(job, new Path(args[1]));System.exit(job.waitForCompletion(true) ? 0 : 1);}
}

2. 创建 Maven 项目（推荐）

使用 Maven 管理依赖和打包，pom.xml示例：

xml

<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>com.example</groupId><artifactId>mapreduce-example</artifactId><version>1.0-SNAPSHOT</version><properties><hadoop.version>3.3.6</hadoop.version><maven.compiler.source>8</maven.compiler.source><maven.compiler.target>8</maven.compiler.target></properties><dependencies><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>${hadoop.version}</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-common</artifactId><version>${hadoop.version}</version></dependency><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-mapreduce-client-core</artifactId><version>${hadoop.version}</version></dependency></dependencies><build><plugins><plugin><groupId>org.apache.maven.plugins</groupId><artifactId>maven-shade-plugin</artifactId><version>3.4.1</version><executions><execution><phase>package</phase><goals><goal>shade</goal></goals><configuration><transformers><transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"><mainClass>WordCount</mainClass></transformer></transformers></configuration></execution></executions></plugin></plugins></build>
</project>

3. 打包项目

使用 Maven 命令打包：

bash

mvn clean package

这将生成一个包含所有依赖的 JAR 文件（通常位于target/mapreduce-example-1.0-SNAPSHOT.jar）。

4. 上传输入数据到 HDFS

假设输入文件为input.txt，上传到 HDFS：

bash

hdfs dfs -mkdir -p /user/hadoop/input
hdfs dfs -put input.txt /user/hadoop/input/

5. 运行 MapReduce 作业

使用hadoop jar命令提交作业：

bash

hadoop jar target/mapreduce-example-1.0-SNAPSHOT.jar WordCount /user/hadoop/input /user/hadoop/output

参数说明：
- target/mapreduce-example-1.0-SNAPSHOT.jar：打包后的 JAR 文件路径。
- WordCount：主类名（包含main方法的类）。
- /user/hadoop/input：HDFS 输入路径。
- /user/hadoop/output：HDFS 输出路径（需不存在，系统会自动创建）。

6. 查看结果

bash

hdfs dfs -cat /user/hadoop/output/part-r-00000

查看全文

http://www.xdnf.cn/news/418699.html

基于大模型预测胸椎管狭窄诊疗全流程的研究报告

基于开源AI大模型AI智能名片S2B2C商城小程序的零售结算技术创新研究——以京东AI与香港冯氏零售集团智能结算台为例

深入理解 JVM：StackOverFlow、OOM 与 GC overhead limit exceeded 的本质剖析及 Stack 与 Heap 的差异

逆强化学习IRL在医疗行为模式研究中的应用

Three.js模型材质调整与性能优化实战

JPG与PDF格式转换器

【论文阅读】Dip-based Deep Embedded Clustering with k-Estimation

如何优化MCU中断响应时间

【Ubuntu】neovim Lazyvim安装与卸载

coze平台实现文生视频和图生视频（阿里云版）工作流

OpenCV进阶操作：风格迁移以及DNN模块解析

【计算机视觉】OpenCV实战项目：基于OpenCV的车牌识别系统深度解析

Kafka、RabbitMQ、RocketMQ的区别

加速AI在k8s上使用GPU卡

WPS一旦打开，就会修改默认打开方式，怎么解？

【OpenCV】网络模型推理的简单流程分析（readNetFromONNX、setInput和forward等）

React+Webpack 脚手架、前端组件库搭建

Ansys 计算刚柔耦合矩阵系数

Linux之初见进程

使用光标测量，使用 TDR 测量 pH 和 fF

day 24

智能手表整机装配作业指导书（SOP）

Vue.js---分支切换与cleanup

第六章 GPIO输入——按键检测

工业4G路由器IR5000公交站台物联网应用解决方案

游戏引擎学习第275天:将旋转和剪切传递给渲染器

【Linux】简单设计libc库

Spring Boot之Web服务器的启动流程分析

Antd中Form详解:

Mapreduce初使用

1. 编写 MapReduce 程序

2. 创建 Maven 项目（推荐）

3. 打包项目

4. 上传输入数据到 HDFS

5. 运行 MapReduce 作业

6. 查看结果

相关文章：