MapReduce打包运行

1. 编写 MapReduce 程序

首先需要编写 MapReduce 程序,通常包含 Mapper、Reducer 和 Driver 类。例如,一个简单的 WordCount 程序:

java

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

    public static class TokenizerMapper
            extends Mapper{

        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {
            StringTokenizer itr = new StringTokenizer(value.toString());
            while (itr.hasMoreTokens()) {
                word.set(itr.nextToken());
                context.write(word, one);
            }
        }
    }

    public static class IntSumReducer
            extends Reducer {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "word count");
        job.setJarByClass(WordCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

2. 创建 Maven 项目(推荐)

使用 Maven 管理依赖和打包,pom.xml示例:

xml


    4.0.0

    com.example
    mapreduce-example
    1.0-SNAPSHOT

    
        3.3.6
        8
        8
    

    
        
            org.apache.hadoop
            hadoop-client
            ${hadoop.version}
        
        
            org.apache.hadoop
            hadoop-common
            ${hadoop.version}
        
        
            org.apache.hadoop
            hadoop-mapreduce-client-core
            ${hadoop.version}
        
    

    
        
            
                org.apache.maven.plugins
                maven-shade-plugin
                3.4.1
                
                    
                        package
                        
                            shade
                        
                        
                            
                                
                                    WordCount
                                
                            
                        
                    
                
            
        
    

3. 打包项目

使用 Maven 命令打包:

bash

mvn clean package

这将生成一个包含所有依赖的 JAR 文件(通常位于target/mapreduce-example-1.0-SNAPSHOT.jar)。

4. 上传输入数据到 HDFS

假设输入文件为input.txt,上传到 HDFS:

bash

hdfs dfs -mkdir -p /user/hadoop/input
hdfs dfs -put input.txt /user/hadoop/input/

5. 运行 MapReduce 作业

使用hadoop jar命令提交作业:

bash

hadoop jar target/mapreduce-example-1.0-SNAPSHOT.jar WordCount /user/hadoop/input /user/hadoop/output

  • 参数说明
    • target/mapreduce-example-1.0-SNAPSHOT.jar:打包后的 JAR 文件路径。
    • WordCount:主类名(包含main方法的类)。
    • /user/hadoop/input:HDFS 输入路径。
    • /user/hadoop/output:HDFS 输出路径(需不存在,系统会自动创建)。

6. 查看结果

bash

hdfs dfs -cat /user/hadoop/output/part-r-00000

你可能感兴趣的:(mapreduce,大数据)