- 用户自定义的Mapper要继承自己的父类
- Mapper的输入数据时KV对的形式(KV的类型可自定义)
- Mapper中的业务逻辑写在map() 方法中
- Mapper的输出数据是KV对的形式(KV的类型可自定义)
- map() 方法(MapTask进程)对每一个
调用一次
- 用户自定义的Reducer要继承自己的父类
- Reducer的输入数据类型对应Mapper的输出数据类型,也是KV
- Reducer的业务逻辑写在reduce()方法中
- ReduceTask进程对每一组相同的k的
组调用一次reduce()方法
相当于Yarn集群的客户端,用于提交我们整个儿程序到Yarn集群,提交的是封装了MapReduce程序相关运行参数的job对象
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.logging.log4j</groupId>
<artifactId>log4j-core</artifactId>
<version>2.8.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.7.2</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.2</version>
</dependency>
</dependencies>
在项目的src/main/resources目录下,新建一个文件,命名为“log4j.properties”
log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/spring.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n
WcMapper.java:
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class WcMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
// 对输出数据进行封装
private Text word = new Text();
private IntWritable one = new IntWritable(1);
/**
* Map是核心逻辑
* @param key 行号
* @param value 行内容
* @param context 任务对象 job
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
// 这一行内容
String line = value.toString();
// 将这一行拆成很多单词
String[] words = line.split(" ");
// 将拆好的单词按照(word, 1) 的形式输出
for (String word :
words) {
this.word.set(word);
context.write(this.word, one);
}
}
}
WcReducer.java:
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
/**
*
* @param key 单词
* @param values 这个单词所有的1
* @param context
* @throws IOException
* @throws InterruptedException
*/
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
// 做累加
int sum = 0;
// 将相同的key的值做累加
for (IntWritable value : values) {
sum += value.get();
}
result.set(sum);
context.write(key, result);
}
}
WcDriver.java:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class WcDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
// 1. 新建一个Job
Job job = Job.getInstance(new Configuration(), "MyWordCount");
// 2. 设置Job的Jar包
job.setJarByClass(WcDriver.class);
// 3. 设置Job的Mapper和Reducer
job.setMapperClass(WcMapper.class);
job.setReducerClass(WcReducer.class);
// combiner
// job.setCombinerClass(WcReducer.class);
// 4. 设置Mapper和Reducer的输出类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
// 5. 设置输入路径和输出路径
// (FileInputFormat和FileOutputFormat选包名找包名长的)
// 输入目录里面带有需求的文件
FileInputFormat.setInputPaths(job, new Path("d:/input"));
// 输出目录必须不存在
FileOutputFormat.setOutputPath(job, new Path("d:/output"));
boolean b = job.waitForCompletion(true);
System.exit(b ? 0 : 1);
}
}
- 将win10的hadoop的压缩包解压到非中文路径
- 添加HADOOP_HOME
- 添加到Path
- 重启计算机
hadoop jar mapreduce-1.0-SNAPSHOT.jar com.atguigu.mapreduce.wordcount.WcDriver /input /ouput
hadoop fs -cat /ouput/*
自己不能胜任的事情,切莫轻易答应别人,一旦答应了别人,就必须实践自己的诺言。——华盛顿