words = lines.flatMap(lambda line: line.split(" "))
利用reduceByKey函数将单词按照键(单词)进行聚合
result = words.reduceByKey(lambda a, b: a + b)
result.saveAsTextFile("file:///usr/hadoop/wordcount/output") ```
4.2 Hadoop代码实例
以下是一个简单的WordCount示例,它将输入文本中的单词计数输出为最终效果。 ```java import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.mapreduce.Reducer; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount { public static class TokenizerMapper extends Mapper { private final static IntWritable one = new IntWritable(1); private Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
context.write(word, one);
}
}
复制代码
}
public static class IntSumReducer extends Reducer { private IntWritable result = new IntWritable();