Hadoop的MapReduce作业都是对key/value空间进行处理,从一个键值对空间映射到另一个键值对空间。具体来讲就是
(输入) <k1, v1> -> map -> <k2, v2> -> combine -> <k2, v2> -> reduce -> <k3,v3> (输出)
Hadoop中的key/value的类型都必须要实现Writable接口,其中key的类型由于要进行排序,还要实现额外的Comparable接口。
Hadoop对Java的基本类型大都进行了封装,如使用得最多的Text,IntWritable,LongWritable等,所有封装都包含get和set方法用于读取和设置封装的值。Hadoop也对Array,Map,SortedMap提供了封装。public static class IntPair implements WritableComparable<IntPair> { private int first = 0; private int second = 0; public void set(int left, int right) { first = left; second = right; } public int getFirst() { return first; } public int getSecond() { return second; } @Override public void readFields(DataInput in) throws IOException { first = in.readInt() + Integer.MIN_VALUE; second = in.readInt() + Integer.MIN_VALUE; } @Override public void write(DataOutput out) throws IOException { out.writeInt(first - Integer.MIN_VALUE); out.writeInt(second - Integer.MIN_VALUE); } @Override public int hashCode() { return first * 157 + second; } @Override public boolean equals(Object right) { if (right instanceof IntPair) { IntPair r = (IntPair) right; return r.first == first && r.second == second; } else { return false; } } public static class Comparator extends WritableComparator { public Comparator() { super(IntPair.class); } public int compare(byte[] b1, int s1, int l1, byte[] b2, int s2, int l2) { return compareBytes(b1, s1, l1, b2, s2, l2); } } static { WritableComparator.define(IntPair.class, new Comparator()); } @Override public int compareTo(IntPair o) { if (first != o.first) { return first < o.first ? -1 : 1; } else if (second != o.second) { return second < o.second ? -1 : 1; } else { return 0; } } }