通常Hadoop在做join策略的时候会有两种方式map-side join(也叫replication join)和reduce-side join(也叫repartition join或者common join)
1. reduce side join
利用了mapreduce框架的sort-merge机制来使得相同key的数据聚合在一起,在map阶段会分别读取输入dataset,然后根据join key来分发每条记录(其他值包装在value中),在reduce阶段读取所有同一个join key对应的所有记录后,就可以做笛卡尔积,然后将结果再emit出去。
2. map side join
如果一部分输入dataset size比较小的话,可以将这部分数据replicate到所有的map端(利用DistributedCache拷贝到各个map host上),在map task执行的时候,会先将这部分数据(小表)读入memory中,每次在map函数遍历大表的时候,会查找memory中对应相同join key的记录集,然后做join。
Hive执行map side join的策略
Hive在Compile阶段的时候对每一个common join会生成一个conditional task,并且对于每一个join table,会假设这个table是大表,生成一个mapjoin task,然后把这些mapjoin tasks装进conditional task(List<Task<? extends Serializable>> resTasks),同时会映射大表的alias和对应的mapjoin task。在runtime运行时,resolver会读取每个table alias对应的input file size,如果小表的file size比设定的threshold要低 (hive.mapjoin.smalltable.filesize,默认值为25M),那么就会执行converted mapjoin task。对于每一个mapjoin task同时会设置一个backup task,就是先前的common join task,一旦mapjoin task执行失败了,则会启用backup task
流程图:
ConditionalResolverCommonJoin.java
resolver.getTasks(conf, resolverCtx)方法
public List<Task<? extends Serializable>> getTasks(HiveConf conf, Object objCtx) { ConditionalResolverCommonJoinCtx ctx = (ConditionalResolverCommonJoinCtx) objCtx; List<Task<? extends Serializable>> resTsks = new ArrayList<Task<? extends Serializable>>(); // get aliasToPath and pass it to the heuristic HashMap<String, ArrayList<String>> pathToAliases = ctx.getPathToAliases(); HashMap<String, Long> aliasToKnownSize = ctx.getAliasToKnownSize(); String bigTableAlias = this.resolveMapJoinTask(pathToAliases, ctx .getAliasToTask(), aliasToKnownSize, ctx.getHdfsTmpDir(), ctx .getLocalTmpDir(), conf); if (bigTableAlias == null) { // run common join task resTsks.add(ctx.getCommonJoinTask()); } else { // run the map join task Task<? extends Serializable> task = ctx.getAliasToTask().get(bigTableAlias); //set task tag if(task.getTaskTag() == Task.CONVERTED_LOCAL_MAPJOIN) { task.getBackupTask().setTaskTag(Task.BACKUP_COMMON_JOIN); } resTsks.add(task); } return resTsks; }
resolveMapJoinTask方法
private String resolveMapJoinTask( HashMap<String, ArrayList<String>> pathToAliases, HashMap<String, Task<? extends Serializable>> aliasToTask, HashMap<String, Long> aliasToKnownSize, String hdfsTmpDir, String localTmpDir, HiveConf conf) { String bigTableFileAlias = null; long smallTablesFileSizeSum = 0; Map<String, AliasFileSizePair> aliasToFileSizeMap = new HashMap<String, AliasFileSizePair>(); for (Map.Entry<String, Long> entry : aliasToKnownSize.entrySet()) { String alias = entry.getKey(); AliasFileSizePair pair = new AliasFileSizePair(alias, entry.getValue()); aliasToFileSizeMap.put(alias, pair); } try { // need to compute the input size at runtime, and select the biggest as // the big table. for (Map.Entry<String, ArrayList<String>> oneEntry : pathToAliases .entrySet()) { String p = oneEntry.getKey(); // this path is intermediate data if (p.startsWith(hdfsTmpDir) || p.startsWith(localTmpDir)) { ArrayList<String> aliasArray = oneEntry.getValue(); if (aliasArray.size() <= 0) { continue; } Path path = new Path(p); FileSystem fs = path.getFileSystem(conf); long fileSize = fs.getContentSummary(path).getLength(); for (String alias : aliasArray) { AliasFileSizePair pair = aliasToFileSizeMap.get(alias); if (pair == null) { pair = new AliasFileSizePair(alias, 0); aliasToFileSizeMap.put(alias, pair); } pair.size += fileSize; } } } // generate file size to alias mapping; but not set file size as key, // because different file may have the same file size. List<AliasFileSizePair> aliasFileSizeList = new ArrayList<AliasFileSizePair>( aliasToFileSizeMap.values()); Collections.sort(aliasFileSizeList); // iterating through this list from the end to beginning, trying to find // the big table for mapjoin int idx = aliasFileSizeList.size() - 1; boolean bigAliasFound = false; while (idx >= 0) { AliasFileSizePair pair = aliasFileSizeList.get(idx); String alias = pair.alias; long size = pair.size; idx--; if (!bigAliasFound && aliasToTask.get(alias) != null) { // got the big table bigAliasFound = true; bigTableFileAlias = alias; continue; } smallTablesFileSizeSum += size; } // compare with threshold long threshold = HiveConf.getLongVar(conf, HiveConf.ConfVars.HIVESMALLTABLESFILESIZE); if (smallTablesFileSizeSum <= threshold) { return bigTableFileAlias; } else { return null; } } catch (Exception e) { e.printStackTrace(); return null; } }
参考:
https://issues.apache.org/jira/browse/HIVE-1642
https://cwiki.apache.org/Hive/configuration-properties.html
https://cwiki.apache.org/Hive/languagemanual-joins.html
本文链接http://blog.csdn.net/lalaguozhe/article/details/9082921,转载请注明