hive按照某个字段分组,然后获取每个分组中最新的n条数据

hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

解释说明:
ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn
表示通过 advertId 进行分区,然后在每个分区中按照时间降序排列结果。
ROW_NUMBER() 生成的时候通OVER(通过)(PARTITION BY advertId ORDER BY addTime desc) 这里面的结果生成行号

其中:

select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2

通过上面SQL生成的结果类似:
hive按照某个字段分组,然后获取每个分组中最新的n条数据_第1张图片

通过:

hive -e "use db; select t.advertId,t.exposureNum from (select advertId,exposureNum,ROW_NUMBER() OVER(PARTITION BY advertId ORDER BY addTime desc) AS rn FROM tb_advert_flow_money where ftype = 2) t where t.rn=1;" > exposureInfo.txt;

生成的数据为:
hive按照某个字段分组,然后获取每个分组中最新的n条数据_第2张图片

你可能感兴趣的:(#,HIVE(数据库仓库工具))