hive-sql语句对in和not in的替换

对于hive-sql里的子查询不支持not in或in ,目前测试,应该是一个hive语句里只能支持一个not in 或in语句,多了不支持,对not in的替换用 left join id(关联字段)is null ,in的替换用left join id is not null替换,或者用left semi join(更优化)
假设要求字段id在a表,而不在b表
not in 示例

select id from a
where id not in (select id from b);

hive-sql替换示例

select id from a
left  join b  on b.id=a.id
where b.id is null;

hive杂记
hivesql里的日期函数不支持’%Y-%m-%d’这样的表达,虽然不会报错,但是不会起作用
支持’2019-06-16’这样的表达
顺便记下我自己做的一个比较复杂的案例
原sql

SELECT
  count(DISTINCT t.passport_user_id) 新增用户数,
  count( t.passport_user_id) 新增用户交易笔数,
  sum(t.trade_amount) 新增用户交易金额
FROM t_trade t
INNER JOIN t_user u ON u.passport_user_id = t.passport_user_id
WHERE t.id IN (
  SELECT min(d.id) id
  FROM t_trade d
  INNER JOIN t_bank b ON d.bank_code = b.bank_code
  WHERE b.method_type = 1
    and DATE_FORMAT(d.create_time,'%Y-%m-%d')= '2019-05-23'
    AND d.trade_type = 3
    AND d.bank_trade_status = 1  
    AND d.passport_user_id NOT IN (
      SELECT d1.passport_user_id
      FROM t_trade d1
      INNER JOIN t_bank b ON d1.bank_code = b.bank_code
      WHERE b.method_type = 1
      AND d1.income_begin_date < d.income_begin_date   ---这是难点,需要用到原sql里的字段判断,因此我在替换后的sql里只好整个搬过去,希望以后能找到更好的解决办法
      AND d1.trade_type = 3
      AND d1.bank_trade_status = 1 
    )
  GROUP BY d.passport_user_id
)

替换后sql
left semi join(看左表的字段在右表里存不存在,有的话就返回左表的字段,注:只返回左表的字段,右表的字段不会返回,且右表字段不能出现在条件里)

SELECT
'金超' channle,
  count(DISTINCT t.passport_user_id) new_user,
  count( t.passport_user_id) new_count,
  sum(t.trade_amount) new_amount
FROM odsdb_bankconsignment_prod.t_trade t
INNER JOIN odsdb_bankconsignment_prod.t_user u ON u.passport_user_id = t.passport_user_id
left SEMI join (
  SELECT min(d.id) id
  FROM odsdb_bankconsignment_prod.t_trade d
  INNER JOIN odsdb_bankconsignment_prod.t_bank b ON d.bank_code = b.bank_code
  left OUTER join
  (
    SELECT d1.passport_user_id
      FROM odsdb_bankconsignment_prod.t_trade d1
      INNER JOIN odsdb_bankconsignment_prod.t_bank b ON d1.bank_code = b.bank_code
      WHERE b.method_type = 1
      AND DATE_FORMAT(d1.create_time,'yyyy-MM-dd') < '2019-06-18'
      AND d1.trade_type = 3
      AND d1.bank_trade_status = 1
    )f
    on d.passport_user_id=f.passport_user_id
  WHERE b.method_type = 1
    and DATE_FORMAT(d.create_time,'yyyy-MM-dd')= '2019-06-18'
    AND d.trade_type = 3
    AND d.bank_trade_status = 1  
    ---and f.income_begin_date

你可能感兴趣的:(hive学习)