hive四种常见的join

1.左连接 left join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > left join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1	xiaoming	1	19
2	xiaolan		2	20
3	xiaoxin		NULL	NULL
Time taken: 0.936 seconds, Fetched 3 row(s)

以左表为主,左表数据全部返回,右表数据只返回和左表通过关联条件能关联上的数据

2.右连接 right join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > right join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
2		xiaolan			2		20
NULL	NULL			4		21
Time taken: 0.936 seconds, Fetched 3 row(s)

以右表为主,右表数据全部返回,左表只返回和右表通过关联条件能关联上的数据

3.内连接 inner join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > inner join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
2		xiaolan			2		20
Time taken: 1.108 seconds, Fetched 2 row(s)

通过关联条件只取两个表数据的交集,关联不上的全部剔除了

4.全连接(满连接) full join
spark-sql> with test1 as (
         > select 1 as user_id,'xiaoming' as name
         > union all 
         > select 2 as user_id,'xiaolan' as name
         > union all 
         > select 3 as user_id,'xiaoxin' as name
         > ),
         > 
         > test2 as (
         > select 1 as user_id,19 as age
         > union all 
         > select 2 as user_id,20 as age
         > union all 
         > select 4 as user_id,21 as age
         > )
         > 
         > select 
         >   t1.user_id,
         >   t1.name,
         >   t2.user_id,
         >   t2.age
         > from 
         > (select 
         >   user_id,
         >   name
         > from test1) t1
         > 
         > full join
         > (select 
         >   user_id,
         >   age
         > from test2) t2
         > on t1.user_id=t2.user_id;
1       xiaoming        1       19                                              
3		xiaoxin			NULL	NULL
NULL	NULL			4		21
2		xiaolan			2		20

通过关联条件取两个表数据的并集,关联上的数据返回,关联不上的数据,另一部分返回NULL

总结:
如果只做数据拼接,一般用left join;如果只取几个表共同的部分,用inner join;如果所有的关联数据都要保留用full join

你可能感兴趣的:(hive,大数据,hadoop,hdfs,hive)