HIVE复杂结构体生成和使用

array类型:使用的时候,使用下标形式取数,生成的时候,使用array()函数生成 

MAP类型:使用的时候,使用字段['key']形式取数,生成的时候,使用str_to_map()函数生成,MAP是key value形式组成 

struct类型:使用的时候,使用字段.属性的形式取数,生成的时候,使用named_struct()函数生成

 get_json_object:使用的时候,使用get_json_object(字段,'$.属性')形式取数,生成的时候,拼接string类型字段拼接为{"key":value,"key1":value1}形式 


 以下是数据案例:

array:

test_person样例数据:

biansutao beijing,shanghai,tianjin,hangzhou

linan changchu,chengdu,wuhan

create table  default.test_person(name string,work_locations array)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY '\t'

COLLECTION ITEMS TERMINATED BY ',';

LOAD DATA  INPATH 'hdfs://sfbdp1/user/xxxx/upload/test_person.txt' OVERWRITE INTO TABLE default.test_person;

select name,work_locations,work_locations[0] from default.test_person limit 10;


insert into table default.test_person select 'AA' as name,array('china','beijing') from dual

select * from  default.test_person limit 10;


MAP类型:

test_score样例数据:

AA 数学:90,语文:80,英语:100

BB 数学:90,语文:80,英语:100

drop table default.test_score;

create table default.test_score(name string, score map)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

COLLECTION ITEMS TERMINATED BY ','

MAP KEYS TERMINATED BY ':';

truncate table default.test_score;

load data inpath 'hdfs://sfbdp1/user/xxx/upload/test_score.txt' overwrite into table default.test_score;

select *,score['英语'] from default.test_score limit 10;


create table test_map as select 'CC' as name,str_to_map('唱歌:90,跳舞:80',',',':') as score from dual

select * from test_map limit 10;


其他例子:

CREATE TABLE IF NOT EXISTS default.explode_laterview_org(

    day1_num BIGINT

    ,day2_num BIGINT

    ,day3_num BIGINT

    ,day4_num BIGINT

    ,day5_num BIGINT

    ,day6_num BIGINT

    ,day7_num BIGINT

    ,campaign_name STRING

    ,campaign_id BIGINT

);

INSERT OVERWRITE TABLE default.explode_laterview_org VALUES

(40, 20, 10, 4, 4, 2, 1, 'zoo', 2 )

,(100, 80, 53, 40, 7, 6, 5, 'moji', 3)

;

str_to_map使用:

select campaign_id,campaign_name,

STR_TO_MAP(

CONCAT(

'day1_num=',CAST (day1_num AS STRING),

'&day2_num=',CAST (day2_num AS STRING),

'&day3_num=',CAST (day3_num AS STRING),

'&day4_num=',CAST (day4_num AS STRING),

'&day5_num=',CAST (day5_num AS STRING),

'&day6_num=',CAST (day6_num AS STRING),

'&day7_num=',CAST (day7_num AS STRING)

),'&', '=') as aa

from default.explode_laterview_org

str_to_map行转列:

select * from (

select campaign_id,campaign_name,

STR_TO_MAP(

CONCAT(

'day1_num=',CAST (day1_num AS STRING),

'&day2_num=',CAST (day2_num AS STRING),

'&day3_num=',CAST (day3_num AS STRING),

'&day4_num=',CAST (day4_num AS STRING),

'&day5_num=',CAST (day5_num AS STRING),

'&day6_num=',CAST (day6_num AS STRING),

'&day7_num=',CAST (day7_num AS STRING)

),'&', '=') as aa

from default.explode_laterview_org

) t lateral view explode(aa) mycol as mycol1,mycol2

备注:explode的用法:

explode(ARRAY) 列表中的每个元素生成一行

explode(MAP) map中每个key-value对,生成一行,key为一列,value为一列

explode(array)

select explode(array_col) as new_col from  table_name

array_col:为数组类型的字段

new_col:array_col被explode之后对应的列

explode(map)

map是key-value结构,所以在用explode转换的时候,会转换为2列,一列是key转换而成的,一列是vaule转换而成的

select explode(map_col) as (map_key_col,map_value_col) from table_name

map_col:map类型的字段

map_key_col:是map_col被拆分之后的map映射的key

map_value_col:是map_col被拆分之后的map映射的value

explode局限性:

1)不能关联原有的表中的其他字段

2)不能与group by、cluster by、distribute by、sort by联用。

3)不能进行UDTF嵌套

4)不允许选择其他表达式


lateral view explode() 能解除以上限制:

一个或多个lateral view explode:

SELECT myCol1, myCol2 FROM baseTable

LATERAL VIEW explode(col1) myTable1 AS myCol1

LATERAL VIEW explode(col2) myTable2 AS myCol2;

注意:lateral view explode(array()),当array()为空的时候,查询的结果返回空,而不是查询结果的对应列为空,使用lateral view outer explode(array()),查询的结果不为空,只有查询的对应列返回的结果为空



struct类型:

test_struct样例数据:

1 english,80

2 math,89

3 chinese,95

CREATE TABLE default.test_struct(id int,course struct)

ROW FORMAT DELIMITED

FIELDS TERMINATED BY ' '

COLLECTION ITEMS TERMINATED BY ',';

load data inpath 'hdfs://sfbdp1/user/xxxx/upload/test_struct.txt' overwrite into table default.test_struct;

select *,course.course,course.score from default.test_struct limit 10


insert into table default.test_struct select '4' as id,named_struct('course','ant','score',100) from dual

select * from default.test_struct limit 10;

get_json_object:

生成:

concat('{"barscantm":"',barscantm,

'","opcode":"',opcode

'}') as json_new

使用:

get_json_object( json_new,'$.opcode')


array类型字段另外两个用法:

array_contains(字段,'str') 判断字段中是否包含有str字符串

with tables as

(

select 'A' as useid,'1' as pageid

union all

select 'B' as useid,'1' as pageid

union all

select 'A' as useid,'2' as pageid

union all

select 'B' as useid,'3' as pageid

union all

select 'A' as useid,'4' as pageid

union all

select 'B' as useid,'4' as pageid

)

select * from (

select useid,collect_set(pageid) as pageid

from

(

select useid,pageid

from tables

group by useid,pageid

) t

group by useid

) t1

where array_contains(pageid,'1') and array_contains(pageid,'4')

-- 同时访问了pageid为1和4的userid数


sort_array(字段) 对字段进行升序排序

select name,work_locations from default.test_person limit 10;

select name,sort_array(work_locations) from default.test_person limit 10;

lateral view explode:

select waybill_no,package_no,

packageno

from gdl.tt_waybill_info

lateral view explode(package_no) mycol as packageno

where inc_day='20210519' and size(package_no)>1

你可能感兴趣的:(HIVE复杂结构体生成和使用)