作者:IvanCodes
日期:2025年5月20日
专栏:Hive教程
在Hive中,我们经常需要以不同于原始表结构的方式查看或处理数据。为了简化复杂查询、提供数据抽象,以及处理复杂数据类型(如数组或Map),Hive 提供了视图 (View) 和 Lateral View 这样强大的机制。
Hive 视图是一个虚拟表,它的内容由一个查询定义。视图本身不存储任何物理数据,而是在被查询时动态执行其定义的SELECT语句,并返回结果。
视图的优势:
基本视图操作:
(1) 创建视图 (CREATE VIEW)
CREATE VIEW [IF NOT EXISTS] view_name
[(column_list)]
[COMMENT view_comment]
AS SELECT_statement;
示例: 假设有 employees
(id, name, department_id, salary) 和 departments
(id, name) 表。
CREATE VIEW employee_department_details_view
COMMENT 'Shows employee name, salary, and their department name'
AS
SELECT e.name AS employee_name, e.salary, d.name AS department_name
FROM employees e
JOIN departments d ON e.department_id = d.id;
(2) 查询视图
SELECT * FROM employee_department_details_view WHERE salary > 60000;
(3) 查看视图定义
SHOW CREATE TABLE employee_department_details_view;
(4) 修改视图 (ALTER VIEW)
ALTER VIEW [db_name.]view_name AS SELECT_statement;
格式:
ALTER VIEW employee_department_details_view AS
SELECT e.name AS emp_name, e.salary, d.name AS dept_name, e.id AS emp_id -- 修改了列名并增加了列
FROM employees e
JOIN departments d ON e.department_id = d.id;
(5) 删除视图 (DROP VIEW)
DROP VIEW [IF EXISTS] [db_name.]view_name;
格式:
DROP VIEW IF EXISTS employee_department_details_view;
有时,我们的Hive表中会包含数组 (ARRAY) 或 映射 (MAP) 这样的复杂数据类型。如果我们希望将这些集合类型中的每个元素或键值对 “展开” 成单独的行,以便进行更细致的分析,这时就需要Lateral View。
Lateral View 通常与表生成函数 (UDTF, User-Defined Table-generating Function) 一起使用,最常用的 UDTF 就是 explode()
。explode()
函数可以接收一个数组或Map作为输入,并为数组中的每个元素或 Map中的每个键值对输出一行。
Lateral View 的工作方式:
Lateral View 会首先将 UDTF 应用于基表的每一行。然后,它将UDTF的输出行与原始的输入行进行连接 (join),形成新的虚拟表行。
语法:
SELECT ...
FROM base_table
LATERAL VIEW udtf(expression) table_alias AS column_alias_1 [, column_alias_2, ...];
udtf(expression)
: 表生成函数及其参数,如 explode(array_column)
或 explode(map_column)
。table_alias
: 为 Lateral View 生成的虚拟表指定的别名。column_alias_1, ...
: 为 UDTF 输出的列指定的别名。explode(array)
只输出一列,explode(map)
输出两列 (key, value)。示例1:展开数组
假设有一个表 user_hobbies
(user_id INT, hobbies ARRAY)。
-- 假设 user_hobbies 表数据:
-- 1, ['reading', 'hiking']
-- 2, ['coding', 'gaming', 'reading']
SELECT user_id, single_hobby
FROM user_hobbies
LATERAL VIEW explode(hobbies) exploded_hobbies_table AS single_hobby;
查询结果将会是:
1, reading
1, hiking
2, coding
2, gaming
2, reading
示例2:展开Map
假设有一个表 product_attributes
(product_id INT, attributes MAP
-- 假设 product_attributes 表数据:
-- 101, {'color':'red', 'size':'M'}
-- 102, {'material':'cotton', 'brand':'XYZ'}
SELECT product_id, attr_key, attr_value
FROM product_attributes
LATERAL VIEW explode(attributes) exploded_attributes_table AS attr_key, attr_value;
查询结果将会是:
101, color, red
101, size, M
102, material, cotton
102, brand, XYZ
Lateral View 的强大之处在于它可以被包含在视图的 AS SELECT
定义中。这样,我们就可以创建一个视图来永久性地提供这种展开后的数据展现形式。
示例:创建一个视图来展示每个用户的单个爱好
CREATE VIEW user_individual_hobbies_view AS
SELECT user_id, single_hobby
FROM user_hobbies
LATERAL VIEW explode(hobbies) exploded_hobbies_table AS single_hobby;
-- 后续查询
SELECT * FROM user_individual_hobbies_view WHERE user_id = 1;
INSERT
, UPDATE
, DELETE
操作。ORDER BY
限制:视图定义中的 SELECT
不推荐直接使用 ORDER BY
(除非配合 LIMIT
)。排序应在最终查询视图时应用。总结: Hive 视图提供了数据的逻辑抽象层,而 Lateral View 则是处理和转换数组、Map等复杂结构的强大工具。将两者结合使用,可以极大地增强数据分析和展现的灵活性与便捷性。
背景数据表:
products
(product_id INT, product_name STRING, category STRING, price DECIMAL(8,2), tags ARRAY)sales
(sale_id INT, product_id INT, sale_date STRING, quantity INT, customer_id INT, sale_details MAPcustomers
(customer_id INT, customer_name STRING, city STRING)请根据以下表结构和数据自行插入一些样例数据用于测试。
例如:
products
表中一条数据: (1, 'Laptop X', 'Electronics', 1200.00, array('slim', 'powerful', '15-inch'))
sales
表中一条数据: (101, 1, '2023-01-15', 1, 201, map('channel','online', 'promo_code','SAVE10'))
题目:
product_basic_info_view
,显示所有产品的 product_id
, product_name
, 和 price
。product_tags_expanded_view
,将 products
表中的 tags
数组展开,每行显示 product_id
, product_name
和一个单独的 tag
。sales_details_expanded_view
,将 sales
表中的 sale_details
Map展开,每行显示 sale_id
, product_id
,以及Map中的 detail_key
和 detail_value
。electronics_product_tags_view
,只显示类别 (category
) 为 ‘Electronics’ 的产品的 product_name
和其展开后的每个 tag
。total_quantity_per_product_view
,显示每个 product_name
的总销售数量 (total_quantity
)。total_quantity_per_product_view
,创建一个新视图 high_sales_products_view
,只显示 total_quantity
大于10的产品。product_tags_expanded_view
,使其额外显示产品的 category
。sales_details_expanded_view
的创建语句。customer_purchase_channels_view
,它通过展开 sales.sale_details
(假设其中有 ‘channel’ key) 来显示每个客户 (customer_name
) 的购买渠道。请写出创建这个视图的SQL语句 (需要连接 customers
和 sales
表)。product_basic_info_view
。product_basic_info_view
:CREATE VIEW product_basic_info_view AS
SELECT product_id, product_name, price
FROM products;
product_tags_expanded_view
:CREATE VIEW product_tags_expanded_view AS
SELECT p.product_id, p.product_name, single_tag
FROM products p
LATERAL VIEW explode(p.tags) exploded_tags_table AS single_tag;
sales_details_expanded_view
:CREATE VIEW sales_details_expanded_view AS
SELECT s.sale_id, s.product_id, detail_key, detail_value
FROM sales s
LATERAL VIEW explode(s.sale_details) exploded_details_table AS detail_key, detail_value;
electronics_product_tags_view
:CREATE VIEW electronics_product_tags_view AS
SELECT p.product_name, single_tag
FROM products p
LATERAL VIEW explode(p.tags) exploded_tags_table AS single_tag
WHERE p.category = 'Electronics';
total_quantity_per_product_view
:CREATE VIEW total_quantity_per_product_view AS
SELECT p.product_name, SUM(s.quantity) AS total_quantity
FROM sales s
JOIN products p ON s.product_id = p.product_id
GROUP BY p.product_name;
high_sales_products_view
:CREATE VIEW high_sales_products_view AS
SELECT product_name, total_quantity
FROM total_quantity_per_product_view
WHERE total_quantity > 10;
product_tags_expanded_view
:ALTER VIEW product_tags_expanded_view AS
SELECT p.product_id, p.product_name, p.category, single_tag
FROM products p
LATERAL VIEW explode(p.tags) exploded_tags_table AS single_tag;
sales_details_expanded_view
的创建语句:SHOW CREATE TABLE sales_details_expanded_view;
customer_purchase_channels_view
:CREATE VIEW customer_purchase_channels_view AS
SELECT
c.customer_name,
details.detail_value AS purchase_channel
FROM
customers c
JOIN
sales s ON c.customer_id = s.customer_id
LATERAL VIEW explode(s.sale_details) details_table AS detail_key, detail_value
WHERE details.detail_key = 'channel';
product_basic_info_view
:DROP VIEW IF EXISTS product_basic_info_view;