HBase集成Hive,由Hive来编写SQL语句操作HBase有以下好处:
HBase集成Hive,由Hive编写SQL语句操作HBase也存在一些弊端,主要体现在以下几个方面:
尽管HBase集成Hive存在一些弊端,但两者进行集成仍然具有重要意义,主要原因如下:
HBase1.4.8和Hive-2.3.3各种的安装步骤请参考基于OpenEuler国产操作系统大数据实验环境搭建。两者集成的步骤如下:
<property>
<name>hive.zookeeper.quorumname>
<value>s1,s2,s3value>
property>
<property>
<name>hive.zookeeper.client.portname>
<value>2181value>
property>
cp $HBASE_HOME/conf/hbase-site.xml $HIVE_HOME/conf/
[root@s1 hive]# hive
……
Logging initialized using configuration in jar:file:/mysoft/hive/lib/hive-common-2.3.3.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine 1.X releases.
hive>
create table hive_to_hbase_emp_table(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping"=":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno") tblproperties("hbase.table.name"="hive_to_hbase_emp_table");
执行上述SQL,结果如下:hive> create table hive_to_hbase_emp_table(
> empno int,
> ename string,
> job string,
> mgr int,
> hiredate string,
> sal double,
> comm double,
> deptno int)
> stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' with serdeproperties ("hbase.columns.mapping"=":key,info:ename,info:job,info:mgr,info:hiredate,info:sal,info:comm,info:deptno") tblproperties("hbase.table.name"="hive_to_hbase_emp_table");
OK
Time taken: 4.958 seconds
hive>
说明Hive与HBase集成已经正常。hbase(main):001:0> list
TABLE
hive_to_hbase_emp_table
1 row(s) in 0.8890 seconds
=> ["hive_to_hbase_emp_table"]
hbase(main):002:0>
create table hive_inner_emp(
empno int,
ename string,
job string,
mgr int,
hiredate string,
sal double,
comm double,
deptno int)
row format delimited fields terminated by ',';
执行上述SQL完成hive_inner_emp表创建。load data local inpath '/tools/emp.csv' into table hive_inner_emp;
命令执行正常时,数据已经添加到hive_inner_emp表中:hive> select * from hive_inner_emp;
OK
7369 SMITH CLERK 7902 1980/12/17 800.0 NULL 20
7499 ALLEN SALESMAN 7698 1981/2/20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981/2/22 1250.0 500.0 30
7566 JONES MANAGER 7839 1981/4/2 2975.0 NULL 20
7654 MARTIN SALESMAN 7698 1981/9/28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981/5/1 2850.0 NULL 30
7782 CLARK MANAGER 7839 1981/6/9 2450.0 NULL 10
7788 SCOTT ANALYST 7566 1987/4/19 3000.0 NULL 20
7839 KING PRESIDENT NULL 1981/11/17 5000.0 NULL 10
7844 TURNER SALESMAN 7698 1981/9/8 1500.0 0.0 30
7876 ADAMS CLERK 7788 1987/5/23 1100.0 NULL 20
7900 JAMES CLERK 7698 1981/12/3 9500.0 NULL 30
7902 FORD ANALYST 7566 1981/12/3 3000.0 NULL 20
7934 MILLER CLERK 7782 1982/1/23 1300.0 NULL 10
Time taken: 0.668 seconds, Fetched: 14 row(s)
hive_inner_emp
表数据插入到hive_to_hbase_emp_table
表中:insert into hive_to_hbase_emp_table select * from hive_inner_emp;
正常执行结果如下所示:hive> insert into hive_to_hbase_emp_table select * from hive_inner_emp;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20250513134124_d0a93e1f-e852-405e-ad52-2ddaae763d9f
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1747100297537_0014, Tracking URL = http://s1:8088/proxy/application_1747100297537_0014/
Kill Command = /mysoft/hadoop/bin/hadoop job -kill job_1747100297537_0014
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2025-05-13 13:41:56,018 Stage-3 map = 0%, reduce = 0%
2025-05-13 13:42:17,910 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 4.98 sec
MapReduce Total cumulative CPU time: 4 seconds 980 msec
Ended Job = job_1747100297537_0014
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 4.98 sec HDFS Read: 5951 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 4 seconds 980 msec
OK
Time taken: 55.57 seconds
hive>
hbase(main):002:0> scan 'hive_to_hbase_emp_table'
ROW COLUMN+CELL
7369 column=info:deptno, timestamp=1747114968171, value=20
7369 column=info:ename, timestamp=1747114968171, value=SMITH
7369 column=info:hiredate, timestamp=1747114968171, value=1980/12/17
7369 column=info:job, timestamp=1747114968171, value=CLERK
7369 column=info:mgr, timestamp=1747114968171, value=7902
7369 column=info:sal, timestamp=1747114968171, value=800.0
7499 column=info:comm, timestamp=1747114968171, value=300.0
7499 column=info:deptno, timestamp=1747114968171, value=30
7499 column=info:ename, timestamp=1747114968171, value=ALLEN
7499 column=info:hiredate, timestamp=1747114968171, value=1981/2/20
7499 column=info:job, timestamp=1747114968171, value=SALESMAN
7499 column=info:mgr, timestamp=1747114968171, value=7698
7499 column=info:sal, timestamp=1747114968171, value=1600.0
7521 column=info:comm, timestamp=1747114968171, value=500.0
7521 column=info:deptno, timestamp=1747114968171, value=30
7521 column=info:ename, timestamp=1747114968171, value=WARD
7521 column=info:hiredate, timestamp=1747114968171, value=1981/2/22
7521 column=info:job, timestamp=1747114968171, value=SALESMAN
7521 column=info:mgr, timestamp=1747114968171, value=7698
7521 column=info:sal, timestamp=1747114968171, value=1250.0
7566 column=info:deptno, timestamp=1747114968171, value=20
7566 column=info:ename, timestamp=1747114968171, value=JONES
7566 column=info:hiredate, timestamp=1747114968171, value=1981/4/2
7566 column=info:job, timestamp=1747114968171, value=MANAGER
7566 column=info:mgr, timestamp=1747114968171, value=7839
7566 column=info:sal, timestamp=1747114968171, value=2975.0
7654 column=info:comm, timestamp=1747114968171, value=1400.0
7654 column=info:deptno, timestamp=1747114968171, value=30
7654 column=info:ename, timestamp=1747114968171, value=MARTIN
7654 column=info:hiredate, timestamp=1747114968171, value=1981/9/28
7654 column=info:job, timestamp=1747114968171, value=SALESMAN
7654 column=info:mgr, timestamp=1747114968171, value=7698
7654 column=info:sal, timestamp=1747114968171, value=1250.0
7698 column=info:deptno, timestamp=1747114968171, value=30
7698 column=info:ename, timestamp=1747114968171, value=BLAKE
7698 column=info:hiredate, timestamp=1747114968171, value=1981/5/1
7698 column=info:job, timestamp=1747114968171, value=MANAGER
7698 column=info:mgr, timestamp=1747114968171, value=7839
7698 column=info:sal, timestamp=1747114968171, value=2850.0
7782 column=info:deptno, timestamp=1747114968171, value=10
7782 column=info:ename, timestamp=1747114968171, value=CLARK
7782 column=info:hiredate, timestamp=1747114968171, value=1981/6/9
7782 column=info:job, timestamp=1747114968171, value=MANAGER
7782 column=info:mgr, timestamp=1747114968171, value=7839
7782 column=info:sal, timestamp=1747114968171, value=2450.0
7788 column=info:deptno, timestamp=1747114968171, value=20
7788 column=info:ename, timestamp=1747114968171, value=SCOTT
7788 column=info:hiredate, timestamp=1747114968171, value=1987/4/19
7788 column=info:job, timestamp=1747114968171, value=ANALYST
7788 column=info:mgr, timestamp=1747114968171, value=7566
7788 column=info:sal, timestamp=1747114968171, value=3000.0
7839 column=info:deptno, timestamp=1747114968171, value=10
7839 column=info:ename, timestamp=1747114968171, value=KING
7839 column=info:hiredate, timestamp=1747114968171, value=1981/11/17
7839 column=info:job, timestamp=1747114968171, value=PRESIDENT
7839 column=info:sal, timestamp=1747114968171, value=5000.0
7844 column=info:comm, timestamp=1747114968171, value=0.0
7844 column=info:deptno, timestamp=1747114968171, value=30
7844 column=info:ename, timestamp=1747114968171, value=TURNER
7844 column=info:hiredate, timestamp=1747114968171, value=1981/9/8
7844 column=info:job, timestamp=1747114968171, value=SALESMAN
7844 column=info:mgr, timestamp=1747114968171, value=7698
7844 column=info:sal, timestamp=1747114968171, value=1500.0
7876 column=info:deptno, timestamp=1747114968171, value=20
7876 column=info:ename, timestamp=1747114968171, value=ADAMS
7876 column=info:hiredate, timestamp=1747114968171, value=1987/5/23
7876 column=info:job, timestamp=1747114968171, value=CLERK
7876 column=info:mgr, timestamp=1747114968171, value=7788
7876 column=info:sal, timestamp=1747114968171, value=1100.0
7900 column=info:deptno, timestamp=1747114968171, value=30
7900 column=info:ename, timestamp=1747114968171, value=JAMES
7900 column=info:hiredate, timestamp=1747114968171, value=1981/12/3
7900 column=info:job, timestamp=1747114968171, value=CLERK
7900 column=info:mgr, timestamp=1747114968171, value=7698
7900 column=info:sal, timestamp=1747114968171, value=9500.0
7902 column=info:deptno, timestamp=1747114968171, value=20
7902 column=info:ename, timestamp=1747114968171, value=FORD
7902 column=info:hiredate, timestamp=1747114968171, value=1981/12/3
7902 column=info:job, timestamp=1747114968171, value=ANALYST
7902 column=info:mgr, timestamp=1747114968171, value=7566
7902 column=info:sal, timestamp=1747114968171, value=3000.0
7934 column=info:deptno, timestamp=1747114968171, value=10
7934 column=info:ename, timestamp=1747114968171, value=MILLER
7934 column=info:hiredate, timestamp=1747114968171, value=1982/1/23
7934 column=info:job, timestamp=1747114968171, value=CLERK
7934 column=info:mgr, timestamp=1747114968171, value=7782
7934 column=info:sal, timestamp=1747114968171, value=1300.0
14 row(s) in 0.9040 seconds
hbase(main):003:0>
hive> INSERT INTO TABLE hive_to_hbase_emp_table VALUES (7935, 'Alice', 'Engineer', null, '2023-01-01', 10000.0, null, 10);
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20250513134721_628fd4af-a8f7-4c6e-b9c1-9e8f22e78c03
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1747100297537_0015, Tracking URL = http://s1:8088/proxy/application_1747100297537_0015/
Kill Command = /mysoft/hadoop/bin/hadoop job -kill job_1747100297537_0015
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
2025-05-13 13:47:49,256 Stage-3 map = 0%, reduce = 0%
2025-05-13 13:48:06,096 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 6.2 sec
MapReduce Total cumulative CPU time: 6 seconds 200 msec
Ended Job = job_1747100297537_0015
MapReduce Jobs Launched:
Stage-Stage-3: Map: 1 Cumulative CPU: 6.2 sec HDFS Read: 6082 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 200 msec
OK
Time taken: 47.183 seconds
hive>
hbase(main):006:0> get 'hive_to_hbase_emp_table','7935'
COLUMN CELL
info:deptno timestamp=1747115316165, value=10
info:ename timestamp=1747115316165, value=Alice
info:hiredate timestamp=1747115316165, value=2023-01-01
info:job timestamp=1747115316165, value=Engineer
info:sal timestamp=1747115316165, value=10000.0
5 row(s) in 0.1260 seconds
到此,说明Hive-2.3.3和HBase-1.4.8集成完毕。hive_to_hbase_emp_table
表,hive只保存该表的元信息,该表所在hdfs上的路径为:drwxr-xr-x - root supergroup 0 2025-05-13 15:47 /user/hive/warehouse/hive_to_hbase_emp_table
真实数据是存储在hbase路径下:[root@s1 conf]# hdfs dfs -ls -R /hbase/data/default/hive_to_hbase_emp_table
drwxr-xr-x - root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/.tabledesc
-rw-r--r-- 3 root supergroup 303 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/.tabledesc/.tableinfo.0000000001
drwxr-xr-x - root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/.tmp
drwxr-xr-x - root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/14a4000780edafd6a8b46b52f046f863
-rw-r--r-- 3 root supergroup 58 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/14a4000780edafd6a8b46b52f046f863/.regioninfo
drwxr-xr-x - root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/14a4000780edafd6a8b46b52f046f863/info
drwxr-xr-x - root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/14a4000780edafd6a8b46b52f046f863/recovered.edits
-rw-r--r-- 3 root supergroup 0 2025-05-13 15:26 /hbase/data/default/hive_to_hbase_emp_table/14a4000780edafd6a8b46b52f046f863/recovered.edits/2.seqid
[root@s1 conf]#