Hive-4.0.1版本部署文档

1. 前置要求

  • 操作系统:建议使用 CentOS 7 或 Ubuntu 20.04(本试验使用的是CentOS Linux release 7.9.2009 (Core))
  • Java 环境:建议安装 Java 8 或更高版本。
  • Hadoop:Hive 需要依赖 Hadoop 进行分布式存储,建议安装 Hadoop 3.x 版本(本实验采用的是hadoop3.3.6)。
  • 数据库:Hive Metastore 需要数据库支持,建议使用 MySQL、PostgreSQL 或 Oracle。本实验采用的是MySQL 。
  • 本服务器IP为192.168.128.130

2. 下载与解压 Hive

  1. 下载 Hive 4.0.1 版本的 tar 文件:

    wget https://downloads.apache.org/hive/hive-4.0.1/apache-hive-4.0.1-bin.tar.gz
    
  2. 解压文件并移动到合适的安装路径:

    tar -zxvf apache-hive-4.0.1-bin.tar.gz
    mv apache-hive-4.0.1-bin /opt/hive
    
  3. 设置环境变量,在 ~/.bashrc 文件中添加以下行:

    export HIVE_HOME=/opt/hive
    export PATH=$PATH:$HIVE_HOME/bin
    

    然后使用 source ~/.bashrc 使其生效。

3. 配置 Hive Metastore 数据库

  1. 创建 Hive 的元数据库。以下为 MySQL 配置的示例(安装数据库请参考别的文档):

    • 启动 MySQL 并登录:
      mysql -u root -p
      
    • 创建数据库:
      CREATE DATABASE hive_metastore;
      
    • 创建用户并授权:
      CREATE USER 'hive'@'%' IDENTIFIED BY 'Hive_123456';
      GRANT ALL PRIVILEGES ON hive_metastore.* TO 'hive'@'%';
      FLUSH PRIVILEGES;
      
  2. 在 Hive 配置中设置数据库连接信息:

    • 编辑 hive-site.xml 文件,路径为 $HIVE_HOME/conf/hive-site.xml

  
    hive.metastore.warehouse.dir
    /user/hive/warehouse
    Location of default Hive warehouse where managed tables are stored.
  

  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://192.168.128.130:3306/hive?createDatabaseIfNotExist=true
    JDBC connection URL to connect to the Hive Metastore database, here with MySQL as the backend database.
  

  
    javax.jdo.option.ConnectionDriverName
    com.mysql.jdbc.Driver
    JDBC driver class name for connecting to the Hive Metastore database.
  

  
    javax.jdo.option.ConnectionUserName
    hive
    Username for connecting to the Hive Metastore database.
  

  
    javax.jdo.option.ConnectionPassword
    Hive_123456
    Password for connecting to the Hive Metastore database.
  

  
    datanucleus.schema.autoCreateAll
    true
    When set to true, DataNucleus will automatically create tables and columns if they do not already exist in the schema.
  

  
    hive.metastore.schema.verification
    false
    Disables schema verification, allowing automatic updates of the Metastore schema without manual intervention.
  

  
    hive.server2.enable.doAs
    true
    Enables HiveServer2 to execute queries as the user who submitted the query, rather than the HiveServer2 service user.
  

  
    hive.server2.authentication
    NONE
    Specifies the authentication mode for HiveServer2 connections. Options include NONE, KERBEROS, LDAP, PAM, and CUSTOM.
  

  • 确保数据库驱动已放置在 $HIVE_HOME/lib 目录下:

    cp /path/to/mysql-connector-java.jar $HIVE_HOME/lib/
    

4. 初始化 Metastore

使用以下命令初始化 Hive 元数据:

schematool -initSchema -dbType mysql

5. 启动 Hiveserver2

由于4.0.1版本已经废弃hive CLI,所以只能通过beeline连接,上述配置是允许使用未知用户连接

hive --service hiveserver2 &
  • 查看10000端口是否启动成功

6.配置匿名用户登录

修改core-site.xml

<configuration>
  <property>
    <name>fs.defaultFSname>
    <value>hdfs://master:8020value>
  property>
  <property>
    <name>hadoop.tmp.dirname>
    <value>/var/log/hadoop/tmpvalue>
  property>
  <property>
    <name>hadoop.proxyuser.root.hostsname>
    <value>*value>
  property>
  <property>
    <name>hadoop.proxyuser.root.groupsname>
    <value>*value>
  property>
configuration>

7. 验证部署

beeline -u jdbc:hive2://192.168.128.130:10000 -n root
[root@master opt]# beeline  -u jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://192.168.128.130:10000> create database test1;
INFO  : Compiling command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a): create database test1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:null, properties:null)
INFO  : Completed compiling command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a); Time taken: 2.054 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a): create database test1
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20241029145312_c6b5e83b-f5a7-488b-b2ca-b3ef3336298a); Time taken: 0.169 seconds
No rows affected (2.721 seconds)
0: jdbc:hive2://192.168.128.130:10000> show databases;
INFO  : Compiling command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8): show databases
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null)
INFO  : Completed compiling command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8); Time taken: 0.236 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
INFO  : Executing command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8): show databases
INFO  : Starting task [Stage-0:DDL] in serial mode
INFO  : Completed executing command(queryId=root_20241029145320_63834d7a-1027-4ca4-933e-927dcccbebb8); Time taken: 0.11 seconds
+----------------+
| database_name  |
+----------------+
| default        |
| test1          |
+----------------+
2 rows selected (0.605 seconds)
0: jdbc:hive2://192.168.128.130:10000>

7. 两种连接方式

  • 通过hive命令进行连接
[root@master opt]# hive
Beeline version 4.0.1 by Apache Hive
beeline> !connect jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://192.168.128.130:10000> !quit
Closing: 0: jdbc:hive2://192.168.128.130:10000
  • 通过beeline命令直接连接
[root@master opt]# beeline  -u jdbc:hive2://192.168.128.130:10000 -n root
Connecting to jdbc:hive2://192.168.128.130:10000
Connected to: Apache Hive (version 4.0.1)
Driver: Hive JDBC (version 4.0.1)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 4.0.1 by Apache Hive
0: jdbc:hive2://192.168.128.130:10000>

你可能感兴趣的:(hive,hadoop,数据仓库)