Presto
[官网地址] https://prestodb.github.io/overview.html
Presto is a distributed system that runs on a cluster of machines. A full installation includes a coordinator and multiple workers. Queries are submitted from a client such as the Presto CLI to the coordinator. The coordinator parses, analyzes and plans the query execution, then distributes the processing to the workers.
Presto has a few basic requirements:
Linux or Mac OS X
Java 8, 64-bit
Python 2.4+
HADOOP / HIVE
Presto supports reading Hive data from the following versions of Hadoop:
Apache Hadoop 1.x
Apache Hadoop 2.x
Cloudera CDH 4
Cloudera CDH 5
The following file formats are supported: Text, SequenceFile, RCFile, ORC and Parquet.
Additionally, a remote Hive metastore is required. Local or embedded mode is not supported. Presto does not use MapReduce and thus only requires HDFS.
CASSANDRA
Cassandra 2.x is required. This connector is completely independent of the Hive connector and only requires an existing Cassandra installation.
TPC-H
The TPC-H connector dynamically generates data that can be used for experimenting with and testing Presto. This connector has no external requirements.
https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.226/presto-server-0.226.tar.gz
cd presto-server-0.226 mkdir etc
Create an etc
directory inside the installation directory. This will hold the following configuration:
Node Properties: environmental configuration specific to each node
JVM Config: command line options for the Java Virtual Machine
Config Properties: configuration for the Presto server
Catalog Properties: configuration for Connectors (data sources)
etc/node.properties
node.environment=production node.id=ffffffff-ffff-ffff-ffff-ffffffffffff node.data-dir=/shidian/servers/presto-server-0.226/data
jvm.config
-server -Xmx16G -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError
log.properties
com.facebook.presto=INFO
config.properties
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=8099 query.max-memory=5GB query.max-memory-per-node=1GB query.max-total-memory-per-node=2GB discovery-server.enabled=true discovery.uri=http://192.168.162.100:8099
创建目录
mkdir -p /etc/catalog
在该目录下:
hive.properties
connector.name=hive-hadoop2 hive.metastore.uri=thrift://hadoop-01:9083 hive.config.resources=/shidian/servers/hadoop-2.7.7/etc/hadoop/core-site.xml,/shidian/servers/hadoop-2.7.7/etc/hadoop/hdfs-site.xml hive.allow-drop-table=true
jmx.properties
connector.name=jmx
[root@hadoop-01 bin]# ./hive --service metastore -p 9083 & [2] 101098 [root@hadoop-01 bin]# ls: cannot access /shidian/servers/spark-2.1.3-bin-hadoop2.7/lib/spark-assembly-*.jar: No such file or directory Starting Hive Metastore Server
./launcher start
客户端下载地址:
[客户端下载] https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.226/presto-cli-0.226-executable.jar
下载完成之后,将其拷贝到如下的目录下
~/presto-server-0.226/bin/
重命名:
make it executable with chmod +x, then run it:
客户端登录命令:
./presto --server hadoop-01:8099 --catalog hive --schema default
登录成功:
[root@hadoop-01 bin]# ./presto --server hadoop-01:8099 --catalog hive --schema default presto:default> show schemas; Schema -------------------- db1 default information_schema (3 rows) Query 20190925_202301_00016_t22a5, FINISHED, 1 node Splits: 19 total, 19 done (100.00%) 0:00 [3 rows, 43B] [8 rows/s, 116B/s] presto:default> use db1; USE presto:db1> select * from t1 order by id; id ---- 1 2 (2 rows) Query 20190925_202310_00020_t22a5, FINISHED, 1 node Splits: 20 total, 20 done (100.00%) 0:00 [2 rows, 4B] [4 rows/s, 8B/s] presto:db1>