presto 安装部署

相关依赖

presto版本:presto-server-0.191

java版本: 1.8

python版本: 2.6

配置

在presto跟目录下创建etc文件夹

在etc文件夹中创建catalog文件夹,以及config.properties,jvm.config,log.properties,node.properties

若连接hive在catalog文件夹中创建hive.properties

config.properties (coordinator)

coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

config.properties (worker)

coordinator=false
http-server.http.port=8080
query.max-memory=50GB
query.max-memory-per-node=1GB
discovery.uri=http://example.net:8080

config.properties (both coordinator and worker)

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=8080
query.max-memory=5GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://example.net:8080

config.properties 参数描述(官方文档)

coordinator: Allow this Presto instance to function as a coordinator (accept queries from clients and manage query execution).

node-scheduler.include-coordinator: Allow scheduling work on the coordinator. For larger clusters, processing work on the coordinator can impact query performance because the machine’s resources are not available for the critical task of scheduling, managing and monitoring query execution.

http-server.http.port: Specifies the port for the HTTP server. Presto uses HTTP for all communication, internal and external.

query.max-memory: The maximum amount of distributed memory that a query may use.

query.max-memory-per-node: The maximum amount of memory that a query may use on any one machine.

discovery-server.enabled: Presto uses the Discovery service to find all the nodes in the cluster. Every Presto instance will register itself with the Discovery service on startup. In order to simplify deployment and avoid running an additional service, the Presto coordinator can run an embedded version of the Discovery service. It shares the HTTP server with Presto and thus uses the same port.

discovery.uri: The URI to the Discovery server. Because we have enabled the embedded version of Discovery in the Presto coordinator, this should be the URI of the Presto coordinator. Replace example.net:8080 to match the host and port of the Presto coordinator. This URI must not end in a slash.

jmx.rmiregistry.port: Specifies the port for the JMX RMI registry. JMX clients should connect to this port.

jmx.rmiserver.port: Specifies the port for the JMX RMI server. Presto exports many metrics that are useful for monitoring via JMX.

node.properties

node.environment=production
node.id=node01
node.data-dir=/var/presto/data

node.properties 参数描述(官方文档)

node.environment: The name of the environment. All Presto nodes in a cluster must have the same environment name.

node.id: The unique identifier for this installation of Presto. This must be unique for every node. This identifier should remain consistent across reboots or upgrades of Presto. If running multiple installations of Presto on a single machine (i.e. multiple nodes on the same machine), each installation must have a unique identifier.

node.data-dir: The location (filesystem path) of the data directory. Presto will store logs and other data here.

jvm.config

-server
-Xmx16G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:+ExitOnOutOfMemoryError

log.properties

com.facebook.presto=INFO

hive.properties

connector.name=hive-hadoop2
#这个连接器的选择要根据自身集群情况结合插件包的名字来写  
hive.metastore.uri=thrift://node16.test:9083
 #修改为 hive-metastore 服务所在的主机名称,这里我是安装在master节点  
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

每个节点使用bin/launcher start 启动 (使用bi/launcher run 可以在前台启动)

下载cli命令行jar包,执行

./presto --server localhost:8080 --catalog hive --schema default

进行测试。
目前看一台16GB的机器执行count(1)一亿条记录需要2min,两台16GB的机器需要1min

你可能感兴趣的:(presto 安装部署)