SparkSQL配置(HIVE作为数据源)

HIVE的配置(以mysql做为元数据的存储,hdfs作为数据的存储):

1.修改 hive-env.sh  (可以从hive-default.xml.template拷贝修改)

#hadoop的主目录
export HADOOP_HOME=/usr/local/hadoop
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/hive/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/hive/lib

2.修改 hive-site.xml(可以参考hive-default.xml.template修改)
#此处主要配置与mysql相关信息
  
    javax.jdo.option.ConnectionURL
    jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true
    JDBC connect string for a JDBC metastore
  

  
    javax.jdo.option.ConnectionPassword
    youpassword
    password to use against metastore database
  

 
    javax.jdo.option.ConnectionUserName
    root
    Username to use against metastore database
  
至此hive基本配置完毕
然后启动./HIVE_HOME/bin/hive 看是否能启动成功!
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
配置spark
1.修改spark-env.sh
#内存根据自己的机器配置,注意:太配置小了,运行会出现no resource。。。。。。,
export SCALA_HOME=/usr/local/spark
export JAVA_HOME=/usr/local/jdk1.8.0
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop
export SPARK_MASTER_IP=master
export SPARK_WORKER_MEMORY=800m
export SPARK_EXECUTOR_MEMORY=800m
export SPARK_DRIVER_MEMORY=800m
export SPARK_WORKER_CORES=4
export MASTER=spark://master:7077

2.配置spark-defaults.conf
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two thr"
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://master:9000/historyserverforSpark
#可以用来查看spark的历史执行任务 web UI
spark.yarn.historyServer.address        master:18080
spark.history.fs.logDirectory   hdfs://master:9000/historyserverforSpark 

3.配置slaves(配置了两个work节点)
slave1
slave2
-------------------------------------------------------
在spark/conf中配置添加hive-site.xml,内容如下



hive.metastore.uris  
    thrift://master:9083  
    Thrift URI for the remote metastore. Used by metastore client to connect to remote metastore. 




4.启动 hive的元数据
 hive --servie meatastore
5. 启动sparkSQL
./bin/spark-bin

    


                                                                           
                                               



你可能感兴趣的:(spark)