Spark On Yarn的配置

Spark on Yarn模式下的配置

 

spark-env.sh配置如下,standalone deploy mode部署模式下忽略

export JAVA_HOME=/usr/jdk64/jdk
export SPARK_HOME=/opt/spark


# Options read in YARN client/cluster mode
export SPARK_CONF_DIR=/opt/spark/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf
export YARN_CONF_DIR=/etc/hadoop/conf
export SPARK_EXECUTOR_CORES=1
export SPARK_EXECUTOR_MEMORY=2G
export SPARK_DRIVER_MEMORY=1G

export SPARK_LOG_DIR=/data0/logs/spark2

spark-defaults.conf文件配置

  spark.master                      yarn
  spark.eventLog.enabled            true
  spark.eventLog.dir                hdfs:///spark2-history/
  spark.history.fs.logDirectory     hdfs:///spark2-history/
  spark.history.provider            org.apache.spark.deploy.history.FsHistoryProvider
  spark.serializer                  org.apache.spark.serializer.KryoSerializer
  spark.driver.memory               1g
  spark.eventLog.compress           true
  spark.driver.extraJavaOptions     -Dhdp.version=2.3.4.0-3485
  spark.yarn.am.extraJavaOptions    -Dhdp.version=2.3.4.0-3485
  spark.yarn.preserve.staging.files true                #job结束后,stage相关的文件保留
  spark.yarn.historyServer.address  s10-hadoop:18080    #RM UI连接到history server UI上

 

本次依赖的Hadoop的是hdp的版本,出现部分问题

1. 随后提交Spark Pi测试yarn-cluster模式时,报错Spark Job Failing "Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster"

原因为yarn在upload相关jar时,本地jar名称出现${hdp.version}无法识别导致

添加如下配置

  # spark-conf.default中添加hdp.version参数
  spark.driver.extraJavaOptions     -Dhdp.version=2.3.4.0-3485
  spark.yarn.am.extraJavaOptions    -Dhdp.version=2.3.4.0-3485


  # SPARK_HOME/conf目录下新建java-opts文件,添加
  -Dhdp.version=2.3.4.0-3485

最后再次测试Spark Pi在yarn-cluster模式运行通过

 

2. 支持spark-sql读取hive数据

   在SPARK_CONF_DIR下 添加了软链(不是copy)hive-site.xml   指向原生hive配置文件,有变化也会自动同步

 

3. 在读取hive表进行spark-sql计算时,报Class com.hadoop.compression.lzo.LzoCodec not found 

  hadoop-lzo此处暂时默认安装,不在具体描述

在spark-env.sh中制定hadoop-lzo目录路径

export SPARK_DIST_CLASSPATH=$SPARK_DIST_CLASSPATH:/usr/hdp/2.3.4.0-3485/hadoop/lib/*

SPAKR_DIST_CLASSPATH配置jar 对driver、executor均作用

 

参考:

https://stackoverflow.com/questions/23441142/class-com-hadoop-compression-lzo-lzocodec-not-found-for-spark-on-cdh-5

https://spark.apache.org/docs/2.2.0/hadoop-provided.html

你可能感兴趣的:(spark)