阅读更多
(1)hadoop2.7.1源码编译 |
http://aperise.iteye.com/blog/2246856 |
(2)hadoop2.7.1安装准备 |
http://aperise.iteye.com/blog/2253544 |
(3)1.x和2.x都支持的集群安装 |
http://aperise.iteye.com/blog/2245547 |
(4)hbase安装准备 |
http://aperise.iteye.com/blog/2254451 |
(5)hbase安装 |
http://aperise.iteye.com/blog/2254460 |
(6)snappy安装 |
http://aperise.iteye.com/blog/2254487 |
(7)hbase性能优化 |
http://aperise.iteye.com/blog/2282670 |
(8)雅虎YCSBC测试hbase性能测试 |
http://aperise.iteye.com/blog/2248863 |
(9)spring-hadoop实战 |
http://aperise.iteye.com/blog/2254491 |
(10)基于ZK的Hadoop HA集群安装 |
http://aperise.iteye.com/blog/2305809 |
1.什么是Hadoop
1.1 Hadoop历史渊源
Doug Cutting是Apache Lucene创始人, Apache Nutch项目开始于2002年,Apache Nutch是Apache Lucene项目的一部分。2005年Nutch所有主要算法均完成移植,用MapReduce和NDFS来运行。2006年2月,Nutch将MapReduce和NDFS移出Nutch形成Lucene一个子项目,命名Hadoop。
Hadoop不是缩写,而是虚构名。项目创建者Doug Cutting解释Hadoop的得名:“这个名字是我孩子给一个棕黄色的大象玩具命名的。我的命名标准就是简短,容易发音和拼写,没有太多的意义,并且不会被用于别处。小孩子恰恰是这方面的高手。”
1.2 狭义的Hadoop
个人认为,狭义的Hadoop指Apache下Hadoop子项目,该项目由以下模块组成:
- Hadoop Common: 一系列组件和接口,用于分布式文件系统和通用I/O
- Hadoop Distributed File System (HDFS?): 分布式文件系统
- Hadoop YARN: 一个任务调调和资源管理框架
- Hadoop MapReduce: 分布式数据处理编程模型,用于大规模数据集并行运算
狭义的Hadoop主要解决三个问题,提供HDFS解决分布式存储问题,提供YARN解决任务调度和资源管理问题,提供一种编程模型,让开发者可以进来编写代码做离线大数据处理。
1.3 广义的Hadoop
个人认为,广义的Hadoop指整个Hadoop生态圈,生态圈中包含各个子项目,每个子项目为了解决某种场合问题而生,主要组成如下图:

2.Hadoop集群部署两种集群部署方式
2.1 hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式

2.2 仅hadoop2.x支持的active namenode+standby namenode方式
2.3 Hadoop官网关于集群方式介绍
1)单机Hadoop环境搭建
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/SingleCluster.html
2)集群方式
集群方式一(hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式)
http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-common/ClusterSetup.html
集群方式二(仅hadoop2.x支持的active namenode+standby namenode方式,也叫HADOOP HA方式),这种方式又将HDFS的HA和YARN的HA单独分开讲解。
HDFS HA(zookeeper+journalnode)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
HDFS HA(zookeeper+NFS)http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailability
YARN HA(zookeeper)http://hadoop.apache.org/docs/r2.7.2/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html
生产环境多采用HDFS(zookeeper+journalnode)(active NameNode+standby NameNode+JournalNode+DFSZKFailoverController+DataNode)+YARN(zookeeper)(active ResourceManager+standby ResourceManager+NodeManager)方式,这里我讲解的是hadoop1.x和hadoop2.x都支持的namenode+secondarynamenode方式,这种方式主要用于学习实践,因为它需要的机器台数低,但存在namenode单节点问题
3.Hadoop安装
3.1 所需软件包
- JavaTM1.7.x,必须安装,建议选择Sun公司发行的Java版本。经验证目前hadoop2.7.1暂不支持jdk1.6,这里用的是jdk1.7,下载地址为:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
- ssh 必须安装并且保证 sshd一直运行,以便用Hadoop 脚本管理远端Hadoop守护进程。
- hadoop安装包下载地址:http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
3.2 环境
- 操作系统: Red Hat Enterprise Linux Server release 5.8 (Tikanga)
- 主从服务器:Master 192.168.181.66 Slave1 192.168.88.21 Slave2 192.168.88.22
3.3 SSH免密码登录
首先需要在linux上安装SSH(因为Hadoop要通过SSH链接集群其他机器进行读写操作),请自行安装。Hadoop需要通过SSH登录到各个节点进行操作,我用的是hadoop用户,每台服务器都生成公钥,再合并到authorized_keys。
1.CentOS默认没有启动ssh无密登录,去掉/etc/ssh/sshd_config其中2行的注释,每台服务器都要设置。 修改前:
#RSAAuthentication yes
#PubkeyAuthentication yes
修改后(修改后需要执行service sshd restart):
RSAAuthentication yes
PubkeyAuthentication yes
后续请参考http://aperise.iteye.com/blog/2253544
3.4 安装JDK
Hadoop2.7需要JDK7,JDK1.6在Hadoop启动时候会报如下错误
[hadoop@nmsc1 bin]# ./hdfs namenode -format
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/hadoop/hdfs/server/namenode/NameNode : Unsupported major.minor version 51.0
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.hdfs.server.namenode.NameNode. Program will exit.
1.下载jdk-7u65-linux-x64.gz放置于/opt/java/jdk-7u65-linux-x64.gz.
2.解压,输入命令tar -zxvf jdk-7u65-linux-x64.gz.
3.编辑/etc/profile,在文件末尾追加如下内容
export JAVA_HOME=/opt/java/jdk1.7.0_65
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
4.使配置生效,输入命令,source /etc/profile
5.输入命令java -version,检查JDK环境是否配置成功。
[hadoop@nmsc2 java]# java -version
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
[hadoop@nmsc2 java]#
3.5 安装Hadoop2.7
1.只在master上下载hadoop-2.7.1.tar.gz并放置于/opt/hadoop-2.7.1.tar.gz.
2.解压,输入命令tar -xzvf hadoop-2.7.1.tar.gz.
3.在/home目录下创建数据存放的文件夹,hadoop/tmp、hadoop/hdfs、hadoop/hdfs/data、hadoop/hdfs/name.
4.配置/opt/hadoop-2.7.1/etc/hadoop目录下的core-site.xml
fs.trash.interval
1440
fs.defaultFS
hdfs://192.168.181.66:9000
hadoop.tmp.dir
file:/home/hadoop/tmp
io.file.buffer.size
131072
dfs.namenode.handler.count
200
The number of server threads for the namenode.
dfs.datanode.handler.count
100
The number of server threads for the datanode.
5.配置/opt/hadoop-2.7.1/etc/hadoop目录下的hdfs-site.xml
dfs.namenode.name.dir
file:/home/hadoop/hdfs/name
dfs.datanode.data.dir
file:/home/hadoop/hdfs/data
dfs.replication
3
dfs.namenode.secondary.http-address
192.168.181.66:9001
dfs.webhdfs.enabled
true
dfs.datanode.du.reserved
107374182400
Reserved space in bytes per volume. Always leave this much space free for non dfs use.
dfs.client.socket-timeout
600000/value>
dfs.datanode.max.transfer.threads
409600
6.配置/opt/hadoop-2.7.1/etc/hadoop目录下的mapred-site.xml.template另存为mapred-site.xml ,修改内容如下:
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
192.168.181.66:10020
mapreduce.jobhistory.webapp.address
192.168.88.21:19888
7.配置/opt/hadoop-2.7.1/etc/hadoop目录下的yarn-site.xml ,修改内容如下:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.auxservices.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.address
192.168.181.66:8032
yarn.resourcemanager.scheduler.address
192.168.181.66:8030
yarn.resourcemanager.resource-tracker.address
192.168.181.66:8031
yarn.resourcemanager.admin.address
192.168.181.66:8033
yarn.resourcemanager.webapp.address
192.168.181.66:8088
yarn.nodemanager.resource.memory-mb
2048
8.配置/home/hadoop/hadoop-2.7.1/etc/hadoop目录下hadoop-env.sh、yarn-env.sh的JAVA_HOME,不设置的话,启动不了
export JAVA_HOME=/opt/java/jdk1.7.0_65
9.配置/home/hadoop/hadoop-2.7.1/etc/hadoop目录下的slaves,删除默认的localhost,增加1个从节点
#localhost
192.168.88.22
192.168.88.21
10.将配置好的Hadoop复制到各个节点对应位置上,通过scp传送
chmod -R 777 /home/hadoop
chmod -R 777 /opt/hadoop-2.7.1
scp -r /opt/hadoop-2.7.1 192.168.88.22:/opt/
scp -r /home/hadoop 192.168.88.22:/home
scp -r /opt/hadoop-2.7.1 192.168.88.21:/opt/
scp -r /home/hadoop 192.168.88.21:/home
11.在Master服务器启动hadoop,从节点会自动启动,进入/home/hadoop/hadoop-2.7.1目录
(1)初始化,输入命令./hdfs namenode -format
[hadoop@nmsc2 bin]# cd /opt/hadoop-2.7.1/bin/
[hadoop@nmsc2 bin]# ls
container-executor hadoop hadoop.cmd hdfs hdfs.cmd mapred mapred.cmd rcc test-container-executor yarn yarn.cmd
[hadoop@nmsc2 bin]# ./hdfs namenode -format
15/09/23 16:03:17 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = nmsc2/127.0.0.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.1
STARTUP_MSG: classpath = /opt/hadoop-2.7.1/etc/hadoop:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpclient-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-api-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-framework-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/gson-2.2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/curator-recipes-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/mockito-all-1.8.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsch-0.1.42.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-asn1-api-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jsp-api-2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/httpcore-4.2.5.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-1.7.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-httpclient-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-net-3.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-math3-3.1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-i18n-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/java-xmlbuilder-0.4.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-digester-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-configuration-1.6.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/api-util-1.0.0-M20.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hadoop-auth-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jets3t-0.9.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/commons-beanutils-core-1.8.0.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/common/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/common/hadoop-common-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xml-apis-1.3.04.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/htrace-core-3.1.0-incubating.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xmlenc-0.52.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/netty-all-4.0.23.Final.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/xercesImpl-2.9.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/lib/commons-daemon-1.0.13.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/hdfs/hadoop-hdfs-nfs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-json-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-logging-1.1.3.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-cli-1.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jsr305-3.0.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/guava-11.0.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6-tests.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/servlet-api-2.5.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-lang-2.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/activation-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-codec-1.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/commons-collections-3.2.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-api-2.2.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/zookeeper-3.4.6.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jettison-1.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-util-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jersey-client-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/stax-api-1.0-2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jaxb-impl-2.2.3-1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-xc-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jackson-jaxrs-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/jetty-6.1.26.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-tests-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-web-proxy-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-distributedshell-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-registry-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-applications-unmanaged-am-launcher-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-sharedcachemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-api-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-nodemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-resourcemanager-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-client-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/yarn/hadoop-yarn-server-applicationhistoryservice-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-servlet-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/xz-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-guice-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-compress-1.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-mapper-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-server-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/avro-1.7.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/guice-3.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/paranamer-2.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/aopalliance-1.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hadoop-annotations-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/javax.inject-1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/commons-io-2.4.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/netty-3.6.2.Final.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jersey-core-1.9.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/jackson-core-asl-1.9.13.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/junit-4.11.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/log4j-1.2.17.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/protobuf-java-2.5.0.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/snappy-java-1.0.4.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/asm-3.2.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/hamcrest-core-1.3.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/lib/leveldbjni-all-1.8.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1-tests.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-plugins-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-hs-2.7.1.jar:/opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.7.1.jar:/contrib/capacity-scheduler/*.jar
STARTUP_MSG: build = https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a; compiled by 'jenkins' on 2015-06-29T06:04Z
STARTUP_MSG: java = 1.7.0_65
************************************************************/
15/09/23 16:03:17 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
15/09/23 16:03:17 INFO namenode.NameNode: createNameNode [-format]
15/09/23 16:03:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-695216ce-3c4e-47e4-a31f-24a7e40d8791
15/09/23 16:03:18 INFO namenode.FSNamesystem: No KeyProvider found.
15/09/23 16:03:18 INFO namenode.FSNamesystem: fsLock is fair:true
15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
15/09/23 16:03:18 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
15/09/23 16:03:18 INFO blockmanagement.BlockManager: The block deletion will start around 2015 Sep 23 16:03:18
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map BlocksMap
15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
15/09/23 16:03:18 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB
15/09/23 16:03:18 INFO util.GSet: capacity = 2^21 = 2097152 entries
15/09/23 16:03:18 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: defaultReplication = 1
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplication = 512
15/09/23 16:03:18 INFO blockmanagement.BlockManager: minReplication = 1
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
15/09/23 16:03:18 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
15/09/23 16:03:18 INFO blockmanagement.BlockManager: encryptDataTransfer = false
15/09/23 16:03:18 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
15/09/23 16:03:18 INFO namenode.FSNamesystem: fsOwner = hadoop(auth:SIMPLE)
15/09/23 16:03:18 INFO namenode.FSNamesystem: supergroup = supergroup
15/09/23 16:03:18 INFO namenode.FSNamesystem: isPermissionEnabled = true
15/09/23 16:03:18 INFO namenode.FSNamesystem: HA Enabled: false
15/09/23 16:03:18 INFO namenode.FSNamesystem: Append Enabled: true
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map INodeMap
15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
15/09/23 16:03:18 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB
15/09/23 16:03:18 INFO util.GSet: capacity = 2^20 = 1048576 entries
15/09/23 16:03:18 INFO namenode.FSDirectory: ACLs enabled? false
15/09/23 16:03:18 INFO namenode.FSDirectory: XAttrs enabled? true
15/09/23 16:03:18 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
15/09/23 16:03:18 INFO namenode.NameNode: Caching file names occuring more than 10 times
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map cachedBlocks
15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
15/09/23 16:03:18 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB
15/09/23 16:03:18 INFO util.GSet: capacity = 2^18 = 262144 entries
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
15/09/23 16:03:18 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
15/09/23 16:03:18 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
15/09/23 16:03:18 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
15/09/23 16:03:18 INFO util.GSet: Computing capacity for map NameNodeRetryCache
15/09/23 16:03:18 INFO util.GSet: VM type = 64-bit
15/09/23 16:03:18 INFO util.GSet: 0.029999999329447746% max memory 889 MB = 273.1 KB
15/09/23 16:03:18 INFO util.GSet: capacity = 2^15 = 32768 entries
15/09/23 16:03:18 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1469452028-127.0.0.1-1442995398776
15/09/23 16:03:18 INFO common.Storage: Storage directory /home/hadoop/dfs/name has been successfully formatted.
15/09/23 16:03:19 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
15/09/23 16:03:19 INFO util.ExitUtil: Exiting with status 0
15/09/23 16:03:19 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at nmsc2/127.0.0.1
************************************************************/
[hadoop@nmsc2 bin]#
(2)全部启动./start-all.sh,也可以分开./start-dfs.sh、./start-yarn.sh
[hadoop@nmsc1 bin]# cd /opt/hadoop-2.7.1/sbin/
[hadoop@nmsc1 sbin]# ./start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
15/09/23 16:48:22 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [192.168.88.21]
192.168.88.21: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-namenode-nmsc1.out
192.168.88.22: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-datanode-nmsc2.out
Starting secondary namenodes [192.168.88.21]
192.168.88.21: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-nmsc1.out
15/09/23 16:48:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
resourcemanager running as process 5881. Stop it first.
192.168.88.22: starting nodemanager, logging to /opt/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-nmsc2.out
[hadoop@nmsc1 sbin]#
(3)停止的话,输入命令,./stop-all.sh
关于Hadoop相关shell命令调用关系,见下图:

(4)输入命令,jps,可以看到相关信息
[hadoop@nmsc1 sbin]# jps
14201 Jps
5881 ResourceManager
13707 NameNode
13924 SecondaryNameNode
[hadoop@nmsc1 sbin]#
12.Web访问,要先开放端口或者直接关闭防火墙
(1)输入命令
systemctl stop firewalld.service(centos)
chkconfig iptables on(redhat 防火墙开启)
chkconfig iptables off (redhat 防火墙关闭)
(2)浏览器打开http://192.168.181.66:8088/(ResourceManager对外web ui地址。用户可通过该地址在浏览器中查看集群各类信息)

(3)浏览器打开http://192.168.181.66:50070/ (NameNode)

(4)浏览器打开http://192.168.181.66:9001 (备用第二个NameNode)

13.安装完成。这只是大数据应用的开始,之后的工作就是,结合自己的情况,编写程序调用Hadoop的接口,发挥hdfs、mapreduce的作用
3.6 遇到的问题
1)hadoop 启动的时候datanode报错 Problem connecting to server
解决办法是修改/etc/hosts,详见http://blog.csdn.net/renfengjun/article/details/25320043
2)启动yarn时候nodemanager无法启动,报错doesn't satisfy minimum allocations, Sending SHUTDOWN signal原因是yarn-site.xml中yarn.nodemanager.resource.memory-mb配置的nodemanager可使用的内存过低,最低不能小于1024M
3)hbase报错:util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using built
解决办法:
sudo rm -r /opt/hbase-1.2.1/lib/native
sudo mkdir /opt/hbase-1.2.1/lib/native
sudo mkdir /opt/hbase-1.2.1/lib/native/Linux-amd64-64
sudo cp -r /opt/hadoop-2.7.1/lib/native/* /opt/hbase-1.2.1/lib/native/Linux-amd64-64/
4)在hbase创建表时指定压缩方式报错”Compression algorithm 'snappy' previously failed test. Set hbase.table.sanity.checks to false at conf“
建表语句为:create 'signal1', { NAME => 'info', COMPRESSION => 'SNAPPY' }, SPLITS => ['00000000000000000000000000000000','10000000000000000000000000000000','20000000000000000000000000000000','30000000000000000000000000000000','40000000000000000000000000000000','50000000000000000000000000000000','60000000000000000000000000000000','70000000000000000000000000000000','80000000000000000000000000000000','90000000000000000000000000000000']
解决办法是在hbase-site.xml中增加配置
hbase.table.sanity.checks
false
5)nodemanager无法启动,报错如下
2016-01-26 18:45:10,891 INFO org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Registering with RM using containers :[]
2016-01-26 18:45:11,778 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:270)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:196)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:271)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:486)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:533)
2016-01-26 18:45:11,781 INFO org.apache.hadoop.service.AbstractService: Service org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl failed in state STARTED; cause: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: NodeManager from slavery01 doesn't satisfy minimum allocations, Sending SHUTDOWN signal to the NodeManager.
解决办法
yarn.nodemanager.resource.memory-mb
2048
6) WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
解决办法 http://zhidao.baidu.com/link?url=_cOK3qt3yzgWwifuMGuZhSOTUyKTiYZfyHr3Xd1id345B9SvSIGsJ-mGLDsk4QseWmBnY5LjxgwHwjKQ4UTFtm8IV6J2im4QfSRh__MhzpW
7) 很多人单机版Hadoop遇到错误Hadoop hostname: Unknown host
解决办法:先ifconfig查看本机IP和用hostname查看主机名,比如为 192.168.55.128和hadoop,那就在/etc/hosts增加一条记录192.168.55.128 hadoop,然后同步修改core-site.xml和mapred-site.xml中localhost为hadoop,修改完后执行./hdfs namenode format,执行完后sbin/start-all.sh就可以了
3.7 网上找到一网友关于hadoop2.7+hbase1.0+hive1.2安装的总结,详见附件“我学大数据技术(hadoop2.7+hbase1.0+hive1.2).pdf”
另外写的比较好的文章有:
Hadoop2.7.1分布式安装-准备篇 http://my.oschina.net/allman90/blog/485352
Hadoop2.7.1分布式安装-安装篇 http://my.oschina.net/allman90/blog/486117
3.8 常用shell
#显示hdfs指定路径/user/下的文件和文件夹
bin/hdfs dfs –ls /user/
#将本地文件/opt/smsmessage.txt上传到hdfs的目录/user/下
bin/hdfs dfs –put /opt/smsmessage.txt /user/
#将hdfs上的文件/user/smsmessage.txt下载到本地/opt/目录下
bin/hdfs dfs -get /user/smsmessage.txt /opt/
#查看hdfs中的文本文件/opt/smsmessage.txt内容
bin/hdfs dfs –cat /opt/smsmessage.txt
#查看hdfs中的/user/smsmessage.txt文件内容
bin/hdfs dfs –text /user/smsmessage.txt
#将hdfs上的文件/user/smsmessage.txt删除
bin/hdfs dfs –rm /user/smsmessage.txt
#在执行balance 操作之前,可设置一下balance 操作占用的网络带宽,设置10M,10*1024*1024
bin/hdfs dfsadmin -setBalancerBandwidth
#执行Hadoop自带Wordcount例子,/input目录必须存在于HDFS上,且其下有文件,/output目录是输出目录,mapreduce会自动创建
bin/hadoop jar /opt/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /input /output
#用这个命令可以检查整个文件系统的健康状况,但是要注意它不会主动恢复备份缺失的block,这个是由NameNode单独的线程异步处理的。
cd /opt/hadoop-2.7.1/bin
./hdfs fsck /
#Hadoop设置根目录/下的备份数
cd /opt/hadoop-2.7.1/bin
./hadoop fs -setrep -R 2 /
#也可以使用如下命令
./hdfs dfs -setrep -R 2 /
#打印出了这个文件每个block的详细信息包括datanode的机架信息。
cd /opt/hadoop-2.7.1/bin
bin/hadoop fsck /user/distribute-hadoop-boss/tmp/pgv/20090813/1000000103/input/JIFEN.QQ.COM.2009-08-13-18.30 -files -blocks -locations -racks
#查看配置文件hdfs-site.xml中配置项dfs.client.block.write.replace-datanode-on-failure.enable和dfs.client.block.write.replace-datanode-on-failure.policy配置的值
cd /opt/hadoop-2.7.1/bin
./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.enable
./hdfs getconf -confKey dfs.client.block.write.replace-datanode-on-failure.policy
#启动HDFS,该命令会读取slaves和配置文件,将所有节点HDFS相关服务启动
cd /opt/hadoop-2.7.1/sbin
./start-dfs.sh
#启动yarn,该命令会读取slaves和配置文件,将所有节点YARN相关服务启动
cd /opt/hadoop-2.7.1/sbin
./start-yarn.sh
#只在单机上启动服务namenode、secondarynamenode、journalnode、datanode
./hadoop-daemon.sh start/stop namenode
./hadoop-daemon.sh start/stop secondarynamenode
./hadoop-daemon.sh start/stop journalnode
./hadoop-daemon.sh start/stop datanode
#查看是否在安全模式
[hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin
[hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode get
Safe mode is OFF
[hadoop@nmsc2 bin]$
#离开安全模式
[hadoop@nmsc2 bin]$ cd /opt/hadoop-2.7.1/bin
[hadoop@nmsc2 bin]$ ./hdfs dfsadmin -safemode leave
Safe mode is OFF
[hadoop@nmsc2 bin]$
#查看某些参数配置值
[hadoop@nmsc1 bin]$ cd /opt/hadoop-2.7.1/bin
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.datanode.handler.count
100
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.handler.count
200
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.read.stale.datanode
false
[hadoop@nmsc1 bin]$ ./hdfs getconf -confKey dfs.namenode.avoid.write.stale.datanode
false
[hadoop@nmsc1 bin]$

- 大小: 46.4 KB

- 大小: 26.4 KB

- 大小: 26 KB
- 我学大数据技术_hadoop2.7_hbase1.0_hive1.2_.pdf (5 MB)
- 下载次数: 107

- 大小: 105.4 KB

- 大小: 23.8 KB

- 大小: 165.4 KB

- 大小: 123.8 KB