我用的1.7版本,可以在 这里 进行下载。
下载好解压缩安装之后,需要对Java环境变量(我都是直接改的~/.bash_profile)进行配置。貌似OS X下的配置比较恶心,网上(Mac OS 上设置 JAVA_HOME)比较推荐的做法是
export JAVA_HOME=`/usr/libexec/java_home`
设置SSH生成密钥
单节点伪集群部署时需要本机ssh连通,生成密钥方法为:
ssh-keygen -t rsa
cat .ssh/id_rsa.pub >>.ssh/authorized_keys
如果之前本机没有开启SSH服务,需要勾上“系统偏好设置->共享->远程登录”设置项。
在 官网 下载。网上建议的方法是新建一个用户专门用来进行Hadoop环境的配置和管理,偷懒的就在当前用户目录下找个地方解压。
以下是环境变量的配置,后文中都用$HADOOP_HOME表示hadoop根目录。
# hadoop
export HADOOP_HOME=~/Environment/hadoop-2.3.0
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
这个时候应该是可以使用hadoop命令了,可以用以下命令测试一下:
[ 502 ~ ]$hadoop version
Hadoop 2.3.0
Subversion http://svn.apache.org/repos/asf/hadoop/common -r 1567123
Compiled by jenkins on 2014-02-11T13:40Z
Compiled with protoc 2.5.0
From source with checksum dfe46336fbc6a044bc124392ec06b85
This command was run using /Users/chenshijiang/Environment/hadoop-2.3.0/share/hadoop/common/hadoop-common-2.3.0.jar
在使用Hadoop之前,需要对一些配置文件进行修改,Hadoop 2.3.0的配置文件都保存在$HADOOP_HOME/etc/hadoop文件夹下。以下直接列出几个配置文件的修改方法。
core-site.xml
fs.default.name
hdfs://localhost:9000
hadoop.tmp.dir
/Users/your username/Environment/hadoop-2.3.0/tmp
A base for other temporary directories.
这里需要注意的是“hadoop.tmp.dir”的配置,这是为了解决Hadoop namenode无法启动的问题。
hdfs-site.xml
dfs.replication
1
dfs.namenode.name.dir
file:/Users/username/Environment/hadoop-2.3.0/hdfs/namenode
dfs.datanode.data.dir
file:/Users/username/Environment/hadoop-2.3.0/hdfs/datanode
这里需要注意的是配置之前需要提前建好namenode和datanode相应的目录。
mapred-site.xml
mapreduce.framework.name
yarn
yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce_shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.nodemanager.resource.memory-mb
40960
至此,基本完成所有准备工作。
之前的Hadoop版本中,可以使用start-all.sh启动Hadoop,现在这种做法已经不赞同使用了。
[ 503 ~ ]$start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
按输出建议的那样,我们依次启动HDFS和YARN,每次启动之后可以运行jps观察已经启动的服务:
[ 505 ~ ]$start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /log/path
localhost: starting datanode, logging to /log/path
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /log/path
[ 506 ~ ]$jps
27592 Jps
27310 NameNode
27519 SecondaryNameNode
27405 DataNode
[ 507 ~ ]$start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /log/path
localhost: starting nodemanager, logging to /log/path
[ 508 ~ ]$jps
27737 NodeManager
27777 Jps
27640 ResourceManager
27310 NameNode
27519 SecondaryNameNode
27405 DataNode
此时,Hadoop已经启动,用浏览器打开localhost:50070和localhost:8088,可以分别看到HDFS和YARN的管理页面。
我们也可以跑个Job试试了。
[ 510 ~ ]$cd $HADOOP_HOME
[ 511 ~/Environment/hadoop-2.3.0 ]$hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar pi 10 5
#log太长就不贴了
或者先运行以下命令传一个文件到HDFS上:
[513 ~/Environment/hadoop-2.3.0 ]$hadoop fs -mkdir hdfs://localhost:9000/user/
[514 ~/Environment/hadoop-2.3.0 ]$hadoop fs -mkdir hdfs://localhost:9000/user/username
[517 ~/Environment/hadoop-2.3.0 ]$hadoop fs -copyFromLocal README.txt hdfs://localhost:9000/user/username/readme.txt
然后跑一个用该文件的Job:
[ 518 ~/Environment/hadoop-2.3.0 ]$hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.3.0.jar wordcount readme.txt out
*更多hadoop使用方法请执行hadoop查看帮助或者自行Google
跑完以上两个Job后可以在YARN管理界面中查看。 Job 结果
停止Hadoop的操作步骤和启动类似,把start-.sh换成stop-.sh就可以了。
[ 521 ~/Environment/hadoop-2.3.0 ]$stop-yarn.sh
stopping yarn daemons
stopping resourcemanager
localhost: stopping nodemanager
localhost: nodemanager did not stop gracefully after 5 seconds: killing with kill -9
no proxyserver to stop
[ 522 ~/Environment/hadoop-2.3.0 ]$stop-dfs.sh
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [0.0.0.0]
0.0.0.0: stopping secondarynamenode
跑Job可能遇到的问题
14/03/21 13:49:11 INFO mapreduce.Job: Job job_1395379328591_0005 failed with state FAILED due to: Application application_1395379328591_0005 failed 2 times due to AM Container for appattempt_1395379328591_0005_000002 exited with exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
这是因为YARN使用的JAVA_HOME和系统使用的不一致,解决方法为:
[ 583 ~/Environment/hadoop-2.3.0 ]$sudo ln -s /usr/bin/java /bin/java
Password:
14/03/21 13:49:11 INFO mapreduce.Job: Job job_1395379328591_0005 failed with state FAILED due to: Application application_1395379328591_0005 failed 2 times due to AM Container for appattempt_1395379328591_0005_000002 exited with exitCode: 127 due to: Exception from container-launch: org.apache.hadoop.util.Shell$ExitCodeException:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:418)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:195)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:283)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:79)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
执行初始化namenode:
hadoop namenode -format
官网教程:http://hadoop.apache.org/docs/r1.0.4/cn/quickstart.html
Hadoop namenode无法启动
http://stackoverflow.com/questions/20390217/mapreduce-job-in-headless-environment-fails-n-times-due-to-am-container-exceptio