参考链接http://hadoop.apache.org/docs/r1.2.1/index.html
1、hadoop单结点部署测试环境
[root@lv1 ~]# useradd -u 800 hadoop [root@lv1 ~]# passwd hadoop [hadoop@lv1 ~]$ ssh-keygen [hadoop@lv1 ~]$ ssh-copy-id localhost [hadoop@lv1 ~]$ ssh localhost [hadoop@lv1 ~]$ logout [hadoop@lv1 ~]$ ssh 192.168.2.145 [hadoop@lv1 ~]$ logout [hadoop@lv1 ~]$ ssh lv1.example.com [hadoop@lv1 ~]$ logout lftp i:/> get pub/docs/hadoop/hadoop-1.2.1.tar.gz lftp i:/> get pub/docs/java/jdk-6u32-linux-x64.bin [hadoop@lv1 ~]$ tar zxf hadoop-1.2.1.tar.gz [hadoop@lv1 ~]$ chown -R hadoop.hadoop hadoop-1.2.1/ [hadoop@lv1 ~]$ ln -s hadoop-1.2.1/ hadoop [hadoop@lv1 ~]$ sh jdk-6u32-linux-x64.bin [hadoop@lv1 ~]$ mv jdk1.6.0_32/ hadoop [hadoop@lv1 ~]$ cd hadoop [hadoop@lv1 hadoop]$ ln -s jdk1.6.0_32/ jdk [hadoop@lv1 hadoop]$ vim conf/hadoop-env.sh 打开 export JAVA_HOME=/home/hadoop/hadoop/jdk
测试
[hadoop@lv1 hadoop]$ cp -r conf/ input [hadoop@lv1 hadoop]$ bin/hadoop jar hadoop-examples-1.2.1.jar grep input output 'dfs[a-z.]+' 过滤测试 [hadoop@lv1 hadoop]$ cat output/part-00000 1 dfs.server.namenode. 1 dfsadmin [hadoop@lv1 hadoop]$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount input test 统计单词测试
2、伪分布式hadoop
在单一结点做任务的map-reduce
[hadoop@lv1 ~]$ mkdir bin [hadoop@lv1 ~]$ ln -s /home/hadoop/hadoop/jdk/bin/jps ~/bin/ 便于使用jps命令
配置文件
conf/core-site.xml:
<configuration> <property> <name>fs.default.name</name> <value>hdfs://localhost:9000</value> </property> </configuration>
conf/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
conf/mapred-site.xml:
<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:9001</value> </property> </configuration>
[hadoop@lv1 hadoop]$ bin/hadoop namenode -format 初始化namenode [hadoop@lv1 hadoop]$ bin/start-all.sh 开启 [hadoop@lv1 hadoop]$ jps 2641 NameNode 相当于存储master 2912 JobTracker 相当于任务分发,与master在一起 3032 TaskTracker 作业结点 2744 DataNode 数据存储结点 2847 SecondaryNameNode
浏览器访问查看监控:
NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/
[hadoop@lv1 hadoop]$ bin/hadoop fs -mkdir input [hadoop@lv1 hadoop]$ bin/hadoop fs -put conf/* input [hadoop@lv1 hadoop]$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount input output [hadoop@lv1 hadoop]$ bin/hadoop fs -cat output/*
3、完全分布式
实验环境:
192.168.2.145 lv1.example.com 启动SecondaryNameNode、JobTracker、NameNode
192.168.2.146 node1.example.com 启动TaskTracker、DataNode
192.168.2.189 node2.example.com
192.168.2.142 node3.example.com 添加删除结点
master到slave之间无密码验证
[root@node1 ~]# useradd -u 800 hadoop [root@node2 ~]# useradd -u 800 hadoop [root@node1 ~]# passwd hadoop [hadoop@lv1 ~]$ scp -r .ssh/ node1.example.com: [hadoop@lv1 ~]$ scp -r .ssh/ node2.example.com: [hadoop@lv1 ~]$ ssh node1.example.com 测试 [hadoop@lv1 ~]$ ssh node2.example.com
修改配置文件
[hadoop@lv1 hadoop]$ vim conf/masters lv1.example.com [hadoop@lv1 hadoop]$ vim conf/slaves node1.example.com node2.example.com
[hadoop@lv1 hadoop]$ vim conf/core-site.xml 修改localhost为lv1.example.com
[hadoop@lv1 hadoop]$ vim conf/mapred-site.xml 修改localhost为lv1.example.com
[hadoop@lv1 hadoop]$ vim conf/hdfs-site.xml 修改value的值为2,数据为2份
[hadoop@lv1 ~]$ scp -r hadoop-1.2.1/ node1.example.com: [hadoop@lv1 ~]$ scp -r hadoop-1.2.1/ node2.example.com:
两个node结点制作软链接
[hadoop@node1 ~]$ ln -s hadoop-1.2.1/ hadoop [hadoop@node1 ~]$ mkdir bin [hadoop@node1 ~]$ cd bin/ [hadoop@node1 bin]$ ln -s ~/hadoop/jdk/bin/jps .
重新格式化namenode
[hadoop@lv1 hadoop]$ bin/hadoop namenode -format [hadoop@lv1 hadoop]$ bin/start-all.sh 启动hadoop starting namenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-namenode-lv1.example.com.out node1.example.com: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-node1.example.com.out node2.example.com: starting datanode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-datanode-node2.example.com.out lv1.example.com: starting secondarynamenode, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-secondarynamenode-lv1.example.com.out starting jobtracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-jobtracker-lv1.example.com.out node1.example.com: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-node1.example.com.out node2.example.com: starting tasktracker, logging to /home/hadoop/hadoop-1.2.1/libexec/../logs/hadoop-hadoop-tasktracker-node2.example.com.out
测试
[hadoop@lv1 hadoop]$ bin/hadoop fs -put conf input [hadoop@lv1 hadoop]$ bin/hadoop fs -ls drwxr-xr-x - hadoop supergroup 0 2014-06-15 02:19 /user/hadoop/input [hadoop@lv1 hadoop]$ bin/hadoop jar hadoop-examples-1.2.1.jar grep input output 'dfs[a-z.]+' [hadoop@lv1 hadoop]$ bin/hadoop fs -cat output/* 查看结果
浏览器访问http://192.168.2.145:50030和http://192.168.2.145:50030
动态添加删除结点
[hadoop@lv1 hadoop]$ dd if=/dev/zero of=bigfile bs=1M count=100 [hadoop@lv1 hadoop]$ bin/hadoop fs -mkdir files [hadoop@lv1 hadoop]$ bin/hadoop fs -put bigfile files 上传到HDFS中 [hadoop@lv1 hadoop]$ bin/hadoop dfsadmin -report 可以看到两个结点各有100M数据
[root@node3 ~]# useradd -u 800 hadoop [root@node3 ~]# echo mmmmmm | passwd --stdin hadoop [hadoop@lv1 ~]$ scp -r .ssh/ node3.example.com: [hadoop@node3 bin]$ ln -s ~/hadoop-1.2.1/jdk/bin/jps . [hadoop@node3 ~]$ ln -s hadoop-1.2.1/ hadoop [hadoop@lv1 hadoop]$ vim conf/slaves [hadoop@node3 hadoop]$ bin/hadoop-daemon.sh start datanode [hadoop@node3 hadoop]$ bin/hadoop-daemon.sh start tasktracker [hadoop@lv1 hadoop]$ bin/hadoop dfsadmin -report 可以看到有3个结点,但是文件并没有均衡 [hadoop@node3 hadoop]$ bin/start-balancer.sh 开启均衡