Ubuntu Hadoop 完全分布式安装

参考文献:

1.   http://hadoop.apache.org/common/docs/r0.19.2/cn/quickstart.html#%E8%BF%90%E8%A1%8CHadoop%E9%9B%86%E7%BE%A4%E7%9A%84%E5%87%86%E5%A4%87%E5%B7%A5%E4%BD%9C

2.  http://yymmiinngg.iteye.com/blog/706699

3.  http://www.linuxidc.com/Linux/2011-08/41661.htm


一. 配置jdk

     重要:配置环境变量

     vi /etc/profile/

    在最后面加入:

set Java Environment
export JAVA_HOME=/usr/lib/jvm/java-6-sun
export CLASSPATH=".:$JAVA_HOME/lib:$CLASSPATH"
export PATH="$JAVA_HOME/:$PATH"VA_HOME=/usr/lib/jvm/java-6-sun
PATH=$JAVA_HOME/bin:$PATH
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export JAVA_HOME
export PATH
export CLASSPATH
export CATALINA_HOME=/usr/local/tomcat

export CLASSPATH=.:$JAVA_HOME/lib:$CATALINA_HOME/lib

export PATH=$PATH:$CATALINA_HOME/bin
ANT_HOME=/usr/local/ant
PATH=$JAVA_HOME/bin:$ANT_HOME/bin:$PATH
export  ANT_HOME PATH


二、配置ssh

安装SSH

sudo apt-get install openssh-server


    a. 用 ssh-key-gen 在本地主机上创建公钥和密钥
                            [[email protected] ~]# ssh-keygen -t  rsa

                            Enter file in which to save the key (/home/jsmith/.ssh/id_rsa):[Enter key]

                            Enter passphrase (empty for no passphrase): [Press enter key]

                            Enter same passphrase again: [Pess enter key]

                            Your identification has been saved in /home/jsmith/.ssh/id_rsa.

                            Your public key has been saved in /home/jsmith/.ssh/id_rsa.pub.

                            The key fingerprint is: 33:b3:fe:af:95:95:18:11:31:d5:de:96:2f:f2:35:f9

                            [email protected]

            完成后,在home跟目录下会产生隐藏文件夹.ssh

                                   $ cd .ssh

之后ls 查看文件

 

                                  cp id_rsa.pub  authorized_keys

uthorized



hadoop@hadoop .ssh]$ scp authorized_keys   node2:/home/hadoop/.ssh/

hadoop@hadoop .ssh]$chmod 644 authorized_keys

b. 用 ssh-copy-id 把公钥复制到远程主机上
                            [[email protected] ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub  root@Datanode1   //Datanode1 为IP

                            root@Datanode1's password:

                            Now try logging into the machine, with ―ssh ?root@Datanode1‘‖, and check in:

                            .ssh/authorized_keys to make sure we haven‘t added extra keys that you weren‘t expecting.

                            [注: ssh-copy-id 把密钥追加到远程主机的 .ssh/authorized_key 上.]

   c. 直接登录远程主机
                            [[email protected] ~]# ssh Datanode1

                            Last login: Sun Nov 16 17:22:33 2008 from 192.168.1.2

                            [注: SSH 不会询问密码.]

                            [root@Datanode1 ~]

                            [注: 你现在已经登录到了远程主机上]

   d. 注意:在这里,执行都在Namenode上面,而且Namenode也需要对自己进行无密码操作即

            执行下面的命令:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys


      【不用执行这句】 [[email protected] ~]# ssh-copy-id -i ~/.ssh/id_rsa.pub  [email protected]操作,

      其他的,按照a-c重复操作Datanode2和Datanode3就行了

      一定要能无密码访问,否则不能集群hadoop一定失败.


三、配置hadoop(在conf文件夹下)

       A。namenode:

                 a.  core-site.xml:

                     







   
       fs.default.name
       hdfs://10.108.32.97:9000   //10.108.32.97为IP
   
 
     hadoop.tmp.dir
    /home/yourname/tmp    //注意:tmp目录必须为空
 

             

b. hadoop-env.sh

 

# The java implementation to use.  Required.
 export JAVA_HOME=/usr/lib/jvm/java-6-sun-1.6.0.06

           c.  hdfs-site.xml







	dfs.replication
	2
  
        
               
                dfs.name.dir
                /home/yourname/hdfs/name
        
        
             
                dfs.data.dir 
                /home/yourname/hdfs/data
        
        
                
                dfs.permissions
                false
       


          d. mapred-site.xml




     
              
                  
                    mapred.job.tracker  
                    10.108.32.97:9001  
              
            
      
     e. conf/masters:

     namenode 的iP地址

    f.  conf/slaves:

     datanode的ip地址

g. scp -r /home/yourname/hadoop slave1:/home/dataname1/

   scp -r /home/yourname/hadoop slave2:/home/dataname2/


B.格式化一个新的分布式文件系统:
$ bin/hadoop namenode -format

启动Hadoop守护进程:
$ bin/start-all.sh

Hadoop守护进程的日志写入到 ${HADOOP_LOG_DIR} 目录 (默认是 ${HADOOP_HOME}/logs).

浏览NameNode和JobTracker的网络接口,它们的地址默认为:

NameNode - http://localhost:50070/
JobTracker - http://localhost:50030/

将输入文件拷贝到分布式文件系统:
$ bin/hadoop fs -put conf input

运行发行版提供的示例程序:
$ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'

查看输出文件:

将输出文件从分布式文件系统拷贝到本地文件系统查看:
$ bin/hadoop fs -get output output
$ cat output/*

或者

在分布式文件系统上查看输出文件:
$ bin/hadoop fs -cat output/*

完成全部操作后,停止守护进程:
$ bin/stop-all.sh

你可能感兴趣的:(Hadoop)