分布式存储HDFS
、分布式计算MapReduce
、分布式资源管理和调度YARN
NameNode
: 管理命名空间、存储数据块映射信息(元数据)、 负责处理客户端对HDFS的访问.
SecondaryNameNode
: NameNode 的热备, 会定期合并命名空间镜像 fsimage和命名空间镜像的编辑日志fsedits, 在主NameNode发生故障时, 可以 快速切换为新的 Active NameNode
DataNode
: 负责实际文件数据的存储、文件会被拆分成多个块, 以多副本方式存储在不同的DataNode
ResourceManager(RM)
:
NodeManager(NM)
:
Container
:
ApplicationMaster(AM)
:
如无特别说明,每台服务器要保持一样的配置
Hadoop300 | Hadoop301 | Hadoop302 | |
---|---|---|---|
NameNode | V | ||
DataNode | V | V | V |
SecondaryNameNode | V | ||
ResourceManger | V | ||
NodeManger | V | V | V |
hadoop3.1.3
解压并创建快捷方式到~/app
目录下, hadoop301,hadoop302同理[hadoop@hadoop300 app]$ pwd
/home/hadoop/app
[hadoop@hadoop300 app]$ ll
lrwxrwxrwx 1 hadoop hadoop 47 2月 21 12:33 hadoop -> /home/hadoop/app/manager/hadoop_mg/hadoop-3.1.3
vim ~/.bash_profile
# ============ java =========================
export JAVA_HOME=/home/hadoop/app/jdk
export PATH=$PATH:$JAVA_HOME/bin
# ======================= Hadoop ============================
export HADOOP_HOME=/home/hadoop/app/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
${HADOOP_HOME}/etc/hadoop下
的hadoop-env.sh, mapred-env.sh、yarn-env.sh这三个文件都添加JDK的环境变量export JAVA_HOME=/home/hadoop/app/jdk
${HADOOP_HOME}/etc/hadoop/core-site.xml
文件
<property>
<name>fs.defaultFSname>
<value>hdfs://hadoop300:8020value>
property>
<property>
<name>hadoop.http.staticuser.username>
<value>hadoopvalue>
property>
<property>
<name>hadoop.proxyuser.hadoop.hostsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.hadoop.groupsname>
<value>*value>
property>
<property>
<name>hadoop.proxyuser.hadoop.usersname>
<value>*value>
property>
<property>
<name>dfs.replicationname>
<value>2value>
property>
<property>
<name>dfs.namenode.secondary.http-addressname>
<value>hadoop302:9868value>
property>
<property>
<name>yarn.nodemanager.aux-servicesname>
<value>mapreduce_shufflevalue>
property>
<property>
<name>yarn.resourcemanager.hostnamename>
<value>hadoop301value>
property>
<property>
<name>yarn.scheduler.minimum-allocation-mbname>
<value>200value>
property>
<property>
<name>yarn.scheduler.maximum-allocation-mbname>
<value>2048value>
property>
<property>
<name>yarn.nodemanager.resource.memory-mbname>
<value>4096value>
property>
<property>
<name>yarn.nodemanager.pmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.nodemanager.vmem-check-enabledname>
<value>falsevalue>
property>
<property>
<name>yarn.log.server.urlname>
<value>http://hadoop300:19888/jobhistory/logs/value>
property>
<property>
<name>yarn.log-aggregation-enablename>
<value>truevalue>
property>
<property>
<name>yarn.log-aggregation.retain-secondsname>
<value>604800value>
property>
<property>
<name>mapreduce.framework.namename>
<value>yarnvalue>
property>
<property>
<name>mapreduce.jobhistory.addressname>
<value>hadoop300:10020value>
property>
<property>
<name>mapreduce.jobhistory.webapp.addressname>
<value>hadoop300:19888value>
property>
<property>
<name>yarn.app.mapreduce.am.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.map.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
<property>
<name>mapreduce.reduce.envname>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}value>
property>
${HADOOP_HOME}/etc/hadoop/workers
文件, 设置Hadoop集群节点列表
tip: 注意不要出现空行和空格
hadoop300
hadoop301
hadoop302
[hadoop@hadoop300 app]$ hdfs namenode -format
[hadoop@hadoop300 ~]$ start-dfs.sh
Starting namenodes on [hadoop300]
Starting datanodes
Starting secondary namenodes [hadoop302]
[hadoop@hadoop301 ~]$ start-yarn.sh
Starting resourcemanager
Starting nodemanagers
[hadoop@hadoop300 hadoop]$ mapred --daemon start historyserver
[hadoop@hadoop300 hadoop]$ xcall jps
--------- hadoop300 ----------
16276 JobHistoryServer
30597 DataNode
19641 Jps
30378 NameNode
3242 NodeManager
--------- hadoop301 ----------
24596 DataNode
19976 Jps
27133 ResourceManager
27343 NodeManager
--------- hadoop302 ----------
24786 SecondaryNameNode
27160 NodeManager
24554 DataNode
19676 Jps
访问HDFS的NameNode界面 在hadoop300:9870
访问HDFS的SecondaryNameNode界面 在hadoop300:9868
访问JobHistory的界面, 在hadoop300:19888
vim hadoop.sh
#!/bin/bash
case $1 in
"start"){
echo ---------- Hadoop 集群启动 ------------
echo "启动Hdfs"
ssh hadoop300 "source ~/.bash_profile;start-dfs.sh"
echo "启动Yarn"
ssh hadoop300 "source ~/.bash_profile;mapred --daemon start historyserver"
echo "启动JobHistory"
ssh hadoop301 "source ~/.bash_profile;start-yarn.sh"
};;
"stop"){
echo ---------- Hadoop 集群停止 ------------
echo "关闭Hdfs"
ssh hadoop300 "source ~/.bash_profile;stop-dfs.sh"
echo "关闭Yarn"
ssh hadoop300 "source ~/.bash_profile;mapred --daemon stop historyserver"
echo "关闭JobHistory"
ssh hadoop301 "source ~/.bash_profile;stop-yarn.sh"
};;
esac