hadoop硬件需求及node计算_第1张图片


Hadoop Effective Space = (MaxAllocFactor * DiskSize * ( #Disk – RaidDisks ) ) / ReplicationFactor

MaxAllocFactor = Max recommended allocation, 75% for Hadoop

DiskSize = Size of your drive

#Disk = Number of drives

RaidDisks = Number disk eaten up by RAID, for Hadoop this is 0

ReplicationFactor = Hadoop recommends three copies of data thus it gets a replication factor of 3.


计算节点的可用空间:
假设复制因子为3,同时临时空间要占用25%的硬盘原始空间。基于上述假设,要在主机硬盘空间为2TB的集群上处理10TB数据,所需主机数的计算方法如下:
1. 用主机存储空间总量除以复制因子
   2TB / 3 = 666 GB

2. 在此基础上减去25%的临时数据存储空间
   666 GB * 0.75 = 500 GB

3. 因此,每个硬盘存储空间为2TB的节点只有大约500GB的可用空间

4. 数据集规模除以该值,结果即为所需的节点数
   10TB / 500 GB = 20

所以,处理10TB数据的集群最少需要20个节点


Here are the recommended specifications for DataNode/TaskTrackers in a balanced Hadoop cluster:

1. 12-24 1-4TB hard disks in a JBOD (Just a Bunch Of Disks) configuration


2. 2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz


3. 64-512GB of RAM


4. Bonded Gigabit Ethernet or 10Gigabit Ethernet (the more storage density, the higher the network throughput needed)



Here are the recommended specifications for NameNode/JobTracker/Standby NameNode nodes. The drive count will fluctuate depending on the amount of redundancy:

1. 4–6 1TB hard disks in a JBOD configuration (1 for the OS, 2 for the FS p_w_picpath [RAID 1], 1 for Apache ZooKeeper, and 1 for Journal node)


2. 2 quad-/hex-/octo-core CPUs, running at least 2-2.5GHz


3. 64-128GB of RAM


4. Bonded Gigabit Ethernet or 10Gigabit Ethernet

Below is a list of various hardware configurations for different workloads, including our original “balanced” recommendation:

1. Light Processing Configuration (1U/machine): Two hex-core CPUs, 24-64GB memory, and 8 disk drives (1TB or 2TB)


2. Balanced Compute Configuration (1U/machine): Two hex-core CPUs, 48-128GB memory, and 12 – 16 disk drives (1TB or 2TB) directly attached using the motherboard controller. These are often available as twins with two motherboards and 24 drives in a single 2U cabinet.


3. Storage Heavy Configuration (2U/machine): Two hex-core CPUs, 48-96GB memory, and 16-24 disk drives (2TB – 4TB). This configuration will cause high network traffic in case of multiple node/rack failures.


4. Compute Intensive Configuration (2U/machine): Two hex-core CPUs, 64-512GB memory, and 4-8 disk drives (1TB or 2TB)