HDFS故障:Namenode安全模式:The reported blocks 12xx needs additional xx blocks to reach the threshold 0.999

1 CDH环境,HDFS无法做任何操作,报错信息连接Namenode节点失败,处于safemode。

 

2 查看HDFS 实例,红色警告,不能创建/tmp/.cloudera.....

 

3 查看 namenode日志:     /var/log/hadoop-hdfs/hadoop-cmf-hdfs-NAMENODE-cdh00.log.out

报错信息:

org.apache.hadoop.ipc.RetriableException: org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create directory /tmp/hive/hive/8d99a26e-35a2-4416-9070-0bb0a5db1c51. Name node is in safe mode.
The reported blocks 505 needs additional 1 blocks to reach the threshold 0.9990 of total blocks 507.
The number of live datanodes 3 has reached the minimum number 1. Safe mode will be turned off automatically once the thresholds have been reached. NamenodeHostName:cdh00
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkNameNodeSafeMode(FSNamesystem.java:1439)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3100)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675

4 分析:

大体意思是DataNode给NameNode汇报了XXXXX个block块,但是还需要XX个block块 才能达到总共所需要的yyyyy个block块的0.99990.       

 dataNode丢失超过设置的丢失百分比,系统自动进入安全模式

5 故障处理,比较粗暴的方式,删除故障分区:

                              步骤 1     执行命令退出安全模式:sudo -u hdfs  hadoop dfsadmin -safemode leave

                                步骤 2     执行健康检查,删除损坏掉的block。()

                                 sudo -u hdfs   hdfs fsck  /  -delete

6 生产环境一般考虑先恢复: 找到数据块的位置和丢失的数据信息

 (1)查看/所有分区信息:    sudo -u hdfs hdfs fsck / -files -blocks -locations

     


(2)sudo -u hdfs  hdfs debug recoverLease [-path ] [-retries ] 用这个命令恢复上面路径丢失的数据块.

      最后一个参数是重试次数  默认1

   -path必须是文件,不是目录。  具体文件,可以按照步骤1来分析

    hdfs debug recoverLease  -path /tmp/test.log  

 
 

你可能感兴趣的:(HDFS故障)