hadoop3.x 新特性
hadoop3.x 新特性
Features | Hadoop 2.x | Hadoop 3.x |
---|---|---|
Minimum Required Java Version | JDK 6 and above. | JDK 8 is the minimum runtime version of JAVA required to run Hadoop 3.x as many dependency library files have been used from JDK 8. |
Fault Tolerance | Fault Tolerance is handled through replication leading to storage and network bandwidth overhead.(3个副本) | Support for Erasure Coding(纠错码) in HDFS improves fault tolerance (0.5纠错码+1数据 = 1.5倍磁盘占用) |
Storage Scheme | Follows a 3x Replication Scheme for data recovery leading to 200% storage overhead. For instance, if there are 8 data blocks then a total of 24 blocks will occupy the storage space because of the 3x replication scheme. | Storage overhead in Hadoop 3.0 is reduced to 50% with support for Erasure Coding. In this case, if here are 8 data blocks then a total of only 12 blocks will occupy the storage space. |
Change in Port Numbers | Hadoop HDFS NameNode -8020 Hadoop HDFS DataNode -50010 Secondary NameNode HTTP -50091 | Hadoop HDFS NameNode -9820 Hadoop HDFS DataNode -9866 Secondary NameNode HTTP -9869 |
YARN Timeline Service | YARN timeline service introduced in Hadoop 2.0 has some scalability issues. | YARN Timeline service has been enhanced with ATS v2 which improves the scalability and reliability. |
Intra DataNode Balancing | HDFS Balancer in Hadoop 2.0 caused skew within a DataNode because of addition or replacement of disks. | Intra DataNode Balancing has been introduced in Hadoop 3.0 to address the intra-DataNode skews which occur when disks are added or replaced. |
Number of NameNodes | Hadoop 2.0 introduced a secondary namenode as standby.(一主一备) | Hadoop 3.0 supports 2 or more NameNodes.(一主多备) |
Heap Size | In Hadoop 2.0 , for Java and Hadoop tasks, the heap size needs to be set through two similar properties mapreduce.{map,reduce}.java. Opts and mapreduce.{map,reduce}.memory.mb | In Hadoop 3.0, heap size or mapreduce.*.memory.mb is derived automatically. |
hdfs HA 逻辑
- 增加用于主备之间信息共享推送的
JournalNode
JournalNode
是 hadoop 根据 paxos 协议实现的日志服务 - 增加用于选主决策的 zookeeper 集群:
ha.zookeeper.quorum
配置 - 增加用于监控同机器上的 namenode,试图选举,切换本地 namenode 的 active,standby 状态的zookeeper failover controller(zkfc)进程:
QuorumPeerMain