I. hdfs namenode HA
In Hadoop 1.0, there is a single point of failure of the namenode in the hdfs cluster. When the namenode is unavailable, the whole hdfs cluster service will be unavailable. In addition, if you need to temporarily design or operate the namenode, after you stop the namenode, the hdfs cluster cannot be used.
Through the way of HA, the problem of single point failure can be solved to a certain extent.
2. Key points of namenode HA
1) metadata management mode needs to be changed:
A copy of metadata is saved in memory;
Only the namenode node in the Active state can write Edits logs;
Both namenode can read the edits;
The shared edits are managed in a shared storage (two mainstream implementations of qjournal and NFS);
2) need a status management function module
hadoop implements a zkfailover, which is resident in the node where each namenode is located. Each zkfailover is responsible for monitoring its namenode, using zk for state identification. When state switching is needed, zkfailover is responsible for switching, and the phenomenon of brain split should be prevented during switching.
3) it must be ensured that ssh can be used to log in without password between two namenodes. For later isolation. ssh to another namenode to kill the namenode process completely. Prevent brain crack.
4) Fence, that is, only one NameNode provides external services at the same time
3. Automatic failover mechanism of namenode HA
In addition to two namenode, there are two additional components for automatic failover of namenode HA: zookeeper cluster service and zkfailover controller (ZKFC).
It is a client of zookeeper and is responsible for monitoring the status of namenode. A ZKFC process runs on each namenode.
1) health monitoring:
ZKFC uses a health check command to ping the NameNode of the same host regularly. As long as the NameNode returns to the health status in time, ZKFC believes that the node is healthy. If the node crashes, freezes, or enters an unhealthy state, the health monitor identifies the node as unhealthy.
2) ZooKeeper session management:
When the local NameNode is healthy, ZKFC maintains a session opened in ZooKeeper. If the local NameNode is in the active state, ZKFC also maintains a special znode lock, which uses ZooKeeper's support for short-lived nodes (i.e. temporary nodes). If the session is terminated, the lock node will be deleted automatically.
ZKFC will create a / Hadoop HA / namenodeha cluster name / such a node on zookeeper, There are two child nodes on this node: ActiveBreadCrumb: Persistent node. The value of the node records the HA cluster name active node alias active node address It is mainly used for other client s who want to access the namenode service to get the namenode address of active status, so it must be a persistent node. ActiveStandbyElectorLock: Temporary node. The value of the node records the HA cluster name active node alias active node address. It serves as an exclusive lock. Only when the usage right of the node is obtained can the value of the above ActiveBreadCrumb node be modified. Because it is a temporary node, when the active namenode and zk are connected, the node will always exist, and the standby namenode is also connected with zk, but when the temporary node is found to exist, it is clear that someone has occupied it, so it will not do anything. When the active namenode above is abnormal, ZKFC will disconnect from zk and the temporary node will disappear. At this time, standby namenode will recreate the temporary node, which is equivalent to obtaining the lock and modifying the value of ActiveBreadCrumb. At this point, it will naturally become a new active namenode.
3) selection based on ZooKeeper:
If the local NameNode is healthy, and ZKFC finds that no other node currently holds the znode lock, it will obtain the lock for itself. If successful, it has won the choice and is responsible for running the failover process so that its local NameNode is active.
4, HA configuration
(1) environmental planning
jdk, zookeeper deployment will not be repeated. Read the previous article
Basic environment configuration:
Add hostname resolution / etc/hosts for each machine
Each host must configure ssh keyless login for itself and the other two hosts
Turn off firewall and selinux
The full deployment of hadoop can be seen in the previous article. Here we focus on the configuration of namenode HA.
To modify a profile:
<configuration> <!--Appoint namenode Of HA Cluster name --> <property> <name>fs.defaultFS</name> <value>hdfs://mycluster</value> </property> <!--Appoint hadoop in hdfs Directory to hold data and metadata blocks--> <property> <name>hadoop.tmp.dir</name> <value>/opt/modules/HA/hadoop-2.8.4/data/ha_data</value> </property> <!--Specify the zk All nodes of the cluster ip:port--> <property> <name>ha.zookeeper.quorum</name> <value>bigdata121:2181,bigdata122:2181,bigdata123:2181</value> </property> </configuration>
<configuration> <!-- namenode Fully distributed cluster name,Name need and core-site The cluster names defined in are the same --> <property> <name>dfs.nameservices</name> <value>mycluster</value> </property> <!-- Cluster NameNode What are the nodes,Here is the alias of the node --> <property> <name>dfs.ha.namenodes.mycluster</name> <value>nn1,nn2</value> </property> <!-- nn1 Of RPC Mailing address --> <property> <name>dfs.namenode.rpc-address.mycluster.nn1</name> <value>bigdata121:9000</value> </property> <!-- nn2 Of RPC Mailing address --> <property> <name>dfs.namenode.rpc-address.mycluster.nn2</name> <value>bigdata122:9000</value> </property> <!-- nn1 Of http Mailing address --> <property> <name>dfs.namenode.http-address.mycluster.nn1</name> <value>bigdata121:50070</value> </property> <!-- nn2 Of http Mailing address --> <property> <name>dfs.namenode.http-address.mycluster.nn2</name> <value>bigdata122:50070</value> </property> <!-- Appoint NameNode Metadata in JournalNode Storage location on,Used for storing edits Journal ,Multiple nodes separated by commas--> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://bigdata121:8485;bigdata122:8485/mycluster</value> </property> <!-- Configure the isolation mechanism, that is, only one server can respond to the external at the same time,Yes shell and sshfence Two methods, mainly used to down Dropped namenode Host will process kill fall Prevention of cerebral fissure --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- When using isolation mechanism ssh No secret key to log on to another host namenode process kill The path of the private key is specified here--> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- statement journalnode Server storage directory--> <property> <name>dfs.journalnode.edits.dir</name> <value>/opt/modules/HA/hadoop-2.8.4/data/jn</value> </property> <!-- Turn off permission check--> <property> <name>dfs.permissions.enable</name> <value>false</value> </property> <!-- Access agent class: client，mycluster，active Configuration failure auto switch implementation mode. For accessing configured HA Of namenode--> <property> <name>dfs.client.failover.proxy.provider.mycluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- start-up ha Automatic failover without manual failover namenode--> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
Profiles are synchronized to each node. Feel free to use scp or rsync.
(3) start cluster
On first startup:
cd /opt/modules/HA/hadoop-2.8.4 1)each journalnode Start on node journalnode service sbin/hadoop-daemon.sh start journalnode 2)nn1 Formatting namenode，And start up bin/hdfs namenode -format sbin/hadoop-daemon.sh start namenode 3)nn2 Enabled by journalnode from nn1 Upper synchronization namenode Data to local namenode bin/hdfs namenode -bootstrapStandby 4)start-up nn2 sbin/hadoop-daemon.sh start namenode 5)nn1 Start all on datanode sbin/hadoop-daemons.sh start datanode 6)Two stations namenode View up namenode state bin/hdfs haadmin -getServiceState nn1 bin/hdfs haadmin -getServiceState nn2 //Normally, one is active and the other is standby 7)Manual conversion to active and standby bin/hdfs haadmin -transitionToActive namenode Name bin/hdfs haadmin -transitionToStandby namenode Name //Note that if you need to switch manually, you need to turn off the automatic switch in hdfs-site.xml. Otherwise, an error is reported. //Or use -- forceactive to cast.
After startup, you can manually turn off the active namenode, and you can see that the namenode just in standby will automatically turn to active. When the namenode that was just turned off goes online again, it will become standby.
(4) why is there no SNN?
When we started the HA cluster of the whole namenode, we found that there was no SNN. Naively, I thought it needed to be started manually, so I started it manually, and the result was an error.
Check the startup log of SNN, and you can find an exception message
org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode: Failed to start secondary namenode java.io.IOException: Cannot use SecondaryNameNode in an HA cluster. The Standby Namenode will perform checkpointing. at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:189) at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:690)
The meaning is obvious, that is to say, SNN responsibilities are completed by the namenode of standby, and the existence of SNN is not needed in HA state. In fact, it is also reasonable to make full use of standby's namenode so as to avoid it idling there.
II. yarn resourceManager HA
1. Working mechanism
In fact, similar to the ha of the namenode above, it also uses ZKFC to monitor RM.
A node named / yarn leader election / yarn cluster will be created on zk,
There are two subnodes: activebreadcrumb and activestandbyelectorlock
The effect is similar, not repeat. The working mechanism is basically similar
2, HA configuration
(2) configuration file
<configuration> <!-- Site specific YARN configuration properties --> <!--Appoint reducer Get data by shuffle mechanism--> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!--Start log aggregation--> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!--Specify a log retention time of 7 days in seconds--> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property> <!--Enable resourcemanager ha--> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!--Statement two resourcemanager Cluster name for--> <property> <name>yarn.resourcemanager.cluster-id</name> <value>cluster-yarn1</value> </property> <!--Statement two resourcemanager Alias for the node of--> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!--Statement two rm Address--> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>bigdata121</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>bigdata122</value> </property> <!--Appoint zookeeper Address of the cluster--> <property> <name>yarn.resourcemanager.zk-address</name> <value>bigdata121:2181,bigdata122:2181,bigdata123:2181</value> </property> <!--Enable automatic recovery,Automatic failover--> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!--Appoint resourcemanager The status information of is stored in zookeeper colony--> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> </configuration>
Profiles are synchronized to other nodes.
(3) start cluster
bigdata121: start-up yarn sbin/start-yarn.sh bigdata122: start-up rm sbin/yarn-daemon.sh start resourcemanager //View service status: bin/yarn rmadmin -getServiceState rm1 bin/yarn rmadmin -getServiceState rm2
The test method is similar to namenode, not repeated here