Big data learning (1) Hadoop installation

Cluster architecture

The installation of Hadoop is actually the configuration of HDFS and YARN cluster. As can be seen from the following architecture diagram, each data node of HDFS needs to be configured with the location of NameNode. Similarly, every NodeManager in YARN needs to configure the location of ResourceManager.

NameNode and ResourceManager Is there a single problem in the cluster environment? stay Hadoop1.0 It does exist in, but in 2.0 It has been solved in, for specific reference:

https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.html

https://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop-name-node/index.html

 

 

 

 

 

to configure

Because the configuration of each machine is the same, it is usually to configure one server and then copy it to other servers.

JAVA_HOME

Configure Java? Home in the hadoop-env.sh file

core-site.xml

Configure the hdfs file system, and configure the NameNode node of hdfs through fs.defaultFS.

<property>
    <name>fs.defaultFS</name>
    <value>hdfs://{hdfs-name-node-server-host}:9000</value>
</property>

Configure the storage directory of files generated by Hadoop runtime through hadoop.tmp.dir

<property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop-data/tmp</value>
</property>

 

hdfs-site.xml

Number of profile copies and second namenode:

<property>
    <name>dfs.replication</name>
    <value>1</value>
</property>
        
<property>
    <name>dfs.secondary.http.address</name>
    <value>{second-namenode-host}:50090</value>
</property>

 

yarn-site.xml

To configure the resource manager of YARN:

<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>{resource-manager-host}</value>
</property>

How to obtain data with reducer:

<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>

 

Finally, remember to add the bin and sbin directories of hadoop to the environment variables:

export HADOOP_HOME=/user/local/hadoop-2.6.5
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

Format namenode

hdfs namenode -format (hadoop namenode -format)

 

Start Hadoop

Start the NameNode of HDFS first:

 hadoop-daemon.sh start datanode

Start the DataNode on the DataNode of the cluster:

 hadoop-daemon.sh start datanode

View startup results

[root@server1 ~]# jps
2111 Jps
2077 NameNode

If the startup is successful, you can see a page similar to the following through http://server1:50070:

 


Restart YARN

[root@vcentos1 sbin]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-resourcemanager-vcentos1.out
vcentos3: starting nodemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-nodemanager-vcentos3.out
vcentos2: starting nodemanager, logging to /usr/local/hadoop-2.6.5/logs/yarn-root-nodemanager-vcentos2.out
[root@server1 sbin]# jps
2450 ResourceManager
2516 Jps
2077 NameNode

hadoop Next sbin The files in the directory are used for management hadoop Service:

hadoop-dameon.sh: Used to start independently namenode or datanode;

start/stop-dfs.sh: coordination/etc/hadoop/slaves,Can be started in batch/close NameNode And others in the cluster DataNode;

start/stop-yarn.sh: coordination/etc/hadoop/slaves,Can be started in batch/close ResourceManager And others in the cluster NodeManager;

bin Files in the directory can be provided hdfs,yarn and mapreduce Services:

[root@server1 bin]# hadoop fs 
Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] [-h] <path> ...]
        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] <path> ...]
        [-expunge]
        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-d] [-h] [-R] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] [-l] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-usage [cmd ...]]

 

 

 

reference resources:

Latest installation documentation: http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/ClusterSetup.html

2.6.5 installation documents: http://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-common/SingleCluster.html

Secondary Namenode: http://blog.madhukaraphatak.com/secondary-namenode---what-it-really-do/

Tags: Java Hadoop NodeManager Apache xml

Posted on Mon, 18 May 2020 08:25:53 -0700 by EZbb