I. Cluster Planning
A three-node Hadoop cluster is built here, in which three hosts deploy DataNode and NodeManager services, but only NameNode and ResourceManager services are deployed on hadoop001.
Hadoop runs on JDK and needs to be pre-installed. The installation steps are arranged separately to:
Installation of JDK under L ...
Posted on Mon, 16 Sep 2019 04:19:53 -0700 by edup_pt
Environment Construction of Hadoop 2.8.0
2019-08-09 12:12:44 -0700
This article is about installing Hadoop cluster under centos7
Posted on Sun, 25 Aug 2019 23:35:47 -0700 by gammaster
Microblog content (as shown): ID content
TF: The frequency (frequency) of words appearing in a microblog.
N: Total number of microblogs
DF: How many microblogs have entries appeared?
Four reduceTask s are used in the case. The subscript count starts at 0, three statistical word frequencies TF, and one statisti ...
Posted on Thu, 31 Jan 2019 04:18:16 -0800 by Jimbit
High-availability cluster building can refer to another blog of the bloggerhttps://blog.csdn.net/PowerBlogger/article/details/83018127
The steps of building yarn based on HDFS high availability distributed cluster are as follows:
Find mapred-site.xml.template in the hadoop installation directory and rename it ...
Posted on Wed, 30 Jan 2019 14:51:14 -0800 by Aethaellyn
The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive.
Hive is a framework of data management based on hadoop cluster system, ...
Posted on Sat, 26 Jan 2019 09:03:14 -0800 by Naug
Close Hadoop Cluster Error Reporting
1. Errors are reported as follows:
[root@server4 sbin]# ./stop-yarn.sh
stopping yarn daemons
no resourcemanager to stop
server5: no nodemanager to stop
server6: no nodemanager to stop
server4: no nodemanager to stop
no proxyserver to stop
[root@server4 sbin]# ./stop-dfs.sh
Stopping name ...
Posted on Sat, 26 Jan 2019 04:42:14 -0800 by fj1200
pid is stored by default in / tmp directory
The pid content is the process number
[hadoop@hadoop001 ~]$ cd /tmp
[hadoop@hadoop001 tmp]$ ll
drwxrwxr-x. 4 hadoop hadoop 4096 Sep 18 10:05 hadoop-hadoop
-rw-rw-r--. 1 hadoop hadoop 6 Oct 20 22:44 hadoop-hadoop-datanode.pid
-rw-rw-r--. 1 hadoop hadoop 6 Oct ...
Posted on Mon, 21 Jan 2019 11:39:13 -0800 by irkevin
There must be three virtual machines, and the virtual machine configuration must be exactly the same every day.
Firewalls must be closed, preferably permanently.
Set up independent user hadoop.
Set the host name for each server.
Set the mapping hosts between ip and host name.
Three hosts set up ssh sec ...
Posted on Sat, 19 Jan 2019 15:00:13 -0800 by staggman