Case 5 - Mining High Weight Items in Microblog Advertisements

Microblog content (as shown): ID content Formula: TF: The frequency (frequency) of words appearing in a microblog. N: Total number of microblogs DF: How many microblogs have entries appeared? Four reduceTask s are used in the case. The subscript count starts at 0, three statistical word frequencies TF, and one statisti ...

Posted on Thu, 31 Jan 2019 04:18:16 -0800 by Jimbit

Building Yarn Based on High Availability HDFS Distributed Cluster

High-availability cluster building can refer to another blog of the bloggerhttps://blog.csdn.net/PowerBlogger/article/details/83018127 Cluster planning: The steps of building yarn based on HDFS high availability distributed cluster are as follows: Find mapred-site.xml.template in the hadoop installation directory and rename it ...

Posted on Wed, 30 Jan 2019 14:51:14 -0800 by Aethaellyn

Step by step install hive

The Apache Hive ™ data warehouse software facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. Structure can be projected onto data already in storage. A command line tool and JDBC driver are provided to connect users to Hive. Hive is a framework of data management based on hadoop cluster system, ...

Posted on Sat, 26 Jan 2019 09:03:14 -0800 by Naug

Close Hadoop Cluster Error Reporting

Close Hadoop Cluster Error Reporting 1. Errors are reported as follows: [root@server4 sbin]# ./stop-yarn.sh stopping yarn daemons no resourcemanager to stop server5: no nodemanager to stop server6: no nodemanager to stop server4: no nodemanager to stop no proxyserver to stop [root@server4 sbin]# ./stop-dfs.sh Stopping name ...

Posted on Sat, 26 Jan 2019 04:42:14 -0800 by fj1200

Analysis of pid file for hdfs in large data learning

pid file pid is stored by default in / tmp directory The pid content is the process number [hadoop@hadoop001 ~]$ cd /tmp [hadoop@hadoop001 tmp]$ ll total 132 drwxrwxr-x. 4 hadoop hadoop 4096 Sep 18 10:05 hadoop-hadoop -rw-rw-r--. 1 hadoop hadoop 6 Oct 20 22:44 hadoop-hadoop-datanode.pid -rw-rw-r--. 1 hadoop hadoop 6 Oct ...

Posted on Mon, 21 Jan 2019 11:39:13 -0800 by irkevin

Building a fully distributed Hadoop environment

Preliminary preparation There must be three virtual machines, and the virtual machine configuration must be exactly the same every day. Firewalls must be closed, preferably permanently. Set up independent user hadoop. Set the host name for each server. Set the mapping hosts between ip and host name. Three hosts set up ssh sec ...

Posted on Sat, 19 Jan 2019 15:00:13 -0800 by staggman