How to Rerun Data

Offline tasks, data re-run is a normal thing, such as the program runs and hangs up, such as the data run out is incorrect, need to check after re-run But when you re-run, it's important to note that no data has been written to hbase, or that the partition of hive already has data on that day. If hive h ...

Posted on Thu, 03 Oct 2019 19:06:21 -0700 by aunquarra

Giraph Source Analysis - Adding Message Statistics

Author | Bai Song 1. Adding classes to write the message size of each overstep to Hadoop's Count. Create a new GiraphMessages class under the package org.apache.giraph.counters to count the number of messages. The source code is as follows: package org.apache.giraph.counters; import java.util.Iterator; import java.util.Map; import org.apache ...

Posted on Wed, 02 Oct 2019 23:23:45 -0700 by HoangLong

Hive custom function UDF UDTF UDAF

Hive custom function UDF UDTF UDAF UDF: User-defined (normal) functions that only affect one-line values; UDF can only implement one-in-one-out operations. Definition udf Computing the Minimum of Two Numbers public class Min extends UDF { public Double evaluate(Double a, Double b) { ...

Posted on Wed, 02 Oct 2019 19:48:10 -0700 by jtbaker

hadoop distributed file system

1. single version dhcp The source address is 0..0.0.0,The target address is 255.255.255.255, //The ports are UDP67 and UDP68, one sending and one receiving. Client to port 68 (bootps) //Broadcast request configuration, the server broadcasts the response request to port 67 (bootpc). //Default format ...

Posted on Wed, 02 Oct 2019 19:19:24 -0700 by azunoman

Construction of hive 3.1.0 under centos7

In the above blog Construction of hadoop 3.2.0 fully distributed cluster under Centos 7 This paper introduces the construction of hadoop cluster. This article will introduce the construction of hive environment. Catalog I. Host environment Second, build hadoop cluster first III. Installation of h ...

Posted on Tue, 01 Oct 2019 18:44:51 -0700 by *Lynette

Hadoop Series - HDFS Java API

I. Brief Introduction To use the HDFS API, you need to import dependencies on hadoop-client. If it's a CDH version of Hadoop, you need to specify its warehouse address as well: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" ...

Posted on Tue, 17 Sep 2019 06:53:55 -0700 by wildmalc

Hadoop Series-Hadoop Cluster Environment Construction

I. Cluster Planning A three-node Hadoop cluster is built here, in which three hosts deploy DataNode and NodeManager services, but only NameNode and ResourceManager services are deployed on hadoop001. Pre-conditions Hadoop runs on JDK and needs to be pre-installed. The installation steps are arranged separately to: Installation of JDK under L ...

Posted on Mon, 16 Sep 2019 04:19:53 -0700 by edup_pt

Hadoop Series - Distributed Computing Framework MapReduce

1. Overview of MapReduce Hadoop MapReduce is a distributed computing framework for writing batch applications.Written programs can be submitted to the Hadoop cluster for parallel processing of large datasets. The MapReduce job splits the input dataset into separate blocks, which are processed by the map in parallel, and the framework sorts the ...

Posted on Fri, 13 Sep 2019 09:21:27 -0700 by tharagleb

HDFS File Interface

Command basic format: hadoop fs -cmd < args > ls hadoop fs -ls / List directories and files in the root directory of the hdfs file system hadoop fs -ls -R / List all directories and files in the hdfs filesystem put hadoop fs -put < l ...

Posted on Fri, 06 Sep 2019 07:08:35 -0700 by Wesf90

Spark from zero to Spark API In Java8

                          Spark API In  Java8 1. map, flatMap map is easy to understand. It passes an element of the source JavaRDD into the call method and returns one by one after the algorithm to generate a new JavaRDD. map Sample Code L ...

Posted on Mon, 02 Sep 2019 20:01:48 -0700 by dubhcat