Row-column Conversion in Hive

Row-column Conversion in Hive Row to column Multi-row to multi-column Data sheet row2col col1 col2 col3 a c 1 a d 2 a e 3 b c 4 b d 5 b e 6 //Now we need to translate it into: col1 c d e a 1 2 3 b ...

Posted on Mon, 07 Oct 2019 01:40:00 -0700 by Russia

How to Rerun Data

Offline tasks, data re-run is a normal thing, such as the program runs and hangs up, such as the data run out is incorrect, need to check after re-run But when you re-run, it's important to note that no data has been written to hbase, or that the partition of hive already has data on that day. If hive h ...

Posted on Thu, 03 Oct 2019 19:06:21 -0700 by aunquarra

Hive custom function UDF UDTF UDAF

Hive custom function UDF UDTF UDAF UDF: User-defined (normal) functions that only affect one-line values; UDF can only implement one-in-one-out operations. Definition udf Computing the Minimum of Two Numbers public class Min extends UDF { public Double evaluate(Double a, Double b) { ...

Posted on Wed, 02 Oct 2019 19:48:10 -0700 by jtbaker

Construction of hive 3.1.0 under centos7

In the above blog Construction of hadoop 3.2.0 fully distributed cluster under Centos 7 This paper introduces the construction of hadoop cluster. This article will introduce the construction of hive environment. Catalog I. Host environment Second, build hadoop cluster first III. Installation of h ...

Posted on Tue, 01 Oct 2019 18:44:51 -0700 by *Lynette

sqoop job for automatic incremental import

sqoop job for automatic incremental import Ordinary incremental import # The problem is that we have to manually change the value of last-value every time we incrementally import it. # Otherwise, it will be imported in full every time. Seems inflexible bin/sqoop import \ --connect jdbc:mysql://hadoo ...

Posted on Mon, 30 Sep 2019 08:40:46 -0700 by JimmyD

Flink Bucketing Sink Source Analysis

0x1 Digest The BucketingSink class provides perfect functionality to support data dropping into HDFS. It is not recommended to implement this class in real business. It can avoid some pits by using this class directly. Note: This article is based on Flink version 1.6.3 source code. Structure Analysis of 0x2 BucketingSink Class We focus on three ...

Posted on Sun, 22 Sep 2019 04:37:40 -0700 by Nuv

Hadoop Series - Distributed Computing Framework MapReduce

1. Overview of MapReduce Hadoop MapReduce is a distributed computing framework for writing batch applications.Written programs can be submitted to the Hadoop cluster for parallel processing of large datasets. The MapReduce job splits the input dataset into separate blocks, which are processed by the map in parallel, and the framework sorts the ...

Posted on Fri, 13 Sep 2019 09:21:27 -0700 by tharagleb

Hadoop Cluster Modification: Adjusting Cluster Version

Links to the original text: http://www.cnblogs.com/DamianZhou/p/4184026.html Catalog Hadoop Cluster Modification and Cluster Version Adjustment Modification Notes Detailed steps 1. JDK modification 2 ...

Posted on Wed, 17 Jul 2019 13:02:17 -0700 by pesoto74

Big Data Learning: Initial Use of Data Processing Tool Pig

brief introduction Pig is a large-scale data analysis platform based on Hadoop. It provides a SQL-LIKE language called Pig Latin. The compiler of this language converts data analysis requests like SQL into a series of optimized MapReduce operations. Characteristic Focus on massive data set analysis Running on the cluster computing archit ...

Posted on Sun, 14 Jul 2019 13:03:15 -0700 by Hillu

CDH Integrated LDAP Configuration

Reproduced from Java Chen Blog by Java Chen Original Link Address: http://blog.javachen.com/2014/11/12/config-ldap-with-kerberos-in-cdh-hadoop.html Referring to the basic configuration above, some configurations have been added This article describes the process of integrating LDAP with a cdh hadoop cluster, where LDAP installs OpenL ...

Posted on Thu, 16 May 2019 01:44:27 -0700 by MerlinJR