Hadoop Cluster Modification: Adjusting Cluster Version

Links to the original text: http://www.cnblogs.com/DamianZhou/p/4184026.html Catalog Hadoop Cluster Modification and Cluster Version Adjustment Modification Notes Detailed steps 1. JDK modification 2 ...

Posted on Wed, 17 Jul 2019 13:02:17 -0700 by pesoto74

Big Data Learning: Initial Use of Data Processing Tool Pig

brief introduction Pig is a large-scale data analysis platform based on Hadoop. It provides a SQL-LIKE language called Pig Latin. The compiler of this language converts data analysis requests like SQL into a series of optimized MapReduce operations. Characteristic Focus on massive data set analysis Running on the cluster computing archit ...

Posted on Sun, 14 Jul 2019 13:03:15 -0700 by Hillu

CDH Integrated LDAP Configuration

Reproduced from Java Chen Blog by Java Chen Original Link Address: http://blog.javachen.com/2014/11/12/config-ldap-with-kerberos-in-cdh-hadoop.html Referring to the basic configuration above, some configurations have been added This article describes the process of integrating LDAP with a cdh hadoop cluster, where LDAP installs OpenL ...

Posted on Thu, 16 May 2019 01:44:27 -0700 by MerlinJR

Tutorial: Data Lake Analytics + OSS Data File Format Processing Complete

0. Preface Data Lake Analytics Serverless is an interactive query analysis service on the cloud. Users can use standard SQL statements to query and analyze data stored on OSS and TableStore without moving. At present, the product has officially landed in Aliyun, welcome to apply for trial, experience more convenient data analysis services.Pleas ...

Posted on Wed, 08 May 2019 21:06:40 -0700 by htcilt

spark persistence and shared variables

1. Persistence operator cache Introduction: Normally, an RDD does not contain real data, but only contains metadata information describing the RDD. If the cache method is called on the RDD, then the data of the RDD still has no real data. Until the first call of an action operator triggers the data generation of the RDD, then the cache operati ...

Posted on Sun, 05 May 2019 01:32:37 -0700 by techker

hive environment construction and simple use

There's a set of cdh versions of hadoop,hive,zookeeper, all matching. Link: https://pan.baidu.com/s/1wmyMw9RVNMD4NNOg4u4VZg Extraction code: m888 Reconfigure the hadoop runtime environment once, and configure it in detail https://blog.csdn.net/kxj19980524/article/details/88954645 <configuration> <property> & ...

Posted on Mon, 22 Apr 2019 15:18:34 -0700 by Hexen

Hive Basic Environment Architecture (with Java and Hadoop Environment Architecture)

Hive relies on Hadoop, and Hadoop relies on Java, so the first step is to build a Java environment. Construction of JAVA Environment 1. Use yum to check if java has been installed: yum list installed | grep java 2. If so, you can choose to uninstall and reload, or skip the installation steps. The uninstall command is: yum -y remove java ...

Posted on Wed, 27 Mar 2019 22:21:30 -0700 by mrinfin1ty

Install Hadoop 2.7.3 + hive 2.1.1 + sqoop under mac os

hadoop installation Install jdk vim ~/.bash_profile export JAVA_HOME="YOUR_JAVA_HOME" export PATH=$PATH:$JAVA_HOME/bin When the configuration is complete, run java -version -------------- java version "1.8.0_121" Java(TM) SE Runtime Environment (build 1.8.0_121-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode) ssh secret- ...

Posted on Wed, 13 Feb 2019 02:30:18 -0800 by delphipgmr

Fast Construction of Spark SQL on Carbon Data Using Streaming Pro

Preface CarbonData has released version 1.0, and the changes are quick. This version has removed the kettle, making deployment and use easy, and supporting multiple Spark versions such as 1.6+, 2.0+.Streaming Pro allows you to experience Carbondata with a simple command and supports Http/JDBC access patterns. Download the Spark distributionFor ...

Posted on Tue, 12 Feb 2019 06:30:19 -0800 by Richardtagger

grouping sets, cube, rollup learning of Hive analysis functions

Source data table statement:hive> show create table bi_all_access_log; OK CREATE TABLE `bi_all_access_log`( `appsource` string, `appkey` string, `identifier` string, `uid` string) PARTITIONED BY ( `pt_month` string, `pt_day` string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' WITH SERDEPROPERTIE ...

Posted on Thu, 07 Feb 2019 10:54:17 -0800 by paradigmapc