IDEA, SparkSql read data in HIve

The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...

Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter

The first lesson of big data on the cloud: MaxCompute authorization and appearance operation hidden pit Guide

1, Sub account creation, AK information binding If you are the first time to log in to digital plus platform and use DataWorks with a sub account, you need to confirm the following information: • the business alias of the primary account to which the sub account belongs. • user name and password of the sub account. • AccessKey ID ...

Posted on Tue, 10 Mar 2020 22:54:52 -0700 by physaux

Installation and deployment of Sqoop

I. overview What is Sqoop? Sqoop (pronunciation: skup) is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from ...

Posted on Tue, 10 Mar 2020 03:58:20 -0700 by drath

Flink SQL client1.10 source code integrates hive in IDEA and runs

Recently, I have been following up on the Flink UU SQL to prepare for a deeper understanding. This article mainly records the process of running the SQL UU client source code~~ For the hadoop, hive and other related environments involved in this article, see the previous article The integration of Flink SQL client 1.10 and hive to read real-ti ...

Posted on Tue, 03 Mar 2020 20:40:29 -0800 by cdhogan

30 offline system assistant tool sqoop -- a good program

1. What is sqoop Apache sqoop (TM) is a tool designed for efficient transferring bulk data between Apache Hadoop and structured datastores such as relational databases Nature: Convert the statement of sqoop to mapreduce task (maptask) Characteristic: Advantages: data integration across platforms Dis ...

Posted on Thu, 27 Feb 2020 22:22:51 -0800 by athyzafiris

Taobao double 11 big data analysis (data preparation)

Article directory Preface Data content analysis `User? Log.csv ` file content meaning `Content meaning of "train.csv" and "test.csv" Upload the data to Linux system and decompress it Data set preprocessing File information interception Import data into Hive Confirm that the Had ...

Posted on Tue, 25 Feb 2020 23:24:24 -0800 by jara06

The usage of partial * * * ByKey in pyspark

Preparation import pyspark from pyspark import SparkContext from pyspark import SparkConf conf=SparkConf().setAppName("lg").setMaster('local[4]') sc=SparkContext.getOrCreate(conf) 1. aggregateByKey The former function in aggregateByKey is a function calculated in each partition, and the latter fun ...

Posted on Sat, 22 Feb 2020 01:35:39 -0800 by YOUAREtehSCENE

[Hadoop big data platform component building series] - Hive component configuration

brief introduction This article introduces the build of Hive component in Hadoop big data platform component (MySQL needs to be built before build Hive) Use software versionapache-hive-1.1.0-bin.tarmysql-connector-java-5.1.47.jar (Baidu cloud extraction code: vk6v) Install Hive Extract hive inst ...

Posted on Fri, 21 Feb 2020 06:06:37 -0800 by chrisuk

How to configure Hive in Springboot? This blog may help you!

Recently, I have just completed a project about big data, and the framework used in this project includes spring boot. Because it is an offline data analysis, Hive is also selected for component selection (Spark or HBase may be used for real-time ) This blog is about how to configure Hive in the spring ...

Posted on Wed, 12 Feb 2020 10:19:54 -0800 by mrmom

Detailed explanation of Hive function and actual combat of case list

1. Built in functions of Hive system 1.1 numerical calculation function 1. Rounding function: round Syntax: round(double a)Return value: BIGINTNote: returns the integer value part of double type (following rounding) hive> select round(3.1415926) from tableName; 3 hive> select round(3.5) from tableName; 4 hive> create table tableName a ...

Posted on Wed, 05 Feb 2020 04:14:30 -0800 by AndrewBacca