The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...
Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter
1, Sub account creation, AK information binding If you are the first time to log in to digital plus platform and use DataWorks with a sub account, you need to confirm the following information: • the business alias of the primary account to which the sub account belongs. • user name and password of the sub account. • AccessKey ID ...
Posted on Tue, 10 Mar 2020 22:54:52 -0700 by physaux
What is Sqoop?
Sqoop (pronunciation: skup) is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from ...
Posted on Tue, 10 Mar 2020 03:58:20 -0700 by drath
Recently, I have been following up on the Flink UU SQL to prepare for a deeper understanding. This article mainly records the process of running the SQL UU client source code~~
For the hadoop, hive and other related environments involved in this article, see the previous article The integration of Flink SQL client 1.10 and hive to read real-ti ...
Posted on Tue, 03 Mar 2020 20:40:29 -0800 by cdhogan
1. What is sqoop
Apache sqoop (TM) is a tool designed for efficient transferring bulk data between Apache Hadoop and structured datastores such as relational databases
Convert the statement of sqoop to mapreduce task (maptask)
Advantages: data integration across platforms
Posted on Thu, 27 Feb 2020 22:22:51 -0800 by athyzafiris
Data content analysis
`User? Log.csv ` file content meaning
`Content meaning of "train.csv" and "test.csv"
Upload the data to Linux system and decompress it
Data set preprocessing
File information interception
Import data into Hive
Confirm that the Had ...
Posted on Tue, 25 Feb 2020 23:24:24 -0800 by jara06
from pyspark import SparkContext
from pyspark import SparkConf
The former function in aggregateByKey is a function calculated in each partition, and the latter fun ...
Posted on Sat, 22 Feb 2020 01:35:39 -0800 by YOUAREtehSCENE
This article introduces the build of Hive component in Hadoop big data platform component (MySQL needs to be built before build Hive)
Use software versionapache-hive-1.1.0-bin.tarmysql-connector-java-5.1.47.jar (Baidu cloud extraction code: vk6v)
Extract hive inst ...
Posted on Fri, 21 Feb 2020 06:06:37 -0800 by chrisuk
Recently, I have just completed a project about big data, and the framework used in this project includes spring boot. Because it is an offline data analysis, Hive is also selected for component selection (Spark or HBase may be used for real-time ) This blog is about how to configure Hive in the spring ...
Posted on Wed, 12 Feb 2020 10:19:54 -0800 by mrmom
1. Built in functions of Hive system
1.1 numerical calculation function
1. Rounding function: round
Syntax: round(double a)Return value: BIGINTNote: returns the integer value part of double type (following rounding)
hive> select round(3.1415926) from tableName;
hive> select round(3.5) from tableName;
hive> create table tableName a ...
Posted on Wed, 05 Feb 2020 04:14:30 -0800 by AndrewBacca