Using skills of Flume

1. Flume overview Flume is a distributed system for massive log collection, aggregation and transmission. Flume's main function is to read the data of the server's local disk in real time and write the data to HDFS. Agent: send data from Source to destination in the form of events. Including Source, ...

Posted on Thu, 04 Jun 2020 10:59:09 -0700 by jamz310

Build hive clusters based on different versions of Hadoop (with configuration files)

Focus on Public Number: Alliance of Java Architects, Daily Technical Updates This tutorial uses two scenarios One is hive-1.21 and hadoop is hadoop 2.6.5 Another is mainly about the construction based on hadoop3.x hive   First come first 1. Local (embedded derby) step This storage requires running a mysql server locally and configuring ...

Posted on Thu, 28 May 2020 09:18:13 -0700 by adam119

PySaprk saves DataFrame data as Hive partition table

Create a SparkSession from pyspark.sql import SparkSession spark = SparkSession.builder.enableHiveSupport().appName('test_app').getOrCreate() sc = spark.sparkContext hc = HiveContext(sc) 1. Spark creates partition table # You can change append to overwrite, so that if the table already exists, the previous table will be deleted and a ...

Posted on Mon, 11 May 2020 01:18:45 -0700 by [xNet]DrDre

cdh6.3.2 configure Hive on Spark

Environment: Dell xps15 (32G memory, 1T solid state, Samsung 1T mobile solid state with external lightning 3 interface, 4T external mechanical hard disk of WD Elements) win10 three Centos7 virtual machines are used to test cdh6.3.2 cluster (the highest version of free community version), self compiled Phoenix 5.1.0, flink1.10.0, elasticsearch6. ...

Posted on Tue, 05 May 2020 05:12:05 -0700 by mwichmann4

IDEA, SparkSql read data in HIve

The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...

Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter

The first lesson of big data on the cloud: MaxCompute authorization and appearance operation hidden pit Guide

1, Sub account creation, AK information binding If you are the first time to log in to digital plus platform and use DataWorks with a sub account, you need to confirm the following information: • the business alias of the primary account to which the sub account belongs. • user name and password of the sub account. • AccessKey ID ...

Posted on Tue, 10 Mar 2020 22:54:52 -0700 by physaux

Installation and deployment of Sqoop

I. overview What is Sqoop? Sqoop (pronunciation: skup) is an open source tool, mainly used in Hadoop(Hive) and traditional databases (mysql, postgresql )For data transfer, you can import data from a relational database (such as mysql, Oracle, Postgres, etc.) into HDFS of Hadoop, or import data from ...

Posted on Tue, 10 Mar 2020 03:58:20 -0700 by drath

Flink SQL client1.10 source code integrates hive in IDEA and runs

Recently, I have been following up on the Flink UU SQL to prepare for a deeper understanding. This article mainly records the process of running the SQL UU client source code~~ For the hadoop, hive and other related environments involved in this article, see the previous article The integration of Flink SQL client 1.10 and hive to read real-ti ...

Posted on Tue, 03 Mar 2020 20:40:29 -0800 by cdhogan

30 offline system assistant tool sqoop -- a good program

1. What is sqoop Apache sqoop (TM) is a tool designed for efficient transferring bulk data between Apache Hadoop and structured datastores such as relational databases Nature: Convert the statement of sqoop to mapreduce task (maptask) Characteristic: Advantages: data integration across platforms Dis ...

Posted on Thu, 27 Feb 2020 22:22:51 -0800 by athyzafiris

Taobao double 11 big data analysis (data preparation)

Article directory Preface Data content analysis `User? Log.csv ` file content meaning `Content meaning of "train.csv" and "test.csv" Upload the data to Linux system and decompress it Data set preprocessing File information interception Import data into Hive Confirm that the Had ...

Posted on Tue, 25 Feb 2020 23:24:24 -0800 by jara06