IDEA, SparkSql read data in HIve

The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...

Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter

Cupid Console: a powerful tool for Maxcompute Spark job management and control

Background At present, Maxcompute platform can support running spark jobs. Spark jobs rely on the Cupid platform of Maxcompute, which can be submitted to Maxcompute for running in a community compatible way. It supports reading and writing Maxcompute tables, sharing Project resources with the original SQL/MR Jobs on Maxcompute. Please refer to ...

Posted on Tue, 03 Mar 2020 00:44:54 -0800 by edmore

Taobao double 11 big data analysis (Spark analysis)

Article directory Preface test.csv and train.csv data preprocessing Processing of test.csv file Processing of train.csv file Spark processes data execution environment Upload files to HDFS MySQL preparation Launch Spark Shell Prediction of repeat customers by SVM classifier Output results to mysql ...

Posted on Wed, 26 Feb 2020 22:30:09 -0800 by artic

Spark command details

In this blog post, Alice gives you more details about Spark commands. spark-shell Introduce Previously, we used spark-shell to submit tasks. spark-shell is the interactive Shell program that comes with Spark, which makes it easy for users to program interactively. Users can write spark programs with ...

Posted on Thu, 20 Feb 2020 17:41:05 -0800 by Cut

Analysis of spark checkpoint principle and source code

I. overview What is Checkpoint? Spark often faces a lot of RDDS of Tranformation in the production environment (for example, a Job contains 10000 RDDS), or the calculation of RDDS generated by specific Tranformation is particularly complex and time-consuming (for example, the calculation often exce ...

Posted on Sat, 15 Feb 2020 21:22:45 -0800 by jf3000

When is the onStart() method of Spark source code analysis Master called?

As we all know, the life cycle method of Master is: constructor - > onStart - > receive * - > onstop; but there is no direct call to onStart in Master's main method, so when is the onStart method called? This is actually related to the underlying Netty communication architecture of Spark. In th ...

Posted on Fri, 07 Feb 2020 08:25:08 -0800 by jib

Machine learning feature Engineering

Python machine learning 3-day quick start python machine learning in 2018 [dark horse programmer] (2) Characteristic Engineering 1. Dictionary feature extraction from sklearn.feature_extraction import DictVectorizer def dict_demo(): ''' //Dictionary feature extraction :return: ' ...

Posted on Mon, 03 Feb 2020 09:03:03 -0800 by MitchEvans

RDD programming learning note 3 data reading and writing

Local read scala> var textFile = sc.textFile("file:///root/1.txt") textFile: org.apache.spark.rdd.RDD[String] = file:///root/1.txt MapPartitionsRDD[57] at textFile at <console>:24 scala> textFile.saveAsTextFile("file:///root/writeback") scala> textFile.foreach(println) hadoop hello bi ...

Posted on Wed, 29 Jan 2020 05:34:55 -0800 by dizel247

"Class - Basic Concept 3" Learned by Scala

Type judgment using pattern matching In practical development, such as spark's source code, a lot of places use pattern matching to make type judgment, which is more concise and clear, and the code is very maintainable and scalable With pattern matching, functionally, just like isInstanceOf, it is sufficient to judge objects that are predomina ...

Posted on Mon, 27 Jan 2020 19:59:45 -0800 by psyion

Forest Rain Case--Analysis of Taobao Fake Data

Data Analysis and Forecast of Taobao Shuang11 Dead work: software tool The system and software involved in this case: Linux System (CENTOS 7) MySQL Tomcat(7.0.9) Hadoop(3.2.0) Hive(2.3.5) Sqoop(1.4.6) ECharts(4.5.0) Idea(2019.1.3) Spark(2.3.1)          ...

Posted on Thu, 23 Jan 2020 01:15:49 -0800 by designedfree4u