The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...
Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter
At present, Maxcompute platform can support running spark jobs. Spark jobs rely on the Cupid platform of Maxcompute, which can be submitted to Maxcompute for running in a community compatible way. It supports reading and writing Maxcompute tables, sharing Project resources with the original SQL/MR Jobs on Maxcompute. Please refer to ...
Posted on Tue, 03 Mar 2020 00:44:54 -0800 by edmore
test.csv and train.csv data preprocessing
Processing of test.csv file
Processing of train.csv file
Spark processes data
Upload files to HDFS
Launch Spark Shell
Prediction of repeat customers by SVM classifier
Output results to mysql ...
Posted on Wed, 26 Feb 2020 22:30:09 -0800 by artic
In this blog post, Alice gives you more details about Spark commands.
Previously, we used spark-shell to submit tasks. spark-shell is the interactive Shell program that comes with Spark, which makes it easy for users to program interactively. Users can write spark programs with ...
Posted on Thu, 20 Feb 2020 17:41:05 -0800 by Cut
What is Checkpoint?
Spark often faces a lot of RDDS of Tranformation in the production environment (for example, a Job contains 10000 RDDS), or the calculation of RDDS generated by specific Tranformation is particularly complex and time-consuming (for example, the calculation often exce ...
Posted on Sat, 15 Feb 2020 21:22:45 -0800 by jf3000
As we all know, the life cycle method of Master is: constructor - > onStart - > receive * - > onstop; but there is no direct call to onStart in Master's main method, so when is the onStart method called?
This is actually related to the underlying Netty communication architecture of Spark.
In th ...
Posted on Fri, 07 Feb 2020 08:25:08 -0800 by jib
Python machine learning
3-day quick start python machine learning in 2018 [dark horse programmer]
(2) Characteristic Engineering
1. Dictionary feature extraction
from sklearn.feature_extraction import DictVectorizer
//Dictionary feature extraction
Posted on Mon, 03 Feb 2020 09:03:03 -0800 by MitchEvans
scala> var textFile = sc.textFile("file:///root/1.txt")
textFile: org.apache.spark.rdd.RDD[String] = file:///root/1.txt MapPartitionsRDD at textFile at <console>:24
Posted on Wed, 29 Jan 2020 05:34:55 -0800 by dizel247
Type judgment using pattern matching
In practical development, such as spark's source code, a lot of places use pattern matching to make type judgment, which is more concise and clear, and the code is very maintainable and scalable
With pattern matching, functionally, just like isInstanceOf, it is sufficient to judge objects that are predomina ...
Posted on Mon, 27 Jan 2020 19:59:45 -0800 by psyion
Data Analysis and Forecast of Taobao Shuang11
The system and software involved in this case:
Linux System (CENTOS 7)
Posted on Thu, 23 Jan 2020 01:15:49 -0800 by designedfree4u