What is Checkpoint?
Spark often faces a lot of RDDS of Tranformation in the production environment (for example, a Job contains 10000 RDDS), or the calculation of RDDS generated by specific Tranformation is particularly complex and time-consuming (for example, the calculation often exce ...
Posted on Sat, 15 Feb 2020 21:22:45 -0800 by jf3000
As we all know, the life cycle method of Master is: constructor - > onStart - > receive * - > onstop; but there is no direct call to onStart in Master's main method, so when is the onStart method called?
This is actually related to the underlying Netty communication architecture of Spark.
In th ...
Posted on Fri, 07 Feb 2020 08:25:08 -0800 by jib
Python machine learning
3-day quick start python machine learning in 2018 [dark horse programmer]
(2) Characteristic Engineering
1. Dictionary feature extraction
from sklearn.feature_extraction import DictVectorizer
//Dictionary feature extraction
Posted on Mon, 03 Feb 2020 09:03:03 -0800 by MitchEvans
scala> var textFile = sc.textFile("file:///root/1.txt")
textFile: org.apache.spark.rdd.RDD[String] = file:///root/1.txt MapPartitionsRDD at textFile at <console>:24
Posted on Wed, 29 Jan 2020 05:34:55 -0800 by dizel247
Type judgment using pattern matching
In practical development, such as spark's source code, a lot of places use pattern matching to make type judgment, which is more concise and clear, and the code is very maintainable and scalable
With pattern matching, functionally, just like isInstanceOf, it is sufficient to judge objects that are predomina ...
Posted on Mon, 27 Jan 2020 19:59:45 -0800 by psyion
Data Analysis and Forecast of Taobao Shuang11
The system and software involved in this case:
Linux System (CENTOS 7)
Posted on Thu, 23 Jan 2020 01:15:49 -0800 by designedfree4u
This is the ninth article of Spring Cloud column. Understanding the contents of the first eight articles will help you better understand this article:
Introduction to Spring Cloud and its common components
Spring Cloud Part 2 use and know Eureka registry
Spring Cloud Part 3: building a highly available Eureka registry
Spring Cloud Par ...
Posted on Thu, 19 Dec 2019 03:08:19 -0800 by cdherold
Posted on Thu, 12 Dec 2019 09:10:21 -0800 by mcbeckel
1. Introduction of Spark
Spark 1.2.0 uses Scala 2.10 to write applications. You need to use a compatible version of scala (for example: 2.10.X).
When writing spark application, you need to add Maven dependency of spark. Spark can be obtained through Maven central warehouse:
groupId = org.apache.spark
artifactId = spark-core_ ...
Posted on Thu, 12 Dec 2019 06:48:43 -0800 by ch3m1st
The relation data is the edge of from - > to
Set data format to Long
spark computing network: some algorithms cannot be implemented or the cost is too high due to the large amount of data
In order to reduce the calculation pressure or optimize the calculation method, the isolated data relations in the whole relationship n ...
Posted on Wed, 11 Dec 2019 09:39:39 -0800 by misheck