Patterns Matching and Sample Classes of spark Notes

Level has a very powerful pattern matching mechanism, which can be applied to many occasions, such as switch statements, type checking and so on. Level also provides sample classes to optimize pattern matching, which can quickly match.1.1. Matching string package cn.itcast.cases import scala.util.Random   object CaseDemo01 extends App{   v ...

Posted on Thu, 22 Aug 2019 00:15:15 -0700 by atsphpflash

Spark Core Custom Sort, Partition

Custom Sorting (Important) Simple data types can be sorted directly in spark, but some complex conditions can be achieved by using custom sorting import org.apache.spark.rdd.RDD import org.apache.spark.{SparkConf, SparkContext} //Custom Sorting obj ...

Posted on Thu, 15 Aug 2019 00:19:09 -0700 by EddieFoyJr

How to dynamically parse Schema of JSON data from Kafka by schema_of_json method in Structured Streaming

How to parse Schema of JSON data from Kafka in Structured Streaming In actual production, the fields in the message may change, such as adding one more field or something, but the Spark program can't stop. So consider that instead of customizing the Schema in the program, infer the Schema through the ...

Posted on Thu, 08 Aug 2019 23:52:18 -0700 by newbienewbie

Introduction to GraphX and GraphFrames Testing

      Overview GraphX is a component of Spark for graph and graph computing. GraphX introduces a new graph abstract data structure by extending Spark RDD, a directed multiple graph that puts valid information into vertices and edges. Like every module of Spark, they have an abstract data struct ...

Posted on Sat, 20 Jul 2019 20:33:38 -0700 by e11rof

Checkpoint process of spark source code analysis

Summary The checkpoint mechanism ensures that the DAG execution chart of Spark, an application that needs to access duplicate data, may be huge, and the computation chain in task may be long. If task fails in the middle, then the whole task needs to be recalculated very time-consuming. Therefore, it is necessary to check point of RDD, which is ...

Posted on Thu, 04 Jul 2019 13:14:08 -0700 by vMan

netty Server Start-Server Bootstrap Source Parsing

netty Server Start-Server Bootstrap Source Parsing In the first article, I analyzed the parameter setting and start-up process of Bootstrap, which is the boot class of netty client in spark. Obviously, we still have another important part - the initialization and startup process of the server side has not been explored, so in this section, we w ...

Posted on Sat, 29 Jun 2019 12:36:44 -0700 by Alkimuz

Message Receipt Solution for XMPP Protocol

Find a way in distress The concept of message receipt is known at the beginning of instant messaging. The purpose is to solve the problem that the message is not delivered to the other party for various reasons and provide a safeguard mechanism. The main reasons for this problem are network instability, server or client anomalies, which lead to ...

Posted on Wed, 26 Jun 2019 11:30:06 -0700 by show8bbs

Tensorflow on Spark Pit Climbing Guide

As machine learning and in-depth learning are becoming more and more popular, Tensorflow, as an open source in-depth learning framework launched by Jeff Dean, has attracted a lot of attention. Tensorflow is flexible, allowing users to use multiple devices (such as different CPU s and GPUs) on multiple machines. However, because Tensorflow distr ...

Posted on Sun, 09 Jun 2019 13:10:45 -0700 by kcengel

Storage System for spark--BlockManager Source Code Analysis

Based on a series of previous analyses, we have a general understanding of the process of spark jobs from creation, dispatch and distribution, to execution, and finally the results are returned to driver.But there are still a lot of problems in the process of analyzing the source code. The main one is the important basic modules involved in spa ...

Posted on Sun, 09 Jun 2019 10:02:32 -0700 by rdimaggio

spark Task Assignment--TaskSchedulerImpl Source Parsing

TaskSchedulerImpl In the previous article, when DAGScheduler divided the entire calculation chain of a job into multiple stages based on the shuffle dependency, it started submitting the last ResultStage, and because of the dependency between stages, it actually ended up submitting stages from top to bottom along the calculation chain.Each stag ...

Posted on Sun, 02 Jun 2019 12:43:11 -0700 by pleigh