Page single hop conversion module

1, Demand analysis 1. Get the taskid from the spark submit script submitted by the user and get the parameters of the task 2. Obtain the data within the specified date range for calculation, and obtain the page slice flow to calculate the conversion ratio of visits between pages. Like targe tPageFlow: ...

Posted on Fri, 05 Jun 2020 00:03:52 -0700 by defunct

How to read and write Aliyun Hbase using MaxCompute Spark

background Spark on MaxCompute has access to instances (e.g. ECS, HBase, RDS) within the VPC in the Ali cloud. The default MaxCompute underlying network is isolated from the external network, and Spark on MaxCompute provides a solution through configurationSpark.hadoop.odps.Cupid.vpc.domain.list to access the Hbase of Ali Cloud's VPC network e ...

Posted on Mon, 01 Jun 2020 23:53:10 -0700 by Assorro

Training DeepFM under PAI-Notebook

Training DeepFM under PAI-Notebook It should be said that DeepFM is one of the most common CTR prediction models at present. For a recommendation system based on CTR estimation, the most important thing is to learn the combination of features behind user click behavior.In different recommended scenarios, low-order or high-order combinatorial fe ...

Posted on Thu, 14 May 2020 20:06:07 -0700 by ZaZall

PySaprk saves DataFrame data as Hive partition table

Create a SparkSession from pyspark.sql import SparkSession spark = SparkSession.builder.enableHiveSupport().appName('test_app').getOrCreate() sc = spark.sparkContext hc = HiveContext(sc) 1. Spark creates partition table # You can change append to overwrite, so that if the table already exists, the previous table will be deleted and a ...

Posted on Mon, 11 May 2020 01:18:45 -0700 by [xNet]DrDre

node+js for large file fragment upload

1. What is fragment upload Fragment upload is the transmission of a large file into several blocks, one by one. The benefit of doing so can reduce the overhead of re uploading. For example: if the file we upload is a large file, the upload time should be long. In addition to the influence of various factors of network instability, it is easy t ...

Posted on Tue, 05 May 2020 15:23:53 -0700 by affordit

cdh6.3.2 configure Hive on Spark

Environment: Dell xps15 (32G memory, 1T solid state, Samsung 1T mobile solid state with external lightning 3 interface, 4T external mechanical hard disk of WD Elements) win10 three Centos7 virtual machines are used to test cdh6.3.2 cluster (the highest version of free community version), self compiled Phoenix 5.1.0, flink1.10.0, elasticsearch6. ...

Posted on Tue, 05 May 2020 05:12:05 -0700 by mwichmann4

IDEA, SparkSql read data in HIve

The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...

Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter

Cupid Console: a powerful tool for Maxcompute Spark job management and control

Background At present, Maxcompute platform can support running spark jobs. Spark jobs rely on the Cupid platform of Maxcompute, which can be submitted to Maxcompute for running in a community compatible way. It supports reading and writing Maxcompute tables, sharing Project resources with the original SQL/MR Jobs on Maxcompute. Please refer to ...

Posted on Tue, 03 Mar 2020 00:44:54 -0800 by edmore

Taobao double 11 big data analysis (Spark analysis)

Article directory Preface test.csv and train.csv data preprocessing Processing of test.csv file Processing of train.csv file Spark processes data execution environment Upload files to HDFS MySQL preparation Launch Spark Shell Prediction of repeat customers by SVM classifier Output results to mysql ...

Posted on Wed, 26 Feb 2020 22:30:09 -0800 by artic

Spark command details

In this blog post, Alice gives you more details about Spark commands. spark-shell Introduce Previously, we used spark-shell to submit tasks. spark-shell is the interactive Shell program that comes with Spark, which makes it easy for users to program interactively. Users can write spark programs with ...

Posted on Thu, 20 Feb 2020 17:41:05 -0800 by Cut