Hadoop example: calculating the total amount of stock transactions

The optional Cloud Computing and Big Data Overview final job requires a stock case study with the following specific requirements: Case: Attachment Document TextData.txtThe file (shown in Fig. 1) shows the trading data of the daily stocks from 2011-1 to today, the trading data of the daily stocks, the t ...

Posted on Thu, 04 Jun 2020 11:54:49 -0700 by poppy

Using skills of Flume

1. Flume overview Flume is a distributed system for massive log collection, aggregation and transmission. Flume's main function is to read the data of the server's local disk in real time and write the data to HDFS. Agent: send data from Source to destination in the form of events. Including Source, ...

Posted on Thu, 04 Jun 2020 10:59:09 -0700 by jamz310

How to read and write Aliyun Hbase using MaxCompute Spark

background Spark on MaxCompute has access to instances (e.g. ECS, HBase, RDS) within the VPC in the Ali cloud. The default MaxCompute underlying network is isolated from the external network, and Spark on MaxCompute provides a solution through configurationSpark.hadoop.odps.Cupid.vpc.domain.list to access the Hbase of Ali Cloud's VPC network e ...

Posted on Mon, 01 Jun 2020 23:53:10 -0700 by Assorro

Build hive clusters based on different versions of Hadoop (with configuration files)

Focus on Public Number: Alliance of Java Architects, Daily Technical Updates This tutorial uses two scenarios One is hive-1.21 and hadoop is hadoop 2.6.5 Another is mainly about the construction based on hadoop3.x hive   First come first 1. Local (embedded derby) step This storage requires running a mysql server locally and configuring ...

Posted on Thu, 28 May 2020 09:18:13 -0700 by adam119

Big data learning (1) Hadoop installation

Cluster architecture The installation of Hadoop is actually the configuration of HDFS and YARN cluster. As can be seen from the following architecture diagram, each data node of HDFS needs to be configured with the location of NameNode. Similarly, every NodeManager in YARN needs to configure the location of ResourceManager. NameNode and Resour ...

Posted on Mon, 18 May 2020 08:25:53 -0700 by EZbb

Using the wagon Maven plugin plug-in to automatically deploy a project

The maven dependency of this plug-in is: <dependency>       <groupId>org.codehaus.mojo</groupId>       <artifactId>wagon-maven-plugin</artifactId>       <version>1.0</version>   </dependency>   The document address of the plug-in is: http://www.mojohaus.org/wagon-maven-plugin/ ...

Posted on Sun, 03 May 2020 15:55:34 -0700 by Sanoz0r

Build a fully distributed Hadoop2.6 environment under Centos7

1. Download the Hadoop package and JDK 1. Download Hadoop address: https://archive.apache.org/dist/hadoop/common/hadoop-2.6.4/hadoop-2.6.4.tar.gz2. Download jdk: Link: https://pan.baidu.com/s/1lbu7eBEtgjeGIi2bWthLnA Extraction code: 0j0j 2. Preparing virtual machines 1. Create a new virtual machine (Centos7) in VMware, which is omitted. 2. ...

Posted on Sat, 02 May 2020 21:01:08 -0700 by calavera

Detailed steps for Hadoop installation

Write before If you want to successfully build a Hadoop cluster locally through this blog, you need to follow the video course first Three-day Starter Big Data Practice Course Build a local cluster environment. The chapters you need to learn in this video lesson are: Course objectives VMWare WorkStation Installation Create Virtual Machine Ins ...

Posted on Tue, 28 Apr 2020 10:28:10 -0700 by Twentyoneth

Day02 -- list, tuple, dictionary and set of Python data types

List in python #List#List class, list#Enclosed in brackets, separated by commas, the elements in the list can be numbers, strings, lists, Booleans, etc.#Lists can also be nested =========Basic operation of list=========(1) Common operations of list list1 = [11,22,33,44,55] # len Number of elements to view the list print(l ...

Posted on Thu, 23 Apr 2020 08:36:01 -0700 by djr587

IDEA, SparkSql read data in HIve

The traditional hive computing engine is MapReduce. After Spark 1.3, SparkSql was officially released, and it is basically compatible with apache hive. Based on the powerful computing power of Spark, the data processing speed of using Spark to process hive is far faster than that of traditional hive.Using SparkSql in idea to read the data in H ...

Posted on Mon, 30 Mar 2020 14:23:09 -0700 by bl00dshooter