How to Rerun Data

Offline tasks, data re-run is a normal thing, such as the program runs and hangs up, such as the data run out is incorrect, need to check after re-run But when you re-run, it's important to note that no data has been written to hbase, or that the partition of hive already has data on that day. If hive h ...

Posted on Thu, 03 Oct 2019 19:06:21 -0700 by aunquarra

hadoop distributed file system

1. single version dhcp The source address is 0..0.0.0,The target address is 255.255.255.255, //The ports are UDP67 and UDP68, one sending and one receiving. Client to port 68 (bootps) //Broadcast request configuration, the server broadcasts the response request to port 67 (bootpc). //Default format ...

Posted on Wed, 02 Oct 2019 19:19:24 -0700 by azunoman

Hadoop Series - Distributed Computing Framework MapReduce

1. Overview of MapReduce Hadoop MapReduce is a distributed computing framework for writing batch applications.Written programs can be submitted to the Hadoop cluster for parallel processing of large datasets. The MapReduce job splits the input dataset into separate blocks, which are processed by the map in parallel, and the framework sorts the ...

Posted on Fri, 13 Sep 2019 09:21:27 -0700 by tharagleb

DirectByteBuffer and File IO Details

In the java.nio package is a new API that Java uses to process IO. It uses channel, select and other models to re-implement IO operations. DirectByteBuffer is one of the classes under the nio package.This class is used to save byte arrays, in particular because it stores data in out-of-heap memory.Unlike traditional objects, objects are in the ...

Posted on Thu, 05 Sep 2019 17:50:11 -0700 by mulysa

Hadoop Cluster Modification: Adjusting Cluster Version

Links to the original text: http://www.cnblogs.com/DamianZhou/p/4184026.html Catalog Hadoop Cluster Modification and Cluster Version Adjustment Modification Notes Detailed steps 1. JDK modification 2 ...

Posted on Wed, 17 Jul 2019 13:02:17 -0700 by pesoto74

A Cluster Load Scoring Method for HBase Load Balancing

HMater is responsible for homogenizing regions into each region server. One of the threaded tasks in the hmaster is dedicated to balancing and is executed every five minutes by default. Each load balancing operation can be divided into two steps: Generating Load Balancing Schedule Assignment Manager class execution schedule Let's go into ...

Posted on Sat, 13 Jul 2019 15:15:04 -0700 by phpnewbie8

Elastic search learning summary 6: using Observer to synchronize data from HBase to Elastic search

Recently, in the company's unified log collection and processing platform, the choice of technology must be elastic search, because it can quickly retrieve system logs, log problem checking and power business chain calls can be quickly retrieved. Some fields of the company's application logs, such as content, do not need to be stored in es. A ...

Posted on Mon, 24 Jun 2019 17:09:12 -0700 by Tr4mpldUndrfooT

Hbase operation table and Java API

Hbase List Use the list command to list all tables hbase(main):001:0 > list Listing tables using Java API What follows is the use Java API The program lists all HBase List of tables in the table. import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.hbase.HBaseConfiguration; import org.apach ...

Posted on Fri, 17 May 2019 16:29:48 -0700 by po

spark persistence and shared variables

1. Persistence operator cache Introduction: Normally, an RDD does not contain real data, but only contains metadata information describing the RDD. If the cache method is called on the RDD, then the data of the RDD still has no real data. Until the first call of an action operator triggers the data generation of the RDD, then the cache operati ...

Posted on Sun, 05 May 2019 01:32:37 -0700 by techker

Big Data Development Project-Telecom Project 2-Transmission Data

Article directory 1. Configuring flume files 2. Data Acquisition Part Gets Through 2.1 Start zookeeper and cluster 2.2 Start kafka cluster 2.3 Start flume Cluster 2.4 Production data 3 Data Consumption Environment Preparedness 3.1 Add maven configuration 3.2 Add maven configuration 4 Consumer Data Tools 4.1 PropertiesUti ...

Posted on Mon, 22 Apr 2019 18:06:34 -0700 by softnmedia