Data Processing Delay (GC pool) Caused by hbase GC

In the process of time maneuvering, judging by adding time stamp to the program, when inquiring HBase in batches, there will be some data inquiry time more than 8s, which is definitely unacceptable. Looking at hbase's logs, we can find the reasons. There are similar logs'GC pool'. It is speculated th ...

Added by muadzir on Fri, 04 Oct 2019 22:23:57 +0300

Intelligent Mail Marketing Using Markov Model in MapReduce

Intelligent Mail Marketing Using Markov Model in MapReduce (II) In this blog, we use MapReduce Computing Framework to generate the following output for each customer-id customerID (Date1Date_1Date1​,Amount1Amount_1Amount1​) ; (Date2Date_2Date2​,Amount2Amount_2Amount2​);...(DateNDate_NDateN​,AmountNAmo ...

Added by cristal777 on Tue, 01 Oct 2019 12:12:56 +0300

Hadoop Big Data: Combiner/serialization/sorting in mapreduce

Combiner in mapreduce (1) combiner is a component other than Mapper and Reducer in MR programs (2) The parent class of combiner components is Reducer (3) The difference between Combiner and reducer lies in the location of operation: Combiner runs at every maptask node Reducer receives the output of a ...

Added by northcave on Mon, 30 Sep 2019 23:29:29 +0300

K-Mean Clustering of MapReduce (End)

KKK-Mean Clustering of MapReduce (End) In the last blog K-Mean Clustering of MapReduce (I) In this paper, the basic principle of KKK-means clustering algorithm is introduced, and then how to use MapReduce to implement KKK-means clustering algorithm is described. MapReduce solution The MapReduce solu ...

Added by ak_mypayday on Sat, 21 Sep 2019 11:03:09 +0300

Hadoop Series: Building Hadoop High Availability Cluster Based on ZooKeeper

Introduction to High Availability High Availability of Hadoop can be divided into HDFS high availability and YARN high availability. Their implementation is basically similar, but HDFS NameNode requires much higher data storage and consistency than YARN Resource Manger, so its implementation is more complex. So let's explain the following: 1.1 ...

Added by enterume on Tue, 17 Sep 2019 16:53:49 +0300

Big Data Series-Learning of hdfs

1. HDFS (Distributed File System) 1.1 Distributed File System When the size of a data set exceeds the storage capacity of an independent computer, it is necessary to store the data set through multiple machines in the network. A file system compo ...

Added by HuggieBear on Thu, 12 Sep 2019 05:30:13 +0300

Installation and Configuration of Hive

In order to explore the mystery and greatness of Hive, we embarked on the road of learning Hive, the good and bad of this tool, let alone install Hive first... We use MySQL to store Hive's metadata Metastore, so install MySQL first. The specific ...

Added by alpachino on Sat, 07 Sep 2019 15:05:02 +0300

Implementing MapReaduce Complex Case in JDEA

Let's implement a complex case. Find out the best friends between two. A:B,C,D,F,E,O B:A,C,E,K C:F,A,D,I D:A,E,F,L E:B,C,D,M,L F:A,B,C,D,E,O,M G:A,C,D,E,F H:A,C,D,E,O I:A,O J:B,O K:A,C,D L:D,E,F M:E,F,G O:A,H,I,J /* The map function in the first st ...

Added by asgsoft on Tue, 03 Sep 2019 16:38:46 +0300

Construction of High Availability HA

Configure High Availability 1. Install zookeeper 2. Edit zoo_cfg in the conf folder under the installation zookeeper directory If not, copy zoo_ (add to the ip address of the three machines, create a directory, create myid under the directory, ...

Added by EviL_CodE on Thu, 29 Aug 2019 15:35:47 +0300

flume Learning - Including Installation

1. What is Flume: Flume is a highly available, highly reliable, distributed system for collecting, aggregating and transferring massive logs provided by Cloudera. Flume is based on streaming architecture, flexible and simple. Flume Composition A ...

Added by SeenGee on Wed, 28 Aug 2019 12:43:54 +0300