Hadoop ecosystem - MapReduce Job submission source code analysis
1. Debug environment preparation
1.1 Debug code: MR classic introduction case WordCount
1.1.1 Mapper class
public class WordCountMapper extends Mapper<LongWritable, Text,Text,LongWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String ...
Added by zuzupus on Sun, 06 Feb 2022 08:42:00 +0200
Spark chasing Wife Series (RDD of Value type)
Today is the third day of the lunar new year. Monkey Sai Lei
Small talk
These days, I send her a red envelope every night, a new year's red envelope, and an expression package can be added. I feel that the Chinese New Year is good and there is no new year flavor. My throat hurts when I eat melon seeds.
There are many operators in Spark, in ...
Added by Rincewind on Thu, 03 Feb 2022 13:40:07 +0200
Atlas installation of big data components based on Apache version
Atlas2.1.0 detailed installation record of big data components based on Apache open source version (test environment)
Note: Atlas installation refers to a large number of online materials. This record is only used for future convenience. If there is infringement in this article, please contact immediately.
Component version
Component nameCom ...
Added by Dark.Munk on Wed, 02 Feb 2022 06:26:47 +0200
Hadoop (Introduction) overview, operation environment construction and operation mode of big data technology
1 Hadoop overview
1.1 what is Hadoop
(1) Hadoop is a distributed system infrastructure developed by the Apache foundation (2) It mainly solves the problems of massive data storage and massive data analysis and calculation (3) In a broad sense, Hadoop usually refers to a broader concept - Hadoop ecosystem
1.2 Hadoop advantages
(1) High ...
Added by cuongvt on Tue, 01 Feb 2022 04:04:45 +0200
Hadoop - quick start
Big data has to mention the most useful weapon Hadoop. This article is the fastest way for you to get started with Hadoop. Hadoop has a quick introduction and a perceptual understanding, which can also be used as a quick index of steps. This article solves the following problems:
Understand what Hadoop isWhat is Hadoop used for and how to use ...
Added by malcx on Tue, 01 Feb 2022 00:53:13 +0200
hadoop related issues
1, mapreduce job oom 1. If the task has not been started, an error will be directly oom reported
AM journal:
21/05/10 15:15:13 INFO mapreduce.Job: Task Id : attempt_1617064346277_101596_m_000000_1, Status : FAILEDError: Java heap space21/05/10 15:15:16 INFO mapreduce.Job: Task Id : attempt_1617064346277_101596_m_000000_2, Status : FAILEDError: ...
Added by dercof on Mon, 31 Jan 2022 18:08:20 +0200
hive partition notes
hive partition
1. Primary zoning
A partition in Hive is a subdirectory. It is basically consistent with the slice in map. Map slicing is also to improve parallelism. Open the data in the table separately. When you check the data in the table, write the partition information to avoid scanning the whole table; It is an optimized scheme.
The pa ...
Added by jeff21 on Sat, 29 Jan 2022 17:12:37 +0200
Some experience of using Hadoop
Some experience on the use of HDFS
Write before:
I've been working on big data in the company for some time. Take time to sort out the problems encountered and some better optimization methods.
1.HDFS storage multi directory
1.1 production server disk
1.2 on HDFS site Configure multiple directories in the XML file, and pay attention t ...
Added by Soldier Jane on Fri, 28 Jan 2022 02:06:47 +0200
Import and processing of business data in offline data warehouse
I don't know if it's good or not. I'll try my best to tell it
Data synchronization
The previous article talked about using Sqoop to export the data of Mysql and Hdfs to each other.
This is the second chapter of offline warehouse. It is about the processing of business data.
The basic business data of offline data warehouse are stored in M ...
Added by kurtsu on Thu, 27 Jan 2022 19:00:14 +0200
Hadoop ecosystem - HDFS small file solution
preface
Part of the content is extracted from the training materials of shangsilicon Valley, dark horse and so on
1. Hadoop Archive
HDFS is not good at storing small files, because each file has at least one block, and the metadata of each block will occupy memory in the NameNode. If there are a large number of small files ...
Added by dstantdog3 on Tue, 25 Jan 2022 10:12:58 +0200