Hadoop + spark big data analysis: Hadoop cluster construction
Article catalogue
preface
1, Download and configuration of cluster environment
1. Download hadoop
2. Configure hadoop environment variables
Configure hadoop core environment
Configure core site xml
Configure HDFS site xml
Configure mapred site xml
Configure yarn site xml
Configure workers
Disable firewall
2, Clone ...
Added by jonniejoejonson on Tue, 08 Feb 2022 05:25:06 +0200
CDH6.1. Upgrade Impala to version 3.4 to enable auto refresh metadata function and Its Solutions
At cdh6 Version 1 we try on cdh6 In version 1, Impala was upgraded and the function of automatically refreshing metadata was enabled. Some problems were encountered during this period. They were finally solved by checking the log, source code, Google and so on. Use this article to sort it out and give back to the community.
The main reference ...
Added by gwydionwaters on Tue, 08 Feb 2022 02:43:35 +0200
1 line of code climb CSDN hot list, Python ha beer style writing
Eraser, a funny senior Internet bug
Project background
Group Friends: sister wipe, how many lines of code can CSDN hot list data climb at least? Sister wipe: it's estimated to be 10. Group Friends: oh baby, show me your code!
This is how the project needs to climb the CSDN hot list with the least number of lines of code.
The import module ...
Added by J@ystick_FI on Mon, 07 Feb 2022 10:09:55 +0200
Flink de duplication scheme
Flink heavy
De duplication calculation should be a common indicator calculation in data analysis business, such as the number of users visiting the website in a day, the number of users clicking on advertisements, etc. offline calculation is a full and one-time calculation process, and the de duplication results can usually be obtained by dist ...
Added by cowboy_x on Mon, 07 Feb 2022 05:46:01 +0200
ES introduction learning notes
Introduction:
ES is a non relational database of distributed documents (a document is similar to a single record in a relational database). Each field of the document will be indexed by default, and the data of each field can be searched. It can be horizontally extended to hundreds of servers to store and process PB level data. ES is based on ...
Added by daloss on Mon, 07 Feb 2022 03:25:07 +0200
Spark learning notes [1]-scala environment installation and basic syntax
Spark learning notes [1]-scala environment installation and basic syntax
just as the saying goes, if you want to do a good job, you must first use your tools. Spark's development language is not Java but scala. Although they both run on the JVM, the basic characteristics of the two languages are still somewhat different. Here is a ...
Added by GateGuardian on Sun, 06 Feb 2022 08:36:12 +0200
Apache hudi source code analysis - zorder layout optimization
This article aims to gradually get familiar with the implementation of the overall architecture of hudi through a certain function, and will not discuss the implementation details of the algorithmhudi newcomer, if you have any questions, please correct themspark : version, 3.1.2 hudi : branch, masterTime: 2022/02/06 First EditionObjective: to r ...
Added by blakey on Sun, 06 Feb 2022 06:26:03 +0200
Flink real-time data warehouse of big data project (DWM layer)
Design ideas
In the past, we split the data into independent Kafka topics through diversion and other processing methods. Next, when processing the data, we should consider processing the index items used in real-time calculation. Timeliness is the pursuit of real-time data warehouse. Therefore, in some scenarios, it is not necessary to have a ...
Added by SteveMellor on Thu, 03 Feb 2022 21:34:05 +0200
Spark chasing Wife Series (RDD of Value type)
Today is the third day of the lunar new year. Monkey Sai Lei
Small talk
These days, I send her a red envelope every night, a new year's red envelope, and an expression package can be added. I feel that the Chinese New Year is good and there is no new year flavor. My throat hurts when I eat melon seeds.
There are many operators in Spark, in ...
Added by Rincewind on Thu, 03 Feb 2022 13:40:07 +0200
elastic_ Getting started with search
Basic concepts
An index is similar to a table in a traditional relational database. It is a place to store relational documents Document type [removed after version 7.0] Document (doc)
A doc represents a piece of data in the index, like a record in the database table. Doc stores data in json format
es architecture design
Simple defini ...
Added by abie10 on Thu, 03 Feb 2022 05:10:31 +0200