Used car price forecast -- task 1 understanding and baseline
preface
This competition is a data analysis novice learner development competition organized by Tianchi data platform. The content of the competition is used car price prediction. The data is provided by Tianchi platform. When I first saw this topic, my first reaction was to use the linear regression method. Of course, this is the simp ...
Added by justin.nethers on Tue, 08 Mar 2022 15:56:41 +0200
Graphic big data | practical operation case Hive construction and application case
Author: [Han Xinzi]( https://github.com/HanXinzi-AI)@[ShowMeAI](http://www.showmeai.tech/ )
[tutorial address]( http://www.showmeai.tech/tutorials/84): http://www.showmeai.tech/tutorials/84
[address of this article]( http://www.showmeai.tech/article-detail/171): http://www.showmeai.tech/article-detail/171
Notice: All Rights Reserved. Plea ...
Added by Rupo on Tue, 08 Mar 2022 12:35:20 +0200
Linux disk storage management LVM logical volumes
Basic concepts of LVM logical volumes
Logically add different hard disks or partitions to a unified volume groupThe VG volume group is equivalent to a large logical hard diskLV logical volume, equivalent to partition, takes out a certain space from the volume group
The creation and management of logical volumes and hard disk partitions are si ...
Added by harkonenn on Tue, 08 Mar 2022 09:26:47 +0200
Build a virtual machine Linux cluster environment required for learning big data
On the right side of the page, there is a directory index, which can jump to the content you want to see according to the titleIf not on the right, look for the left
This article is my study of Hadoop 3 1. X study notesVideo resource address: https://www.bilibili.com/video/BV1Qp4y1n7EN?p=34&spm_id_from=pageDriver
1, Three virtual machines ...
Added by factoring2117 on Mon, 07 Mar 2022 20:39:41 +0200
01Hadoop learning notes - Hadoop cluster construction
1 Introduction
1. What is it Hadoop is an open source software of apache for big data storage and computing. 2. What can I do a big data storage b distributed computing c (computer) resource scheduling 3. Features
High performance, low cost, high efficiency and reliability. It is universal and simple to use
4. Version 4.1 open source communi ...
Added by j.bouwers on Mon, 07 Mar 2022 12:52:31 +0200
The 32nd day of learning big data - cycle and date
The 32nd day of learning big data - cycle and date
for loop
Format 1:
for((i=1;i<=j;i++))
do
Circulatory body
done
Format 2:
for i in {start position... End position} # there are two points in the middle
do
Circulatory body
done
Format 3:
for i in $(seq end position)
do
Circulatory body
done
For ...
Added by fatfrank on Sun, 06 Mar 2022 16:39:34 +0200
Wholestagecodegenexec in Spark (full code generation)
background
In previous articles Analysis and solution of DataSourceScanExec NullPointerException caused by spark DPP , we directly skipped the step of dynamic code generation failure. This time, let's analyze that SQL is still in the article mentioned above.
analysis
After running the sql, we can see the following physical plan: We can see ...
Added by sgoldenb on Sat, 05 Mar 2022 12:43:09 +0200
How to reallocate data after adding nodes in Kafka cluster
This redistribution is implemented by adding partitions to the specified nodes and sharing the pressure. For details of this scheme, please refer to the connection: Pit avoidance Guide: scheme summary of rapid expansion of Kafka cluster_ Java theory and practice - CSDN blog
Steps to add a node
Connect the server of other nodes After copying t ...
Added by rodneykm on Sat, 05 Mar 2022 09:02:03 +0200
Small file processing topics
Small file processing topics
I. MapReduce
1.1 problems caused by small data
On HDFS, each file occupies 150byte (in memory) on the namenode. If there are too many small files, it will occupy a lot of namenode memory, and the speed of searching metadata will be very slowIn the process of processing MapReduce, each small file should be started ...
Added by zeus1 on Sat, 05 Mar 2022 06:27:32 +0200
Flink_ 08_ SQL (personal summary)
Statement: 1 *** 2. Because it is a personal summary, write the article with the most concise words 3. If there is any mistake or improper place, please point out
FlinkSQL &am ...
Added by adrianl on Sat, 05 Mar 2022 04:36:46 +0200