Used car price forecast -- task 1 understanding and baseline

preface This competition is a data analysis novice learner development competition organized by Tianchi data platform. The content of the competition is used car price prediction. The data is provided by Tianchi platform. When I first saw this topic, my first reaction was to use the linear regression method. Of course, this is the simp ...

Added by justin.nethers on Tue, 08 Mar 2022 15:56:41 +0200

Graphic big data | practical operation case Hive construction and application case

Author: [Han Xinzi]( https://github.com/HanXinzi-AI)@[ShowMeAI](http://www.showmeai.tech/ ) [tutorial address]( http://www.showmeai.tech/tutorials/84): http://www.showmeai.tech/tutorials/84 [address of this article]( http://www.showmeai.tech/article-detail/171): http://www.showmeai.tech/article-detail/171 Notice: All Rights Reserved. Plea ...

Added by Rupo on Tue, 08 Mar 2022 12:35:20 +0200

Linux disk storage management LVM logical volumes

Basic concepts of LVM logical volumes Logically add different hard disks or partitions to a unified volume groupThe VG volume group is equivalent to a large logical hard diskLV logical volume, equivalent to partition, takes out a certain space from the volume group The creation and management of logical volumes and hard disk partitions are si ...

Added by harkonenn on Tue, 08 Mar 2022 09:26:47 +0200

Build a virtual machine Linux cluster environment required for learning big data

On the right side of the page, there is a directory index, which can jump to the content you want to see according to the titleIf not on the right, look for the left This article is my study of Hadoop 3 1. X study notesVideo resource address: https://www.bilibili.com/video/BV1Qp4y1n7EN?p=34&spm_id_from=pageDriver 1, Three virtual machines ...

Added by factoring2117 on Mon, 07 Mar 2022 20:39:41 +0200

01Hadoop learning notes - Hadoop cluster construction

1 Introduction 1. What is it Hadoop is an open source software of apache for big data storage and computing. 2. What can I do a big data storage b distributed computing c (computer) resource scheduling 3. Features High performance, low cost, high efficiency and reliability. It is universal and simple to use 4. Version 4.1 open source communi ...

Added by j.bouwers on Mon, 07 Mar 2022 12:52:31 +0200

The 32nd day of learning big data - cycle and date

The 32nd day of learning big data - cycle and date for loop Format 1: for((i=1;i<=j;i++)) do Circulatory body done Format 2: for i in {start position... End position} # there are two points in the middle do Circulatory body done Format 3: for i in $(seq end position) do Circulatory body done For ...

Added by fatfrank on Sun, 06 Mar 2022 16:39:34 +0200

Wholestagecodegenexec in Spark (full code generation)

background In previous articles Analysis and solution of DataSourceScanExec NullPointerException caused by spark DPP , we directly skipped the step of dynamic code generation failure. This time, let's analyze that SQL is still in the article mentioned above. analysis After running the sql, we can see the following physical plan: We can see ...

Added by sgoldenb on Sat, 05 Mar 2022 12:43:09 +0200

How to reallocate data after adding nodes in Kafka cluster

This redistribution is implemented by adding partitions to the specified nodes and sharing the pressure. For details of this scheme, please refer to the connection: Pit avoidance Guide: scheme summary of rapid expansion of Kafka cluster_ Java theory and practice - CSDN blog Steps to add a node Connect the server of other nodes After copying t ...

Added by rodneykm on Sat, 05 Mar 2022 09:02:03 +0200

Small file processing topics

Small file processing topics I. MapReduce 1.1 problems caused by small data On HDFS, each file occupies 150byte (in memory) on the namenode. If there are too many small files, it will occupy a lot of namenode memory, and the speed of searching metadata will be very slowIn the process of processing MapReduce, each small file should be started ...

Added by zeus1 on Sat, 05 Mar 2022 06:27:32 +0200

Flink_ 08_ SQL (personal summary)

Statement: 1 ***               2. Because it is a personal summary, write the article with the most concise words               3. If there is any mistake or improper place, please point out FlinkSQL &am ...

Added by adrianl on Sat, 05 Mar 2022 04:36:46 +0200