Spark introduction and spark deployment, principle and development environment construction
Spark introduction and spark deployment, principle and development environment construction
Introduction to spark
Spark is a fast, universal and scalable big data analysis and calculation engine based on memory.
It is a general memory parallel computing framework developed by the AMP Laboratory (Algorithms, Machines, and People Lab) at the U ...
Added by benreisner on Mon, 03 Jan 2022 22:14:19 +0200
Hadoop distributed file system (HDFS)
Hadoop distributed file system
brief introduction
HDFS (Hadoop distributed file system) is a core component of Hadoop and a distributed storage service
Distributed file systems can span polymorphic computers. It has a wide application prospect in the era of big data. They provide the required expansion capability for storing and processing s ...
Added by ajaybuilder on Mon, 03 Jan 2022 16:43:34 +0200
58. Build Hadoop ha high availability for Ubuntu (start from scratch)
Environmental preparation
numberhost nametypeuserIP1masterMaster noderoot192.168.231.2472slave1Slave noderoot192.168.231.2483slave2Slave noderoot192.168.231.249
Environment construction
1, Basic configuration
1. Install VMware tools
Copy it to the desktop
Note: Press' Enter 'when prompted, and enter ye ...
Added by jesirose on Mon, 03 Jan 2022 01:13:15 +0200
2, Build Hadoop cluster
1, Create template machine
1.1. Modify the IP settings in the configuration file
vim /etc/sysconfig/network-scripts/ifcfg-ens33
#Modification:
ONBOOT=yes
BOOTPROTO=static
IPADDR=192.168.150.211
NETMASK=255.255.255.0
GATEWAY=192.168.150.2
DNS1=192.168.150.2
1.2 modify the host name to hadoop01
vim /etc/hostname
1.3 restart network servic ...
Added by SoccerGloves on Fri, 31 Dec 2021 05:15:31 +0200
6 - click stream data analysis project - log collection to HDFS
6 - click stream data analysis project - log collection to HDFS
reference resources: https://blog.csdn.net/tianjun2012/article/details/62424486
The basic information about logs has been introduced in the previous section. It will not be explained in detail here. Only the basic methods of generating logs and collecting logs are provided.
...
Added by ron8000 on Thu, 30 Dec 2021 07:23:26 +0200
Hive tuning idea - knowledge summary
Hive tuning:
Choosing the appropriate "storage format" and "compression method" for the analyzed data can improve the analysis efficiency of hive
Data compression format:
When selecting a compression algorithm, you need to consider whether it can be divided, If segmentation is not supported (the integrity of a pi ...
Added by ZHarvey on Thu, 30 Dec 2021 02:06:19 +0200
4 - website log analysis cases - log data statistical analysis
4 - website log analysis cases - log data statistical analysis
1, Environment preparation and data import
1. Start hadoop
If it is enabled in a virtual environment such as lsn, you need to perform formatting first
hadoop namenode -format
Start Hadoop
start-dfs.sh
start-yarn.sh
Check to see if it starts
jps
2. Import data
Upload ...
Added by D_tunisia on Wed, 29 Dec 2021 17:51:55 +0200
[software engineering practice] Hive research - Blog13
[software engineering practice] Hive research - Blog13
2021SC@SDUSC
Research content introduction
I am responsible for converting the query block QB into a logical query plan (OP Tree) The following code is from apaceh-hive-3.1 2-Src / QL / SRC / Java / org / Apache / Hadoop / hive / QL / plan, which is my analysis object code. In Blog9-12, ...
Added by Tryfan on Wed, 29 Dec 2021 13:51:26 +0200
009 Optimization & new features & HA
1,Hadoop data compression
compression algorithmOriginal file sizeCompressed file sizeCompression speedDecompression speedBring your ownsegmentationChange proceduregzip8.3GB1.8GB17.5MB/s58MB/syesnonobzip28.3GB1.1GB2.4MB/s9.5MB/syesyesnoLZO8.3GB2.9GB49.3MB/s74.6MB/snoyesyes
Input compression: (Hadoop uses the file extension to determine whether ...
Added by prbrowne on Mon, 27 Dec 2021 20:14:25 +0200
Hadoop data compression
1, Overview
1) Advantages and disadvantages of compression
Advantages of compression: to reduce disk IO and disk storage space. Disadvantages of compression: increase CPU overhead.
2) Compression principle
(1) Operation intensive jobs use less compression (2) IO intensive Job, multi-purpose compression
2, MR supported compression coding
1 ...
Added by madhukar_garg on Mon, 27 Dec 2021 09:56:33 +0200