Spark introduction and spark deployment, principle and development environment construction

Spark introduction and spark deployment, principle and development environment construction Introduction to spark Spark is a fast, universal and scalable big data analysis and calculation engine based on memory. It is a general memory parallel computing framework developed by the AMP Laboratory (Algorithms, Machines, and People Lab) at the U ...

Added by benreisner on Mon, 03 Jan 2022 22:14:19 +0200

Hadoop distributed file system (HDFS)

Hadoop distributed file system brief introduction HDFS (Hadoop distributed file system) is a core component of Hadoop and a distributed storage service Distributed file systems can span polymorphic computers. It has a wide application prospect in the era of big data. They provide the required expansion capability for storing and processing s ...

Added by ajaybuilder on Mon, 03 Jan 2022 16:43:34 +0200

58. Build Hadoop ha high availability for Ubuntu (start from scratch)

Environmental preparation numberhost nametypeuserIP1masterMaster noderoot192.168.231.2472slave1Slave noderoot192.168.231.2483slave2Slave noderoot192.168.231.249 Environment construction 1, Basic configuration 1. Install VMware tools Copy it to the desktop Note: Press' Enter 'when prompted, and enter ye ...

Added by jesirose on Mon, 03 Jan 2022 01:13:15 +0200

2, Build Hadoop cluster

1, Create template machine 1.1. Modify the IP settings in the configuration file vim /etc/sysconfig/network-scripts/ifcfg-ens33 #Modification: ONBOOT=yes BOOTPROTO=static IPADDR=192.168.150.211 NETMASK=255.255.255.0 GATEWAY=192.168.150.2 DNS1=192.168.150.2 1.2 modify the host name to hadoop01 vim /etc/hostname 1.3 restart network servic ...

Added by SoccerGloves on Fri, 31 Dec 2021 05:15:31 +0200

6 - click stream data analysis project - log collection to HDFS

6 - click stream data analysis project - log collection to HDFS reference resources: https://blog.csdn.net/tianjun2012/article/details/62424486 The basic information about logs has been introduced in the previous section. It will not be explained in detail here. Only the basic methods of generating logs and collecting logs are provided. ...

Added by ron8000 on Thu, 30 Dec 2021 07:23:26 +0200

Hive tuning idea - knowledge summary

Hive tuning: Choosing the appropriate "storage format" and "compression method" for the analyzed data can improve the analysis efficiency of hive Data compression format: When selecting a compression algorithm, you need to consider whether it can be divided, If segmentation is not supported (the integrity of a pi ...

Added by ZHarvey on Thu, 30 Dec 2021 02:06:19 +0200

4 - website log analysis cases - log data statistical analysis

4 - website log analysis cases - log data statistical analysis 1, Environment preparation and data import 1. Start hadoop If it is enabled in a virtual environment such as lsn, you need to perform formatting first hadoop namenode -format Start Hadoop start-dfs.sh start-yarn.sh Check to see if it starts jps 2. Import data Upload ...

Added by D_tunisia on Wed, 29 Dec 2021 17:51:55 +0200

[software engineering practice] Hive research - Blog13

[software engineering practice] Hive research - Blog13 2021SC@SDUSC Research content introduction I am responsible for converting the query block QB into a logical query plan (OP Tree) The following code is from apaceh-hive-3.1 2-Src / QL / SRC / Java / org / Apache / Hadoop / hive / QL / plan, which is my analysis object code. In Blog9-12, ...

Added by Tryfan on Wed, 29 Dec 2021 13:51:26 +0200

009 Optimization & new features & HA

1,Hadoop data compression compression algorithmOriginal file sizeCompressed file sizeCompression speedDecompression speedBring your ownsegmentationChange proceduregzip8.3GB1.8GB17.5MB/s58MB/syesnonobzip28.3GB1.1GB2.4MB/s9.5MB/syesyesnoLZO8.3GB2.9GB49.3MB/s74.6MB/snoyesyes Input compression: (Hadoop uses the file extension to determine whether ...

Added by prbrowne on Mon, 27 Dec 2021 20:14:25 +0200

Hadoop data compression

1, Overview 1) Advantages and disadvantages of compression Advantages of compression: to reduce disk IO and disk storage space. Disadvantages of compression: increase CPU overhead. 2) Compression principle (1) Operation intensive jobs use less compression (2) IO intensive Job, multi-purpose compression 2, MR supported compression coding 1 ...

Added by madhukar_garg on Mon, 27 Dec 2021 09:56:33 +0200