Zookeeper client operation
1. Client command line operation
1.1 basic grammar
commandexplainhelpDisplay all operation commandsls pathView the child nodes of the current node (can listen)-w listen for changes in child nodes-s additional secondary informationcreateCreate normal node-s contains sequences-e temporary (restart or timeout disappears)get pathObtain the value ...
Added by prueba123a on Thu, 30 Dec 2021 11:04:02 +0200
Hive tuning idea - knowledge summary
Hive tuning:
Choosing the appropriate "storage format" and "compression method" for the analyzed data can improve the analysis efficiency of hive
Data compression format:
When selecting a compression algorithm, you need to consider whether it can be divided, If segmentation is not supported (the integrity of a pi ...
Added by ZHarvey on Thu, 30 Dec 2021 02:06:19 +0200
4 - website log analysis cases - log data statistical analysis
4 - website log analysis cases - log data statistical analysis
1, Environment preparation and data import
1. Start hadoop
If it is enabled in a virtual environment such as lsn, you need to perform formatting first
hadoop namenode -format
Start Hadoop
start-dfs.sh
start-yarn.sh
Check to see if it starts
jps
2. Import data
Upload ...
Added by D_tunisia on Wed, 29 Dec 2021 17:51:55 +0200
26 data analysis cases -- the second stop: Civil Aviation Customer Value Analysis Based on Hive
26 data analysis cases -- the second stop: Civil Aviation Customer Value Analysis Based on Hive
Environment required for experiment
• Python: Python 3.x; • Hadoop2.7.2 environment; • Hive2.2.0
Experimental background
People choose more and more travel modes, such as aircraft, high-speed rail, cars, ships, etc. in particular, aircraft ...
Added by abhic on Wed, 29 Dec 2021 16:51:45 +0200
Big data -- Introduction to Algorithms in Spark GraphX
1, ConnectedComponents algorithm
ConnectedComponents, that is, the connectome algorithm labels each connectome in the graph with id, and takes the id of the vertex with the smallest serial number in the connectome as the id of the connectome.
When the diagram is as follows:
//Create point
val vertexRDD: RDD[(VertexId, (String,Int)) ...
Added by nvee on Wed, 29 Dec 2021 05:09:54 +0200
Introduction to ElasticSearch and its deployment, principle and use
Introduction to ElasticSearch and its deployment, principle and use
Chapter 1: introduction to elastic search
Elasticsearch is a Lucene based search server. It provides a distributed multi-user full-text search engine based on RESTful web interface. Elasticsearch is developed in Java and released as an open source under the Apache license ter ...
Added by parijat_php on Tue, 28 Dec 2021 09:46:24 +0200
009 Optimization & new features & HA
1,Hadoop data compression
compression algorithmOriginal file sizeCompressed file sizeCompression speedDecompression speedBring your ownsegmentationChange proceduregzip8.3GB1.8GB17.5MB/s58MB/syesnonobzip28.3GB1.1GB2.4MB/s9.5MB/syesyesnoLZO8.3GB2.9GB49.3MB/s74.6MB/snoyesyes
Input compression: (Hadoop uses the file extension to determine whether ...
Added by prbrowne on Mon, 27 Dec 2021 20:14:25 +0200
Hadoop data compression
1, Overview
1) Advantages and disadvantages of compression
Advantages of compression: to reduce disk IO and disk storage space. Disadvantages of compression: increase CPU overhead.
2) Compression principle
(1) Operation intensive jobs use less compression (2) IO intensive Job, multi-purpose compression
2, MR supported compression coding
1 ...
Added by madhukar_garg on Mon, 27 Dec 2021 09:56:33 +0200
CDH6.2. The whole process of brainless construction and configuration (Beginner's version)
The software download link is at the bottom
thank: CSDN Daniel: Travel through IT bilibili Daniel: amoscloud2013
1. Preliminary preparation
Five 8G virtual machines are CDH1, cdh2, cdh3, CDH4 and cdh5 respectively. JDK is installed on all virtual machines
2. Modify IP and host name
Select CentOS 7 for cluster deployment. All three vir ...
Added by thefollower on Mon, 27 Dec 2021 05:46:50 +0200
Detailed explanation of Elasticsearch Template
In ES, we can set
Index Template and
Dynamic Template to better manage and set indexes and mapping for us.
1, Index Template
For example, we need to use es for log management. We all know that the amount of log data is very large. If a single index is used to save all log data, there may be some performance problems. We can automaticall ...
Added by TubeRev on Sun, 26 Dec 2021 23:24:49 +0200