ElasticSearch related knowledge points
brief introduction
An open source search engine based on Apache Lucene(TM). Due to the complexity of using Lucene, ElasticSearch aims to make full-text search simple through RESTful API.
Basic concepts
1. Near real time NRT
When full-text search is not true, there is usually a delay. Different search engines have a core search delay time. T ...
Added by [ArcanE] on Mon, 17 Jan 2022 18:54:32 +0200
Elasticsearch 7.1.1 cluster construction
1 prepare the installation environment
1.1 installing JDK
elasticsearch 7.1.1 configuring java8, java11
1.2 change system resource configuration
Modify / etc / sysctl Conf file, add VM. Conf at the end of the file max_ map_ count=262144 Note: after modification, execute sysctl -p, and load system parameters from the specified file. If ...
Added by Wabin on Mon, 17 Jan 2022 06:25:44 +0200
Elasticsearch: retrieve password - Password Recovery
If you have any questions about how to set up security for Elasticsearch cluster, please read my previous article“ Elasticsearch: set Elastic account security ”. Security is very important for an elastic search. Otherwise, our cluster is running naked. Before the following exercises, it is recommended to refer to the article“ ...
Added by Cagecrawler on Mon, 17 Jan 2022 04:27:01 +0200
Python beginner crawler - climb UIBE Academic Affairs Office (requests+bs4)
The most basic crawler -- Python requests+bs4 crawling UIBE Academic Affairs Office
1. Use tools
1.Python 3.x
2. Third party library requests,bs4
3. Browser
2. Specific ideas
The website of UIBE academic affairs office is highly open and has no anti crawler measures. It only needs to use the most basic crawler means. Use the requests libr ...
Added by isam4m on Sun, 16 Jan 2022 19:02:27 +0200
MapReduce Performance Optimization -- data skew problem
Let's analyze a scenario: Suppose we have a file with 1000W pieces of data. The values in it are mainly numbers, 1,2,3,4,5,6,7,8,9,10. We want to count the number of occurrences of each number
In fact, in private, we know the general situation of this data. Among the 1000w data, there are about 910w data with a value of 5, and there are only 9 ...
Added by .Stealth on Sun, 16 Jan 2022 04:23:03 +0200
Chapter 2 Hive installation
Chapter 2 Hive installation
2.1 hive installation address
1. Hive official website address
http://hive.apache.org/
2. Document viewing address
https://cwiki.apache.org/confluence/display/Hive/GettingStarted
3. Download address
http://archive.apache.org/dist/hive/
4. github address
https://github.com/apache/hive
2.2 Hive installation a ...
Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200
hive sql calculates the total number and average age of all users and active users
The log is as follows. Please write the code to get the total number and average age of all users and active users. (active users refer to users who have access records for two consecutive days)
Date user age
2019-02-11,test_1,23
2019-02-11,test_2,19
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-02-11,test_3,39
2019-02-11,test_1,23
2019-0 ...
Added by hinchcliffe on Fri, 14 Jan 2022 23:13:51 +0200
Spark performance optimization guide - train of thought
preface
Spark job optimization is actually a general topic, because sometimes it is slow, but the solution is really different. I want to point out all aspects of optimization so that the system can formulate the overall optimization scheme.
Sorting out optimization ideas
How to treat the so-called slow problem? I made a sorting:
themeresou ...
Added by jber on Fri, 14 Jan 2022 22:46:36 +0200
Minio is installed on the thinnest Docker in the whole network to fill the pit of the latest version (highly recommended Collection)
preface
In enterprises, we usually store some pictures, videos, documents and other related data in object storage. Common object storage services include Alibaba cloud OSS object storage, FastDFS distributed file system and the company's private cloud platform, so as to facilitate data storage and rapid access. However, with the rapid d ...
Added by sunilj20 on Fri, 14 Jan 2022 22:38:55 +0200
Hadoop3.3.1 compilation, installation and deployment tutorial
preface
it's best to recompile the source code when building Hadoop, because some functions of Hadoop must coordinate Java class files and library files generated by Native code through JNT. To run Native code in linux system, first compile Native into [. so] file of target CPU architecture. Different processor architectures n ...
Added by roxki on Fri, 14 Jan 2022 13:21:03 +0200