ElasticSearch related knowledge points

brief introduction An open source search engine based on Apache Lucene(TM). Due to the complexity of using Lucene, ElasticSearch aims to make full-text search simple through RESTful API. Basic concepts 1. Near real time NRT When full-text search is not true, there is usually a delay. Different search engines have a core search delay time. T ...

Added by [ArcanE] on Mon, 17 Jan 2022 18:54:32 +0200

Elasticsearch 7.1.1 cluster construction

1 prepare the installation environment 1.1 installing JDK elasticsearch 7.1.1 configuring java8, java11 1.2 change system resource configuration Modify / etc / sysctl Conf file, add VM. Conf at the end of the file max_ map_ count=262144 Note: after modification, execute sysctl -p, and load system parameters from the specified file. If ...

Added by Wabin on Mon, 17 Jan 2022 06:25:44 +0200

Elasticsearch: retrieve password - Password Recovery

If you have any questions about how to set up security for Elasticsearch cluster, please read my previous article“ Elasticsearch: set Elastic account security ”. Security is very important for an elastic search. Otherwise, our cluster is running naked. Before the following exercises, it is recommended to refer to the article“ ...

Added by Cagecrawler on Mon, 17 Jan 2022 04:27:01 +0200

Python beginner crawler - climb UIBE Academic Affairs Office (requests+bs4)

The most basic crawler -- Python requests+bs4 crawling UIBE Academic Affairs Office 1. Use tools 1.Python 3.x 2. Third party library requests,bs4 3. Browser 2. Specific ideas The website of UIBE academic affairs office is highly open and has no anti crawler measures. It only needs to use the most basic crawler means. Use the requests libr ...

Added by isam4m on Sun, 16 Jan 2022 19:02:27 +0200

MapReduce Performance Optimization -- data skew problem

Let's analyze a scenario: Suppose we have a file with 1000W pieces of data. The values in it are mainly numbers, 1,2,3,4,5,6,7,8,9,10. We want to count the number of occurrences of each number In fact, in private, we know the general situation of this data. Among the 1000w data, there are about 910w data with a value of 5, and there are only 9 ...

Added by .Stealth on Sun, 16 Jan 2022 04:23:03 +0200

Chapter 2 Hive installation

Chapter 2 Hive installation 2.1 hive installation address 1. Hive official website address http://hive.apache.org/ 2. Document viewing address https://cwiki.apache.org/confluence/display/Hive/GettingStarted 3. Download address http://archive.apache.org/dist/hive/ 4. github address https://github.com/apache/hive 2.2 Hive installation a ...

Added by weknowtheworld on Sun, 16 Jan 2022 01:17:49 +0200

hive sql calculates the total number and average age of all users and active users

The log is as follows. Please write the code to get the total number and average age of all users and active users. (active users refer to users who have access records for two consecutive days) Date user age 2019-02-11,test_1,23 2019-02-11,test_2,19 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-02-11,test_3,39 2019-02-11,test_1,23 2019-0 ...

Added by hinchcliffe on Fri, 14 Jan 2022 23:13:51 +0200

Spark performance optimization guide - train of thought

preface Spark job optimization is actually a general topic, because sometimes it is slow, but the solution is really different. I want to point out all aspects of optimization so that the system can formulate the overall optimization scheme. Sorting out optimization ideas How to treat the so-called slow problem? I made a sorting: themeresou ...

Added by jber on Fri, 14 Jan 2022 22:46:36 +0200

Minio is installed on the thinnest Docker in the whole network to fill the pit of the latest version (highly recommended Collection)

preface In enterprises, we usually store some pictures, videos, documents and other related data in object storage. Common object storage services include Alibaba cloud OSS object storage, FastDFS distributed file system and the company's private cloud platform, so as to facilitate data storage and rapid access. However, with the rapid d ...

Added by sunilj20 on Fri, 14 Jan 2022 22:38:55 +0200

Hadoop3.3.1 compilation, installation and deployment tutorial

preface    it's best to recompile the source code when building Hadoop, because some functions of Hadoop must coordinate Java class files and library files generated by Native code through JNT. To run Native code in linux system, first compile Native into [. so] file of target CPU architecture. Different processor architectures n ...

Added by roxki on Fri, 14 Jan 2022 13:21:03 +0200