Hadoop principle and tuning

Hadoop principle 1. HDFS write process 1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission. 2.NameNode towards client Return whether you can upload or not, and return three items at the ...

Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200

Spark low level API RDD learning notes

What is RDDs The full English name is Resilient Distributed Datasets, which translates elastic distributed datasets The Spark The Definitive Guide describes as follows: RDD represents an immutable, partitioned collection of records that can be operated on in parallel. In my personal understanding, rdd is a kind of distributed object collection ...

Added by greenhorn666 on Tue, 22 Feb 2022 14:25:12 +0200

Oracle database operation and maintenance scheme and optimization

Oracle database operation and maintenance scheme and optimization Operation and maintenance optimization This paper explains in detail how to operate and maintain Oracle database, and explains how to operate and maintain it from all aspects. preface: In the last article, we talked about the performance optimization of Oracle database. In ...

Added by osram on Tue, 22 Feb 2022 00:04:58 +0200

hive small case - Comprehensive windowing function, judgment statement, date conversion, time mean calculation

What needs to be done: a full scale to calculate the average start and end time of the task in seven days 1, Introduction data The data table is a full synchronization table. The partition is based on the date. It contains the start time, end time, total seconds of start time (total seconds to the early morning), and total seconds of end ...

Added by quicknik on Mon, 21 Feb 2022 16:58:37 +0200

Sqoop shallow in and shallow out

Sqoop A tool for efficient data transmission between Hadoop and relational database Latest stable version 1.4.7 (Sqoop2 is not recommended for production) Graduated from Apache In essence, it is just a command-line tool In production, the import and export of data are basically completed by splicing the Sqoop command Bottom working mechanism: ...

Added by ursvmg on Mon, 21 Feb 2022 12:40:21 +0200

Summary of Hive built stepping pits under windows

preface: Hive is a data warehouse tool based on Hadoop, which operates Hadoop data warehouse (HDFS, etc.) with a kind of SQL HQL statement. Therefore, Hadoop needs to be built before installing local windows. The previous article has roughly introduced the environment construction and pit stepping summary, so here is still only the basic insta ...

Added by Ben Cleary on Mon, 21 Feb 2022 03:58:02 +0200

Run the latest version of ElasticSearch8 and Kibana8 on CentOS 7

background I've also built and experienced Elasticsearch 7 X's services and clusters, however, have been running in the Intranet environment at that time, and there is no configuration xpack related to its authentication. I remember the suggestion written at that time: because Elasticsearch does not enable the built-in security defense mechani ...

Added by daveyboy on Sun, 20 Feb 2022 18:19:51 +0200

GEE dataset: ERA5 daily summary - the latest climate reanalysis dataset produced by ECMWF / Copernicus Climate Change Service

1. Data set introduction ERA5 Daily Aggregates - Latest Climate Reanalysis Produced by ECMWF / Copernicus Climate Change Service ERA5 daily summary - latest climate reanalysis produced by ECMWF / Copernicus Climate Change Service ERA5 is the fifth generation ECMWF atmospheric reanalysis of global climate. Reanalysis combines model data wit ...

Added by buddhi225 on Sun, 20 Feb 2022 12:00:30 +0200

[Flink from getting started to mastering 01] DataStream API

In the previous article, we introduced the installation, deployment and basic concepts of Flink. Today, let's learn about DataStream API, one of the core of Flink. 01 distributed stream processing foundation In the figure above, we divide the whole code into three parts, namely, the basic model of distributed stream processing: SourceTrans ...

Added by c-o-d-e on Sun, 20 Feb 2022 10:32:54 +0200

Big data journey for beginners of strange upgrade < Java object-oriented advanced multithreading safety and wake-up mechanism >

Xiaobai's big data journey (27) Java object-oriented advanced multithreading security and wake-up mechanism Last review In the last issue, we learned the concept of multithreading and the basic use of multithreading. This chapter explains the remaining knowledge points of multithreading, thread safety and solution, and locking mechanism. Aft ...

Added by nick314 on Sun, 20 Feb 2022 09:58:57 +0200