2021-10-21 virtual box-based Hadoop Cluster Installation Configuration Tutorial

Reference to this document http://dblab.xmu.edu.cn/blog/2775-2/ The process of building a hadoop distributed cluster Front A pseudo-distributed hadoop system has been configured on a virtual machineOne virtual machine acts as master as namenode, and three virtual machines data1, 2, 3 (all with ubuntu system installed) acts as datanode netwo ...

Added by php_user13 on Thu, 21 Oct 2021 20:39:33 +0300

Cloudera series uses data frames and Schemas

1, Create DataFrames from Data Sources 1. Data source for DataFrame DataFrames reads data from the data source and writes data to the data sourceSpark SQL supports a wide range of data source types and formats Text files CSV, JSON, plain text Binary format files Apache Parquet, Apache ORC, Apache Avro data format Tables Hive ...

Added by ruach on Thu, 21 Oct 2021 17:47:06 +0300

HDFS basic operation

1, Viewing storage system information hdfs dfsadmin -report [-live] [-dead] [-decommissioning] Output the basic information and relevant data statistics of the file system [root@master ~]# hdfs dfsadmin -report Output the basic information and relevant data statistics of online nodes in the file system [root@master ~]# hdfs dfsadmin -report ...

Added by elklabone on Wed, 20 Oct 2021 08:55:17 +0300

Experiment 3: familiar with common HBase operation

1, Experimental purpose (1) Understand the role of HDFS in Hadoop architecture; (2) Proficient in using HDFS to operate common Shell commands; (3) Familiar with Java API s commonly used in HDFS operation. 2, Experimental platform Operating system: Linux (CentOS recommended);Hadoop version: 3.2.2;HBase version: 2.3.6;JDK version: 1.7 or abo ...

Added by pido on Sat, 16 Oct 2021 10:08:47 +0300

[Hadoop] build a fully distributed cluster based on Docker

reference material: http://dblab.xmu.edu.cn/blog/1233/ Note: the experiment of this blog requires a Docker image with Hadoop cluster environment. Operating environment Ubuntu20.04 Hadoop3.3.1 JDK8 1. Open three containers with Docker Node introduction of this test Node nameeffectmasterMaster nodeslave1Secondary nodeslave2Secondary ...

Added by yendor on Wed, 13 Oct 2021 04:32:50 +0300

Hive environment building + reading es data to internal tables

Scenario:          The project needs function optimization. It needs to compare the same data. Which is more efficient to query from hive or es. Therefore, we need to synchronize all the data of an index in es to hdfs, and query hdfs data through hive to compare their efficiency. Step 1: preliminary pre ...

Added by Naez on Wed, 13 Oct 2021 00:02:23 +0300

Enterprise Architecture Case for flume Learning

Advances in flume learning Flume Transactions The primary purpose is to ensure data consistency, either with success or with failure. Transaction schematics Flume Agent Internal Principles To summarize: That is to say Source Collected in event Not directly to channel Instead, a ChannelProcessor,this processor Will let us event Goes ...

Added by Jax2 on Mon, 11 Oct 2021 19:39:19 +0300

Learning to use hadoop

Tip: after the article is written, the directory can be generated automatically. Please refer to the help document on the right for how to generate it 1, The role of hadoop? What is hadoop? Hadoop is an open source framework that can write and run distributed applications to process large-scale data. It is designed for offline and larg ...

Added by wittanthony on Tue, 05 Oct 2021 00:56:46 +0300

[Docker x Hadoop] use Docker to build Hadoop clusters (from scratch)

0. Background See the online tutorials, which use virtual machines to clone multiple virtual machines to simulate clusters But on the real server, it was found that this method did not work At this time, I think of Docker. Imagine that Docker hasn't really fought since he finished learning. This is just a good opportunity The implementa ...

Added by landavia on Sun, 03 Oct 2021 22:12:07 +0300

Introduction and usage of Apache Doris dynamic partition

​ 1. Introduction In some usage scenarios, the user will partition the table by day and perform routine tasks regularly every day. At this time, the user needs to manually manage the partition. Otherwise, the data import may fail because the user does not create a partition, which brings additional maintenance costs to the user. Through the ...

Added by lost305 on Tue, 28 Sep 2021 08:33:08 +0300