2021-10-21 virtual box-based Hadoop Cluster Installation Configuration Tutorial
Reference to this document http://dblab.xmu.edu.cn/blog/2775-2/ The process of building a hadoop distributed cluster
Front
A pseudo-distributed hadoop system has been configured on a virtual machineOne virtual machine acts as master as namenode, and three virtual machines data1, 2, 3 (all with ubuntu system installed) acts as datanode
netwo ...
Added by php_user13 on Thu, 21 Oct 2021 20:39:33 +0300
Cloudera series uses data frames and Schemas
1, Create DataFrames from Data Sources
1. Data source for DataFrame
DataFrames reads data from the data source and writes data to the data sourceSpark SQL supports a wide range of data source types and formats
Text files
CSV, JSON, plain text Binary format files
Apache Parquet, Apache ORC, Apache Avro data format Tables
Hive ...
Added by ruach on Thu, 21 Oct 2021 17:47:06 +0300
HDFS basic operation
1, Viewing storage system information hdfs dfsadmin -report [-live] [-dead] [-decommissioning] Output the basic information and relevant data statistics of the file system
[root@master ~]# hdfs dfsadmin -report
Output the basic information and relevant data statistics of online nodes in the file system
[root@master ~]# hdfs dfsadmin -report ...
Added by elklabone on Wed, 20 Oct 2021 08:55:17 +0300
Experiment 3: familiar with common HBase operation
1, Experimental purpose
(1) Understand the role of HDFS in Hadoop architecture;
(2) Proficient in using HDFS to operate common Shell commands;
(3) Familiar with Java API s commonly used in HDFS operation.
2, Experimental platform
Operating system: Linux (CentOS recommended);Hadoop version: 3.2.2;HBase version: 2.3.6;JDK version: 1.7 or abo ...
Added by pido on Sat, 16 Oct 2021 10:08:47 +0300
[Hadoop] build a fully distributed cluster based on Docker
reference material: http://dblab.xmu.edu.cn/blog/1233/
Note: the experiment of this blog requires a Docker image with Hadoop cluster environment.
Operating environment
Ubuntu20.04
Hadoop3.3.1
JDK8
1. Open three containers with Docker
Node introduction of this test
Node nameeffectmasterMaster nodeslave1Secondary nodeslave2Secondary ...
Added by yendor on Wed, 13 Oct 2021 04:32:50 +0300
Hive environment building + reading es data to internal tables
Scenario:
The project needs function optimization. It needs to compare the same data. Which is more efficient to query from hive or es. Therefore, we need to synchronize all the data of an index in es to hdfs, and query hdfs data through hive to compare their efficiency.
Step 1: preliminary pre ...
Added by Naez on Wed, 13 Oct 2021 00:02:23 +0300
Enterprise Architecture Case for flume Learning
Advances in flume learning
Flume Transactions
The primary purpose is to ensure data consistency, either with success or with failure.
Transaction schematics
Flume Agent Internal Principles
To summarize: That is to say Source Collected in event Not directly to channel Instead, a ChannelProcessor,this processor Will let us event Goes ...
Added by Jax2 on Mon, 11 Oct 2021 19:39:19 +0300
Learning to use hadoop
Tip: after the article is written, the directory can be generated automatically. Please refer to the help document on the right for how to generate it
1, The role of hadoop?
What is hadoop?
Hadoop is an open source framework that can write and run distributed applications to process large-scale data. It is designed for offline and larg ...
Added by wittanthony on Tue, 05 Oct 2021 00:56:46 +0300
[Docker x Hadoop] use Docker to build Hadoop clusters (from scratch)
0. Background
See the online tutorials, which use virtual machines to clone multiple virtual machines to simulate clusters But on the real server, it was found that this method did not work At this time, I think of Docker. Imagine that Docker hasn't really fought since he finished learning. This is just a good opportunity
The implementa ...
Added by landavia on Sun, 03 Oct 2021 22:12:07 +0300
Introduction and usage of Apache Doris dynamic partition
1. Introduction
In some usage scenarios, the user will partition the table by day and perform routine tasks regularly every day. At this time, the user needs to manually manage the partition. Otherwise, the data import may fail because the user does not create a partition, which brings additional maintenance costs to the user.
Through the ...
Added by lost305 on Tue, 28 Sep 2021 08:33:08 +0300