Hadoop environment configuration (Linux virtual machine)
Hadoop environment configuration (Linux virtual machine)
This semester, I chose the course of big data management and analysis, which mainly uses Hadoop framework for data analysis and application development. First, I will configure the environment
be careful
It's better to put JDK and Hadoop under / usr/local When adding environment ...
Added by stargate03 on Mon, 28 Feb 2022 13:06:21 +0200
[CentOS] install HBase components
Preparation environment:
Hadoop fully distributed cluster environmentHBase installation package: https://archive.apache.org/dist/hbase/
1. Unzip HBase installation package
Upload local installation package:
Unzip and rename:
Back to top
2. System environment variable configuration
Configure the environment variables a ...
Added by ManOnScooter on Fri, 25 Feb 2022 12:06:45 +0200
Hadoop cluster entry configuration
Hadoop overview
Hadoop composition
HDFS Architecture Overview
Hadoop Distributed File System (HDFS for short) is a distributed file system.
NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...
Added by Irap on Thu, 24 Feb 2022 08:51:49 +0200
Hadoop in simple terms -- getting started
Hadoop learning
1.Hadoop overview
Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing
1.1 three major releases of Hadoop
The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...
Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200
Hadoop principle and tuning
Hadoop principle
1. HDFS write process
1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission.
2.NameNode towards client Return whether you can upload or not, and return three items at the ...
Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200
Sqoop shallow in and shallow out
Sqoop
A tool for efficient data transmission between Hadoop and relational database Latest stable version 1.4.7 (Sqoop2 is not recommended for production) Graduated from Apache
In essence, it is just a command-line tool In production, the import and export of data are basically completed by splicing the Sqoop command Bottom working mechanism: ...
Added by ursvmg on Mon, 21 Feb 2022 12:40:21 +0200
Summary of Hive built stepping pits under windows
preface:
Hive is a data warehouse tool based on Hadoop, which operates Hadoop data warehouse (HDFS, etc.) with a kind of SQL HQL statement. Therefore, Hadoop needs to be built before installing local windows. The previous article has roughly introduced the environment construction and pit stepping summary, so here is still only the basic insta ...
Added by Ben Cleary on Mon, 21 Feb 2022 03:58:02 +0200
Big data tool Hive (basic)
1, Definition of HIVE
Hive is a data warehouse tool based on Hadoop, which can map structured data files into a data table, and can read, write and manage data files in a way similar to SQL. This Hive SQL is abbreviated as HQL. Hive's execution engines can be MR, Spark and Tez.
essence The essence of Hive is to convert HQL into MapReduce task ...
Added by gwood_25 on Fri, 18 Feb 2022 20:55:12 +0200
hive tuning example analysis
hive distribute by group application tuning
Group by fields in the table
set hive.auto.convert.join=true;
set hive.auto.convert.join.noconditionaltask=true;
set hive.auto.convert.join.noconditionaltask.size=10000000;
set hive.mapjoin.smalltable.filesize=200000000;
set hive.merge.mapfiles = true;
set hive.merge.mapredfiles = false; --MR Small ...
Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200
Hadoop cluster construction (super detailed)
This article is a little long, I hope you guys will forgive me!!!
Required installation packages: jdk-8u162-linux-x64.tar.gz (extraction code: 6k1i) hadoop-3.1.3.tar.gz (extraction code: 07p6)
1. Cluster planning
Install VMware and use three Ubuntu 18 04 virtual machine cluster construction. The following is the planning of each virtual mach ...
Added by ronnie88 on Thu, 17 Feb 2022 18:55:29 +0200