Hadoop environment configuration (Linux virtual machine)

Hadoop environment configuration (Linux virtual machine) This semester, I chose the course of big data management and analysis, which mainly uses Hadoop framework for data analysis and application development. First, I will configure the environment be careful It's better to put JDK and Hadoop under / usr/local When adding environment ...

Added by stargate03 on Mon, 28 Feb 2022 13:06:21 +0200

[CentOS] install HBase components

Preparation environment: Hadoop fully distributed cluster environmentHBase installation package: https://archive.apache.org/dist/hbase/ 1. Unzip HBase installation package Upload local installation package: Unzip and rename: Back to top 2. System environment variable configuration Configure the environment variables a ...

Added by ManOnScooter on Fri, 25 Feb 2022 12:06:45 +0200

Hadoop cluster entry configuration

Hadoop overview Hadoop composition HDFS Architecture Overview Hadoop Distributed File System (HDFS for short) is a distributed file system. NameNode (nn): stores the metadata of the file. Such as file name, file directory structure, file attributes (generation time, number of copies, file permissions), block list of each file, DataNo ...

Added by Irap on Thu, 24 Feb 2022 08:51:49 +0200

Hadoop in simple terms -- getting started

Hadoop learning 1.Hadoop overview Infrastructure of a distributed systemIt mainly solves the problems of massive data storage and distributed computing 1.1 three major releases of Hadoop The original version of Apache was released in 2006Cloudera integrates many big data frameworks internally, and the corresponding product is CDH releas ...

Added by birwin on Wed, 23 Feb 2022 18:37:40 +0200

Hadoop principle and tuning

Hadoop principle 1. HDFS write process 1.client adopt Distributed FileSystem Module direction NameNode Request to upload files, NameNode It will check whether the target file exists, whether the path is correct, and whether the user has permission. 2.NameNode towards client Return whether you can upload or not, and return three items at the ...

Added by kee1108 on Wed, 23 Feb 2022 10:12:13 +0200

Sqoop shallow in and shallow out

Sqoop A tool for efficient data transmission between Hadoop and relational database Latest stable version 1.4.7 (Sqoop2 is not recommended for production) Graduated from Apache In essence, it is just a command-line tool In production, the import and export of data are basically completed by splicing the Sqoop command Bottom working mechanism: ...

Added by ursvmg on Mon, 21 Feb 2022 12:40:21 +0200

Summary of Hive built stepping pits under windows

preface: Hive is a data warehouse tool based on Hadoop, which operates Hadoop data warehouse (HDFS, etc.) with a kind of SQL HQL statement. Therefore, Hadoop needs to be built before installing local windows. The previous article has roughly introduced the environment construction and pit stepping summary, so here is still only the basic insta ...

Added by Ben Cleary on Mon, 21 Feb 2022 03:58:02 +0200

Big data tool Hive (basic)

1, Definition of HIVE Hive is a data warehouse tool based on Hadoop, which can map structured data files into a data table, and can read, write and manage data files in a way similar to SQL. This Hive SQL is abbreviated as HQL. Hive's execution engines can be MR, Spark and Tez. essence The essence of Hive is to convert HQL into MapReduce task ...

Added by gwood_25 on Fri, 18 Feb 2022 20:55:12 +0200

hive tuning example analysis

hive distribute by group application tuning Group by fields in the table set hive.auto.convert.join=true; set hive.auto.convert.join.noconditionaltask=true; set hive.auto.convert.join.noconditionaltask.size=10000000; set hive.mapjoin.smalltable.filesize=200000000; set hive.merge.mapfiles = true; set hive.merge.mapredfiles = false; --MR Small ...

Added by jaydeee on Thu, 17 Feb 2022 20:24:42 +0200

Hadoop cluster construction (super detailed)

This article is a little long, I hope you guys will forgive me!!! Required installation packages: jdk-8u162-linux-x64.tar.gz (extraction code: 6k1i) hadoop-3.1.3.tar.gz (extraction code: 07p6) 1. Cluster planning Install VMware and use three Ubuntu 18 04 virtual machine cluster construction. The following is the planning of each virtual mach ...

Added by ronnie88 on Thu, 17 Feb 2022 18:55:29 +0200