Installation and introduction of hadoop
1 big data
1.1 big data concept
big data,
IT industry term refers to a collection of data that cannot be captured, managed and processed within a certain period of time with conventional software tools,
It is a mass, high growth rate and diversified information asset that needs new processing mode t ...
Added by blackbeard on Thu, 16 Jan 2020 17:20:28 +0200
Yard error: Could not create the Java Virtual Machine
Error message when yarn is used:
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
1. Check the yarn version (successful)
D:\me\angular\ng-app>yarn version
Hadoop 2.8.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65c ...
Added by tibo on Tue, 17 Dec 2019 20:44:40 +0200
Small problems of Scala multithreading in spark process
This time, we changed the source code of ThriftServer and added some services. In the middle of the change, we encountered such a problem. When we submitted tasks asynchronously, we wanted to make them multithreaded. At the beginning, we used scala's Actor, which passed sqlcontext and sql. We found that every sparkSessionId cha ...
Added by 88fingers on Wed, 11 Dec 2019 17:15:14 +0200
MapReduce practice handwritten WordCount case
Requirement: count the total number of occurrences of each word in a given stack of text files
As shown in the figure below is the analysis chart of MapReduce statistical WordCount:
The map stage reads the data from the file, the line number is the key, and the read value of each line is the value. Each key/value pair is ou ...
Added by jber on Tue, 10 Dec 2019 22:02:08 +0200
Hive later view and expand
explode(Official website link)
Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways:
Input type
Usage method
describe
T
explode(ARRAY<T> a)
Decompose the array into multiple rows, return a single column a ...
Added by leony on Sun, 08 Dec 2019 13:56:09 +0200
Submit hadoop jobs using the old java api
Copyright notice: This is the original article of the blogger. It can't be reproduced without the permission of the blogger. https://blog.csdn.net/qq1010885678/article/details/43735491
Or use the previous word count example
Custom Mapper class
import java.io.IOException;
import org.apache.h ...
Added by Loryman on Sun, 08 Dec 2019 09:36:31 +0200
Big data case: MapReduce's map end table merge (Distributedcache)
Code download address: https://github.com/tazhigang/big-data-github.git
I. preliminary preparation
Since this case is optimized on the basis of case 6, please refer to case 6 for requirements and data input and output; for the first time, you need to copy the pd.txt file in the root directory of the J disk of the local computer for reference.
...
Added by powerpants on Sat, 02 Nov 2019 05:21:35 +0200
X. namenode working mechanism of HDFS
[TOC]
I. fsimage and edit files
1. Basic concepts
txid: namenode gives a unique id for each operation event (add, delete, and change operation), which is called txid. Generally, txid is automatically increased from 0. For each additional operation, txid is automatically increased by 1.
fsimage: It is a mirror file of the metadata of namenode i ...
Added by p3rk5 on Thu, 17 Oct 2019 00:57:57 +0300
hive-udf operation under Ieda
Code environment: Windows 10 + Idea19-01 + spring-boot 2.1.6 + JDK1.8
jar package running environment: centos virtual machine + Hadoop 3.1.1 + hive3.1.1 + JDK1.8
Create a new spring-boot project in idea, including the basic ones. This project contains only one web package, as follows: pom.xml
The ...
Added by tonyw on Wed, 09 Oct 2019 00:54:29 +0300
centos7 installs hadoop pseudo-distributed learning environment
A hadoop pseudo-distributed environment is built using virtual machines to simulate a small-scale cluster for learning.
Install a centos7 system in the virtual machine
ip
host name
192.168.158.30
hadoop.master
1. Installing the java environment I installed JDK 1.8
Installation method: https://blo ...
Added by weaselandalf on Sun, 06 Oct 2019 19:14:29 +0300