Installation and introduction of hadoop

1 big data 1.1 big data concept big data, IT industry term refers to a collection of data that cannot be captured, managed and processed within a certain period of time with conventional software tools, It is a mass, high growth rate and diversified information asset that needs new processing mode t ...

Added by blackbeard on Thu, 16 Jan 2020 17:20:28 +0200

Yard error: Could not create the Java Virtual Machine

Error message when yarn is used: Error: Could not create the Java Virtual Machine. Error: A fatal exception has occurred. Program will exit. 1. Check the yarn version (successful) D:\me\angular\ng-app>yarn version Hadoop 2.8.3 Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r b3fe56402d908019d99af1f1f4fc65c ...

Added by tibo on Tue, 17 Dec 2019 20:44:40 +0200

Small problems of Scala multithreading in spark process

This time, we changed the source code of ThriftServer and added some services. In the middle of the change, we encountered such a problem. When we submitted tasks asynchronously, we wanted to make them multithreaded. At the beginning, we used scala's Actor, which passed sqlcontext and sql. We found that every sparkSessionId cha ...

Added by 88fingers on Wed, 11 Dec 2019 17:15:14 +0200

MapReduce practice handwritten WordCount case

Requirement: count the total number of occurrences of each word in a given stack of text files As shown in the figure below is the analysis chart of MapReduce statistical WordCount: The map stage reads the data from the file, the line number is the key, and the read value of each line is the value. Each key/value pair is ou ...

Added by jber on Tue, 10 Dec 2019 22:02:08 +0200

Hive later view and expand

explode(Official website link) Expand is a UDTF (table generation function) that converts a single input row to multiple output rows. Generally, it is used in combination with general view, mainly in two ways: Input type Usage method describe T explode(ARRAY<T> a) Decompose the array into multiple rows, return a single column a ...

Added by leony on Sun, 08 Dec 2019 13:56:09 +0200

Submit hadoop jobs using the old java api

Copyright notice: This is the original article of the blogger. It can't be reproduced without the permission of the blogger. https://blog.csdn.net/qq1010885678/article/details/43735491 Or use the previous word count example Custom Mapper class import java.io.IOException; import org.apache.h ...

Added by Loryman on Sun, 08 Dec 2019 09:36:31 +0200

Big data case: MapReduce's map end table merge (Distributedcache)

Code download address: https://github.com/tazhigang/big-data-github.git I. preliminary preparation Since this case is optimized on the basis of case 6, please refer to case 6 for requirements and data input and output; for the first time, you need to copy the pd.txt file in the root directory of the J disk of the local computer for reference. ...

Added by powerpants on Sat, 02 Nov 2019 05:21:35 +0200

X. namenode working mechanism of HDFS

[TOC] I. fsimage and edit files 1. Basic concepts txid: namenode gives a unique id for each operation event (add, delete, and change operation), which is called txid. Generally, txid is automatically increased from 0. For each additional operation, txid is automatically increased by 1. fsimage: It is a mirror file of the metadata of namenode i ...

Added by p3rk5 on Thu, 17 Oct 2019 00:57:57 +0300

hive-udf operation under Ieda

Code environment: Windows 10 + Idea19-01 + spring-boot 2.1.6 + JDK1.8 jar package running environment: centos virtual machine + Hadoop 3.1.1 + hive3.1.1 + JDK1.8 Create a new spring-boot project in idea, including the basic ones. This project contains only one web package, as follows: pom.xml The ...

Added by tonyw on Wed, 09 Oct 2019 00:54:29 +0300

centos7 installs hadoop pseudo-distributed learning environment

A hadoop pseudo-distributed environment is built using virtual machines to simulate a small-scale cluster for learning. Install a centos7 system in the virtual machine ip host name 192.168.158.30 hadoop.master 1. Installing the java environment I installed JDK 1.8 Installation method: https://blo ...

Added by weaselandalf on Sun, 06 Oct 2019 19:14:29 +0300