DAGScheduler: A Running Process of spark Jobs
DAGScheduler--stage partition and creation and stage submission
In this article, I will start with the operation of a spark job and connect all the steps involved in the process of spark operation, including the division of DAG graph, the creation of task set, resource allocation, task serialization, task distribution to executor s, task execut ...
Added by tester2 on Sat, 01 Jun 2019 22:10:09 +0300
Hadoop 2.7.3 configures multiple namenodes (federation clusters) in the cluster
http://blog.csdn.net/wild46cat/article/details/53423472
Hadoop 2.7.3 configures multiple namenodes (federation clusters) in the cluster
First of all, configuring multiple namenodes in a cluster and using secondary Namenode in a cluster are two completely different things. I will write an official translation of haoop later, explaining the ...
Added by Grunge on Tue, 21 May 2019 21:37:27 +0300
The Optimizing Thought of Xiaobai's Deduction of HIVE Database
Xiao Bai used a relational database such as Oracle before, and summarized the knack of relational database optimization - see the explanation plan. Oracle is a mature product. Interpretation plans include many categories, real and virtual. By observing different kinds of interpreted plan data, we can grasp the vast majority of sql data from inp ...
Added by Mirkules on Sun, 19 May 2019 14:22:20 +0300
Big Data Tutorial (14.2) Website Data Analysis
The previous article introduced the business background of the website click stream data analysis project; this blogger will continue to share the relevant knowledge of website analysis.
I. Overall technical process and architecture
1.1. Data Processing Flow
This project is a pure data analysis project, and its overall process is basically b ...
Added by gonsman on Wed, 15 May 2019 19:12:18 +0300
Hadoop Installation and Configuration in Ubuntu
Tencent Yun ubuntu 16.04.1 LTS 64 bits
Linux operation
Modify the password of root
sudo passwd root
Log off users
logout
Close the firewall
ufw disable
Uninstall iptables components
apt-get remove iptables
Download vim components (for text editing)
apt-get install vim
Word change
sudo dpkg-reconfigure console-setu ...
Added by EXiT on Tue, 14 May 2019 16:44:32 +0300
The storage format of hit table; the use of ORC format
There are several types of source file storage formats for the hit table:
1,TEXTFILE
The default format is not specified when creating tables. When importing data, the data files will be copied directly to hdfs for processing. Source files can be viewed directly through Hadoop fs-cat
2. SEQUENCEFILE is a binary ...
Added by poppy28 on Sun, 12 May 2019 10:09:32 +0300