DAGScheduler: A Running Process of spark Jobs

DAGScheduler--stage partition and creation and stage submission In this article, I will start with the operation of a spark job and connect all the steps involved in the process of spark operation, including the division of DAG graph, the creation of task set, resource allocation, task serialization, task distribution to executor s, task execut ...

Added by tester2 on Sat, 01 Jun 2019 22:10:09 +0300

Hadoop 2.7.3 configures multiple namenodes (federation clusters) in the cluster

http://blog.csdn.net/wild46cat/article/details/53423472 Hadoop 2.7.3 configures multiple namenodes (federation clusters) in the cluster First of all, configuring multiple namenodes in a cluster and using secondary Namenode in a cluster are two completely different things. I will write an official translation of haoop later, explaining the ...

Added by Grunge on Tue, 21 May 2019 21:37:27 +0300

The Optimizing Thought of Xiaobai's Deduction of HIVE Database

Xiao Bai used a relational database such as Oracle before, and summarized the knack of relational database optimization - see the explanation plan. Oracle is a mature product. Interpretation plans include many categories, real and virtual. By observing different kinds of interpreted plan data, we can grasp the vast majority of sql data from inp ...

Added by Mirkules on Sun, 19 May 2019 14:22:20 +0300

Big Data Tutorial (14.2) Website Data Analysis

The previous article introduced the business background of the website click stream data analysis project; this blogger will continue to share the relevant knowledge of website analysis. I. Overall technical process and architecture 1.1. Data Processing Flow This project is a pure data analysis project, and its overall process is basically b ...

Added by gonsman on Wed, 15 May 2019 19:12:18 +0300

Hadoop Installation and Configuration in Ubuntu

Tencent Yun ubuntu 16.04.1 LTS 64 bits Linux operation Modify the password of root sudo passwd root Log off users logout Close the firewall ufw disable Uninstall iptables components apt-get remove iptables Download vim components (for text editing) apt-get install vim Word change sudo dpkg-reconfigure console-setu ...

Added by EXiT on Tue, 14 May 2019 16:44:32 +0300

The storage format of hit table; the use of ORC format

There are several types of source file storage formats for the hit table: 1,TEXTFILE The default format is not specified when creating tables. When importing data, the data files will be copied directly to hdfs for processing. Source files can be viewed directly through Hadoop fs-cat 2. SEQUENCEFILE is a binary ...

Added by poppy28 on Sun, 12 May 2019 10:09:32 +0300