Spark independent cluster (you can understand it) and how spark runs on Yan

Spark independent cluster (just understand), how spark runs on Yan Cluster mode Here is just a record of how Spark Standalone -- independent cluster mode is built The standalone model is generally not applicable in the company, because the company generally has yarn and does not need to develop two resource management frameworks So there is n ...

Added by PHPSpirit on Thu, 10 Mar 2022 13:35:11 +0200

Spark13: Spark Program Performance Optimization 01: high performance serialization class library, persistence or checkpoint, JVM garbage collection tuning, improving parallelism and data localization

1, Performance optimization analysis The execution of a computing task mainly depends on CPU, memory and bandwidth. Spark is a memory based computing engine, so for it, the biggest impact may be memory. Generally, our tasks encounter performance bottlenecks, and most of them are memory problems. Of course, CPU and bandwidth may also affect th ...

Added by matthewst on Wed, 09 Mar 2022 04:30:50 +0200

Graphic big data | Spark machine learning - workflow and Feature Engineering

Author: Han Xinzi@ShowMeAI Tutorial address: http://www.showmeai.tech/tutorials/84 Article address: http://www.showmeai.tech/article-detail/180 Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the source 1.Spark machine learning workflow 1) Spark mllib and ml Spark also has MLlib/ML for big d ...

Added by nunomira on Tue, 08 Mar 2022 18:14:48 +0200

Graphic big data | comprehensive case - mining music album data using Spark analysis

Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/178Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroductionThis is one of the most widely used cases of video and audio data processing of HDFS, ...

Added by Spoiler on Tue, 08 Mar 2022 17:26:31 +0200

Illustrating big data covid-19 case analysis of new crown pneumonia epidemic data using spark

Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/176Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroduction2020, since covid-19 has changed the world and affects everyone's life, this case comb ...

Added by subwayman on Tue, 08 Mar 2022 16:24:06 +0200

Scala basic syntax

Since learning Spark requires Scala, here are some basic grammars of scala. be careful: Scala doesn't need a semicolon at the end of a line 1 variable type val is immutable. It must be initialized at the time of declaration, and it cannot be assigned again after initializationvar is variable. It needs to be initialized when declaring. After ...

Added by fxb9500 on Sun, 06 Mar 2022 10:19:54 +0200

Wholestagecodegenexec in Spark (full code generation)

background In previous articles Analysis and solution of DataSourceScanExec NullPointerException caused by spark DPP , we directly skipped the step of dynamic code generation failure. This time, let's analyze that SQL is still in the article mentioned above. analysis After running the sql, we can see the following physical plan: We can see ...

Added by sgoldenb on Sat, 05 Mar 2022 12:43:09 +0200

Big data: visualization of Douban TV series crawler anti climbing agent IP, spark cleaning and flash framework

Full steps of Douban movie big data project 1. Douban reptile: When I started to write Douban TV series crawler, I thought it was very simple, but in practice, there was an IP sealing situation, which led to my distress for a long time, and now I finally wrote it No more nonsense, just go to the code: The run function is ...

Added by gregor171 on Fri, 04 Mar 2022 14:51:48 +0200

Passenger express logistics big data project: initialize Spark flow computing program

catalogue Initialize Spark streaming program 1, SparkSql parameter tuning settings 1. Set session time zone 2. Sets the maximum number of bytes a single partition can hold when reading a file 3. Set the threshold for merging small files 4. Sets the number of partitions to use when shuffling data with join or aggregate 5. Set the maximum ...

Added by ratcateme on Wed, 02 Mar 2022 22:37:24 +0200

Spark low level API RDD learning notes

What is RDDs The full English name is Resilient Distributed Datasets, which translates elastic distributed datasets The Spark The Definitive Guide describes as follows: RDD represents an immutable, partitioned collection of records that can be operated on in parallel. In my personal understanding, rdd is a kind of distributed object collection ...

Added by greenhorn666 on Tue, 22 Feb 2022 14:25:12 +0200