Spark independent cluster (you can understand it) and how spark runs on Yan
Spark independent cluster (just understand), how spark runs on Yan
Cluster mode
Here is just a record of how Spark Standalone -- independent cluster mode is built
The standalone model is generally not applicable in the company, because the company generally has yarn and does not need to develop two resource management frameworks
So there is n ...
Added by PHPSpirit on Thu, 10 Mar 2022 13:35:11 +0200
Spark13: Spark Program Performance Optimization 01: high performance serialization class library, persistence or checkpoint, JVM garbage collection tuning, improving parallelism and data localization
1, Performance optimization analysis
The execution of a computing task mainly depends on CPU, memory and bandwidth.
Spark is a memory based computing engine, so for it, the biggest impact may be memory. Generally, our tasks encounter performance bottlenecks, and most of them are memory problems. Of course, CPU and bandwidth may also affect th ...
Added by matthewst on Wed, 09 Mar 2022 04:30:50 +0200
Graphic big data | Spark machine learning - workflow and Feature Engineering
Author: Han Xinzi@ShowMeAI Tutorial address: http://www.showmeai.tech/tutorials/84 Article address: http://www.showmeai.tech/article-detail/180 Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the source
1.Spark machine learning workflow
1) Spark mllib and ml
Spark also has MLlib/ML for big d ...
Added by nunomira on Tue, 08 Mar 2022 18:14:48 +0200
Graphic big data | comprehensive case - mining music album data using Spark analysis
Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/178Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroductionThis is one of the most widely used cases of video and audio data processing of HDFS, ...
Added by Spoiler on Tue, 08 Mar 2022 17:26:31 +0200
Illustrating big data covid-19 case analysis of new crown pneumonia epidemic data using spark
Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/176Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroduction2020, since covid-19 has changed the world and affects everyone's life, this case comb ...
Added by subwayman on Tue, 08 Mar 2022 16:24:06 +0200
Scala basic syntax
Since learning Spark requires Scala, here are some basic grammars of scala.
be careful: Scala doesn't need a semicolon at the end of a line
1 variable type
val is immutable. It must be initialized at the time of declaration, and it cannot be assigned again after initializationvar is variable. It needs to be initialized when declaring. After ...
Added by fxb9500 on Sun, 06 Mar 2022 10:19:54 +0200
Wholestagecodegenexec in Spark (full code generation)
background
In previous articles Analysis and solution of DataSourceScanExec NullPointerException caused by spark DPP , we directly skipped the step of dynamic code generation failure. This time, let's analyze that SQL is still in the article mentioned above.
analysis
After running the sql, we can see the following physical plan: We can see ...
Added by sgoldenb on Sat, 05 Mar 2022 12:43:09 +0200
Big data: visualization of Douban TV series crawler anti climbing agent IP, spark cleaning and flash framework
Full steps of Douban movie big data project
1. Douban reptile:
When I started to write Douban TV series crawler, I thought it was very simple, but in practice, there was an IP sealing situation, which led to my distress for a long time, and now I finally wrote it
No more nonsense, just go to the code:
The run function is ...
Added by gregor171 on Fri, 04 Mar 2022 14:51:48 +0200
Passenger express logistics big data project: initialize Spark flow computing program
catalogue
Initialize Spark streaming program
1, SparkSql parameter tuning settings
1. Set session time zone
2. Sets the maximum number of bytes a single partition can hold when reading a file
3. Set the threshold for merging small files
4. Sets the number of partitions to use when shuffling data with join or aggregate
5. Set the maximum ...
Added by ratcateme on Wed, 02 Mar 2022 22:37:24 +0200
Spark low level API RDD learning notes
What is RDDs
The full English name is Resilient Distributed Datasets, which translates elastic distributed datasets The Spark The Definitive Guide describes as follows: RDD represents an immutable, partitioned collection of records that can be operated on in parallel. In my personal understanding, rdd is a kind of distributed object collection ...
Added by greenhorn666 on Tue, 22 Feb 2022 14:25:12 +0200