Spark independent cluster (just understand), how spark runs on Yan
Here is just a record of how Spark Standalone -- independent cluster mode is built
The standalone model is generally not applicable in the company, because the company generally has yarn and does not need to develop two resource management frameworks
So there is n ...
Added by PHPSpirit on Thu, 10 Mar 2022 13:35:11 +0200
1, Performance optimization analysis
The execution of a computing task mainly depends on CPU, memory and bandwidth.
Spark is a memory based computing engine, so for it, the biggest impact may be memory. Generally, our tasks encounter performance bottlenecks, and most of them are memory problems. Of course, CPU and bandwidth may also affect th ...
Added by matthewst on Wed, 09 Mar 2022 04:30:50 +0200
Author: Han Xinzi@ShowMeAI Tutorial address: http://www.showmeai.tech/tutorials/84 Article address: http://www.showmeai.tech/article-detail/180 Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the source
1.Spark machine learning workflow
1) Spark mllib and ml
Spark also has MLlib/ML for big d ...
Added by nunomira on Tue, 08 Mar 2022 18:14:48 +0200
Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/178Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroductionThis is one of the most widely used cases of video and audio data processing of HDFS, ...
Added by Spoiler on Tue, 08 Mar 2022 17:26:31 +0200
Author: Han Xinzi@ShowMeAITutorial address: http://www.showmeai.tech/tutorials/84Article address: http://www.showmeai.tech/article-detail/176Notice: All Rights Reserved. Please contact the platform and the author for reprint and indicate the sourceintroduction2020, since covid-19 has changed the world and affects everyone's life, this case comb ...
Added by subwayman on Tue, 08 Mar 2022 16:24:06 +0200
Since learning Spark requires Scala, here are some basic grammars of scala.
be careful: Scala doesn't need a semicolon at the end of a line
1 variable type
val is immutable. It must be initialized at the time of declaration, and it cannot be assigned again after initializationvar is variable. It needs to be initialized when declaring. After ...
Added by fxb9500 on Sun, 06 Mar 2022 10:19:54 +0200
In previous articles Analysis and solution of DataSourceScanExec NullPointerException caused by spark DPP , we directly skipped the step of dynamic code generation failure. This time, let's analyze that SQL is still in the article mentioned above.
After running the sql, we can see the following physical plan: We can see ...
Added by sgoldenb on Sat, 05 Mar 2022 12:43:09 +0200
Full steps of Douban movie big data project
1. Douban reptile:
When I started to write Douban TV series crawler, I thought it was very simple, but in practice, there was an IP sealing situation, which led to my distress for a long time, and now I finally wrote it
No more nonsense, just go to the code:
The run function is ...
Added by gregor171 on Fri, 04 Mar 2022 14:51:48 +0200
Initialize Spark streaming program
1, SparkSql parameter tuning settings
1. Set session time zone
2. Sets the maximum number of bytes a single partition can hold when reading a file
3. Set the threshold for merging small files
4. Sets the number of partitions to use when shuffling data with join or aggregate
5. Set the maximum ...
Added by ratcateme on Wed, 02 Mar 2022 22:37:24 +0200
What is RDDs
The full English name is Resilient Distributed Datasets, which translates elastic distributed datasets The Spark The Definitive Guide describes as follows: RDD represents an immutable, partitioned collection of records that can be operated on in parallel. In my personal understanding, rdd is a kind of distributed object collection ...
Added by greenhorn666 on Tue, 22 Feb 2022 14:25:12 +0200