Pyspark learning -- 2. Try to run pyspark
pyspark learning -- 2. pyspark's running method attempt and various sample code attempts
Operation method
Pycharmrun
spark operation in the system: spark submit
Start spark task run
Sample code
Streaming text processing streaming context
Stream text word count
Error reporting summary
Operati ...
Added by lemming_ie on Sat, 08 Feb 2020 10:38:57 +0200
mapreduce implementation of big data learning find common friends JobControl implementation of directed acyclic graph
Note:
Please note that this is just a MR training project. In practical application, do not use MR calculation friend recommendation and calculation program of directed acyclic graph logic. Because MR will need to write intermediate results to disk, disk IO greatly reduces efficiency.
Hadoop is a bit b ...
Added by Artiom on Mon, 27 Jan 2020 12:45:06 +0200
Spark on K8S (spark on kubernetes operator) FAQ
Spark on K8S (spark on kubernetes operator) environment construction and demo process (2)
Common problems in the process of Spark Demo (two)
How to persist logs in Spark's executor/driver
How to configure Spark history server to take effect
What does xxxxx webhook do under spark operator namespace
...
Added by diggysmalls on Fri, 17 Jan 2020 14:10:41 +0200
Resolution of jackson version conflict in spark application
In the spark program, jackson is used to do json serialization and deserialization of scala objects. There are java.lang.NoClassDefFoundError and java.lang.AbstractMethodError errors at runtime. After searching the Internet, it is found that the version conflicts between jackson/guava and spark.
1. In idea, by adjusting the order of dependencie ...
Added by DocSeuss on Sun, 15 Dec 2019 17:06:15 +0200
Spark obtains a case of a mobile phone number staying under a base station and the location of the current mobile phone
1. Business requirements
Calculate a cell phone number (base station, dwell time), (current longitude, current latitude) by holding the cell phone number's dwell time log and base station information at each base station
The log information generated by connecting the mobile phone to the base station is similar to the following: ...
Added by compguru910 on Tue, 10 Dec 2019 21:16:10 +0200
Spark source code analysis: Master registration mechanism
Master registration mechanism
Application registration
The previous article has analyzed the initialization process of the SparkContext, and finally sent the registration information of the RegisterApplication type to the Master
Now let's see how the Master responds after receiving these messages
First, the Master class inhe ...
Added by Muddy_Funster on Mon, 02 Dec 2019 11:41:42 +0200
Spark custom external data source
Background: sometimes we need to define an external data source and use spark sql to process it. There are two benefits:
(1) after defining the external data source, it is very simple to use, and the software architecture is clear. It can be used directly through sql.
(2) it is easy to divide modules into layers and build them up layer by lay ...
Added by kinaski on Tue, 12 Nov 2019 21:13:18 +0200
Best practice | RDS & POLARDB archiving to X-Pack Spark computing
The X-Pack Spark service provides Redis, Cassandra, MongoDB, HBase and RDS storage services with the ability of complex analysis, streaming processing, warehousing and machine learning through external computing resources, so as to better solve user data processing related scenario problems.
RDS & polardb sub table archiving to X-Pack S ...
Added by cyber_ghost on Thu, 07 Nov 2019 08:55:51 +0200
About big file upload
About big file upload
thinking
Use js to read the file selected in the form form, calculate the md5 value of the file, upload the md5 value to the server, and check whether the file has been uploaded (similar to the second pass function)
If the file has not been uploaded, cut it into 1MB blocks according to its size. If it is smaller than 1MB ...
Added by greenberry on Tue, 05 Nov 2019 22:27:51 +0200
A detailed explanation of Spark operator
Explanation of Spark (2) operator
Article directory
Explanation of Spark (2) operator
I. wordcount
II. Programming model
III. use of RDD data sets and operators
1. Three necessary operators
2. Common operators (Cartesian, cogroup, join)
3. Sorting and aggregation calculation
I. wordcount
B ...
Added by Cragsterboy on Mon, 28 Oct 2019 08:52:55 +0200