Pyspark learning -- 2. Try to run pyspark

pyspark learning -- 2. pyspark's running method attempt and various sample code attempts Operation method Pycharmrun spark operation in the system: spark submit Start spark task run Sample code Streaming text processing streaming context Stream text word count Error reporting summary Operati ...

Added by lemming_ie on Sat, 08 Feb 2020 10:38:57 +0200

mapreduce implementation of big data learning find common friends JobControl implementation of directed acyclic graph

Note: Please note that this is just a MR training project. In practical application, do not use MR calculation friend recommendation and calculation program of directed acyclic graph logic. Because MR will need to write intermediate results to disk, disk IO greatly reduces efficiency. Hadoop is a bit b ...

Added by Artiom on Mon, 27 Jan 2020 12:45:06 +0200

Spark on K8S (spark on kubernetes operator) FAQ

Spark on K8S (spark on kubernetes operator) environment construction and demo process (2) Common problems in the process of Spark Demo (two) How to persist logs in Spark's executor/driver How to configure Spark history server to take effect What does xxxxx webhook do under spark operator namespace ...

Added by diggysmalls on Fri, 17 Jan 2020 14:10:41 +0200

Resolution of jackson version conflict in spark application

In the spark program, jackson is used to do json serialization and deserialization of scala objects. There are java.lang.NoClassDefFoundError and java.lang.AbstractMethodError errors at runtime. After searching the Internet, it is found that the version conflicts between jackson/guava and spark. 1. In idea, by adjusting the order of dependencie ...

Added by DocSeuss on Sun, 15 Dec 2019 17:06:15 +0200

Spark obtains a case of a mobile phone number staying under a base station and the location of the current mobile phone

1. Business requirements Calculate a cell phone number (base station, dwell time), (current longitude, current latitude) by holding the cell phone number's dwell time log and base station information at each base station The log information generated by connecting the mobile phone to the base station is similar to the following: ...

Added by compguru910 on Tue, 10 Dec 2019 21:16:10 +0200

Spark source code analysis: Master registration mechanism

Master registration mechanism Application registration The previous article has analyzed the initialization process of the SparkContext, and finally sent the registration information of the RegisterApplication type to the Master Now let's see how the Master responds after receiving these messages First, the Master class inhe ...

Added by Muddy_Funster on Mon, 02 Dec 2019 11:41:42 +0200

Spark custom external data source

Background: sometimes we need to define an external data source and use spark sql to process it. There are two benefits: (1) after defining the external data source, it is very simple to use, and the software architecture is clear. It can be used directly through sql. (2) it is easy to divide modules into layers and build them up layer by lay ...

Added by kinaski on Tue, 12 Nov 2019 21:13:18 +0200

Best practice | RDS & POLARDB archiving to X-Pack Spark computing

The X-Pack Spark service provides Redis, Cassandra, MongoDB, HBase and RDS storage services with the ability of complex analysis, streaming processing, warehousing and machine learning through external computing resources, so as to better solve user data processing related scenario problems. RDS & polardb sub table archiving to X-Pack S ...

Added by cyber_ghost on Thu, 07 Nov 2019 08:55:51 +0200

About big file upload

About big file upload thinking Use js to read the file selected in the form form, calculate the md5 value of the file, upload the md5 value to the server, and check whether the file has been uploaded (similar to the second pass function) If the file has not been uploaded, cut it into 1MB blocks according to its size. If it is smaller than 1MB ...

Added by greenberry on Tue, 05 Nov 2019 22:27:51 +0200

A detailed explanation of Spark operator

Explanation of Spark (2) operator Article directory Explanation of Spark (2) operator I. wordcount II. Programming model III. use of RDD data sets and operators 1. Three necessary operators 2. Common operators (Cartesian, cogroup, join) 3. Sorting and aggregation calculation I. wordcount B ...

Added by Cragsterboy on Mon, 28 Oct 2019 08:52:55 +0200