Spqrk implements TopN
Related Classes and Operators
textFile(*name*, *minPartitions=None*, *use_unicode=True*)
Added by jdiver on Sun, 07 Jun 2020 05:11:10 +0300
1. Project Background
Traditional data warehouse organization structure is designed for OLAP (Online Transaction Analysis) requirements of offline data. The common way to import data is to use sqoop or spark timer jobs to import business database data into warehouses one by one.With the increasing requirement of real-time in data analysis, hou ...
Added by robh76 on Sun, 19 Apr 2020 03:03:03 +0300
pyspark learning -- 2. pyspark's running method attempt and various sample code attempts
spark operation in the system: spark submit
Start spark task run
Streaming text processing streaming context
Stream text word count
Error reporting summary
Added by lemming_ie on Sat, 08 Feb 2020 10:38:57 +0200
Please note that this is just a MR training project. In practical application, do not use MR calculation friend recommendation and calculation program of directed acyclic graph logic. Because MR will need to write intermediate results to disk, disk IO greatly reduces efficiency.
Hadoop is a bit b ...
Added by Artiom on Mon, 27 Jan 2020 12:45:06 +0200
Spark on K8S (spark on kubernetes operator) environment construction and demo process (2)
Common problems in the process of Spark Demo (two)
How to persist logs in Spark's executor/driver
How to configure Spark history server to take effect
What does xxxxx webhook do under spark operator namespace
Added by diggysmalls on Fri, 17 Jan 2020 14:10:41 +0200
In the spark program, jackson is used to do json serialization and deserialization of scala objects. There are java.lang.NoClassDefFoundError and java.lang.AbstractMethodError errors at runtime. After searching the Internet, it is found that the version conflicts between jackson/guava and spark.
1. In idea, by adjusting the order of dependencie ...
Added by DocSeuss on Sun, 15 Dec 2019 17:06:15 +0200
1. Business requirements
Calculate a cell phone number (base station, dwell time), (current longitude, current latitude) by holding the cell phone number's dwell time log and base station information at each base station
The log information generated by connecting the mobile phone to the base station is similar to the following: ...
Added by compguru910 on Tue, 10 Dec 2019 21:16:10 +0200
Master registration mechanism
The previous article has analyzed the initialization process of the SparkContext, and finally sent the registration information of the RegisterApplication type to the Master
Now let's see how the Master responds after receiving these messages
First, the Master class inhe ...
Added by Muddy_Funster on Mon, 02 Dec 2019 11:41:42 +0200
Background: sometimes we need to define an external data source and use spark sql to process it. There are two benefits:
(1) after defining the external data source, it is very simple to use, and the software architecture is clear. It can be used directly through sql.
(2) it is easy to divide modules into layers and build them up layer by lay ...
Added by kinaski on Tue, 12 Nov 2019 21:13:18 +0200
The X-Pack Spark service provides Redis, Cassandra, MongoDB, HBase and RDS storage services with the ability of complex analysis, streaming processing, warehousing and machine learning through external computing resources, so as to better solve user data processing related scenario problems.
RDS & polardb sub table archiving to X-Pack S ...
Added by cyber_ghost on Thu, 07 Nov 2019 08:55:51 +0200