Python big data processing library PySpark Practice II

Pyspark establishes Spark RDD Each RDD can be divided into multiple partitions. Each partition can be regarded as a data set fragment and can be saved to different nodes in the Spark clusterRDD itself has fault-tolerant mechanism and is a read-only data structure, which can only generate new RDD through transformation; An RDD can be proces ...

Added by pete07920 on Sun, 30 Jan 2022 16:23:19 +0200