How it works (22) how geotrellis calculates on Spark (C) calculation model

1. Introduction

At the end of the previous chapter, we finally generated the multibandtilelayerrdd [spatial key] object, which is all in preparation for the most important step - calculation

2. Calculate NDVI

First, let's review the calculation code:

val ndviTiledRDD: TileLayerRDD[SpatialKey] =
  tiledRDD.withContext { rdd =>
    rdd.mapValues { tile =>
      tile.convert(DoubleConstantNoDataCellType).combineDouble(0, 1) { (r, nir) =>
        (nir - r) / (nir + r)
      }
    }.reduceByKey(_.localMax(_)) // spark calculation optimization, which can be omitted
  }

We know that the algorithm of NDVI is (nir-r)/(nir+r), but how to apply the algorithm to Geotrellis? In other words, we need to know who the algorithm operates on

2.1 separate Tile

By reading Official documents , we can clarify two important information about spark parallel computing in Geotrellis:

Tile is the core of our fine operation
Metadata needs to be preserved in the calculation so that it does not lose spatial information

Therefore, our first task is to find Tile from the above results

In the previous step, we generated a MultibandTileLayerRDD[SpatialKey] object, that is, RDD[(SpatialKey, MultibandTile)] with Metadata[TileLayerMetadata[SpatialKey]]. To get the MultiBandTile, we need two steps:

Restore RDD[K,V] with Metadata to RDD[K,V]
Get the V we need from RDD[K,V], that is, MultibandTile. In this case, it is actually its subclass: ArrayMultiBandTile

At the same time, due to the requirement of retaining metadata mentioned in the document, the code in the paper is finally obtained:

val ndviTiledRDD: TileLayerRDD[SpatialKey] =
  tiledRDD.withContext { rdd: RDD[(SpatialKey, MultibandTile)] =>
    // Once you get the RDD, you can directly use Spark's mapValues method to extract Tile
    rdd.mapValues { tile: MultibandTile =>
      // Tile specific operation
    }
}

The main body of this part is the withContext method. The code is as follows:

[code in spark/Implicit.scala]

implicit class WithContextWrapper[K, V, M](val rdd: RDD[(K, V)] with Metadata[M]) {
    def withContext[K2, V2](f: RDD[(K, V)] => RDD[(K2, V2)]) =
      new ContextRDD(f(rdd), rdd.metadata)
}

It can be seen that it is only a relatively simple implicit conversion, which realizes the function of extracting RDD from TileLayerRDD type data for separate calculation, and finally repackaging it with the original metadata into a ContextRDD object

Finally, we get a familiar ContextRDD object, which can be called ContextRDD[SpatialKey, Tile, TileLayerMetadata[SpatialKey]], RDD[(SpatialKey, Tile)] with Metadata[TileLayerMetadata[SpatialKey]] or more convenient TileLayerRDD[SpatialKey]

2.2 Tile and computing core

Back to the core NDVI calculation code:

tile.convert(DoubleConstantNoDataCellType)
    .combineDouble(0, 1) { (r, nir) => (nir - r) / (nir + r) }

It can be seen that the code mainly has two steps:

Type conversion: convert
Merge bands as specified: combineDouble

In order, start with type conversion

2.2.1 type conversion

In this step, we convert the original data type to DoubleConstantNoDataCellType. Why do we do this conversion?

First of all, we cannot determine the type of original data. In this example, we need to ensure that the data type is floating point when calculating NDVI. We actually select Double type or float type according to actual needs
Secondly, we want to use the default value of Geotrellis type as the strategy to identify Nodata value, so we choose DoubleConstantNoDataCellType as the target type

The main body of this part is the Tile.convert method. The sorted code is as follows:

[code in ArrayMultibandTile.scala]

def convert(newCellType: CellType): MultibandTile = {
    // Create an array to hold the new data
    val newBands = Array.ofDim[Tile](bandCount)
    cfor(0)(_ < bandCount, _ + 1) { i =>
      // Process each band one by one
      newBands(i) = band(i).convert(newCellType)
    }
    ArrayMultibandTile(newBands)
}

def band(bandIndex: Int): Tile = {
    _bands(bandIndex)
}

// Passed in when defining ArrayMultibandTile
class ArrayMultibandTile(_bands: Array[Tile]) extends MultibandTile with MacroMultibandCombiners {
  val bandCount = _bands.size
  ...
}

It can be seen from the code that the core of type conversion is the convert operation on the band. We review the code defining ArrayMultibandTile, and we can see that a band is a [celltype]ArrayTile:

[code in GeoTiffMultibandTile.scala]

case class Chip(
  window: GridBounds[Int],
  bands: Array[MutableArrayTile],
  intersectingSegments: Int,
  var segmentsBurned: Int = 0
)
...
val bands = Array.fill(bandSubsetLength)(ArrayTile.empty(cellType, window.width, window.height))
val chip = Chip(window, bands, segments.length)

The core of the above code is the ArrayTile.empty method:

[code in ArrayTile.scala]

def empty(t: CellType, cols: Int, rows: Int): MutableArrayTile =
    t match {
      // Call different classes for initialization according to the actual situation
      case _: BitCells => BitArrayTile.empty(cols, rows)
      case ct: ByteCells => ByteArrayTile.empty(cols, rows, ct)
      ...
    }

As mentioned above, the core of type conversion is the convert operation on band. Now that we know how [celltype] arraytile (i.e. band) is generated, we can see how its convert is implemented:

[code in ArrayTile.scala]

def convert(targetCellType: CellType): ArrayTile = {
    // Creates an empty ArrayTile of the specified type
    val tile = ArrayTile.alloc(targetCellType, cols, rows)
    // Distinguish and assign the specified floating-point type
    if(!cellType.isFloatingPoint) {
      cfor(0)(_ < rows, _ + 1) { row =>
        cfor(0)(_ < cols, _ + 1) { col =>
          tile.set(col, row, get(col, row))
        }
      }
    } else {
      cfor(0)(_ < rows, _ + 1) { row =>
        cfor(0)(_ < cols, _ + 1) { col =>
          tile.setDouble(col, row, getDouble(col, row))
        }
      }
    }

tile
}

// Predefined methods, and the implementation of applyDouble is in specific subclasses
def getDouble(col: Int, row: Int) = applyDouble(row * cols + col)

// Creates an empty ArrayTile of the specified type
def alloc(t: CellType, cols: Int, rows: Int): MutableArrayTile =
    t match {
      case _: BitCells => BitArrayTile.ofDim(cols, rows)
      ...
      case ct: DoubleCells => DoubleArrayTile.ofDim(cols, rows, ct)
    }

The main body of the code is the ofDim method. Let's assume that the source data is of DoubleArrayTile type and see how it is implemented:

[code in DoubleArrayTile.scala]

// Judge Nodata policy according to specific type
def ofDim(cols: Int, rows: Int, cellType: DoubleCells with NoDataHandling): DoubleArrayTile =
cellType match {
  // Default policy
  case DoubleCellType =>
    new DoubleRawArrayTile(Array.ofDim[Double](cols * rows), cols, rows)
  // Use default Nodata value
  case DoubleConstantNoDataCellType =>
    new DoubleConstantNoDataArrayTile(Array.ofDim[Double](cols * rows), cols, rows)
  // Use user-defined Nodata values
  case udct: DoubleUserDefinedNoDataCellType =>
    new DoubleUserDefinedNoDataArrayTile(Array.ofDim[Double](cols * rows), cols, rows, udct)
}
// It corresponds to the applyDouble method under different policies

final case class DoubleConstantNoDataArrayTile(arr: Array[Double], val cols: Int, val rows: Int)
    extends DoubleArrayTile(arr, cols, rows) {
  ...
  // No special processing is required when the Nodata value is encountered
  def applyDouble(i: Int): Double = arr(i)
  ...
}

final case class DoubleUserDefinedNoDataArrayTile(arr: Array[Double], val cols: Int, val rows: Int, val cellType: DoubleUserDefinedNoDataCellType)
    extends DoubleArrayTile(arr, cols, rows)
       with UserDefinedDoubleNoDataConversions {
  ...
  // When it is a user predetermined value, it is processed as a Nodata value
  def applyDouble(i: Int): Double = udd2d(arr(i))
  ...
}

From the above code, we can see that the logic of type conversion is very simple: re value each band in the multi band image according to the data type specified by the user and put it into a new container. We can also see the structure of multi band image data storage:

Next, we can analyze the most critical task: merging bands in the specified way (in this case, NDVI algorithm)

2.2.2 combined bands

The main body of this part is the combine method:

[code in ArrayMultibandTile.scala]

def combineDouble(b0: Int, b1: Int)(f: (Double, Double) => Double): Tile = {
    val band1 = band(b0)
    val band2 = band(b1)
    // Result container
    val result = ArrayTile.empty(cellType, cols, rows)
    cfor(0)(_ < rows, _ + 1) { row =>
      cfor(0)(_ < cols, _ + 1) { col =>
        // Assignment after calculation
        result.setDouble(col, row, f(band1.getDouble(col, row), band2.getDouble(col, row)))
      }
    }
    result
}

It can be seen that the code logic of this part is very simple, but there is a deeper problem: can it meet our actual use needs?

Obviously, in practice, we may use multiple band fusion, and it is not enough to only support binding two bands. Just observing the code, we can find that Geotrellis provides a more general band merging method:

[code in ArrayMultibandTile.scala]

  def combineDouble(subset: Seq[Int])(f: Seq[Double] => Double): Tile = {
    subset.foreach({ b => require(0 <= b && b < bandCount, "All elements of subset must be present") })
    val subsetSize = subset.size
    val subsetArray = subset.toArray

    val result = ArrayTile.empty(cellType, cols, rows)
    val values: Array[Double] = Array.ofDim(subsetSize)
    // Traverse the specified band
    cfor(0)(_ < rows, _ + 1) { row =>
      cfor(0)(_ < cols, _ + 1) { col =>
        cfor(0)(_ < subsetSize, _ + 1) { i =>
          values(i) = _bands(subsetArray(i)).getDouble(col, row)
        }
        result.setDouble(col, row, f(values))
      }
    }
    result
  }

However, when we actually use Geotrellis, we can find that we can use the following methods for band merging:

tile.combineDouble(0, 1, 2, 3) { (b0, b1, b2, b3) =>  /* Specific operation */ }

There can be more bands actually supported. Obviously, this way of calling is more elegant. However, we can't find the definitions of these methods in the actual code. They come from the characteristics of ArrayMutibandTile

3. Characteristics of arraymultibandtile and MultibandTile

The inheritance structure of ArrayMultibandTile is as follows:

Of which:

Green is class inheritance
Red is the feature realization

However, we can find that the corresponding code files cannot be found for the three characteristics, because these codes are generated through text templates before compilation and defined in the project/Boilerplate.scala file. We don't need to know how the template works, we just need to know what the final generated content of the template is, and then analyze it in a conventional way

3.1 characteristics of macrocombinablemultibandtile and MacroCombineFunctions

First, let's take a look at the characteristics of the generated MacroCombinableMultibandTile:

[the code will be generated in geotrellis/macros/MacroCombinableMultibandTile.scala]

package geotrellis.macros
trait MacroCombinableMultibandTile[T] {
  def combineIntTileCombiner(combiner: IntTileCombiner3): T
  def combineDoubleTileCombiner(combiner: DoubleTileCombiner3): T
  // ... and so on
  def combineIntTileCombiner(combiner: IntTileCombiner10): T
  def combineDoubleTileCombiner(combiner: DoubleTileCombiner10): T
}

It can be seen that the MacroCombinableMultibandTile feature predefines 10 methods to receive TileCombiner types for Int and Double types respectively. It can also be seen that because it is a lot of repetitive work, it is a more convenient implementation method to generate code using templates

The [type]TileCombiner[arity] type in the code also comes from the template (taking Double type as an example, Int type is similar):

[the code will be generated in geotrellis/macros/DoubleTileCombiners.scala]

package geotrellis.macros
  trait DoubleTileCombiner3 {
      val b0: Int; val b1: Int; val b2: Int
      def apply(b0: Double, b1: Double, b2: Double): Double
  }
  // ... and so on
  trait DoubleTileCombiner10 {
      val b0: Int; val b1: Int; val b2: Int; val b3: Int; val b4: Int; val b5: Int; val b6: Int; val b7: Int; val b8: Int; val b9: Int
      def apply(b0: Double, b1: Double, b2: Double, b3: Double, b4: Double, b5: Double, b6: Double, b7: Double, b8: Double, b9: Double): Double
  }

Let's look at the characteristics of the generated MacroCombineFunctions:

[the code will be generated in geotrellis/macros/MacroCombinableMultibandTile.scala]

package geotrellis.macros
import scala.language.experimental.macros
trait MacroCombineFunctions[T, MBT <: MacroCombinableMultibandTile[T]] {
  def combine(b0: Int, b1: Int, b2: Int)(f: (Int, Int, Int) => Int): T =
    macro MultibandTileMacros.intCombine3_impl[T, MBT]
  def combineDouble(b0: Int, b1: Int, b2: Int)(f: (Double, Double, Double) => Double): T =
    macro MultibandTileMacros.doubleCombine3_impl[T, MBT]
  // ... and so on
  def combine(b0: Int, b1: Int, b2: Int, b3: Int, b4: Int, b5: Int, b6: Int, b7: Int, b8: Int, b9: Int)(f: (Int, Int, Int, Int, Int, Int, Int, Int, Int, Int) => Int): T =
    macro MultibandTileMacros.intCombine10_impl[T, MBT]
  def combineDouble(b0: Int, b1: Int, b2: Int, b3: Int, b4: Int, b5: Int, b6: Int, b7: Int, b8: Int, b9: Int)(f: (Double, Double, Double, Double, Double, Double, Double, Double, Double, Double) => Double): T =
    macro MultibandTileMacros.doubleCombine10_impl[T, MBT]
}

[type] combine [arity] in code_ The impl type still comes from the template:

[the code will be generated in geotrellis/macros/MultibandTileMacros.scala]

package geotrellis.macros
import spire.macros.InlineUtil
import scala.reflect.macros.whitebox.Context
import scala.language.experimental.macros
object MultibandTileMacros {
  def intCombine3_impl[T, MBT <: MacroCombinableMultibandTile[T]](c: Context)(b0: c.Expr[Int], b1: c.Expr[Int], b2: c.Expr[Int])(f: c.Expr[(Int, Int, Int) => Int]): c.Expr[T] = {
    import c.universe._
    val self = c.Expr[MacroCombinableMultibandTile[T]](c.prefix.tree)
    val tree =
    q"""$self.combineIntTileCombiner(new geotrellis.macros.IntTileCombiner3 {
       val b0 = $b0; val b1 = $b1; val b2 = $b2
       def apply(b0: Int, b1: Int, b2: Int): Int = $f(b0, b1, b2)
    })"""
    new InlineUtil[c.type](c).inlineAndReset[T](tree)
  }
  // ... and so on
  def doubleCombine10_impl[T, MBT <: MacroCombinableMultibandTile[T]](c: Context)(b0: c.Expr[Int], b1: c.Expr[Int], b2: c.Expr[Int], b3: c.Expr[Int], b4: c.Expr[Int], b5: c.Expr[Int], b6: c.Expr[Int], b7: c.Expr[Int], b8: c.Expr[Int], b9: c.Expr[Int])(f: c.Expr[(Double, Double, Double, Double, Double, Double, Double, Double, Double, Double) => Double]): c.Expr[T] = {
    import c.universe._
    val self = c.Expr[MacroCombinableMultibandTile[T]](c.prefix.tree)
    val tree =
    q"""$self.combineDoubleTileCombiner(new geotrellis.macros.DoubleTileCombiner10 {
       val b0 = $b0; val b1 = $b1; val b2 = $b2; val b3 = $b3; val b4 = $b4; val b5 = $b5; val b6 = $b6; val b7 = $b7; val b8 = $b8; val b9 = $b9
       def apply(b0: Double, b1: Double, b2: Double, b3: Double, b4: Double, b5: Double, b6: Double, b7: Double, b8: Double, b9: Double): Double = $f(b0, b1, b2, b3, b4, b5, b6, b7, b8, b9)
    })"""
    new InlineUtil[c.type](c).inlineAndReset[T](tree)
  }
}

At this point, we can also infer what these two traits do:

The macrocombine functions feature realizes such a function: it generates overloads of several combine / combineddouble methods through macros to support the fusion method from 3 band parameters to 10 band parameters
This macro is implemented as follows:
1. Restrict the context to a class that inherits MacroCombinableMultibandTile
2. Call the combineIntTileCombiner/combineDoubleTileCombiner method in this context and build its required parameters
3. Mount the method to the overload of the corresponding combine/combineDouble method through the macro
In the macro definition, you need to receive the combineIntTileCombiner/combineDoubleTileCombiner method of the specified parameter
- This method is defined in MacroCombinableMultibandTile
- The different parameters required are defined in inttilecombiners / doubletilecmbiners.scala

That is, MacroCombineFunctions defines behavior and MacroCombinableMultibandTile defines type. Therefore, these two characteristics must be inherited by the class at the same time

3.2 characteristics of macromultiband combiners

From the code of MacroCombinableMultibandTile and MacroCombineFunctions, it can be seen that classes that inherit these two characteristics only need to implement specific combineinttilecombiner / combineddoubletilecombiner methods to realize the overload of combineinttilecombiner / combineddouble methods

The MacroMultibandCombiners feature specifically implements the combineinttilecombiner / combineddoubletilecombiner method. Therefore, it enables ArryMultiBand to have a combine method that supports the integration of up to 10 bands at the same time:

[the code will be generated in geotrellis/raster/MacroMultibandCombiners.scala]

package geotrellis.raster
import geotrellis.macros._
import spire.syntax.cfor._
trait MacroMultibandCombiners { self: MultibandTile =>
  def combineIntTileCombiner(combiner: IntTileCombiner3): Tile = {
    val band0 = band(combiner.b0); val band1 = band(combiner.b1); val band2 = band(combiner.b2)
    val result = ArrayTile.empty(targetCellType, cols, rows)
    val arr = Array.ofDim[Int](bandCount)
    cfor(0)(_ < rows, _ + 1) { row =>
      cfor(0)(_ < cols, _ + 1) { col =>
        result.set(col, row, combiner(band0.get(col, row), band1.get(col, row), band2.get(col, row)))
      }
    }
    result
  }
  // ... and so on
  def combineDoubleTileCombiner(combiner: DoubleTileCombiner10): Tile = {
    val band0 = band(combiner.b0); val band1 = band(combiner.b1); val band2 = band(combiner.b2); val band3 = band(combiner.b3); val band4 = band(combiner.b4); val band5 = band(combiner.b5); val band6 = band(combiner.b6); val band7 = band(combiner.b7); val band8 = band(combiner.b8); val band9 = band(combiner.b9)
    val result = ArrayTile.empty(targetCellType, cols, rows)
    val arr = Array.ofDim[Int](bandCount)
    cfor(0)(_ < rows, _ + 1) { row =>
      cfor(0)(_ < cols, _ + 1) { col =>
        result.setDouble(col, row, combiner(band0.getDouble(col, row), band1.getDouble(col, row), band2.getDouble(col, row), band3.getDouble(col, row), band4.getDouble(col, row), band5.getDouble(col, row), band6.getDouble(col, row), band7.getDouble(col, row), band8.getDouble(col, row), band9.getDouble(col, row)))
      }
    }
    result
  }
}

It is not difficult to see that its principle is consistent with the dual band fusion mentioned above, except that [type]combiner[arity] is used instead of f(b0,b1)

Why do you need to define a new type to wrap the f(b0,b1...) method? In fact, it is very simple, because f(b0,b1...) is defined at run time, while macros are defined at compile time. A wrapper cannot be compiled without defining a wrapper specifically. Therefore, in order to realize this function, a wrapper object needs to be defined specifically

4. Summary

In this paper, we start with computing NDVI and discuss how Geotrellis realizes the basic computing function

In the following article, we will continue to analyze how Geotrellis implements more complex computing operations such as band convolution

Added by ccb on Sun, 05 Dec 2021 00:56:25 +0200

Programming VIP

How it works (22) how geotrellis calculates on Spark (C) calculation model

1. Introduction

2. Calculate NDVI

2.1 separate Tile

2.2 Tile and computing core

2.2.1 type conversion

2.2.2 combined bands

3. Characteristics of arraymultibandtile and MultibandTile

3.1 characteristics of macrocombinablemultibandtile and MacroCombineFunctions

3.2 characteristics of macromultiband combiners

4. Summary

Popular Keywords