Sparkml interaction (feature interaction Cartesian)

Interaction (feature interaction Cartesian)

Description: Interaction is a Transformer.

It uses a vector or two value column and generates a single vector column that contains the product of all combinations of one value for each input column. For example, if you have two vector type columns, each with three dimensions as the input column, you get a 9-dimensional vector as the output column.

parameter information Parameter description Remarks Other
setInputCol String Feature to be transformed in DF, feature type must be: vector  
setOutputCol String The name of the transformed feature. The transformed type is: vector  

Program example:

def getDataFrame(sparkSession: SparkSession = this.getSparkSession()): DataFrame = {
     sparkSession.createDataFrame(Seq(
          (1, 1, 2, 3, 8, 4, 5),
          (2, 4, 3, 8, 7, 9, 8),
          (3, 6, 1, 9, 2, 3, 6),
          (4, 10, 8, 6, 9, 4, 5),
          (5, 9, 2, 7, 10, 7, 3),
          (6, 1, 1, 4, 2, 8, 4)
    )).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7")
}

def execute(dataFrame: DataFrame) = {
    //Data preprocessing
    val assembler1 = new VectorAssembler().setInputCols(Array("id2", "id3", "id4")).setOutputCol("vec1")
    val assembler2 = new VectorAssembler().setInputCols(Array("id5", "id6", "id7")).setOutputCol("vec2")
    val assembled = assembler2.transform(assembler1.transform(dataFrame))
    //Characteristic Cartesian product
    val interaction = new Interaction()
    .setInputCols(Array("id1", "vec1", "vec2"))
    .setOutputCol("interactedCol")
    //Transformation
    val interacted = interaction.transform(assembled)
    //show
    dataFrame.show()
    interacted.show(truncate = false)
    interacted.printSchema()
}

raw data:

+---+---+---+---+---+---+---+
|id1|id2|id3|id4|id5|id6|id7|
+---+---+---+---+---+---+---+
|  1|  1|  2|  3|  8|  4|  5|
|  2|  4|  3|  8|  7|  9|  8|
|  3|  6|  1|  9|  2|  3|  6|
|  4| 10|  8|  6|  9|  4|  5|
|  5|  9|  2|  7| 10|  7|  3|
|  6|  1|  1|  4|  2|  8|  4|
+---+---+---+---+---+---+---+

Data results:

+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
|id1|id2|id3|id4|id5|id6|id7|vec1          |vec2          |interactedCol                                         |
+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+
|1  |1  |2  |3  |8  |4  |5  |[1.0,2.0,3.0] |[8.0,4.0,5.0] |[8.0,4.0,5.0,16.0,8.0,10.0,24.0,12.0,15.0]            |
|2  |4  |3  |8  |7  |9  |8  |[4.0,3.0,8.0] |[7.0,9.0,8.0] |[56.0,72.0,64.0,42.0,54.0,48.0,112.0,144.0,128.0]     |
|3  |6  |1  |9  |2  |3  |6  |[6.0,1.0,9.0] |[2.0,3.0,6.0] |[36.0,54.0,108.0,6.0,9.0,18.0,54.0,81.0,162.0]        |
|4  |10 |8  |6  |9  |4  |5  |[10.0,8.0,6.0]|[9.0,4.0,5.0] |[360.0,160.0,200.0,288.0,128.0,160.0,216.0,96.0,120.0]|
|5  |9  |2  |7  |10 |7  |3  |[9.0,2.0,7.0] |[10.0,7.0,3.0]|[450.0,315.0,135.0,100.0,70.0,30.0,350.0,245.0,105.0] |
|6  |1  |1  |4  |2  |8  |4  |[1.0,1.0,4.0] |[2.0,8.0,4.0] |[12.0,48.0,24.0,12.0,48.0,24.0,48.0,192.0,96.0]       |
+---+---+---+---+---+---+---+--------------+--------------+------------------------------------------------------+

Practical application example:

No.

More blog recommendations:

      SparkML (2.1.0) Machine Learning Library Guide

      Link to this article: sparkml interaction (feature interaction Cartesian)

Added by bur147 on Fri, 15 Nov 2019 16:08:48 +0200