5, GO programming mode: MAP-REDUCE

In this article, we will learn about the three very important operations of map, Reduce and Filter in functional programming. These three operations can make us carry out some data processing very conveniently and flexibly - in most cases, our programs are used to invert data, especially for some business scenarios that need statistics, Map/Reduce/Filter is a very common way to play. Here are some examples:

Basic example

Map example

In the following program code, we write two Map functions, which require two parameters,

  • One is string array [] string, which indicates a string of data to be processed
  • The other is a function func(s string) string or func(s string) int
func MapStrToStr(arr []string, fn func(s string) string) []string {
    var newArray = []string{}
    for _, it := range arr {
        newArray = append(newArray, fn(it))
    }
    return newArray
}

func MapStrToInt(arr []string, fn func(s string) int) []int {
    var newArray = []int{}
    for _, it := range arr {
        newArray = append(newArray, fn(it))
    }
    return newArray
}

The entire Map function runs in a very similar way. The function body is traversing the array of the first parameter. Then, the function of second parameters is called, then the value is combined into another array to return.

So we can use these two functions like this:

var list = []string{"Hao", "Chen", "MegaEase"}

x := MapStrToStr(list, func(s string) string {
    return strings.ToUpper(s)
})
fmt.Printf("%v\n", x)
//["HAO", "CHEN", "MEGAEASE"]

y := MapStrToInt(list, func(s string) int {
    return len(s)
})
fmt.Printf("%v\n", y)
//[3, 4, 8]

We can see that we passed the function to the first MapStrToStr() to convert it to uppercase, so the resulting array becomes fully uppercase. We passed the length to MapStrToInt(), so the resulting array is the length of each string.

Let's take another look at the functions of Reduce and Filter.

Reduce example
func Reduce(arr []string, fn func(s string) int) int {
    sum := 0
    for _, it := range arr {
        sum += fn(it)
    }
    return sum
}

var list = []string{"Hao", "Chen", "MegaEase"}

x := Reduce(list, func(s string) int {
    return len(s)
})
fmt.Printf("%v\n", x)
// 15
Filter example
func Filter(arr []int, fn func(n int) bool) []int {
    var newArray = []int{}
    for _, it := range arr {
        if fn(it) {
            newArray = append(newArray, it)
        }
    }
    return newArray
}

var intset = []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
out := Filter(intset, func(n int) bool {
   return n%2 == 1
})
fmt.Printf("%v\n", out)

out = Filter(intset, func(n int) bool {
    return n > 5
})
fmt.Printf("%v\n", out)

The following figure is a metaphor that vividly illustrates the business semantics of map reduce, which is very useful in data processing.

Business example

Through the above examples, you may understand that Map/Reduce/Filter is just a kind of control logic, and the real business logic is defined by the data and function passed to them. Yes, this is a classic programming mode of separating and decoupling "business logic" and "control logic". Let's take a look at a code with business significance to strengthen your understanding of what is the separation of "control logic" and business logic.

Employee information

First, we have an employee object and some data

type Employee struct {
    Name     string
    Age      int
    Vacation int
    Salary   int
}

var list = []Employee{
    {"Hao", 44, 0, 8000},
    {"Bob", 34, 10, 5000},
    {"Alice", 23, 5, 9000},
    {"Jack", 26, 0, 4000},
    {"Tom", 48, 9, 7500},
    {"Marry", 29, 0, 6000},
    {"Mike", 32, 8, 4000},
}
Related Reduce/Fitler functions

Then, we have the following functions:

func EmployeeCountIf(list []Employee, fn func(e *Employee) bool) int {
    count := 0
    for i, _ := range list {
        if fn(&list[i]) {
            count += 1
        }
    }
    return count
}

func EmployeeFilterIn(list []Employee, fn func(e *Employee) bool) []Employee {
    var newList []Employee
    for i, _ := range list {
        if fn(&list[i]) {
            newList = append(newList, list[i])
        }
    }
    return newList
}

func EmployeeSumIf(list []Employee, fn func(e *Employee) int) int {
    var sum = 0
    for i, _ := range list {
        sum += fn(&list[i])
    }
    return sum
}

Briefly explain:

  • Employeecontutif and EmployeeSumIf are respectively used to calculate the number or total number of conditions. They are the semantics of Filter + Reduce.
  • EmployeeFilterIn is to filter according to certain conditions. It's the semantics of Fitler.
    Various customized statistical examples

So we can have the following code.

1) Count how many employees are over 40 years old

old := EmployeeCountIf(list, func(e *Employee) bool {
    return e.Age > 40
})
fmt.Printf("old people: %d\n", old)
//old people: 2

2) Count how many employees earn more than 6000

high_pay := EmployeeCountIf(list, func(e *Employee) bool {
    return e.Salary >= 6000
})
fmt.Printf("High Salary people: %d\n", high_pay)
//High Salary people: 4

3) List employees with or without leave

no_vacation := EmployeeFilterIn(list, func(e *Employee) bool {
    return e.Vacation == 0
})
fmt.Printf("People no vacation: %v\n", no_vacation)
//People no vacation: [{Hao 44 0 8000} {Jack 26 0 4000} {Marry 29 0 6000}]

4) Calculate the total salary of all employees

total_pay := EmployeeSumIf(list, func(e *Employee) int {
    return e.Salary
})

fmt.Printf("Total Salary: %d\n", total_pay)
//Total Salary: 43500

5) Calculate the total salary of employees under the age of 30

younger_pay := EmployeeSumIf(list, func(e *Employee) int {
    if e.Age < 30 {
        return e.Salary
    } 
    return 0
})

Generic map reduce

We can see that the above map reduce needs to write different versions of map reduce because of different types of data to be processed, although their codes look very similar. Therefore, generic programming will be brought out here. Go language does not support generics at the time of writing this article (Note: Russ Cox, technical director of go development team, confirmed that go generics (type parameter) will be implemented in go version 1.18, i.e. February 2022) in an e mail on go Lang dev on November 21, 2012).

Simple version of Generic Map

Therefore, at present, the generic type of Go language can only be completed by interface{} + reflect. Interface {} can be understood as void * in C, Object in Java, and reflect is the reflection mechanism package of Go, which is used to check the type at run time.

Let's take a look at how to write a very simple generic Map function without any type checking.

func Map(data interface{}, fn interface{}) []interface{} {
    vfn := reflect.ValueOf(fn)
    vdata := reflect.ValueOf(data)
    result := make([]interface{}, vdata.Len())

    for i := 0; i < vdata.Len(); i++ {
        result[i] = vfn.Call([]reflect.Value{vdata.Index(i)})[0].Interface()
    }
    return result
}

In the code above,

  • Through reflect Valueof() to obtain the value of interface {}, one of which is the data vdata and the other is the function vfn,
  • Then through VFN Call () method to call the function through [] refelct Value {Vdata. Index (I)} to get the data.
    The grammar of reflection in Go language is still a little puzzling, but you can still understand it by simply reading the manual. I don't talk about reflection in this article, so please use Google related tutorials for relevant basic knowledge.

So, we can have the following code -- different types of data can use the same logical Map() code.

square := func(x int) int {
  return x * x
}
nums := []int{1, 2, 3, 4}

squared_arr := Map(nums,square)
fmt.Println(squared_arr)
//[1 4 9 16]



upcase := func(s string) string {
  return strings.ToUpper(s)
}
strs := []string{"Hao", "Chen", "MegaEase"}
upstrs := Map(strs, upcase);
fmt.Println(upstrs)
//[HAO CHEN MEGAEASE]

But because reflection is a runtime matter, if something goes wrong with the type, there will be runtime errors. For example:

x := Map(5, 5)
fmt.Println(x)

The above code can be easily compiled, but there is a problem at runtime. It is still a panic error

panic: reflect: call of reflect.Value.Len on int Value

goroutine 1 [running]:
reflect.Value.Len(0x10b5240, 0x10eeb58, 0x82, 0x10716bc)
        /usr/local/Cellar/go/1.15.3/libexec/src/reflect/value.go:1162 +0x185
main.Map(0x10b5240, 0x10eeb58, 0x10b5240, 0x10eeb60, 0x1, 0x14, 0x0)
        /Users/chenhao/.../map.go:12 +0x16b
main.main()
        /Users/chenhao/.../map.go:42 +0x465
exit status 2
Robust version of Generic Map

Therefore, if we want to write a robust program, we need to do type checking ourselves for this "excessive genericity" with interface {}. The following is a Map code with type check:

func Transform(slice, function interface{}) interface{} {
  return transform(slice, function, false)
}

func TransformInPlace(slice, function interface{}) interface{} {
  return transform(slice, function, true)
}

func transform(slice, function interface{}, inPlace bool) interface{} {

  //check the <code data-enlighter-language="raw" class="EnlighterJSRAW">slice</code> type is Slice
  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("transform: not slice")
  }

  //check the function signature
  fn := reflect.ValueOf(function)
  elemType := sliceInType.Type().Elem()
  if !verifyFuncSignature(fn, elemType, nil) {
    panic("trasform: function must be of type func(" + sliceInType.Type().Elem().String() + ") outputElemType")
  }

  sliceOutType := sliceInType
  if !inPlace {
    sliceOutType = reflect.MakeSlice(reflect.SliceOf(fn.Type().Out(0)), sliceInType.Len(), sliceInType.Len())
  }
  for i := 0; i < sliceInType.Len(); i++ {
    sliceOutType.Index(i).Set(fn.Call([]reflect.Value{sliceInType.Index(i)})[0])
  }
  return sliceOutType.Interface()

}

func verifyFuncSignature(fn reflect.Value, types ...reflect.Type) bool {

  //Check it is a funciton
  if fn.Kind() != reflect.Func {
    return false
  }
  // NumIn() - returns a function type's input parameter count.
  // NumOut() - returns a function type's output parameter count.
  if (fn.Type().NumIn() != len(types)-1) || (fn.Type().NumOut() != 1) {
    return false
  }
  // In() - returns the type of a function type's i'th input parameter.
  for i := 0; i < len(types)-1; i++ {
    if fn.Type().In(i) != types[i] {
      return false
    }
  }
  // Out() - returns the type of a function type's i'th output parameter.
  outType := types[len(types)-1]
  if outType != nil && fn.Type().Out(0) != outType {
    return false
  }
  return true
}

The above code is complicated at once. It can be seen that complex code is where exceptions are handled. I'm not going to Walk through all the codes. Don't look at too many codes, but you can still understand them. Here are some key points in the codes:

  • The Map function is not used in the code. Because it conflicts with the data structure and key meaning, Transform is used, which comes from the naming in the C++ STL library.
  • There are two versions of the function, one is to return a new array – Transform(), and the other is to "complete in place" – TransformInPlace()
  • In the main function, use the Kind() method to check whether the data type is Slice and the function type is Func
  • Checking the parameters and return types of the function is done through verifyFuncSignature(), where:
    • NumIn() – used to check the "input parameter" of the function
    • NumOut() is used to check the "return value" of the function
  • If you need to generate a Slice, you will use reflect Makeslice().

Well, with the above code, our code can be used happily:

Can be used for string arrays

list := []string{"1", "2", "3", "4", "5", "6"}
result := Transform(list, func(a string) string{
    return a +a +a
})
//{"111","222","333","444","555","666"}

Can be used to shape arrays

list := []int{1, 2, 3, 4, 5, 6, 7, 8, 9}
TransformInPlace(list, func (a int) int {
  return a*3
})
//{3, 6, 9, 12, 15, 18, 21, 24, 27}

Can be used for structures

var list = []Employee{
    {"Hao", 44, 0, 8000},
    {"Bob", 34, 10, 5000},
    {"Alice", 23, 5, 9000},
    {"Jack", 26, 0, 4000},
    {"Tom", 48, 9, 7500},
}

result := TransformInPlace(list, func(e Employee) Employee {
    e.Salary += 1000
    e.Age += 1
    return e
})
Robust version of Generic Reduce

Similarly, the Reduce code of the generic version is as follows:

func Reduce(slice, pairFunc, zero interface{}) interface{} {
  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("reduce: wrong type, not slice")
  }

  len := sliceInType.Len()
  if len == 0 {
    return zero
  } else if len == 1 {
    return sliceInType.Index(0)
  }

  elemType := sliceInType.Type().Elem()
  fn := reflect.ValueOf(pairFunc)
  if !verifyFuncSignature(fn, elemType, elemType, elemType) {
    t := elemType.String()
    panic("reduce: function must be of type func(" + t + ", " + t + ") " + t)
  }

  var ins [2]reflect.Value
  ins[0] = sliceInType.Index(0)
  ins[1] = sliceInType.Index(1)
  out := fn.Call(ins[:])[0]

  for i := 2; i < len; i++ {
    ins[0] = out
    ins[1] = sliceInType.Index(i)
    out = fn.Call(ins[:])[0]
  }
  return out.Interface()
}
Robust version of Generic Filter

Similarly, the Filter code of the generic version is as follows (also divided into two versions of "local calculation":

func Filter(slice, function interface{}) interface{} {
  result, _ := filter(slice, function, false)
  return result
}

func FilterInPlace(slicePtr, function interface{}) {
  in := reflect.ValueOf(slicePtr)
  if in.Kind() != reflect.Ptr {
    panic("FilterInPlace: wrong type, " +
      "not a pointer to slice")
  }
  _, n := filter(in.Elem().Interface(), function, true)
  in.Elem().SetLen(n)
}

var boolType = reflect.ValueOf(true).Type()

func filter(slice, function interface{}, inPlace bool) (interface{}, int) {

  sliceInType := reflect.ValueOf(slice)
  if sliceInType.Kind() != reflect.Slice {
    panic("filter: wrong type, not a slice")
  }

  fn := reflect.ValueOf(function)
  elemType := sliceInType.Type().Elem()
  if !verifyFuncSignature(fn, elemType, boolType) {
    panic("filter: function must be of type func(" + elemType.String() + ") bool")
  }

  var which []int
  for i := 0; i < sliceInType.Len(); i++ {
    if fn.Call([]reflect.Value{sliceInType.Index(i)})[0].Bool() {
      which = append(which, i)
    }
  }

  out := sliceInType

  if !inPlace {
    out = reflect.MakeSlice(sliceInType.Type(), len(which), len(which))
  }
  for i := range which {
    out.Index(i).Set(sliceInType.Index(which[i]))
  }

  return out.Interface(), len(which)
}

Postscript

There are several outstanding matters:

1) Using reflection to do these things, there will be a problem, that is, the performance of the code will be very poor. Therefore, the above code cannot be used where you need high performance. How to solve this problem will be discussed in the next article in this series.

2) The above code refers to the version of Rob Pike. His code is in github.com/robpike/filter

3) Is it difficult For programmers to write in the official map / pike library all over the world? Do you want our official to write it For you? I wrote this code many years ago, but I've never used it once. I still like to use "For loop". I think you'd better use "For loop" with me.

Personally, I think Map/Reduce is still very useful in data processing. Rob Pike may not write much code of "business logic" at ordinary times, so he may not know how often business changes are

Of course, it's up to you to judge whether it's good or not, but learning more programming modes is helpful to yourself and is also very helpful.

(end of the full text) this article is not made by myself. It reprints the left ear mouse blog and its source Cool shell – CoolShell

Keywords: Go

Added by supermerc on Sun, 06 Feb 2022 20:32:39 +0200