Go Generics in functional style

30 May 2022 // Erik Lupander

I recently needed to group, aggregate and filter some form data from our yearly tech radar (2021 installment, Swedish). Instead of coping with an ever-increasing amount of frustration with Google Sheets, I decided to massage the data using Go 1.18 and the lodash-inspired lo library.

1. Introduction

The situation were as follows: We use Google Forms to snapshot what technologies and methodologies Callista consultants currently are using. While Google Forms has some basic charting capabilities, more advanced use of the collected form data needs to happen using some external tool. The most readily available tool is clicking the “Export to Google Sheets” button. Voila, all the data neatly arranged in a Google Sheets sheet!

However, I’m probably the worst Excel/Numbers/Sheets user in northern Europe. Apart from the most basic of use-cases, I tend to just mess up my data or produce ugly-looking charts with deficiencies such as bars in bar charts not being sorted… at all.

So either I typically have to manually copy-paste, re-order and perform other more-or-less occult rituals with the data, or I can use a general-purpose programming language such as Go to map/aggregate/reduce/sort data to my heart’s liking. The neatly arranged data can then be charted using some charting library, or even imported (as CSV etc) back into Sheets.

Read on for the Go-based solution, utilizing Go 1.18 generics and some functional-style coding - this little fellow was definitely happy about it!

gopher

Generated using https://gopherize.me

2. Solution

The data I wanted to massage was from two columns, where each row represented a respondents answer to “Main programming language” and “Satisfaction” (1-10) of using said programming language. The data looks like this:

Java,9
Javascript,6
Java,9
Typescript,6
Java,8
Prolog,9
Swift,7
Javascript,5
Go,8

The objective is to produce a table with language + the average satisfaction score, sorted in descending order by score, e.g:

| Language       | Score |
| Swift          | 9.6   |
| Prolog         | 6.8   |
| Cobol          | 5.133 |
| VBScript       | 1.5   |

(just kidding, no one uses Cobol or VBScript at our company… I hope.)

So, armed with my IDE of choice, a fresh Go 1.18.2 installation and the latest version of the lo library that offer lodash-like functionality for working with slices and maps in a functional style including generics support, I set out on a fun little foray into not-so-Go-like Go-code!

To keep this blog post simple, let’s simplify the data procurement part and assume we obtain the data in a .csv-formatted string:

data := `
Java,9
Javascript,6
Java,9
Java,8
Typescript,8
Kotlin,9` // truncated for brevity

2.1 Parse the CSV

This step does not involve anything new or fancy, just stuff the data into a csv.Reader:

reader := csv.NewReader(bytes.NewBuffer([]byte(data)))
records, err := reader.ReadAll()
if err != nil {
    panic(err.Error())
}

This step produces a [][]string matrix where each “row” represents the respondent’s main language and the corresponding satisfaction score using that language.

2.2 Old-school implementation

In all honesty - while experimenting with functional-style libraries and generics is great fun, it took me quite some time to come up with something I was at least semi-happy about. So, for refence, I started out by implementing a solution using plain-ol’ Go:

// Result keeps track of number of respondents and accumulated score
// per programming language, with storage for the average.
type Result struct {
    Name  string
    Count int // number of respondents for this particular language
    Sum   int // the total "score" sum
    Avg   float64
}

// Simple map of programming language => Result
resultsMap := make(map[string]Result)

// Iterate over the [][]string (records)
for _, row := range records {

    // key == language, val == score
    key := row[0]
    val, _ := strconv.Atoi(row[1])

    // Check if present in map or not. If present, update.
    sum, found := resultsMap[key]
    if !found {
        resultsMap[key] = Result{key, 1, val, 0.0}
    } else {
        resultsMap[key] = Result{key, sum.Count + 1, sum.Sum + val, 0.0}
    }
}

// Prepare a slice for the results, so we can sort it
finalResults := make([]Result, 0)

// Iterate over the map and calculate the Avg
// using the stored sum and counts.
for _, v := range resultsMap {
    v.Avg = float64(v.Sum) / float64(v.Count)
    finalResults = append(finalResults, v)
}

// Sort by average, descending
sort.Slice(finalResults, func(i, j int) bool {
    return finalResults[i].Avg > finalResults[j].Avg
})

// Print the results using some Printf wizardry
for _, v := range finalResults {
    fmt.Printf("| %-12s | %-12.2f|\n", v.Name, v.Avg)
}

Output:

| Scala        | 10.00       |
| Kotlin       | 9.00        |
| Swift        | 9.00        |
| Go           | 8.50        |
| Typescript   | 8.25        |
| Java         | 7.25        |
| Javascript   | 6.67        |

Well. The solution above does its job fine, and it only took me a few minutes to hack it together. However, non-Gophers may find the code somewhat ugly and un-concise with manual updating of the map when processing the CSV data, having to manually copy values from the map to a slice etc.

2.3 Functional and generics-style approach

Using a functional-style approach with lo and generics requires a somewhat stuttering approach given that the lo API doesn’t allow for fluent-style coding. I.e. where many similar libraries may allow a style such as:

// Psuedo-code!!!
var result = Stream.from(data).
  GroupBy(...).
  Filter(...).
  Reduce(...)

lo is more about running one function on your data at a time and storing the result in a new variable (or possibly inlining). We start by using Map to process the CSV [][]string:

// Use Map to convert from [][]string to 
// lo.Tuple2{A: "name": B: score}
tupleList := lo.Map(records, func(row []string, _ int) lo.Tuple2[string, int] {
    val, _ := strconv.Atoi(row[1])
    return lo.T2(row[0], val)
})

tupleList is now of type []lo.Tuple2[string,int]. Notice the use of generics here, where the “output” type of the Map operation is the lo.Tuple2 type that takes two type arguments within the square brackets, i.e. lo.Tuple2[string, int] in our case. Type-safe tuples, yay!

The example above also makes heavy use of type inference. The code above can be written without type inference in the following manner which is quite verbose:

tupleList := lo.Map[[]string, lo.Tuple2[string, int]](records, func(row []string, _ int) lo.Tuple2[string, int] {
    val, _ := strconv.Atoi(row[1])
    return lo.Tuple2[string, int]{A: row[0], B: val}
})

See how lo.Map now takes the “in” and “out” type parameters. The return statement also is more verbose, using struct initialization where the type parameters are required. The first example instead uses a lo.T2(..) helper function that lets the types be inferred by the compiler since the compiler knows what types are being passed to the helper func.

Next, we group the tupleList slice by its string part into a map[string][]lo.Tuple[string,int]] where the map key is the name of the programming language, and its value is a slice of lo.Tuple2 with all the entries for the given key.

groupedTuples := lo.GroupBy(tupleList, func(in lo.Tuple2[string, int]) string {
    return in.A
})

The next step is to perform reduce operation to sum the “score” for each language and then calculate the average, returning the result as a new slice of lo.Tuple2[string,float64]:

// MapValues takes the map[string][]lo.Tuple2 and an inlined function that executes the mapping
// per value in the map.
langWithAvgList := lo.MapValues(groupedTuples, func(items []lo.Tuple2[string, int], langName string) lo.Tuple2[string, float64] {
    // Reduce iterates over items, summing the score
    sum := lo.Reduce(items, func(agg int, item lo.Tuple2[string, int], count int) int {
        return agg + item.B
    }, 0)
    
    // The average score is calculated by dividing the sum by the number of items. The type conversions to float64 could be prettier...
    avg := float64(sum) / float64(len(items))
	
    // Each MapValues iteration yields a new lo.Tuple2, with the langName + avg score.
    return lo.T2(langName, avg)
})

I think it’s pretty neat! Go lambda / inline function expressions could definitely use a short-hand syntax with even more type inference, such as this experiment posted by Robert Grisemer, one of the core members of the Go team. Nevertheless, for people coming from a more functional-style programming language, using this Map/Reduce-esque style may offer more readability than plain old Go with for statements.

The final step is to extract the values from the map[string]lo.Tuple2[string,float64], sort and finally print. lo does not contain any sort functionality, but the Go 1.18 stdlib’s exp package comes to our rescue with the a brand-new generic slices package:

// Map values into slice.
values := lo.Values(langWithAvgList)

// Sort slice by score, descending. Modifies in-place, not very functional.
slices.SortFunc(values, func(a, b lo.Tuple2[string, float64]) bool {
    return a.B > b.B
})

// Use ForEach and some Printf tricks to output an ASCII table!
lo.ForEach(values, func(t lo.Tuple2[string, float64], _ int) {
    fmt.Printf("| %-12s | %-12.2f|\n", t.A, t.B)
})

The final output is as follows:

| Scala        | 10.00       |
| Kotlin       | 9.00        |
| Swift        | 9.00        |
| Go           | 8.50        |
| Typescript   | 8.25        |
| Java         | 7.25        |
| Javascript   | 6.67        |

That’s all! A functional-style aggregation using Go generics. Now I can take the data above and create a chart without unnecessary use of explicit language.

3. Conclusion

As far as programming tasks go, this little problem is a very simple one. I’m certain an Excel guru could have solved it just as easily as I did using plain Go. SQL should have been a breeze as well.

As for these programmatic solutions - which is the “best” way? The functional-style-with-generics or the idiomatic Go way of for loops and map use? I guess many hardcore gophers would do some variant of the old-school solution, probably with some more cleverness added. On the other hand, the functional-style approach may appeal to some due to its more declarative nature. After all - Map, GroupBy, MapValues, Reduce, MapValues, ForEach etc. are more-or-less universally well-known operations on “collections of values” in many programming languages and libraries.

What about generics? Well - I really love the type-safe lo.Tuple[A,B] type. I wrote this code in VSCode which does a pretty good job offering advice on where to put required type arguments - and also helps removing unnecessary type arguments once full function calls or uses have been written - i.e. the compiler only knows it can infer type parameters once the full context is known. I guess one will learn how to write linter-friendly generic code quite quickly. Go generics has some well-known limitations (see “The current generics implementation has the following known limitations:” here), some of which makes it slightly more difficult to produce some API styles such as Fluent DSL:s. For the particular use-case of functional-style Map/Reduce code, I think the lack of short-hand inline functions I touched on earlier may be an even bigger issue. While Go’s explicitness, orthogonality and defensive use of type-inference is seen by many as one of its key strengths, I do think that having to write full func(..){} often reduces readability over, for example, arrow-style declarations ( (a,b) => a+b ) seen in many other languages.

Thanks for reading! Perhaps I’ll expand upon this blog post at some point with direct integration with the Google Sheets API and some charting library! // Erik Lupander

Tack för att du läser Callistas blogg.
Hjälp oss att nå ut med information genom att dela nyheter och artiklar i ditt nätverk.

Blogg