One day I was given the task of doing a whiteboard word counter algorithm. The requirements were:

  • Print on console all the words and the quantity of times they appear on a 40.000 lines input.
  • Make it as fast as possible.

For a start we can do it as simple as possible instead, and then try to see whether we can make it faster:

func main() {
	fmt.Println(wordCounter())
}

func wordCounter() map[string]int {
	b, _ := ioutil.ReadFile("input.txt")
	inputText := string(b)
	mostFrequent := make(map[string]int)
	removeSpecial := regexp.MustCompile(`(?m)[^a-z]`)

	for _, w := range strings.Split(inputText, " ") {
		w = strings.ToLower(w)
		w = removeSpecial.ReplaceAllString(w, "")
		mostFrequent[w] = mostFrequent[w] + 1
	}

	return mostFrequent
}

If I had to explain the code it will be like this:

  1. Read the file and put it on a string variable.
  2. Create the map structure that will hold a key (word) and a value (quantity)
  3. Create a regular expression for removing characters like , ! ?
  4. For each word in the text (splitting by spaces)
  5. Lowercase it + execute the regex + add 1 to the map quantity for that word
  6. When we finish, we print to the console.

This works fine, but we are using Go and every post you visit is probably talking about goroutines and how easy is to achieve concurrency. So what if I do the following?

func main() {
	fmt.Println(wordCounterConcurrent())
}

func wordCounterConcurrent() map[string]int {
	b, _ := ioutil.ReadFile("input.txt")
	inputText := string(b)
	mostFrequent := make(map[string]int)
	removeSpecial := regexp.MustCompile(`(?m)[^a-z]`)

	wg := sync.WaitGroup{}

	for _, w := range strings.Split(inputText, " ") {
		wg.Add(1)
		go func(w2 string) {
			defer wg.Done()
			w2 = strings.ToLower(w2)
			w2 = removeSpecial.ReplaceAllString(w2, "")
			mostFrequent[w2] = mostFrequent[w2] + 1
		}(w)
	}

	wg.Wait()
	return mostFrequent
}

The differences in this snippet and the first one are the following:

  1. We defined a WaitGroup that will allow us to:
    • Add a counter every time a goroutine is fired.
    • Block the main goroutine with the wg.Wait() until all the other goroutines are finished.
    • Decrease the counter executing wg.Done() every time a goroutine is finished.
  2. We put the functionality that is going to be executed for each word in an anonymous function. So now, we can use the word go to execute that function in a different goroutine.

Does this work? Unfortunately no. If we execute the following:

> go run main.go --race
fatal error: concurrent map read and map write

Basically the race detector is telling us that some of the goroutines can execute this line at the same moment:

mostFrequent[w2] = mostFrequent[w2] + 1

If the map does not want to be written at the same time by two different goroutines we should use some sort of queue and then consume the values in a single threaded process. Fortunately, Go have us covered with channels.

Doing the following modifications should work:

func main() {
	fmt.Println(wordCounterConcurrent())
}

func wordCounterConcurrent() map[string]int {
    runtime.GOMAXPROCS(4) //Make sure we use all processors
	b, _ := ioutil.ReadFile("input.txt")
	inputText := string(b)
	mostFrequent := make(map[string]int)
	removeSpecial := regexp.MustCompile(`(?m)[^a-z]`)

	doneChan := make(chan bool)
	wordsChan := make(chan string)

	go func() {
		for {
			select {
			case w := <-wordsChan:
				mostFrequent[w] = mostFrequent[w] + 1
			case <-doneChan:
				return
			}
		}
	}()

	wg := sync.WaitGroup{}

	for _, w := range strings.Split(inputText, " ") {
		wg.Add(1)
		go func(w1 string) {
			defer wg.Done()
			w1 = strings.ToLower(w1)
			w1 = removeSpecial.ReplaceAllString(w1, "")
			wordsChan <- w1
		}(w)
	}

	wg.Wait()
	doneChan <- true
	return mostFrequent
}

But now the code growth a lot, lets look at it by parts:

In order to declare channels we use the following syntax

doneChan := make(chan bool)
wordsChan := make(chan string)

Now, if we go to the words goroutines we can see that instead of performing operations using the map we are just pushing a value into a channel.

wordsChan <- w1

The operations using the map, then will be performed when we read a value from the channel (blocking action since it is not a buffered channel)

w := <-wordsChan:

For that we have the following:

go func() {
    for {
        select {
        case w := <-wordsChan:
            mostFrequent[w] = mostFrequent[w] + 1
        case <-doneChan:
            return
        }
    }
}()

This piece of code will read from the wordsChannel and perform the map operation; and also read from the doneChannel to stop the goroutine.

The message to the doneChannel will only be submitted after all words goroutines are finished.

wg.Wait()
doneChan <- true

Now if we run this algorithm we will see that is not failing anymore. Let’s then move to benchmarking.

Benchmark

We are going to do this in two different ways: using the command-line utility time and executing an idiomatic Go benchmark.

> go test -run=XXX -bench=.
BenchmarkWordCounter-8                       100          16685557 ns/op
BenchmarkWordCounterConcurrent-8              30          46959704 ns/op

This is telling us that the sync version executed 100 times and each execution took approx 16685557 nanoseconds, meanwhile in the same time the concurrent version only got executed 30 times because each execution took approx 46959704 nanoseconds.

Wait, What?

Let’s do it with this:

> go build
> time ./go-word-counter
(SYNC)
./go-word-counter  0.02s user 0.00s system 57% cpu 0.048 total
(CONCURRENT)
./go-word-counter  0.31s user 0.04s system 362% cpu 0.096 total

So the sync version takes approx 0.048 and 57% cpu and the concurrent one 0.096 and 362% cpu.

We are sure that the function is running concurrently (hence the CPU usage), but shouldn’t that mean it has to run faster? NO.

If we look at the code we can see that the parts that get executed inside the word goroutines takes less to execute than all the orchestration needed for pulling a word from a channel.

Things would be different if we add a sleep like this:

w1 = strings.ToLower(w1)
w1 = removeSpecial.ReplaceAllString(w1, "")
time.Sleep(200)

And then the benchmark:

BenchmarkWordCounter-8                        10         171650568 ns/op
BenchmarkWordCounterConcurrent-8              20          54229310 ns/op

In this scenario we can clearly see how the concurrent way outperforms the sync one.

So we started with a single goroutine algorithm, we evolved it to run using more than one goroutine and having an orchestration for the algorithm to work. At the end we did a benchmark and compared the results.

The value we can get out of this is to avoid premature optimization, how to KISS (Keep it simple and stupid); and the most important one: We learned that concurrent flows are beneficial when the code inside the goroutine is CPU/time intensive.

πŸ‘‰ Repository with source files used πŸ‘ˆ