'Go CountBits exercise: n strings slices performing a lot better than log2n math package calls
I'm trying to solve an exercise where you have to count the number of '1' bits of any number. I came up with 4 main ideas:
- Make a recursive function countBitsString, where at each iteration it gets the string corresponding to the base2 number removing the first character. If this is 1, increment a counter and continue.
- The same as above but with a pointer to that string instead of the value (TODO)
- Use logarithm logic to understand how many power of 2 are inside this number, for each of them we have a '1' bit up so increment the counter. Increment one more if the number is odd (countBitsSmart)
- Work with binary logic to check the bit and then shift left (TODO)
Now, here's my current code and later the results
package main
import (
"fmt"
"strconv"
"math/rand"
"math"
"time"
)
/*
* implement a function that counts the number of set of bits in binary representation
* Complete the 'countBits' function below.
* The function is expected to return an int32.
* The function accepts unit32 num as parameter.
*/
func countBitsString(num uint32) int32 {
num_64 := int64(num)
num_2 := strconv.FormatInt(num_64, 2)
_, counter := recursiveCount(num_2, 0)
return counter
}
func recursiveCount(s string, counter int32) (string, int32) {
if len(s) < 1 {
return "", counter
}
if s[:1] == "1" {
counter+=1
}
//fmt.Printf("Current counter: %d\n", counter)
//fmt.Printf("Current char %s\n", s[:1])
return recursiveCount(s[1:], counter)
}
func countBitsBinary(num uint32) int32 {
fmt.Println("TODO")
// convert uint23 in binary form
// for binary_number > 0
// if current bit == 1 => counter++
// shift left binary_number
return 0
}
func countBitsSmart(num uint32) int32{
var upBits int32
var highestPower uint32 = math.MaxUint32
if num % 2 >0 {
upBits+=1
// fmt.Printf("\tUP bits = %d\n", upBits)
}
for ; (num > 2 && highestPower > 1); upBits++ {
highestPower = uint32(math.Log2(float64(num)))
num = num - uint32(math.Pow(2, float64(highestPower)))
// fmt.Printf("\tlog2 = %d\n", highestPower)
// fmt.Printf("\tnum %d rest %d\n", num, num%2)
// fmt.Printf("\tUP bits = %d\n", upBits)
}
return upBits
}
// Profiling with execution time and calling
func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
fmt.Println("\n=========================================================")
start := time.Now()
for _, numTemp := range numInput{
// fmt.Printf("> Number: %d (%s)\tUp bits: %d\n", numTemp, strconv.FormatInt(int64(numTemp), 2), countFunc( uint32(numTemp) ))
countFunc( uint32(numTemp))
}
fmt.Printf("Time elapsed for %v func= %vs", funcName, time.Since(start))
}
func main() {
const testSize = 10000000
var numInput = make([]uint32, testSize)
for i:=0; i<testSize; i++ {
numInput[i] = uint32(rand.Intn(500))
}
fmt.Printf("\n Test size = %d\n", testSize)
invoker(numInput, "countBitsString", countBitsString)
invoker(numInput, "countBitsSmart", countBitsSmart)
fmt.Println()
}
How can the second function perform 5 times worse than the previous one? Is it just for the calls to math package?
Thanks
Solution 1:[1]
Using floating point Log2/Pow calculations is a very expensive way to calculate bits. Converting to a binary string is much faster since it can be done with bit shifts/integer calculations (it's still expensive). The Wikipedia Hamming Weight page has some explanations of very fast popcount calculations via bitmasks.
You can see the relative performance via Go's builtin benchmark support. It can help understand how code performs:
// main_test.go
package main
import (
"math/bits"
"testing"
)
// Simple benchmarks to get started.
// You could also try benchmarking different numbers to understand
// performance differences. There is some inherit bias in benchmarking
// sequential numbers starting from 0. You could try a list of 256 preset
// random numbers. There are many options..
func BenchmarkCountBitsString(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsString(uint32(i))
}
}
func BenchmarkCountBitsSmart(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt32 = countBitsSmart(uint32(i))
}
}
func BenchmarkOnesCount(b *testing.B) {
for i := 0; i < b.N; i++ {
dummyInt = bits.OnesCount32(uint32(i))
}
}
// Ensure some optimisations don't occur by assigning to a global.
var (
dummyInt int
dummyInt32 int32
)
Then you can easily understand the relative performance:
$ go test -bench .
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10388419 99.97 ns/op
BenchmarkCountBitsSmart-8 2036588 617.6 ns/op
BenchmarkOnesCount-8 1000000000 0.5540 ns/op
PASS
ok stack/bench 3.625s
Inspecting performance with pprof requires some extra parameters:
- Disable tests with
-run XXX(XXX doesn't match any tests) - Limit to a single function benchmark, otherwise the faster function will be run more often until it consumes as much time as the slower function (~5s in the final run).
- Use a fixed number of iterations for the benchmark time so the profiles are comparable between functions
- Save CPU profile. This also causes the test binary to be kept as
DIR.test(bench.testin this example).
Benchmark details for countBitsSmart show it spends ~55% time in math.Log2 and ~38% time in math.Pow:
$ go test -bench CountBitsSmart -run XXX -benchtime 10000000x -cpuprofile cpu.prof
goos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsSmart-8 10000000 703.0 ns/op
PASS
ok stack/bench 7.226s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:19pm (AEST)
Duration: 7.22s, Total samples = 7.01s (97.10%)
Showing nodes accounting for 7s, 99.86% of 7.01s total
Dropped 1 node (cum <= 0.04s)
flat flat% sum% cum cum%
0.01s 0.14% 0.14% 7.01s 100% stack/bench.BenchmarkCountBitsSmart
0 0% 0.14% 7.01s 100% testing.(*B).launch
0 0% 0.14% 7.01s 100% testing.(*B).runN
0.43s 6.13% 6.28% 7s 99.86% stack/bench.countBitsSmart
0.07s 1% 7.28% 3.88s 55.35% math.Log2 (inline)
0.42s 5.99% 13.27% 3.81s 54.35% math.log2
0.02s 0.29% 13.55% 3.07s 43.79% math.Log (inline)
3.05s 43.51% 57.06% 3.05s 43.51% math.archLog
0.15s 2.14% 59.20% 2.69s 38.37% math.Pow (inline)
1.20s 17.12% 76.32% 2.54s 36.23% math.pow
0.07s 1% 77.32% 0.57s 8.13% math.Frexp (inline)
[...]
$ go tool pprof -list bench.countBitsSmart bench.test cpu.prof
Total: 7.01s
ROUTINE ======================== stack/bench.countBitsSmart in /home/.../stack/bench/main.go
430ms 7s (flat, cum) 99.86% of Total
. . 54: if num%2 > 0 {
. . 55: upBits += 1
. . 56: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 57: }
. . 58:
40ms 40ms 59: for ; num > 2 && highestPower > 1; upBits++ {
250ms 4.13s 60: highestPower = uint32(math.Log2(float64(num)))
90ms 2.78s 61: num = num - uint32(math.Pow(2, float64(highestPower)))
. . 62: // fmt.Printf("\tlog2 = %d\n", highestPower)
. . 63: // fmt.Printf("\tnum %d rest %d\n", num, num%2)
. . 64: // fmt.Printf("\tUP bits = %d\n", upBits)
. . 65: }
. . 66:
50ms 50ms 67: return upBits
. . 68:}
. . 69:
. . 70:// Profiling with execution time and calling
. . 71:func invoker(numInput []uint32, funcName string, countFunc func(num uint32) int32) {
. . 72: fmt.Println("\n=========================================================")
The countBitsString benchmark shows about half the time is spent in recursiveCount and half in strconv.FormatInt. This indicates recursiveCount could be a good target for optimisation.
$ go test -bench CountBitsString -run XXX -benchtime 10000000x -cpuprofile cpu.profgoos: linux
goarch: amd64
pkg: stack/bench
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
BenchmarkCountBitsString-8 10000000 99.13 ns/op
PASS
ok stack/bench 1.111s
$ go tool pprof -top -cum bench.test cpu.prof
File: bench.test
Type: cpu
Time: Apr 17, 2022 at 11:15pm (AEST)
Duration: 1.10s, Total samples = 1.05s (95.18%)
Showing nodes accounting for 1.05s, 100% of 1.05s total
flat flat% sum% cum cum%
0.02s 1.90% 1.90% 1.01s 96.19% stack/bench.BenchmarkCountBitsString
0 0% 1.90% 1.01s 96.19% testing.(*B).launch
0 0% 1.90% 1.01s 96.19% testing.(*B).runN
0.01s 0.95% 2.86% 0.99s 94.29% stack/bench.countBitsString
0.51s 48.57% 51.43% 0.51s 48.57% stack/bench.recursiveCount
0 0% 51.43% 0.47s 44.76% strconv.FormatInt
0.21s 20.00% 71.43% 0.47s 44.76% strconv.formatBits
0.03s 2.86% 74.29% 0.26s 24.76% runtime.slicebytetostring
0.14s 13.33% 87.62% 0.22s 20.95% runtime.mallocgc
0.04s 3.81% 91.43% 0.04s 3.81% runtime.nextFreeFast (inline)
[...]
$ go tool pprof -list bench.countBitsString bench.test cpu.prof
Total: 1.05s
ROUTINE ======================== stack/bench.countBitsString in /home/.../stack/bench/main.go
10ms 990ms (flat, cum) 94.29% of Total
. . 14: * implement a function that counts the number of set of bits in binary representation
. . 15: * Complete the 'countBits' function below.
. . 16: * The function is expected to return an int32.
. . 17: * The function accepts unit32 num as parameter.
. . 18: */
10ms 10ms 19:func countBitsString(num uint32) int32 {
. . 20:
. . 21: num_64 := int64(num)
. 470ms 22: num_2 := strconv.FormatInt(num_64, 2)
. 510ms 23: _, counter := recursiveCount(num_2, 0)
. . 24: return counter
. . 25:}
. . 26:
. . 27:func recursiveCount(s string, counter int32) (string, int32) {
. . 28: if len(s) < 1 {
These benchmarks show math.Log2+math.Pow (6.57s) is ~14x slower than strconv.FormatInt (0.47s).
I also highly recommend using the pprof web interface to interactively explore performance. This is a little non-obvious, but can be started with:
$ go tool pprof -http : bench.test cpu.prof
Dave Cheney also has an instructive blog on benchmarking with Go: https://dave.cheney.net/2013/06/30/how-to-write-benchmarks-in-go
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |

