'Why is using integers much slower than using strings in this problem dealing with binary strings?

I solved the following leet code problem https://leetcode.com/problems/find-kth-bit-in-nth-binary-string/ by using strings (class Solution) and integers (class Solution2). I thought the integer solution should be a lot faster and using much less memory. But actually it takes a lot longer. With n=18 and k=200 it took 10.1 seconds compared to 0.13 seconds of the string solution.

import time

class Solution:
    def findKthBit(self, n: int, k: int) -> str:
        i = 0
        Sn = "0"
        Sn = self.findR(Sn, i, n)
        return Sn[k-1]

    def findR(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSn(Sn, i)
        return self.findR(newSn, i+1, n)

    def calcNewSn(self, Sn, i):
        inverted = ""
        for c in Sn:
            inverted += "1" if c == "0" else "0"
        newSn = Sn + "1" + (inverted)[::-1]
        return newSn


class Solution2:
    def findKthBitBin(self, n: int, k: int) -> str:
        i = 0
        # MSB (S1) has to be 1 for bit operations but is actually always 0
        Sn = 1
        Sn = self.findRBin(Sn, i, n)
        lenSn = (2**(n+1)) - 1
        return "0" if k == 1 else str((Sn >> (lenSn - k)) & 1)

    def findRBin(self, Sn, i, n):
        if i == n:
            return Sn
        newSn = self.calcNewSnBin(Sn, i)
        return self.findRBin(newSn, i+1, n)

    def calcNewSnBin(self, Sn, i):
        lenSn = 2**(i+1) - 1
        newSn = (Sn << 1) | 1
        inverted = (~Sn | (1 << (lenSn - 1))) & self.getMask(lenSn)
        newSn = self.reverseBits(newSn, inverted, lenSn)
        return newSn

    def getMask(self, lenSn):
        mask = 0
        for i in range(lenSn):
            mask |= (1 << i)
        return mask

    def reverseBits(self, newSn, inverted, lenSn):
        for i in range(lenSn):
            newSn <<= 1
            newSn |= inverted & 1
            inverted >>= 1
        return newSn
 
sol = Solution()
sol2 = Solution2()

start = time.time()
print(sol.findKthBit(18, 200))
end = time.time()
print(f"time using strings: {end - start}")
start = time.time()
print(sol2.findKthBitBin(18, 200))
end = time.time()
print(f"time using large integers: {end - start}")

Why is the solution using integers so slow? Are the implemetations of big ints faster in other languages compared to strings?



Solution 1:[1]

The problem isn't how fast or slow the integer implementation is. The problem is that you've written an algorithm that wants a random-access mutable sequence data type, like a list, and integers are not a random-access mutable sequence.

For example, look at your reverseBits implementation:

def reverseBits(self, newSn, inverted, lenSn):
    for i in range(lenSn):
        newSn <<= 1
        newSn |= inverted & 1
        inverted >>= 1
    return newSn

With a list, you could just call list.reverse(). Instead, your code shifts newSn and inverted over and over, copying almost all the data involved repeatedly on every iteration.

Strings aren't a random-access mutable sequence data type either, but they're closer, and the standard implementation of Python has a weird optimization that actually does try to mutate strings in loops like the one you've written:

def calcNewSn(self, Sn, i):
    inverted = ""
    for c in Sn:
        inverted += "1" if c == "0" else "0"
    newSn = Sn + "1" + (inverted)[::-1]
    return newSn

Instead of building a new string and copying all the data for the +=, Python tries to resize the string in place. When that works, it avoids most of the unnecessary copying you're doing with the integer-based implementation.


The right way to write your algorithm would be to use lists, instead of strings or integers. However, there's an even better way, with a different algorithm that doesn't need to build a bit sequence at all. You don't need the whole bit sequence. You just need to compute a single bit. Figure out how to compute just that bit.

Solution 2:[2]

The bulk of the time is spent in reverseBits(). Overall, you're creating hundreds of thousands of new integer objects in that, up to 262143 bits long. It's essentially quadratic time. The corresponding operation for the string form is just (inverted)[::-1], which runs entirely at "C speed", and is worst-case linear time.

Low Level, High Level

You can get a factor of 8 speedup just by replacing reverseBits()

    def reverseBits(self, newSn, inverted, lenSn):
        inverted = int(f"{inverted:0{lenSn}b}"[::-1], 2)
        return (newSn << lenSn) | inverted

That's linear time. It converts inverted to a bit string, reverses that string, and converts the result back to an int. Other tricks can be used to speed other overly-slow bigint algorithms.

But, as the other answer (which you should accept!) said, it's missing the forest for the trees: the whole approach is off the best track. It's possible, and even with simpler code, to write a function such that all these run blazing fast (O(n) worst case per call), and require only trivial working memory:

>>> [findKthBit(4, i) for i in range(1, 16)] == list("011100110110001")
True
>>> findKthBit(500, 2**20)
'1'
>>> findKthBit(500, 2**20 - 1)
'1'
>>> findKthBit(500, 2**20 + 1)
'0'

There's not a computer in the world with enough memory to build a string with 2**500-1 characters (or a bigint with that many bits).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 user2357112
Solution 2