'how can I get the first 3 digits of a given ***unsigned long long*** without converting it to string and without knowing is the length

how can I get the first 3 digits of a given unsigned long long without converting it to string and there is now the constant length of the number.

without using the naive approach of dividing with / 10.

like this

int first_3_digits(unsigned long long number)
{
    unsigned long long n = number;
    while(n >= 1000){ 
        n = n/10;
    }

    return n;
}

I had an interview and they said that this solution wasn't good enough the interviewer said the solution resembles a binary search. I know how binary search work but I don't know how to connect it to this problem.



Solution 1:[1]

Modern processors are able to compute the log2 of an integer in only few cycles using specific low-level instructions (eg. bsr on mainstream x86-64 processors). Based on this great previous post one can compute the log10 of an integer very quickly. The idea is to use a lookup table so to do the translation between the log2 and log10. Once the log10 has been computed, one can just use another lookup table to perform the division by 10 ** log10(number). However, non-constant 64-bit divisions are very expensive on almost all processors. An alternative solution is to use a switch with all the possible case so the compiler can generate an efficient code for all cases and use a fast jump table. Indeed, a division by a constant can be optimized by the compiler in few fast instructions (ie. multiplications and shifts) that are much faster than a non-constant division. The resulting code is not very beautiful/simple though. Here it is:

#include <math.h>
#include <assert.h>

static inline unsigned int baseTwoDigits(unsigned long long x) {
    return x ? 64 - __builtin_clzll(x) : 0;
}

static inline unsigned int baseTenDigits(unsigned long long x) {
    static const unsigned char guess[65] = {
        0,  0,  0,  0,  1,  1,  1,  2,  2,  2,  3,  3,  3,  3,  4,  4,  4,
        5,  5,  5,  6,  6,  6,  6,  7,  7,  7,  8,  8,  8,  9,  9,  9,  9,
       10, 10, 10, 11, 11, 11, 12, 12, 12, 12, 13, 13, 13, 14, 14, 14, 15,
       15, 15, 15, 16, 16, 16, 17, 17, 17, 18, 18, 18, 18, 19
    };
    static const unsigned long long tenToThe[] = {
        1ull, 10ull, 100ull, 1000ull, 10000ull, 100000ull, 1000000ull, 10000000ull, 100000000ull,
        1000000000ull, 10000000000ull, 100000000000ull, 1000000000000ull,
        10000000000000ull, 100000000000000ull, 1000000000000000ull,
        10000000000000000ull, 100000000000000000ull, 1000000000000000000ull,
        10000000000000000000ull
    };
    unsigned int digits = guess[baseTwoDigits(x)];
    return digits + (x >= tenToThe[digits]);
}

inline int optimized(unsigned long long number)
{
    const unsigned int intLog = baseTenDigits(number);
    switch(intLog)
    {
        case  0: return number;
        case  1: return number;
        case  2: return number;
        case  3: return number;
        case  4: return number / 10ull;
        case  5: return number / 100ull;
        case  6: return number / 1000ull;
        case  7: return number / 10000ull;
        case  8: return number / 100000ull;
        case  9: return number / 1000000ull;
        case 10: return number / 10000000ull;
        case 11: return number / 100000000ull;
        case 12: return number / 1000000000ull;
        case 13: return number / 10000000000ull;
        case 14: return number / 100000000000ull;
        case 15: return number / 1000000000000ull;
        case 16: return number / 10000000000000ull;
        case 17: return number / 100000000000000ull;
        case 18: return number / 1000000000000000ull;
        case 19: return number / 10000000000000000ull;
        case 20: return number / 100000000000000000ull;
        default: assert(0); return 0;
    }
}

Note that this code use the non-standard compiler built-in __builtin_clzll available on both Clang and GCC. For MSVC, please read this post.

[Update] The previous benchmark did not inline the code of the proposed function as opposed to the others resulting in a slower execution (especially on Clang). Using static+inline helped the compiler to properly inline the function calls.

Results

Here is the results of the methods on QuickBench respectively on GCC and Clang (with -O3). Note that the input distribution is chosen so that the logarithm are quite uniform and pseudo-random (ie. logarithm uniform). This choice as been made since the interviewer said a binary-search was a good solution and this distribution is a best one for such algorithm.

GCC benchmark

Clang benchmark

One can see that this solution is the fastest. The one of Yakov Khodorkovski and qqNade give wrong results for big values due to floating point rounding. The one of qqNade is not presented in the benchmark for sake of clarity as it is >10 time slower than the original one.

The reason why the solution of Gerhardh is so fast with Clang is that the compiler is able to partially generate fast conditional moves instead of slow conditional branches. This optimization is insanely clever since this is only possible on 32-bit integers (and also if the optimization about the division by a constant is performed before), but Clang is able to know that n is small enough after the 2 first conditions! That being said, this optimization is fragile since few changes in the code often appears to break it.

One can note that the original code is surprisingly quite fast (especially on GCC). This is due to branch prediction. Modern processors execute many iterations without checking they should be executed (and roll back if needed). Each iteration is very fast since the division by a constant is optimized: it only takes 2 cycle/iterations on my machine. On modern x86-64 Intel processors a branch misprediction takes 14 cycles while a well-predicted branch takes only 1-2 cycles (similar on AMD Zen processors). The average number of iteration is ~9 and only the last iteration is expensive. The solution of Gerhardh results in far less instructions executed, but it can result in up to 4 miss-predicted branch with GCC and up to 2 with Clang. The proposed solution results in only 1 indirect miss-predicted branch (that processors can less easily execute efficiently though). Since the optimized implementations runs in only ~10 cycles in average on Quickbench, the effect of branch misprediction is huge.

Note that using other input distribution have an impact on results though the overall trend remains the same. Here are results for a uniform distribution: GCC and Clang. The original algorithm is significantly slower since the average number of digits is twice bigger (17~18 instead of ~9) so is the number of iterations. The speed of the other algorithms is not very different compared to the previous distribution and the trend is left unchanged overall for them.

Conclusion

To conclude, the solution of Gerhardh is relatively portable, simple and pretty fast. The new proposed solution is more complex, but it is the fastest on both GCC and Clang. Thus, the solution of Gerhardh should be preferred unless the performance of this function is very important.

Solution 2:[2]

Finding the "best" solution often depends on the criteria you define to matter. If you want smallest or simplest solution, your approach is not bad.

If you want fasted solution (and got the hint about "binary search" from the interviewer), then you might try something like this (not tested):

int first_3_digits(unsigned long long number)
{
    unsigned long long n = number;
    unsigned long long chopped;

    // n has up to 20 digits. We need to chop up to 17 digits.

    // Chop 8 digits
    chopped = n / 100000000ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 12 digits,
    // If we use old n we have up to 11 digits
    // 9 more to go...

    // Chop 4 digits
    chopped = n / 10000ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 8 digits,
    // If we use old n we have up to 7 digits
    // 5 more to go...

    // Chop 2 digits
    chopped = n / 100ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 6 digits,
    // If we use old n we have up to 5 digits
    // 3 more to go...

    // Chop 2 digits again
    chopped = n / 100ull;
    if (chopped >= 100)
    {
        n = chopped;
    }
    // chopped has up to 4 digits,
    // If we use old n we have up to 3 digits
    // 1 more to go...

    // Chop last digit if required.
    if (n >= 1000)
    {
        n /= 10;
    }

    return n;
}

For 64 bit values, maximum number is 18446744073709551615, i.e. 20 digits. We have to remove at most 17 digits. As this is not a perfect number to use powers of 2 for number of digits to chop, we repeat the step to chop 2 digits.

That solution might be a bit faster but is likely to take more code.

Solution 3:[3]

When the interviewer says:

the solution resembles a binary search

that's evidence that the interviewer has in mind a particular distribution of inputs for this function, and that distribution is not a uniform distribution over the range of unsigned long long values. At that point, it's necessary to ask the interviewer what input distribution they expect, since it's not possible to optimise algorithms like this without knowing that.

In particular, if the inputs were selected from a uniform sample of the range of 64-bit unsigned values, then the following simple function would be close to optimal:

/* See Note 1 for this constant */
#define MAX_POWER (10ULL * 1000 * 1000 * 1000 * 1000 * 1000)

int first_3_digits(unsigned long long n) {
  if (n >= 1000) { /* Almost always true but needed for correctness */
    while (n < MAX_POWER * 100) n *= 10;
    if (n >= MAX_POWER * 1000) n /= 10;
    n /= MAX_POWER;
  }
  return n;
}

I hope this demonstrates how much difference the expected input distribution makes. The above solution optimises for the case where the input is close to the maximum number of digits, which is almost always the case with a uniform distribution. In that case, the while loop will hardly ever execute (more precisely, it will execute about 1/1844 times).

That solution also has the advantage that the while loop, even if it does execute, only does multiplications by 10, not divisions by 10. Both GCC and Clang understand how to optimise divisions by constants into multiplication and shift, but a multiplication is still faster, and multiplication by 10 is particularly easy to optimise. [Note 2]


Notes

  1. Note that the constant MAX_POWER was precomputed for the case where unsigned long long is a 64-bit value. That's not guaranteed, although it's common, and it's also the minimum possible size of an unsigned long long. Computing the correct value with the C preprocessor is possible, at least up to a certain point, but it's tedious, so I left it out. The value needed is the largest power of 10 no greater than ULLONG_MAX / 1000. (Since unsigned long long is always a power of 2, it cannot be a power of 10 and the test could equally be for the largest power of 10 less than ULLONG_MAX / 1000, if that were more convenient.)

  2. In the benchmark provided by Jérôme Richard, whose sample inputs are chosen so that their logarithms are roughly uniform --a very different input distribution--, this comes out a bit slower than his optimised solution, although it's within the margin of error of the benchmark tool. (On the other hand, the code is quite a bit simpler.) On Jérôme's second benchmark, with a uniform sample, it comes out a lot faster.

Solution 4:[4]

  • The hint for a binary search implies the last few tests and divides should be as below, using powers of 10: ..., 8, 4, 2, 1

    // Pseudo code
    if (big enough)
      divide by 100,000,000
    if (big enough)
      divide by 10,000 
    if (big enough)
      divide by 100 
    if (big enough)
      divide by 10
    
  • Working this backwards we need up to 5 divisions for a 64-bit unsigned long long.

  • As unsigned long long may be wider than 64, add tests for that.

  • Provide not only a solution, but a test harness to demo the correctness.

Example:

#include <errno.h>
#include <inttypes.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

unsigned first3(unsigned long long x) {
  static const unsigned long long ten18 = 1000000ull * 1000000 * 1000000;
  static const unsigned long long ten16 = 10000ull * 1000000 * 1000000;
  static const unsigned long long ten10 = 10000ull * 1000000;
  static const uint32_t ten8 = 100ul * 1000000;
  static const uint32_t ten6 = 1000000u;
  static const uint32_t ten4 = 10000u;
  static const uint32_t ten3 = 1000u;
  static const uint32_t ten2 = 100u;

  // while loop used in case unsigned long long is more than 64-bit
  // We could use macro-magic to make this an `if` for common 64-bit.
  while (x >= ten18) {
    x /= ten16;
  }

  if (x >= ten10) {
    x /= ten8;
  }
  if (x >= ten6) {
    x /= ten4;
  }
  uint32_t x32 = (uint32_t) x;  // Let us switch to narrow math
  if (x32 >= ten4) {
    x32 /= ten2;
  }
  if (x32 >= ten3) {
    x32 /= 10;
  }
  return (unsigned) x32;
}

Test code

int test_first3(void) {
  char buf[sizeof(unsigned long long) * CHAR_BIT];
  for (size_t i = 1;; i++) {
    for (int dig = '0'; dig <= '9'; dig += 9) {
      memset(buf, dig, i);
      if (dig == '0') {
        buf[0]++;
      }
      buf[i] = '\0';
      errno = 0;
      unsigned long long x = strtoull(buf, 0, 10);
      if (errno) {
        puts("Success!");
        return 0;
      }
      unsigned f3 = first3(x);
      char buf3[sizeof(unsigned) * CHAR_BIT];
      int len = sprintf(buf3, "%u", f3);
      printf("%2zu <%s> <%s>\n", i, buf3, buf);
      if (len > 3) {
        printf("i:%zu dig:%c\n", i, dig);
        return -1;
      }
      if (strncmp(buf, buf3, 3) != 0) {
        return -1;
      }
    }
  }
}

int main(void) {
  test_first3();
}

Output

 1 <1> <1>
 1 <9> <9>
 2 <10> <10>
 2 <99> <99>
 3 <100> <100>
 3 <999> <999>
 4 <100> <1000>
 4 <999> <9999>
 5 <100> <10000>
 5 <999> <99999>
 ...
17 <100> <10000000000000000>
17 <999> <99999999999999999>
18 <100> <100000000000000000>
18 <999> <999999999999999999>
19 <100> <1000000000000000000>
19 <999> <9999999999999999999>
20 <100> <10000000000000000000>
Success!

Solution 5:[5]

here a short solution using log10 function :

int first_n_digits(unsigned long long number){
    return number < 1000 ? number =  (int)number : number = (int)(number/pow(10,(int)(log10(number)+1)-3));
}

Solution 6:[6]

You can calculate the length of a number (its power) using the decimal logarithm, then use the decimal exponent to get a divisor less than three orders of magnitude and integer divide the number by it:

#include <assert.h>
#include <math.h>

int first_3_digits(unsigned long long number)
{
     if(number < 1000)
         return number;
     int number_length = int(floor(log10l(number)))+1;
     assert(number_length > 3);
     unsigned long long divider = exp10l(number_length - 3);
     return number/divider;
}

int main()
{
     assert(first_3_digits(0)==0);
     assert(first_3_digits(999)==999);
     assert(first_3_digits(1234)==123);
     assert(first_3_digits(9876543210123456789ull)==987);
     return 0;
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4
Solution 5 Yakov Khodorkovski
Solution 6