'Given a string 's' of length n, calculate the number of occurrences of a character 'c' of s in all substrings of s

Consider the string aba.

All of its substrings are: [a,b,a,ab,ba,aba].

In the above list, character 'a' appears 6 times and b appears 4 times.

Is there any formula for calculating this? that is the number of occurrences of a character 'c' of s in all substrings of s.



Solution 1:[1]

In a string of length n, there are n(n+1)/2 substrings.

The character at index i appears once in every substring, except for the substrings that end at character i-1 or before, and except for the substrings that start at character i+1 or after. Note that here I'm only counting this specific occurrence of the character at index i.

Let me rephrase that: in a string S c T, where S and T are two substrings and c is a character, this occurrence of character c appears once in every substring of S c T, except in substrings of S and in substrings of T.

Thus the number of substrings in which appears the character at index i in a string of length n is:

k_i = n(n+1)/2 - i(i+1)/2 - (n-i-1)(n-i)/2

If we expand, cancel out terms, and refactor:

k_i = n(n+1)/2 - i(i+1)/2 - (n-i-1)(n-i)/2
k_i = (n(n+1) - i(i+1) - (n-i-1)(n-i)) / 2
k_i = (n²+n -i²-i -n²+ni+in-i²+n-i) / 2
k_i = (2n + 2ni -2i - 2i²) / 2
k_i = n + ni - i - i²
k_i = n(i+1) - i(i+1)
k_i = (i+1) (n-i)

We can sum on all occurrences of this character. In python, this gives the following function:

def count_occurences_in_substrings(s, c):
    n = len(s)
    return sum(
        (i + 1) * (n - i)
        for i, a in enumerate(s) if a == c
    )

Testing by comparing against a bruteforce function that actually generates the substrings:

from itertools import combinations

def bruteforce(s, c):
    return ''.join(s[i:j] for i,j in combinations(range(len(s)+1), 2)).count(c)

for s in ['aba', 'abcdefg', 'bcdaefg']:
    k1 = count_occurences_in_substrings(s, 'a')
    k2 = bruteforce(s, 'a')
    print('{:8s}: {:2d},{:2d}'.format(s, k1, k2))
# aba     :  6, 6
# abcdefg :  7, 7
# bcdaefg : 16,16

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1