'What's the worst case complexity for KMP when the goal is to find all occurrences of a certain string?

I would also like to know which algorithm has the worst case complexity of all for finding all occurrences of a string in another. Seems like Boyer–Moore's algorithm has a linear time complexity.



Solution 1:[1]

There is a long article on KMP at http://en.wikipedia.org/wiki/Knuth-morris-pratt which ends with saying

Since the two portions of the algorithm have, respectively, complexities of O(k) and O(n), the complexity of the overall algorithm is O(n + k).

These complexities are the same, no matter how many repetitive patterns are in W or S. (end quote)

So the total cost of a KMP search is linear in the number of characters of string and pattern. I think this holds even if you need to find multiple occurrences of the pattern in the string - and if not, just consider searching for patternQ, where Q is a character that does not occur in the text, and noting down where the KMP state shows that it has matched everything up to the Q.

Solution 2:[2]

You can count Pi function for a string in O(length). KMP builds a special string that has length n+m+1, and counts Pi function on it, so in any case complexity will be O(n+m+1)=O(n+m)

Solution 3:[3]

If you think about it, the worst case for matching the pattern is the one in which you've to visit each index of the LPS array, when mismatch occurs. For example, pattern "aaaa" which creates LPS arrays as [0,1,2,3] makes it possible.

Now, for the worst case matching in the text, we want to maximize the such mismatches that forces us to visit all the indices of the LPS array. That would be a text with repeated pattern, but with the last character as a mismatch. For example, "aaabaaacaaabaaacaaabaaac".

Let the length of the text be n and that of pattern be m. Number of the occurences of such pattern in the text is n/m. And for each of these occurences, we are performing m comparisions. Not to forget that we are also traversing n characters of the text.

Therefore, the worst case time for KMP matching would be O(n + (n/m)*m), which is basically O(n).

Total worst case time complexity, including LPS creation, would be O(n+m).

KMP Code (for reference):

void createLPS(char[] pattern,int[] lps){
        int m = pattern.length;
        int i=1;
        int j=0;
        lps[j]=0;
        while(i<m){
            if(pattern[j]==pattern[i]){
                lps[i]=j+1;
                i++;
                j++;
            }else{
                if(j!=0){
                    j = lps[j-1];
                }else{
                    lps[i]=0;
                    i++;
                }
            }
        }
    }

 List<Integer> match(char[] str, char[] pattern, int[] lps){
        int m = pattern.length;
        int n = str.length;
        int i=0, j=0;
        List<Integer> idxs = new ArrayList<>();
        while(i<n){
            if(pattern[j]==str[i]){
                j++;
                i++;
            }else{
                if(j!=0){
                    j = lps[j-1];
                }else{
                    i++;
                }
            }
            if(j==m){
                idxs.add(i-m);             
                j = lps[j-1];
            }
        }
        return idxs;
    }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mcdowella
Solution 2 kilotaras
Solution 3 Aryan