'Number of ways to recreate a given string using a given list of words

Given is a String word and a String array book that contains some strings. The program should give out the number of possibilities to create word only using elements in book. An element can be used as many times as we want and the program must terminate in under 6 seconds.

For example, input:

String word = "stackoverflow";

String[] book = new String[9];
book[0] = "st";
book[1] = "ck";
book[2] = "CAG";
book[3] = "low";
book[4] = "TC";
book[5] = "rf";
book[6] = "ove";
book[7] = "a";
book[8] = "sta";

The output should be 2, since we can create "stackoverflow" in two ways:

1: "st" + "a" + "ck" + "ove" + "rf" + "low"

2: "sta" + "ck" + "ove" + "rf" + "low"

My implementation of the program only terminates in the required time if word is relatively small (<15 characters). However, as I mentioned before, the running time limit for the program is 6 seconds and it should be able to handle very large word strings (>1000 characters). Here is an example of a large input.

Here is my code:

1) the actual method:

input: a String word and a String[] book

output: the number of ways word can be written only using strings in book

public static int optimal(String word, String[] book){
    int count = 0;

    List<List<String>> allCombinations = allSubstrings(word);

    List<String> empty = new ArrayList<>();

    List<String> wordList = Arrays.asList(book);

    for (int i = 0; i < allCombinations.size(); i++) {

        allCombinations.get(i).retainAll(wordList);

        if (!sumUp(allCombinations.get(i), word)) {
            allCombinations.remove(i);
            allCombinations.add(i, empty);
        }
        else count++;
    }

    return count;
}

2) allSubstrings():

input: a String input

output: A list of lists, each containing a combination of substrings that add up to input

static List<List<String>> allSubstrings(String input) {

    if (input.length() == 1) return Collections.singletonList(Collections.singletonList(input));

    List<List<String>> result = new ArrayList<>();

    for (List<String> temp : allSubstrings(input.substring(1))) {

        List<String> firstList = new ArrayList<>(temp);
        firstList.set(0, input.charAt(0) + firstList.get(0));
        if (input.startsWith(firstList.get(0), 0)) result.add(firstList);

        List<String> l = new ArrayList<>(temp);
        l.add(0, input.substring(0, 1));
        if (input.startsWith(l.get(0), 0)) result.add(l);
    }

    return result;
}

3.) sumup():

input: A String list input and a String expected

output: true if the elements in input add up to expected

public static boolean sumUp (List<String> input, String expected) {

    String x = "";

    for (int i = 0; i < input.size(); i++) {
        x = x + input.get(i);
    }
    if (expected.equals(x)) return true;
    return false;
}


Solution 1:[1]

I've figured out what I was doing wrong in my previous answer: I wasn't using memoization, so I was redoing an awful lot of unnecessary work.

Consider a book array {"a", "aa", "aaa"}, and a target word "aaa". There are four ways to construct this target:

"a" + "a" + "a"
"aa" + "a"
"a" + "aa"
"aaa"

My previous attempt would have walk through all four, separately. But instead, one can observe that:

  • There is 1 way to construct "a"
  • You can construct "aa" in 2 ways, either "a" + "a" or using "aa" directly.
  • You can construct "aaa" either by using "aaa" directly (1 way); or "aa" + "a" (2 ways, since there are 2 ways to construct "aa"); or "a" + "aa" (1 way).

Note that the third step here only adds a single additional string to a previously-constructed string, for which we know the number of ways it can be constructed.

This suggests that if we count the number of ways in which a prefix of word can be constructed, we can use that to trivially calculate the number of ways a longer prefix by adding just one more string from book.

I defined a simple trie class, so you can quickly look up prefixes of the book words that match at any given position in word:

class TrieNode {
  boolean word;
  Map<Character, TrieNode> children = new HashMap<>();

  void add(String s, int i) {
    if (i == s.length()) {
      word = true;
    } else {
      children.computeIfAbsent(s.charAt(i), k -> new TrieNode()).add(s, i + 1);
    }
  }
}

For each letter in s, this creates an instance of TrieNode, and stores the TrieNode for the subsequent characters etc.

static long method(String word, String[] book) {
  // Construct a trie from all the words in book.
  TrieNode t = new TrieNode();
  for (String b : book) {
    t.add(b, 0);
  }

  // Construct an array to memoize the number of ways to construct
  // prefixes of a given length: result[i] is the number of ways to
  // construct a prefix of length i.
  long[] result = new long[word.length() + 1];

  // There is only 1 way to construct a prefix of length zero.
  result[0] = 1;

  for (int m = 0; m < word.length(); ++m) {
    if (result[m] == 0) {
      // If there are no ways to construct a prefix of this length,
      // then just skip it.
      continue;
    }

    // Walk the trie, taking the branch which matches the character
    // of word at position (n + m).
    TrieNode tt = t;
    for (int n = 0; tt != null && n + m <= word.length(); ++n) {
      if (tt.word) {
        // We have reached the end of a word: we can reach a prefix
        // of length (n + m) from a prefix of length (m).
        // Increment the number of ways to reach (n+m) by the number
        // of ways to reach (m).
        // (Increment, because there may be other ways).
        result[n + m] += result[m];
        if (n + m == word.length()) {
          break;
        } 
      }
      tt = tt.children.get(word.charAt(n + m));
    }
  }

  // The number of ways to reach a prefix of length (word.length())
  // is now stored in the last element of the array.
  return result[word.length()];
}

For the very long input given by OP, this gives output:

$ time java Ideone

2217093120

real    0m0.126s
user    0m0.146s
sys 0m0.036s

Quite a bit faster than the required 6 seconds - and this includes JVM startup time too.


Edit: in fact, the trie isn't necessary. You can simply replace the "Walk the trie" loop with:

for (String b : book) {
  if (word.regionMatches(m, b, 0, b.length())) {
    result[m + b.length()] += result[m];
  }
}

and it performs slower, but still way faster than 6s:

2217093120

real    0m0.173s
user    0m0.226s
sys 0m0.033s

Solution 2:[2]

A few observations:

x = x + input.get(i);

As you are looping, using String+ isn't a good idea. Use a StringBuilder and append to that within the loop, and in the end return builder.toString(). Or you follow the idea from Andy. There is no need to merge strings, you already know the target word. See below.

Then: List implies that adding/removing elements might be costly. So see if you can get rid of that part, and if it would be possible to use maps, sets instead.

Finally: the real point would be to look into your algorithm. I would try to work "backwards". Meaning: first identify those array elements that actually occur in your target word. You can ignore all others right from start.

Then: look at all array entries that **start*+ your search word. In your example you can notice that there are just two array elements that fit. And then work your way from there.

Solution 3:[3]

My first observation would be that you don't actually need to build anything: you know what string you are trying to construct (e.g. stackoverflow), so all you really need to keep track of is how much of that string you have matched so far. Call this m.

Next, having matched m characters, provided m < word.length(), you need to choose a next string from book which matches the portion of word from m to m + nextString.length().

You could do this by checking each string in turn:

if (word.matches(m, nextString, 0, nextString.length()) { ...}

But you can do better, by determining strings that can't match in advance: the next string you append will have the following properties:

  1. word.charAt(m) == nextString.charAt(0) (the next characters match)
  2. m + nextString.length() <= word.length() (adding the next string shouldn't make the constructed string longer than word)

So, you can cut down the potential words from book that you might check by constructing a map of letters to words that start with that (point 1); and if you store the words with the same starting letter in increasing length order, you can stop checking that letter as soon as the length gets too big (point 2).

You can construct a map once and reuse:

Map<Character, List<String>> prefixMap =
    Arrays.asList(book).stream()
        .collect(groupingBy(
            s -> s.charAt(0),
            collectingAndThen(
                toList(),
                ss -> {
                  ss.sort(comparingInt(String::length));
                  return ss;
                })));

You can count the number of ways recursively, without constructing any additional objects (*):

int method(String word, String[] book) {
  return method(word, 0, /* construct map as above */);
}

int method(String word, int m, Map<Character, List<String>> prefixMap) {
  if (m == word.length()) {
    return 1;
  }

  int result = 0;
  for (String nextString : prefixMap.getOrDefault(word.charAt(m), emptyList())) {
    if (m + nextString.length() > word.length()) {
      break;
    }

    // Start at m+1, because you already know they match at m.
    if (word.regionMatches(m + 1, nextString, 1, nextString.length()-1)) {
      // This is a potential match!
      // Make a recursive call.
      result += method(word, m + nextString.length(), prefixMap);
    }
  }
  return result;
}

(*) This may construct new instances of Character, because of the boxing of the word.charAt(m): cached instances are guaranteed to be used for chars in the range 0-127 only. There are ways to work around this, but they would only clutter the code.

Solution 4:[4]

I think you are already doing a pretty good job at optimizing your application. In addition to the answer by GhostCat here are a few suggestions of my own:

public static int optimal(String word, String[] book){

    int count = 0;

    List<List<String>> allCombinations = allSubstrings(word);
    List<String> wordList = Arrays.asList(book);

    for (int i = 0; i < allCombinations.size(); i++)
    {
        /*
         * allCombinations.get(i).retainAll(wordList);
         * 
         * There is no need to retrieve the list element
         * twice, just set it in a local variable
         */
        java.util.List<String> combination = allCombinations.get(i);
        combination.retainAll(wordList);
        /*
         * Since we are only interested in the count here
         * there is no need to remove and add list elements
         */
        if (sumUp(combination, word)) 
        {
            /*allCombinations.remove(i);
            allCombinations.add(i, empty);*/
            count++;
        }
        /*else count++;*/
    }
    return count;
}

public static boolean sumUp (List<String> input, String expected) {

    String x = "";

    for (int i = 0; i < input.size(); i++) {
        x = x + input.get(i);
    }
    // No need for if block here, just return comparison result
    /*if (expected.equals(x)) return true;
    return false;*/
    return expected.equals(x);
}

And since you are interested in seeing the execution time of your method I would recommend implementing a benchmarking system of some sort. Here is a quick mock-up:

private static long benchmarkOptima(int cycles, String word, String[] book) {

    long totalTime = 0;
    for (int i = 0; i < cycles; i++)
    {
        long startTime = System.currentTimeMillis();

        int a = optimal(word, book);

        long executionTime = System.currentTimeMillis() - startTime;
        totalTime += executionTime;
    }
    return totalTime / cycles;
}

public static void main(String[] args)
{
    String word = "stackoverflow";
    String[] book = new String[] {
            "st", "ck", "CAG", "low", "TC",
            "rf", "ove", "a", "sta"
    };

    int result = optimal(word, book);

    final int cycles = 50;
    long averageTime = benchmarkOptima(cycles, word, book);

    System.out.println("Optimal result: " + result);
    System.out.println("Average execution time - " + averageTime + " ms");
}

Output

2
Average execution time - 6 ms

Solution 5:[5]

Note: The implementation is getting stuck in the test case mentioned by @user1221, working on it.

What I could think of is a Trie based approach that is O(sum of length of words in dict) space. Time is not optimal.

Procedure:

  1. Build a Trie of all the words in the dictionary. This is a pre-processing task that will take O(sum of lengths of all strings in dict).
  2. We try finding the string that you want to make in the trie, with a twist. We start with searching a prefix of the string. If we get a prefix in the trie, we start the search from the top recursively and continue to look for more prefixes.
  3. When we reach the end of out string i.e. stackoverflow, we check if we arrived at the end of any string, if yes, then we reached a valid combination of this string. we count this while going back up the recursion.

eg: In the above case, we use the dict as {"st", "sta", "a", "ck"} We construct our trie ($ is the sentinel char, i.e. a char which is not in the dict):

$___s___t.___a.
|___a.
|___c___k.

the . represents that a word in the dict ends at that position. We try to find the no of constructions of stack.

We start searching stack in the trie.

depth=0
$___s(*)___t.___a.
|___a.
|___c___k.

We see that we are at the end of one word, we start a new search with the remaining string ack from the top.

depth=0
$___s___t(*).___a.
|___a.
|___c___k.

Again we are at the end of one word in the dict. We start a new search for ck.

depth=1
$___s___t.___a.
|___a(*).
|___c___k.
depth=2
$___s___t.___a.
|___a.
|___c(*)___k.

We reach the end of stack and end of a word in the dict, hence we have 1 valid representation of stack.

depth=2
$___s___t.___a.
|___a.
|___c___k(*).

We go back to the caller of depth=2

No next char is available, we return to the caller of depth=1.

depth=1
$___s___t.___a.
|___a(*, 1).
|___c___k.
depth=0
$___s___t(*, 1).___a.
|___a.
|___c___k.

We move to next char. We see that we reached the end of one word in the dict, we launch a new search for ck in the dict.

depth=0
$___s___t.___a(*, 1).
|___a.
|___c___k.
depth=1
$___s___t.___a.
|___a.
|___c(*)___k.

We reach the end of the stack and a work in the dict, so another valid representation. We go back to the caller of depth=1

depth=1
$___s___t.___a.
|___a.
|___c___k(*, 1).

There are no more chars to proceed, we return with the result 2.

depth=0
$___s___t.___a(*, 2).
|___a.
|___c___k.

Note: The implementation is in C++, shouldn't be too hard to convert to Java and this implementation assumes that all chars are lowercase, it's trivial to extend it to both cases.

Sample code (full version):

/**
Node *base: head of the trie
Node *h   : current node in the trie
string s  : string to search
int idx   : the current position in the string
*/
int count(Node *base, Node *h, string s, int idx) {
    // step 3: found a valid combination.
    if (idx == s.size()) return h->end;

    int res = 0;
    // step 2: we recursively start a new search.
    if (h->end) {
        res += count(base, base, s, idx);
    }
    // move ahead in the trie.
    if (h->next[s[idx] - 'a'] != NULL) { 
        res += count(base, h->next[s[idx] - 'a'], s, idx + 1);
    }

    return res;
}

Solution 6:[6]

def cancons(target,wordbank, memo={}):
if target in memo:
    return memo[target]
if target =='':
    return 1
total_count =0
for word in wordbank:
    if target.startswith(word):
        l= len(word)
        number_of_way=cancons(target[l:],wordbank,memo)
        total_count += number_of_way
memo[target]= total_count
return total_count
if __name__ == '__main__':
word = "stackoverflow";
String= ["st", "ck","CAG","low","TC","rf","ove","a","sta"]
b=cancons(word,String,memo={})
print(b)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3
Solution 4 Matthew
Solution 5
Solution 6 murli