'Is there a way to decrease space utilization and potentially improve time performance for string manipulation in my C++ code?

I'm working on a string manipulation code in C++ for my assignment, where I have to find a substring within a string and essentially replace it with a newly computed sequence based on other inputs (here, the string vector 'patterns'). I have been trying to find a better way to manipulate the string in order to increase efficiency (reduce space cost and possibly also reduce time complexity further). This is a sample of what I've done so far(Sorry if the strings look a bit weird, this was the best simple example I could come up with at the moment):

// some input variables
std::string& originalString ("__  BB  ++  AA  __  AA  __  CC  __  DD");
std::map<std::string, std::vector<std::string>> patternMap = {
                                       {"AA", {"aaa","AAA"}}, 
                                       {"BB", {"bbb","BBB"}},
                                   };

std::size_t location = 0;
while ((location = originalString.find("__", location)) != std::string::npos) {
     const std::string subString = originalString.substr(location+4, location+6);
     std::map<std::string, std::vector<std::string>>::iterator patternsIterator = patternMap.find(subString);
    if (patternsIterator != patternMap.end()) {
          const std::vector<std::string> patterns = patternsIterator -> second;
          std::string alternateSubString("*");
          for (const std::string& item : patterns) {
            alternateSubString.append(item + "__" );
          }
          alternateSubString.replace(alternateSubString.size() - 5, 5,"*");
          originalString.replace(location+4, 2, alternateSubString);
          location += alternateSubString.size();
      }
    ++location;
  }// Expected  value of the original string should change to: "__  *bbb__BBB*  ++  AA  __  *aaa__AAA*  __  CC  __  DD"

'''

Any tips regarding a better way, maybe do away with the variable alternateSubString? Trying to achieve some space/time optimization here.

c++


Solution 1:[1]

Your algorithm seems to be:

  • Scan the string until the next occurrence of __ is found.

  • When found, the next word (e.g. "BB") is replaced by building a new string using the vector of words in patternMap

  • This vector of words involves concatenating each word found in patternMap with __ and the entire replacement string that is built gets prefixed and postfixed with *. (e.g. {"bbb","BBB"} => "*bbb__BBB*" )

Feedback

Your implementation has a couple of instances of "backtracking" using the replace method to cover up a mistake made during the initial substitution loop.

It also has a few hardcoded length values that assumes spacing will be consistent between tokens. All of this makes it slightly hard to read.

unordered_map will be faster than map. Since you don't need enumeration order, you can use unordered_map. This is a micro-optimization if the size of your map is really small.

I propose reading the initial string into tokens and keeping track if the next token needs to be replaced with the words in the patternMap.

Consider this:

#include <algorithm>
#include <iostream>
#include <string>
#include <unordered_map>
#include <vector>

using namespace std;

typedef unordered_map<string, vector<string>> PatternMap;


// getNext returns a pair of strings:
// The leading whitespace preceeding a word and the word itself
pair<string, string> getNext(const string& s, size_t& location);

// makeMagicWord converts a vector of words 
// into the associated magic replacement pattern
// e.g. {"AA", "BB", CC"} => "*AA__BB__CC*"
string makeMagicWord(const vector<string>& vec);

string ApplyPatterns(const std::string& original, const PatternMap& patterns)
{
    string newString;
    size_t location = 0;
    bool replacing = false;

    while (true) {
        auto p = getNext(original, location);
        string whitespace = p.first;
        string token = p.second;

        newString += whitespace;

        if (token.empty()) {
            break;
        }

        if (replacing) {
            auto itor = patterns.find(token);
            if (itor != patterns.end()) {
                token = makeMagicWord(itor->second); // itor->second = patterns[token]
            }
            replacing = false;
        }
        else if (token == "__") {
            replacing = true;
        }
        newString += token;
    }
    return newString;
}

The above code reads the whitespace/word tokens with a helper function, applies substitution as needed, and uses another helper function for making the replacement word. The implementation of those two helper functions is below:

Then the implementation of our two helpers is below:

pair<string, string> getNext(const string& s, size_t& location) {
    string whitespace, token;

    // consume leading whitespace
    while (location < s.size() && ::isspace(s[location])) {
        whitespace += s[location];
        location++;
    }

    // consume the next token
    while (location < s.size() && !::isspace(s[location])) {
        token += s[location];
        location++;
    }

    return { whitespace, token };
}

string makeMagicWord(const vector<string>& vec) {
    std::string s = "*";
    for (size_t i = 0; i < vec.size(); i++) {
        s += ((i > 0) ? "__" : "");
        s += vec[i];
    }
    s += "*";
    return s;
}

Let's test it all out:

int main() {
    string originalString("__  BB  ++  AA  __  AA  __  CC  __  DD");
    unordered_map<std::string, std::vector<std::string>> patternMap = {
                                           {"AA", {"aaa","AAA"}},
                                           {"BB", {"bbb","BBB"}}};

    string replacement = ApplyPatterns(originalString, patternMap);
    cout << originalString << endl;
    cout << replacement << endl;
    return 0;
}

And when run, the above prints out:

__  BB  ++  AA  __  AA  __  CC  __  DD
__  *bbb__BBB*  ++  AA  __  *aaa__AAA*  __  CC  __  DD

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1