'Identify the start of token in boost tokenizer

std::string s = "this string will be modified"; boost::tokenizer<> tok(s); for (auto it = tok.begin(), it_end = tok.end(); it != it_end; ++it) { std::string::difference_type const offset = it.base() - s.begin()- it->size(); //do some operations on string s }

I need to find the start of each token and then suppose delete 3 characters from this token. This process will be repeated for the whole string. The offset calculated in this way is not correct if string is modified. Any other way ?



Solution 1:[1]

You can use

#include <iostream>
#include <boost/tokenizer.hpp>

int main()
{
    typedef boost::tokenizer<> tok_t;

    std::string s = "this string will be modified"; 
    tok_t const tok(s);
    for (tok_t::const_iterator it = tok.begin(), it_end = tok.end(); it != it_end; ++it)
    {
        std::string::difference_type const offset = it.base() - s.begin() - it->size();
        std::cout << offset << "\t::\t" << *it << '\n';
    }
}

See this C++ demo. Output:

0   ::  this
5   ::  string
12  ::  will
17  ::  be
20  ::  modified

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Wiktor Stribiżew