'Regex pattern issue remove specific digits
I'm trying to use a regex to extract a time string in this format only "01 Apr 2022". But I'm having trouble getting these digits out "07:28:00".
std::string test = "Fri, 01 Apr 2022 07:28:00 GMT";
std::string get_date(std::string str) {
static std::vector<std::regex> patterns = {
std::regex{"Fri,(.+)([0-9]+)GMT"},
};
for (auto& regex : patterns) {
std::smatch m;
if (std::regex_search(str, m, regex)) {
return m[1];
}
}
return str;
}
Solution 1:[1]
Here is a regex which will do the job: std::regex reg{R"(\d{2} \w+ \d{4})"};. And in your code you use m[0], not m[1].
But if your format is stable (and it sure looks like one) you don't need regex at all. Just do something like this: str.substr(5, 12) or std::string(str.begin() + 5, str.begin() + 16).
Solution 2:[2]
I would (strongly) advise against using a regex for this purpose.
The C++ standard library already has an std::get_time to handle tasks like this, and I'd advise simply using it. In this case, the format you've shown seems to fit with a get_time format string like: "%a, %d %b %Y %T".
Demo code:
#include <iostream>
#include <sstream>
#include <iomanip>
#include <chrono>
std::string test = "Fri, 01 Apr 2022 07:28:00 GMT";
int main() {
std::istringstream buffer { test };
std::tm t;
buffer >> std::get_time(&t, "%a, %d %b %Y %T");
std::cout << "Hour: " << t.tm_hour
<< ", Minute: " << t.tm_min
<< ", Second: " << t.tm_sec << "\n";
}
Solution 3:[3]
You can use
std::regex{R"(^[a-zA-Z]{3},\s*(.*?)\s*\d{2}(?::\d{2}){2})"}
See the regex demo. Details:
^- start of string[a-zA-Z]{3}- three letters,- a comma -\s*- zero or more whitespaces(.*?)- Group 1: any zero or more chars other than line break chars as few as possible\s*\d{2}(?::\d{2}){2}- two digits,:, two digits,:and two digits.
See the C++ demo:
#include <regex>
#include <string>
#include <iostream>
std::string get_date(std::string str) {
static std::vector<std::regex> patterns = {
std::regex{R"(^[a-zA-Z]{3},\s*(.*?)\s*\d{2}(?::\d{2}){2})"},
};
for (auto& regex : patterns) {
std::smatch m;
if (std::regex_search(str, m, regex)) {
return m[1];
}
}
return str;
}
int main() {
std::cout << get_date("Fri, 01 Apr 2022 07:28:00 GMT") << std::endl;
return 0;
}
Output:
01 Apr 2022
Solution 4:[4]
If you give the pattern like this -
"(Mon|Tue|Wed|Thu|Fri|Sat|Sun),\s+(\d{1,})\s+(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Nov|Dec)\s+(\d{4})\s+(\d{2}:\d{2}:\d{2})\s+GMT"
Then the 5th group m[4] should give you the time (hh:mm:ss) part
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Jerry Coffin |
| Solution 3 | Wiktor Stribiżew |
| Solution 4 | Rajarshi Ghosh |
