'Grep and extract specific data in multiple log files
I've got multiple log files in a directory and trying to extract just the timestamp and a section of the log line i.e. the value of the fulltext query param. Each query param in a request is separated by an ampersand(&) as shown below.
Input
30/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%40Delete=&
31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=Dyson+V7&savedSearches%40Delete=&
Intended Output
30/Mar/2022:00:27:36 -> 798
31/Mar/2022:00:27:36 -> Dyson+V7
I've got this command to recursively search over all the files in the directory.
grep -rn "/libs/granite/omnisearch" ~/Downloads/ReqLogs/ > output.txt
This prints the entire log line starting with the directory name, like so
/Users/****/Downloads/ReqLogs/logfile1_2022-03-31.log:6020:31/Mar/2022:00:27:36 +0000 [59823] -> GET /libs/granite/omnisearch?p.guessTotal=1000&fulltext=798&savedSearches%4
Please enlighten, How do i manipulate this to achieve the intended output.
Solution 1:[1]
grep can return the whole line or the string which matched. For extracting a different piece of data from the matching lines, turn to sed or Awk.
awk -v search="/libs/granite/omnisearch" '$0 ~ search { s = $0; sub(/.*fulltext=/, "", s); sub(/&.*/, "", s); print $1, s }' ~/Downloads/ReqLogs/*
or
sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*/\1/p' ~/Downloads/ReqLogs/*
The sed version is more succinct, but also somewhat more oblique.
\%...% uses the alternate delimiter % so that we can use literal slashes in our search expression.
The s/ .../\1/p then says to replace everything on the matching lines after the first space, capturing anything between fulltext= and &, and replace with the captured substring, then print the resulting line.
The -n flag turns off the default printing action, so that we only print the lines where the search expression matched.
The wildcard ~/Downloads/ReqLogs/* matches all files in that directory; if you really need to traverse subdirectories, too, perhaps add find to the mix.
find ~/Downloads/ReqLogs -type f -exec sed -n '\%/libs/granite/omnisearch%s/ .*fulltext=\([^&]*\)&.*/\1/p' {} +
or similarly with the Awk command after -exec. The placeholder {} tells find where to add the name of the found file(s) and + says to put as many as possible in one go, rather than running a separate -exec for each found file. (If you want that, use \; instead of +.)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
