'Want correct Regex to extract a text from responsetext from server and do JSON.parse on extracted text
await page.on("response", async (response) => {
const request = await response.request();
if (
request.url().includes("https://www.jobs.abbott/us/en/search-results")
) {
const text = await response.text();
const root = await parse(text);
root.querySelectorAll("script").map(async function (n) {
if (n.rawText.includes("eagerLoadRefineSearch")) {
const text = await n.rawText.match(
/"eagerLoadRefineSearch":(\{.*\})\,/,
);
const refinedtext = await text[0].match(/\[{.*}\]/);
//console.log(refinedtext);
console.log(JSON.parse(refinedtext[0]));
}
});
}
});
In the snippet I have posted a data which is in text format I want to extract eagerLoadRefineSearch : { (and its content too)} as a text with regex and perform json.parse on extracted text so that i get finally a json object of "eagerLoadRefineSearch" : {}.
I am using puppetter for intercepting response. I just want a correct regex which can get me whole object text of "eagerLoadRefineSearch" : {} (with its content).
I am sharing the response text from the server in this link https://codeshare.io/bvjzJA .
I want to extract "eagerLoadRefineSearch" : {} from the data which is in text format in this https://codeshare.io/bvjzJA
Solution 1:[1]
Context
Silly mistakes
The text you are parsing has no flanked " around eagerLoadRefineSearch. Now the object to match spans across several lines thus m flag is required. Also . does not match new line so the alternative is to use [\s\S]. Refer to how-to-use-javascript-regex-over-multiple-lines.
Also also, don't use await on string method match.
Matching the closing brace
Quick search on this topic lead me to this link and as I suspected, this is complicated. To ease this problem I made this assumption that the text is correctly indented. We can match on the indentation level to find the closing brace with this pattern.
/(?<indent>[\s]+)\{[\s\S]+\k<indent>\}/gm
This works if the both the opening and the closing braces are at the same level of indentation. They are not in our case since eagerLoadRefineSearch: is between the indent and opening brace but we can account for this.
const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm
Valid JSON
As metioned earlier the keys lack flanking double quotes so lets replace all keys with "key"s.
const reMatchKeys = /(\w+):/gm
const impure = 'hello: { name: "nammu", age: 18, subjects: { first: "english", second: "mythology"}}'
const pure = impure.replace(reMatchKeys, '"$1":')
console.log(pure)
Then we get rid of the trailing commas. Here's the regex that worked for this example.
const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm
Once we pipe these replace functions, the data is good to use by JSON.parse.
Code
await page.on('response', async (response) => {
const request = await response.request();
if (
request
.url()
.includes('https://www.jobs.abbott/us/en/search-results')
) {
const text = await response.text();
const root = await parse(text);
root.querySelectorAll('script').map(async function (n) {
const data = n.rawText;
if (data.includes('eagerLoadRefineSearch')) {
const reMatchObject = /(?<indent>[\s]+)eagerLoadRefineSearch: \{[\s\S]+?\k<indent>\}/gm;
const reMatchKeys = /(\w+):\s/g;
const reMatchTrailingCommas = /,(?=\s+[\]\}])/gm;
const parsedStringArray = data.toString().match(reMatchObject);
for (const parsed of parsedStringArray) {
const noTrailingCommas = parsed.replace(reMatchTrailingCommas, '');
const validJSONString = '{' + noTrailingCommas.replace(reMatchKeys, '"$1":') + '}';
console.log(JSON.parse(validJSONString));
}
}
});
}
});
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nikhil Devadiga |
