'Remove all duplicate lines from return data HTML by regex
I'm using regex in app script to scrape data from website:
I try this code:
const name = /(?<=<span class="(.*?)">)(.*?)(?=<\/span>)/gi; // work Great
for(var i = 0; i < 9; i++){
var names = data[i].match(name)[0];
Logger.log(names)
}
this code work fine but give me duplicate lines:
1:56:22 PM Notice Execution started
1:56:35 PM Info john
1:56:35 PM Info ara
1:56:35 PM Info john
1:56:35 PM Info anita
1:56:35 PM Info ara
1:56:35 PM Info fabian
1:56:35 PM Info ara
1:56:35 PM Info john
1:56:35 PM Info fabian
1:56:37 PM Notice Execution completed
I want to remove all duplicate names and see result like that:
1:56:22 PM Notice Execution started
1:56:35 PM Info john
1:56:35 PM Info ara
1:56:35 PM Info anita
1:56:35 PM Info fabian
1:56:37 PM Notice Execution completed
Solution 1:[1]
Description
First I would collect all the names in an array. Then using the [...new Set()] create an array of unique names.
Script
function spanTest() {
try {
const name = /(?<=<span class="(.*?)">)(.*?)(?=<\/span>)/gi; // work Great
let data = ['<=<span class="test">john</span>',
'<=<span class="test">ara</span>',
'<=<span class="test">john</span>',
'<=<span class="test">anita</span>',
'<=<span class="test">ara</span>',
'<=<span class="test">fabian</span>',
'<=<span class="test">ara</span>',
'<=<span class="test">john</span>',
'<=<span class="test">fabian</span>'];
let names = [...new Set(data.map( span => span.match(name)[0]) )];
console.log(names);
}
catch(err) {
console.log(err);
}
}
7:39:23 AM Notice Execution started
7:39:23 AM Info [ 'john', 'ara', 'anita', 'fabian' ]
7:39:23 AM Notice Execution completed
Reference
Solution 2:[2]
Set
You can use a Set (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set) in order to do that.
names = Array.from(new Set(names));
We don't have your final goal, here you simply console.log your data, but you may not need to convert your Set back to an Array :)
Sort
An other solution would be to sort your array, and then iterate on it in order to remove dupplicates with more ease.
array.sort();
array.filter((el, index) => index < array.length && el !== array[index + 1]);
Test on my browser::
let a = [1,1,2,3,4,4,5,6,7,7];
a.filter((el, index) => index < a.length && el !== a[index + 1]);
Array(7) [ 1, 2, 3, 4, 5, 6, 7 ];
This solution obviously does not preserve any order, while the forst one seems to preserve initial order, at least on my firefox's js
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | TheWizEd |
| Solution 2 |
