'Remove all duplicate lines from return data HTML by regex

I'm using regex in app script to scrape data from website:

I try this code:

const name = /(?<=<span class="(.*?)">)(.*?)(?=<\/span>)/gi; // work Great

for(var i = 0; i < 9; i++){

var names = data[i].match(name)[0];
Logger.log(names)
}

this code work fine but give me duplicate lines:

1:56:22 PM  Notice  Execution started
1:56:35 PM  Info    john
1:56:35 PM  Info    ara
1:56:35 PM  Info    john
1:56:35 PM  Info    anita
1:56:35 PM  Info    ara
1:56:35 PM  Info    fabian
1:56:35 PM  Info    ara
1:56:35 PM  Info    john
1:56:35 PM  Info    fabian
1:56:37 PM  Notice  Execution completed

I want to remove all duplicate names and see result like that:

1:56:22 PM  Notice  Execution started
1:56:35 PM  Info    john
1:56:35 PM  Info    ara
1:56:35 PM  Info    anita
1:56:35 PM  Info    fabian
1:56:37 PM  Notice  Execution completed


Solution 1:[1]

Description

First I would collect all the names in an array. Then using the [...new Set()] create an array of unique names.

Script

function spanTest() {
  try {
    const name = /(?<=<span class="(.*?)">)(.*?)(?=<\/span>)/gi; // work Great
    let data = ['<=<span class="test">john</span>',
                '<=<span class="test">ara</span>',
                '<=<span class="test">john</span>',
                '<=<span class="test">anita</span>',
                '<=<span class="test">ara</span>',
                '<=<span class="test">fabian</span>',
                '<=<span class="test">ara</span>',
                '<=<span class="test">john</span>',
                '<=<span class="test">fabian</span>'];

    let names = [...new Set(data.map( span => span.match(name)[0]) )];
    console.log(names);
    
  }
  catch(err) {
    console.log(err);
  }
}

7:39:23 AM  Notice  Execution started
7:39:23 AM  Info    [ 'john', 'ara', 'anita', 'fabian' ]
7:39:23 AM  Notice  Execution completed

Reference

Solution 2:[2]

Set

You can use a Set (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set) in order to do that.

names = Array.from(new Set(names));

We don't have your final goal, here you simply console.log your data, but you may not need to convert your Set back to an Array :)

Sort

An other solution would be to sort your array, and then iterate on it in order to remove dupplicates with more ease.

array.sort();

array.filter((el, index) => index < array.length && el !== array[index + 1]);

Test on my browser::

let a = [1,1,2,3,4,4,5,6,7,7];

a.filter((el, index) => index < a.length && el !== a[index + 1]);

Array(7) [ 1, 2, 3, 4, 5, 6, 7 ];

This solution obviously does not preserve any order, while the forst one seems to preserve initial order, at least on my firefox's js

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 TheWizEd
Solution 2