'Google Sheets - Filtering out values in a range from another range

-- EDIT #2 -- Updated the Google Sheet again with a solution which is painfully close. Best formula I've had so far is below. --

=ARRAYFORMULA(SPLIT(UNIQUE({ARRAYFORMULA(QUERY(IF(COUNTIF(G4:G&"|||"&H4:H,ARRAYFORMULA(A4:A&"|||"&B4:B))>0,REGEXREPLACE(A4:A&"|||"&B4:B,".*",""),A4:A&"|||"&B4:B),"SELECT * WHERE Col1 IS NOT NULL"));ARRAYFORMULA(QUERY(IF(COUNTIF(G4:G&"|||"&H4:H,ARRAYFORMULA(D4:D&"|||"&E4:E))>0,REGEXREPLACE(D4:D&"|||"&E4:E,".*",""),D4:D&"|||"&E4:E),"SELECT * WHERE Col1 IS NOT NULL"))}),"|||"))

-- EDIT -- Updated the Google Sheet to more closely reflect my use case --

Pretty confident someone's asked this before but I've been Googling for a few hours now and I'm starting to lose hair. I think I've got to use a QUERY function but not 100% on that.

Demo sheet here: https://docs.google.com/spreadsheets/d/1p_hqk9WydcyXQZT4bIm4DSZZnaPKbngtnZ0-laYHwk8/edit?usp=sharing

What I want to do: I want to combine the ranges under DATA 1 and DATA 2, but I want to exclude and rows which start with the values in DATA 3.

RESULT 1 doesn't add value but shows how I was adding DATA 1 and DATA 2 together.

RESULT 2 shows the result I'm trying to get.

RESULT 3 hidden but where I got to (and doesn't add value again, sorry). I can get it mostly working, but I'd have to manually specify in the QUERY which combinations I'm looking for... and frankly my dataset is HUGE. That formula currently looks like this:

=QUERY(UNIQUE({FILTER(A4:B,NOT(ISBLANK(A4:A)));FILTER(D4:E,NOT(ISBLANK(D4:D)))}),"SELECT * WHERE NOT Col1 STARTS WITH 'a' OR NOT Col2 STARTS WITH 'v'",-1)

Hope someone can help me out! You're my only hope.



Solution 1:[1]

This works for your example data. It uses ARRAYFORMULA and MATCH to concatenate the two columns and LEFT to only use the first character in the match function. You might need to find a slightly smarter way to do the starts with element depending on your actual data.

=UNIQUE({FILTER(A4:B,not(isblank(A4:A)),iserror(MATCH(ARRAYFORMULA(A4:A&left(B4:B,1)),ARRAYFORMULA(G4:G&H4:H),0)));FILTER(D4:E,not(isblank(D4:D)),iserror(MATCH(ARRAYFORMULA(D4:D&left(E4:E,1)),ARRAYFORMULA(G4:G&H4:H),0)))})

Documentation:

MATCH: here

ARRAYFORMULA: here

Solution 2:[2]

Was also looking for how to find the set-difference between two ranges. All my values were in one range but I reworked my solution to fit your case of needing to join ranges too.

Finding the set-difference between two ranges:

// given A1:A9 and C1:C9
// return all values in A1:A10 excluding those in C1:C10
=filter(A1:A9, not(iferror( match(A1:A9,C1:C9,0), false )), A1:A9<>"")

Finding the set-difference between two joined ranges and a third range:

// given A1:A9, B1:B9, and C1:C9
// return all values in A1:A10 and B1:B10 excluding those in C1:C10
=filter({A1:A9;B1:B9}, not(iferror( match({A1:A9;B1:B9},C1:C9,0), false )), {A1:A9;B1:B9}<>"")

Breakdown:

Goal is to get an output range that is the input range excluding all values in an "exclusion" range

MATCH: match all values that are in both your input range and your exclusion range (remaining values will produce an error)

NOT + IFERROR : convert matches to false and errors to true

FILTER: filter the input down to only the true values (i.e. whats not matched, a.k.a. remove the excluded values), and also add a 2nd condition to remove blanks

tips:

{X;Y}: unions two ranges, ; adds rows , adds columns

X<>"": true for all non-blank values in range X

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 a-burge
Solution 2 Keego