'Remove duplicate string from cell but keep last instance of duplicate

using VBA on excel to remove duplicates strings (whole words) from a cell, but keep the last instance of the duplicate.

Example

hello hi world hello => hi world hello
this is hello my hello world => this is my hello world

Iam originally a python developer so excuse my lack of syntax in VBA, I have edited a piece of code found online with the following logic:

''' Function RemoveDupeWordsEnd(text As String, Optional delimiter As String = " ") As String Dim dictionary As Object Dim x, part, endword

Set dictionary = CreateObject("Scripting.Dictionary")
dictionary.CompareMode = vbTextCompare
For Each x In Split(text, delimiter)
    part = Trim(x)
    
    If part <> "" And Not dictionary.exists(part) Then
        dictionary.Add part, Nothing
    End If

    '' COMMENT
    '' if the word exists in dictionary remove previous instance and add the latest instance
    If part <> "" And dictionary.exists(part) Then
            dictionary.Del part, Nothing
            endword = part
            dictionary.Add endword, Nothing
    End If
    
   
Next

If dictionary.Count > 0 Then
    RemoveDupeWordsEnd = Join(dictionary.keys, delimiter)
Else
    RemoveDupeWordsEnd = ""
End If

Set dictionary = Nothing

End Function

'''

Thanks all help and guidance would be very much appreciated



Solution 1:[1]

Use VBA's replace in a while loop that terminates when the occurrences of the string drop below 2. Replace takes an optional argument for the number of matches to replace.

Function keepLast(raw As String, r As String) As String
  
  While (Len(raw) - Len(Replace(raw, r, ""))) / Len(r) > 1
    raw = Replace(raw, r, "", , 1)
    Wend
    
  keepLast = Trim(Replace(raw, "  ", " "))
    
End Function

I use Trim and Replace any double spaces with a single space to avoid extraneous white space that is left by the removal of the target string. You could avoid the loop by just counting the number of occurrences and passing that minus 1 straight to replace:

Function keepLast(raw As String, r As String) As String
  
  keepLast = raw
  
  Dim cnt As Integer
  cnt = (Len(raw) - Len(Replace(raw, r, ""))) / Len(r)
  If cnt < 2 Then Exit Function
  
  raw = Replace(raw, r, "", , cnt - 1)
  keepLast = Trim(Replace(raw, "  ", " "))

End Function

Bear in mind that this method is very susceptible to partial matches. If your raw string was "hello that Othello is a good play hello there", then you'll end up with "that O is a good play hello there", which I don't think is exactly what you want. You might use regex to address this, if it's necessary:

Function keepLast(raw As String, r As String) As String
  
  Dim parser As Object
  Set parser = CreateObject("vbscript.regexp")
  parser.Global = True
  parser.Pattern = "\b" & r & "\b"
  
  While parser.Execute(raw).Count > 1
    raw = parser.Replace(raw, "")
    Wend
  
  keepLast = Trim(Replace(raw, "  ", " "))

End Function

The regexp object has a property to ignore case, if you need to handle "hello" and "Hello". You would set that like this:

parser.ignoreCase = true

Solution 2:[2]

Late to the party, but try:

Function RemoveDups(inp As String) As String

With CreateObject("vbscript.regexp")
    .Global = True
    .Pattern = "(?:^| )(\S+)(?= |$)(?=.* \1(?: |$))"
    RemoveDups = Application.Trim(.Replace(inp, ""))
End With

End Function

Unfortunately VBA does not support word-boundaries which would make for a much easier pattern. The idea however is to match 1+ non-whitespace characters from and upto a space/start-line/end-line and use this match with a backreference to check the word is repeated again.

enter image description here

Formula in B1:

=RemoveDups(A1)

Note: This is currently case-sensitive. So use the appropriate regex object properties and add: RegExp.IgnoreCase = False in case you want to use case-insensitive matching.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2