'Remove duplicate string from cell but keep last instance of duplicate
using VBA on excel to remove duplicates strings (whole words) from a cell, but keep the last instance of the duplicate.
Example
hello hi world hello => hi world hello
this is hello my hello world => this is my hello world
Iam originally a python developer so excuse my lack of syntax in VBA, I have edited a piece of code found online with the following logic:
''' Function RemoveDupeWordsEnd(text As String, Optional delimiter As String = " ") As String Dim dictionary As Object Dim x, part, endword
Set dictionary = CreateObject("Scripting.Dictionary")
dictionary.CompareMode = vbTextCompare
For Each x In Split(text, delimiter)
part = Trim(x)
If part <> "" And Not dictionary.exists(part) Then
dictionary.Add part, Nothing
End If
'' COMMENT
'' if the word exists in dictionary remove previous instance and add the latest instance
If part <> "" And dictionary.exists(part) Then
dictionary.Del part, Nothing
endword = part
dictionary.Add endword, Nothing
End If
Next
If dictionary.Count > 0 Then
RemoveDupeWordsEnd = Join(dictionary.keys, delimiter)
Else
RemoveDupeWordsEnd = ""
End If
Set dictionary = Nothing
End Function
'''
Thanks all help and guidance would be very much appreciated
Solution 1:[1]
Use VBA's replace in a while loop that terminates when the occurrences of the string drop below 2. Replace takes an optional argument for the number of matches to replace.
Function keepLast(raw As String, r As String) As String
While (Len(raw) - Len(Replace(raw, r, ""))) / Len(r) > 1
raw = Replace(raw, r, "", , 1)
Wend
keepLast = Trim(Replace(raw, " ", " "))
End Function
I use Trim and Replace any double spaces with a single space to avoid extraneous white space that is left by the removal of the target string. You could avoid the loop by just counting the number of occurrences and passing that minus 1 straight to replace:
Function keepLast(raw As String, r As String) As String
keepLast = raw
Dim cnt As Integer
cnt = (Len(raw) - Len(Replace(raw, r, ""))) / Len(r)
If cnt < 2 Then Exit Function
raw = Replace(raw, r, "", , cnt - 1)
keepLast = Trim(Replace(raw, " ", " "))
End Function
Bear in mind that this method is very susceptible to partial matches. If your raw string was "hello that Othello is a good play hello there", then you'll end up with "that O is a good play hello there", which I don't think is exactly what you want. You might use regex to address this, if it's necessary:
Function keepLast(raw As String, r As String) As String
Dim parser As Object
Set parser = CreateObject("vbscript.regexp")
parser.Global = True
parser.Pattern = "\b" & r & "\b"
While parser.Execute(raw).Count > 1
raw = parser.Replace(raw, "")
Wend
keepLast = Trim(Replace(raw, " ", " "))
End Function
The regexp object has a property to ignore case, if you need to handle "hello" and "Hello". You would set that like this:
parser.ignoreCase = true
Solution 2:[2]
Late to the party, but try:
Function RemoveDups(inp As String) As String
With CreateObject("vbscript.regexp")
.Global = True
.Pattern = "(?:^| )(\S+)(?= |$)(?=.* \1(?: |$))"
RemoveDups = Application.Trim(.Replace(inp, ""))
End With
End Function
Unfortunately VBA does not support word-boundaries which would make for a much easier pattern. The idea however is to match 1+ non-whitespace characters from and upto a space/start-line/end-line and use this match with a backreference to check the word is repeated again.
Formula in B1:
=RemoveDups(A1)
Note: This is currently case-sensitive. So use the appropriate regex object properties and add: RegExp.IgnoreCase = False in case you want to use case-insensitive matching.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |

