'Unique count of words from text string
I have a dataset that is multiple strings and I want a unique count of the occurrences so I can review and refine my datasets. I've been unable to do this using formulas so went over to VBA, but hit a roadblock as I'm an amateur.
My data looks like this...
I want it to return this...
I've tried parsing it with text to columns, but in large datasets I have 60 columns with 100s of hits in my string. Therefore, transposing it then trying to get a count of uniques would be daunting.
Therefore, I was hoping VBA would help, but I can only seem to get a function and not with a Sub and Function to transpose then count. Something like below...
Sub Main()
Dim filename As String
Dim WorksheetName As String
Dim CellRange As String
Sheets.Add.Name = "ParsedOutput"
'==============================================================
' CHANGE THESE VALUES FOR YOUR SHEET
WorksheetName =
CellRange =
'==============================================================
' Get range
Dim Range
Set Range = ThisWorkbook.Worksheets(WorksheetName).Range(CellRange)
' Copy range to avoid overwrite
Range.Copy _
Destination:=ThisWorkbook.Worksheets("ParsedOutput").Range("A1")
' Get copied exclusions
Dim Copy
Set Copy = ThisWorkbook.Worksheets("ParsedOutput").Range("A:A")
' Parse and overwrite
Copy.TextToColumns _
Destination:=Range("A:A"), _
DataType:=xlDelimited, _
ConsecutiveDelimiter:=True, _
Comma:=True
End Sub
Option Explicit
Public Function Counter(InputRange As Range) As String
Dim CellValue As Variant, UniqueValues As New Collection
Application.Volatile
'For error Handling On Error Resume Next
'Looping through all the cell in the defined range For Each CellValue In InputRange
UniqueValues.Add CellValue, CStr(CellValue) ' add the unique item Next
'Returning the count of number of unique values CountUniqueValues = UniqueValues.Count
End Function
Solution 1:[1]
For the sake of simplicity, I will take minimal data to demostrate how to achieve what you want. Feel free to change the code to suit your needs.
Excel Sheet
Let's say our worksheet looks like this
Logic:
- Find last row and last column as shown HERE and construct your range.
- Store the values of that range in an array.
- Loop through each item in that array and extract words based of
,as a delimiter and store it in the collection. If the delimiter doesnt exist then store the entire word in the collection. To create a unique collection, we useOn Error Resume Nextas shown in the code below. - Based on the count of words in the collection, we create an 2D array for output. One part of the array will hold the word and the other part will hold the count of occurences.
- Use .Find and .FindNext to count the occurence of a word in the range and then store it in array.
- Write the array in one go to the relevant cell. For demonstration purpose, I will write to Column D
Code
I have commented the code so you should not have a problem understanding it but if you do then simply ask.
Option Explicit
Sub Sample()
Dim ws As Worksheet
'~~> Change this to relevant sheet
Set ws = Sheet1
Dim LastRow As Long, LastColumn As Long
Dim i As Long, j As Long, k As Long
Dim col As New Collection
Dim itm As Variant, myAr As Variant, tmpAr As Variant
Dim OutputAr() As String
Dim aCell As Range, bCell As Range, rng As Range
Dim countOfOccurences As Long
With ws
'~~> Find last row
LastRow = .Cells.Find(What:="*", _
After:=.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByRows, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Row
'~~> Find last column
LastColumn = .Cells.Find(What:="*", _
After:=.Range("A1"), _
Lookat:=xlPart, _
LookIn:=xlFormulas, _
SearchOrder:=xlByColumns, _
SearchDirection:=xlPrevious, _
MatchCase:=False).Column
'~~> Construct your range
Set rng = .Range(.Cells(1, 1), .Cells(LastRow, LastColumn))
'~~> Store the value in an array
myAr = rng.Value2
'~~> Create a unique collection
For i = LBound(myAr) To UBound(myAr)
For j = LBound(myAr) To UBound(myAr)
If Len(Trim(myAr(i, j))) <> 0 Then
'~~> Check data has "," delimiter
If InStr(1, myAr(i, j), ",") Then
tmpAr = Split(myAr(i, j), ",")
For k = LBound(tmpAr) To UBound(tmpAr)
On Error Resume Next
col.Add tmpAr(k), CStr(tmpAr(k))
On Error GoTo 0
Next k
Else
On Error Resume Next
col.Add myAr(i, j), CStr(myAr(i, j))
On Error GoTo 0
End If
End If
Next j
Next i
'~~> Count the number of items in the collection
i = col.Count
'~~> Create output array for storage
ReDim OutputAr(1 To i, 1 To 2)
i = 1
'~~> Loop through unique collection
For Each itm In col
OutputAr(i, 1) = Trim(itm)
countOfOccurences = 0
'~~> Use .Find and .Findnext to count for occurences
Set aCell = rng.Find(What:=OutputAr(i, 1), LookIn:=xlValues, _
Lookat:=xlPart, SearchOrder:=xlByRows, SearchDirection:=xlNext, _
MatchCase:=False, SearchFormat:=False)
If Not aCell Is Nothing Then
Set bCell = aCell
countOfOccurences = countOfOccurences + 1
Do
Set aCell = rng.FindNext(After:=aCell)
If Not aCell Is Nothing Then
If aCell.Address = bCell.Address Then Exit Do
countOfOccurences = countOfOccurences + 1
Else
Exit Do
End If
Loop
End If
'~~> Store count in array
OutputAr(i, 2) = countOfOccurences
i = i + 1
Next itm
'~~> Output it to relevant cell
.Range("D1").Resize(UBound(OutputAr), 2).Value = OutputAr
End With
End Sub
Output
Solution 2:[2]
Try,
It is convenient to use Dictionary to extract duplicate items.
Sub test()
Dim Ws As Worksheet, wsResult As Worksheet
Dim vDB, vSplit, v
Dim Dic As Object 'Scripting.Dictionary
Dim i As Long, n As Long
Set Dic = CreateObject("Scripting.Dictionary")
Set Ws = Sheets(1) 'ActiveSheet
vDB = Ws.Range("a1").CurrentRegion
For i = 2 To UBound(vDB, 1)
vSplit = Split(vDB(i, 3), ",")
For Each v In vSplit
If Dic.Exists(v) Then
Dic(v) = Dic.Item(v) + 1
Else
Dic.Add v, 1
End If
Next v
Next i
Set wsResult = Sheets(2)
n = Dic.Count
With wsResult
.UsedRange.Clear
.Range("a1").Resize(n) = WorksheetFunction.Transpose(Dic.Keys)
.Range("b1").Resize(n) = WorksheetFunction.Transpose(Dic.Items)
End With
End Sub
Solution 3:[3]
For all who won't use VBA. Here a solution with PowerQuery:
Quelle = Excel.CurrentWorkbook(){[Name="tbl_Source"]}[Content],
Change_Type = Table.TransformColumnTypes(Quelle,{{"ID", Int64.Type}, {"Record", type text}, {"Hits", type text}}),
Split_Hits = Table.ExpandListColumn(Table.TransformColumns(Change_Type, {{"Hits", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Hits"),
Clean_Spaces = Table.ReplaceValue(Split_Hits," ","",Replacer.ReplaceText,{"Hits"}),
Group_Rows = Table.Group(Clean_Spaces, {"Hits"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
Group_Rows
Solution 4:[4]
Approach simulating newer TextJoin and Unique functions
In order to complete the above solutions, I demonstrate an approach using
[1]a)a replacement of theTextJoinfunction (available since vers. 2019, MS 365 ~> the newer function code is commented out,btw),[1]b)theFilterXML()function to get unique words (available since vers. 2013+) and[3]a)a negative filtering to calculate results
Sub wordCounts()
'[0]define data range
With Sheet3
Dim lastRow As Long: lastRow = .Range("A" & .Rows.Count).End(xlUp).Row
Dim rng As Range: Set rng = .Range("A2:A" & lastRow)
End With
With WorksheetFunction
'[1]split a) available and b) unique words into arrays
' Dim words: words = Split(.TextJoin(",", True, rng), ",") ' (available vers. 2019+ or MS 365)
Dim words: words = Split(Join(.Transpose(rng), ","), ",") '
Dim uniques: uniques = UniqueXML(words) ' (already since vers. 2013+)
'[2]provide for calculation
'fill temporary array with words
Dim tmp: tmp = words
'declare cnt array for counting results
Dim cnt: ReDim cnt(0 To UBound(uniques), 0 To 0)
Dim old As Long: old = UBound(tmp) + 1 ' remember original size
'[3]get word counts
Dim elem
For Each elem In uniques
'a) filter out current elem
tmp = Filter(tmp, elem, False)
Dim curr As Long: curr = UBound(tmp) + 1
'b) count number of words (as difference of filtered tmp boundaries) ...
Dim n As Long: n = old - curr
' ... and remember latest array boundary
old = curr
'c) assign results to array cnt
Dim i As Long: cnt(i, 0) = n
i = i + 1 ' increment counter
Next elem
'[4]write word counts to target
rng.Offset(0, 2).Resize(UBound(uniques), 1) = .Transpose(uniques)
rng.Offset(0, 3).Resize(UBound(cnt), 1) = cnt
End With
End Sub
Help function UniqueXML()
Function UniqueXML(arr, Optional Delim As String = ",", Optional ZeroBased As Boolean = False)
' Purp: return unique list of array items
' Note: optional argument Delim defaulting to colon (",")
' Help: https://docs.microsoft.com/de-de/office/vba/api/excel.worksheetfunction.filterxml
' [1] get array data to xml node structure (including root element)
Dim wellformed As String
wellformed = "<root><i>" & Join(arr, "</i><i>") & "</i></root>"
' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
' [2] define XPath string searching unique item values
' Note: c.f. udf: https://stackoverflow.com/questions/58677041/vba-excel-how-to-display-non-equal-values-in-an-excel-array/58685756#58685756
' ------------------------------------------------
' //i ... all <i> node values after the DocumentElement
' [not( .=preceding::i)] ... only if not preceded by siblings of the same node value
' ------------------------------------------------
Dim myXPath As String
myXPath = "//i[not( .=preceding::i)]"
' [3] get "flat" 1-dim array (~> one-based!)
Dim tmp As Variant
tmp = Application.Transpose(WorksheetFunction.FilterXML(wellformed, myXPath))
' [3a] optional redim as zero-based array
If ZeroBased Then ReDim Preserve tmp(LBound(tmp) - 1 To UBound(tmp) - 1)
' [4] return function result
UniqueXML = tmp
End Function
Solution 5:[5]
I didn't understand the problem you have between sub or function; however, this is a function that counts the unique values in a range
Public Function Counter(InputRange As Variant) As Variant
Dim UniqueValues As New Collection
Dim Val As Variant
Dim Cell As Range
Dim I As Long
Application.Volatile
On Error Resume Next
For Each Cell In InputRange
Val = Split(Cell, ",")
If IsArray(Val) Then
For I = LBound(Val) To UBound(Val)
UniqueValues.Add Val(I), CStr(Val(I))
Next I
Else
UniqueValues.Add Val, CStr(Val)
End If
Next Cell
On Error GoTo 0
Counter = UniqueValues.Count
End Function
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Siddharth Rout |
| Solution 2 | Dy.Lee |
| Solution 3 | Chris |
| Solution 4 | T.M. |
| Solution 5 | Zer0Kelvin |





