'VBA optimizing array parsing and matching strings
I built a VBA Sub that matches an input string with the equivalent string within a large array and returns a specific string which is bounded to the matched string.
However, while the code is working well with some 100 entries, around 12sec. Around 1000 entries take 1min and 1500 entries might take 3min.
So, I was wondering if there anything I could improve to make the code run faster with a large amount of entries.
The VBA Function:
Sub searchISIN()
Dim StartTime As Double
StartTime = Timer
lRow = getlastrow(ws_universe, 1)
Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim z As Long: z = 1
Dim i As Long: i = 1
Dim j As Long
For Each cell In rngISIN
z = z + 1
For j = LBound(MatchingArr) To UBound(MatchingArr)
If InStr(1, CStr(MatchingArr(j)), CStr(cell.Value), vbTextCompare) Then
ws_universe.Cells(z, 2).Value = Left(MatchingArr(j), 18)
i = i + 1
GoTo NextIteration
End If
Next j
ws_universe.Cells(z, 2).Value = "k.A."
i = i + 1
NextIteration:
Next cell
MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub
The array that gets parsed has around 150k entries and each entry is a string which looks like the following:
"IID00XXXXXXXXXXXX|Magna International Inc.|US55922PF576;US559222AQ72;CA559222AT14;US559222AV67;US55922PRV75;US55922PF329;CA5592224011;XS1689185426;US55922PUW12;US559222AR55"
The code takes an input string, for example CA559222AT14, uses the built-in InStr function and returns the first 18 characters of the current array entry. In this example the return value would be "IID00XXXXXXXXXXXX"
I'm open for any idea to improve the code runtime. There are no constrains, rearranging the array layout, rearranging the complete code or whatsoever.
Solution 1:[1]
I think what you're doing is a good use case for a dictionary (instead of an array).
Dim MatchingArr as new Dictionary
If you're looking for matches between the semi-colons, then you should split those out when you build the dictionary instead of trying to match after the fact. You don't show what the search strings are supposed to be, so I'm just guessing.
For example, if you're looking for the following matches:
US55922PF576 US559222AQ72 CA559222AT14 to return IID00XXXXXXXXXXXX, then they should be separate dictionary entries:
MatchingArr.Add "US55922PF576", "IID00XXXXXXXXXXXX"
MatchingArr.Add "US559222AQ72", "IID00XXXXXXXXXXXX"
MatchingArr.Add "CA559222AT14", "IID00XXXXXXXXXXXX"
^ I'm not sure how you need to build the dictionary, but you can loop through your strings, split things out, and add the matches to it if needed.
Then, dump your range to an array and use your dictionary. Create a "paste range" array and fill it out as you loop:
Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim rngArr() as Variant: rngArr = rngISIN
Dim pasteArr() as Variant: pasteArr = rngArr
Dim x
For x = LBound(rngArr,1) to UBound(rngArr,1)
If MatchingArr.Exists(rngArr(x,1)) Then pasteArr(x,1) = MatchingArr(rngArr(x,1))
Next x
Finally just set the range next to your original range = to the paste range array.
ws_universe.Range("B2:B" & lRow) = pasteArr
The following code runs through 500,000 rows in about 10 seconds (this includes building the dictionary).
If the dictionary is already built, it's faster.
The results of the first run with 1,048,576 rows was 00:00:56 using this code:
Dim myDict As New Dictionary
Sub TestBigRange()
Dim StartTime As Double
StartTime = Timer
Dim rangeArr() As Variant
rangeArr = Range("A1:A1048576")
Dim x As Long
'Build dictionary if needed
If myDict.Count = 0 Then
For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
myDict.Add x, "A" & x
Next x
End If
Dim pasteRng() As Variant
pasteRng = rangeArr
For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
If myDict.Exists(rangeArr(x, 1)) Then pasteRng(x, 1) = myDict(rangeArr(x, 1))
Next x
Range("A1:A1048576") = rangeArr
Range("B1:B1048576") = pasteRng
MsgBox "Search Dictionary: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub
Without having to rebuild the dictionary after:
Building the dictionary with 1,048,576 entries took 24 seconds.
Reading/Writing the dictionary to/from a text file is pretty quick as well:
'If we write to separate lines, we don't have to split:
Sub WriteDictionary()
Open ThisWorkbook.Path & "\DictionaryFile.txt" For Output As #1
Dim x As Long
For x = 1 To 1048576
Print #1, x & ""
Print #1, "A" & x
Next x
Close #1
End Sub
Sub ReadDictionary()
Set myDict = New Dictionary
Dim StartTime As Double
StartTime = Timer
Open ThisWorkbook.Path & "\DictionaryFile.txt" For Input As #1
Dim key, val
Do Until EOF(1)
Line Input #1, key
Line Input #1, val
myDict.Add key, val
Loop
Close #1
MsgBox "Read Dictionary from File [1048576 rows]: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub
EDIT: Per Tim Williams, keeping your dictionaries smaller results in even faster creation/lookup time.
Here we do the complete build dictionary/lookup on 1048576 rows in 31 seconds!
Dim aDicts() As New Dictionary
Sub BuildDictionaries(ARange As Range, Optional MaxSize = 100000)
'100,000 is arbitrary, but seems to be a pretty good number
'Feel free to experiment: too small/big = slower.
ReDim aDicts(Int(ARange.Cells.Count / MaxSize))
Dim x As Long, r() As Variant, curDict As Integer
curDict = 0: r = ARange
For x = LBound(r, 1) To UBound(r, 1)
If aDicts(curDict).Count < MaxSize Then
aDicts(curDict).Add x, "A" & x
Else
curDict = curDict + 1
aDicts(curDict).Add x, "A" & x
End If
Next x
End Sub
The code to search through each dictionary:
For x = LBound(rangeArr, 1) To UBound(rangeArr, 1)
For Each z In aDicts
If z.Exists(rangeArr(x, 1)) Then
pasteRng(x, 1) = z(rangeArr(x, 1))
Exit For
End If
Next z
Next x
Here's the output from my last round of testing (reading in dictionary from text file):
Need 11 dictionaries [100,000 split]
Dictionary(0) size: 100000
Dictionary(1) size: 100000
Dictionary(2) size: 100000
Dictionary(3) size: 100000
Dictionary(4) size: 100000
Dictionary(5) size: 100000
Dictionary(6) size: 100000
Dictionary(7) size: 100000
Dictionary(8) size: 100000
Dictionary(9) size: 100000
Dictionary(10) size: 48576
Read Dictionary from File [1048576 rows]: 00:00:02
Search Dictionary [100000]: 00:00:02
Search Dictionary [200000]: 00:00:03
Search Dictionary [300000]: 00:00:04
Search Dictionary [400000]: 00:00:05
Search Dictionary [500000]: 00:00:07
Search Dictionary [600000]: 00:00:10
Search Dictionary [700000]: 00:00:13
Search Dictionary [800000]: 00:00:17
Search Dictionary [900000]: 00:00:22
Search Dictionary [1000000]: 00:00:27
Search Dictionary [Finished]: 00:00:31
Solution 2:[2]
Looping Through Arrays Instead of Ranges
- Not tested. It will fail if
lRowis less than 3.
Option Explicit
Sub searchISIN()
Dim StartTime As Double: StartTime = Timer
lRow = getlastrow(ws_universe, 1)
Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim aData As Variant: aData = rngISIN.Value
Dim bData As Variant: bData = rngISIN.EntireRow.Columns("B").Value
Dim aOffset As Long: aOffset = 1 - LBound(MatchingArr)
Dim aIndex As Variant
Dim a As Long
Dim i As Long: i = 1
For a = 1 To UBound(aData, 1)
aIndex = Application.Match("*" & CStr(aData(a, 1)) & "*", MatchingArr, 0)
If IsNumeric(aIndex) Then
bData(a, 1).Value = Left(MatchingArr(aIndex - aOffset), 18)
i = i + 1
Else
bData(a, 1) = "k.A."
i = i + 1
End If
Next a
rngISIN.EntireRow.Columns("B").Value = bData
MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub
Sub searchISINFirst()
Dim StartTime As Double: StartTime = Timer
lRow = getlastrow(ws_universe, 1)
Dim rngISIN As Range: Set rngISIN = ws_universe.Range("A2:A" & lRow)
Dim aData As Variant: aData = rngISIN.Value
Dim bData As Variant: bData = rngISIN.EntireRow.Columns("B").Value
Dim a As Long
Dim i As Long: i = 1
Dim j As Long
Dim jFound As Boolean
For a = 1 To UBound(aData, 1)
For j = LBound(MatchingArr) To UBound(MatchingArr)
If InStr(1, CStr(MatchingArr(j)), CStr(aData(a, 1)), vbTextCompare) Then
bData(a, 1).Value = Left(MatchingArr(j), 18)
i = i + 1
jFound = True
Exit For
End If
Next j
If jFound Then
jFound = False
Else
bData(a, 1) = "k.A."
i = i + 1
End If
Next a
rngISIN.EntireRow.Columns("B").Value = bData
MsgBox "Search ISINs: " & Format((Timer - StartTime) / 86400, "hh:mm:ss")
End Sub
Solution 3:[3]
Not an answer but just to illustrate a point about loading a Scripting Dictionary if you have a lot of data...
The chart below compares actual load times vs number of entries with a linear extrapolation based on time to load the first 500k entries. It's clear that when you get past a few hundred thousand entries the load time gets very long.
As commented above, splitting your data over multiple dictionaries (stored in a Collection for example) may result in faster run times (depending on your exact use case).
Also worth noting that the Dictionary can accept [pretty much] any datatype as keys, and some types load faster than others (eg. Long keys can be added about 2-3x faster than String keys)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |





