'How do I detect groups of common strings in filenames
I'm trying to figure out a way to detect groups of files. For instance:
If a given directory has the following files:
- Birthday001.jpg
- Birthday002.jpg
- Birthday003.jpg
- Picknic1.jpg
- Picknic2.jpg
- Afternoon.jpg.
I would like to condense the listing to something like
- Birthday ( 3 pictures )
- Picknic ( 2 pictures )
- Afternoon ( 1 picture )
How should I go about detecting the groups?
Solution 1:[1]
Simply build a histogram whose keys are modified by a regex:
<?php
# input
$filenames = array("Birthday001.jpg", "Birthday002.jpg", "Birthday003.jpg", "Picknic1.jpg", "Picknic2.jpg", "Afternoon.jpg");
# create histogram
$histogram = array();
foreach ($filenames as $filename) {
$name = preg_replace('/\d+\.[^.]*$/', '', $filename);
if (isset($histogram[$name])) {
$histogram[$name]++;
} else {
$histogram[$name] = 1;
}
}
# output
foreach ($histogram as $name => $count) {
if ($count == 1) {
echo "$name ($count picture)\n";
} else {
echo "$name ($count pictures)\n";
}
}
?>
Solution 2:[2]
Generate an array of words like "my" (developing this array will be very important, "my" is the only one in your example given) and strip these out of all the file names. Strip out all numbers and punctuation, also extensions should be long gone at this point. Once this is done, put all of the unique results into an array. You can then use this as a fairly reliable source of keywords to search for any stragglers that the other processing didn't catch.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Alex S |
