'Find groups of files that end with the same 17 characters
I'm grabbing files that have a unique and common pattern. I'm trying to match on the common. Currently trying with bash. I can use python or whatever.
file1_02_01_2021_002244.mp4
file2_02_01_2021_002244.mp4
file3_02_01_2021_002244.mp4
# _02_01_2021_002244.mp4 should be the 'match all files that contain this string'
file1_03_01_2021_092200.mp4
file2_03_01_2021_092200.mp4
file3_03_01_2021_092200.mp4
# _03_01_2021_092200.mp4 is the match
...
file201_01_01_2022_112230.mp4
file202_01_01_2022_112230.mp4
file203_01_01_2022_112230.mp4
# _01_01_2022_112230.mp4 is the match
the goal is to find all that are matching from the very end of the file back to the first uniq character, then move them into a folder. The actionable part will be easy. I just need help with the matching.
find -type f $("all that match the same last 17 characters of the file name"); do
do things
done
this is my example directory:
total 28480
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir1
drwxr-xr-x 2 user user 64B Feb 24 10:49 dir2
-rw-r--r-- 2 user user 6.8M Feb 24 08:59 file1_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 468K Feb 24 09:06 file1_03_01_2021_092200.mp4
-rw-r--r-- 2 user user 4.5M Feb 24 08:59 file2_02_01_2021_002244.mp4
-rw-r--r-- 2 user user 665K Feb 24 09:06 file2_03_01_2021_092200.mp4
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile1
-rw-r--r-- 1 user user 0B Feb 24 10:49 otherfile2
I've got it to work with suggestions from the answer marked as correct. They python method probably could work better (especially with the file names that have spaces in them) but I'm not proficient with python enough to make it do everything I want. the script in full is found below:
#!/usr/local/bin/bash
# this is my solution
# create array with patterns
aPATTERN=($(find . -type f -name "*.mp4" | sed 's/^[^_]*//'|sort -u ))
# itterate through all patterns, do things
for each in ${aPATTERN[@]}; do
# create a temp working directory for files that match the pattern
vDIR=`gmktemp -d -p $(pwd)`
# create array of all files found matching the pattern
aFIND+=(`find . -mindepth 1 -maxdepth 1 -type f -iname \*$each`)
# move all files that match the match to the working temp directory
for file in ${aFIND[@]}; do
mv -iv $file $vDIR
done
# reset the found files array, get ready for next pattern
aFIND=()
done
Solution 1:[1]
In python:
import os
os.chdir("folder_path")
data = {}
data = [[file[-22:], file] for file in os.listdir()]
output = {}
for pattern, filename in data:
output.setdefault(pattern, []).append(filename)
print(output)
This will create a dict associating each file with the corresponding pattern.
Output:
{
'_03_01_2021_092200.mp4': ['file1_03_01_2021_092200.mp4', 'file3_03_01_2021_092200.mp4', 'file2_03_01_2021_092200.mp4'],
'_01_01_2022_112230.mp4': ['file202_01_01_2022_112230.mp4', 'file201_01_01_2022_112230.mp4', 'file203_01_01_2022_112230.mp4'],
'_02_01_2021_002244.mp4': ['file1_02_01_2021_002244.mp4', 'file2_02_01_2021_002244.mp4', 'file3_02_01_2021_002244.mp4']
}
Solution 2:[2]
There are several ways to approach this, including writing a bash script, but if it were me, I'd take the quick and easy road. Use grep and read:
PATTERN=_02_01_2021_002244.mp4
find . -name '*.mp4' | grep $PATTERN; while read -t 1 A; do echo $A; done
There are probably better ways that I haven't thought of but this gets the job done.
Solution 3:[3]
Try this:
#!/bin/bash
while IFS= read -r line
do
if [[ "$line" == *_+([0-9])_+([0-9])_+([0-9])_+([0-9])\.mp4 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
Remember that is uses globbing, not regex when you build your pattern.
That is why I did not use [0-9]{2} to match 2 digits, {} does not do that in globbing, like it does in regex.
To use regex, use:
#!/bin/bash
while IFS= read -r line
do
if [[ $(echo "$line" | grep -cE '*_[0-9]{2}_[0-9]{2}_[0-9]{4}_[0-9]{6}\.mp4') -ne 0 ]]
then
echo "MATCH: $line"
else
echo "no match: $line"
fi
done < <(/bin/ls -c1)
This is a more precise match since you can specify how many digits to accept in each sub-pattern.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Cubix48 |
| Solution 2 | challinan |
| Solution 3 |
