'How to bulk rename files based on their content in python

I'm trying to prepend file information into the names of a set of .html files. But I've been getting a traceback error resulting in the same date being prepended into all of the files processed.

The intended outcome is to prepend the relative date from the into the title of each html file.

2021-03-27-x.hmtl
2021-03-28-x.hmtl
2021-03-29-x.hmtl
etc

The script:

import os

dir = os.listdir(".")

files = []

for file in dir:
    if file[-5:] == '.html':
        files.insert(0, file)

for fileName in files:
    file = open(fileName)
    content = file.read()
    file.close()

    datetime = content.partition('datetime="')[-1][:10]

    [os.rename(f, datetime + "-" + str(f)) for f in os.listdir('.') 
    if ((not f.startswith('.')) and f.endswith(".html"))]

    print(datetime)

the error:

Traceback (most recent call last):
  File ".../TEST.io/3prefixdate.py", line 25, in <module>
    file = open(fileName)

The code works for bulk processing files so I think that its likely the os.rename lines that are broken?



Solution 1:[1]

The main problem I see is that you're looping through each of the files you want to process, but then you seem to be looping again, trying to rename all of the matching files in the directory once for each file that you process in the outer loop. This doesn't make sense to me, especially given what you say you want as a result.

I'm guessing that this is kinda what you want. I took the liberties of simplifying things a bit and using some more favorable Python idioms:

import os

files = os.listdir(".")

files = []

for file in files:
    if file[0] != '.' and file.endswith('.html'):
        files.insert(0, file)

for fileName in files:
    with open(fileName) as f:
        content = file.read()
    datetime = content.partition('datetime="')[-1][:10]
    os.rename(fileName, datetime + "-" + fileName)
    print(datetime + "-" + fileName)

If you're getting an error, I assume that it involves the extraction of the date from the file's content. That's a fairly hairy expression, and of course, I have no way of knowing if it's right without knowing what the content of your files looks like.

A few comments on changes I made:

  1. Don't use dir as a variable name. It's a built-in Python function. Note how in the display of the code in your question, the dirs are red. That's why.
  2. Learn to use with for working on files. It prevents you from forgetting to close a file when you're done using it, and is visually superior too IMO.
  3. <string>.endswith() is preferable here because it's possible that you'll see file names shorter than 5 characters, at which point <string>[-5:] will raise an Exception.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1