'linux how to add missing file extensions or move files without extensions

I have over 7k files, some of them are HTML some are PDFs as seen below:

mixture of HTML and PDFs

I would now like to use the terminal to move all PDFs to a new directory. Unfortunately, when I do

mv *.pdf ../new_directory/

I get "No such file or directory", because none of the files have any file extension (no .pdf nor .HTML) however when I do the following on a random file:

file -b \[2022\]\ NZHC\ 1

It does say either "PDF document" or "HTML document, ASCII text", so the system recognises the right format. So how can I move all pdfs or HTMLs respectively? Is there a way to add the appropriate extension (.pdf or .HTML) to the file names?



Solution 1:[1]

file -b --extension [filename] gives you a determined extension for the given file.

For example:

file -b --extension random_pdf.pdf # prints 'pdf'

but, HTML file doesn't have any identifiable file signature since it is just a text file, hence the command returns ??? instead.

file -b --extension random_html.html # prints '???' because the type is unknown

Not only HTML, but file will return ??? to every files that it can't determine what it is.

But, assuming your folder has only PDF and HTML files in it, we can also assume that every file that the command reports with ??? is an HTML file.

So we will use sed to change ??? to html like so:

file -b --extension "$filename" | sed 's/???/html/'

Now, assuming you're using this line only with PDF and HTML files, this will report the correct file extension for each files.

Using command substitution, we could make a new filename which contains the extension.

echo "$filename.$(file -b --extension "$filename" | sed 's/???/html/')"

Combining this with mv and bash for loop, we can get this line:

for i in *; do
    mv "$i" "$i.$(file -b --extension "$i" | sed 's/???/html/')"
done

and voila, all of your files in the folder should now have been renamed with proper extensions!

P.S. Before running the command, you should test if it indeed works as intended by putting echo in front of mv, so that it doesn't actually get executed and just prints out the command instead.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 KokoseiJ