'Regular expression for valid filename

I already gone through some question in StackOverflow regarding this but nothing helped much in my case.

I want to restrict the user to provide a filename that should contain only alphanumeric characters, -, _, . and space.

I'm not good in regular expressions and so far I came up with this ^[a-zA-Z0-9.-_]$. Can somebody help me?



Solution 1:[1]

To validate a file name i would suggest using the function provided by C# rather than regex

if (filename.IndexOfAny(System.IO.Path.GetInvalidFileNameChars()) != -1)
{
}

Solution 2:[2]

While what the OP asks is close to what the currently accepted answer uses (^[\w\-. ]+$), there might be others seeing this question who has even more specific constraints.

First off, running on a non-US/GB machine, \w will allow a wide range of unwanted characters from foreign languages, according to the limitations of the OP.

Secondly, if the file extension is included in the name, this allows all sorts of weird looking, though valid, filenames like file .txt or file...txt.

Thirdly, if you're simply uploading the files to your file system, you might want a blacklist of files and/or extensions like these:

web.config, hosts, .gitignore, httpd.conf, .htaccess

However, that is considerably out of scope for this question; it would require all sorts of info about the setup for good guidance on security issues. I thought I should raise the matter none the less.

So for a solution where the user can input the full file name, I would go with something like this:

^[a-zA-Z0-9](?:[a-zA-Z0-9 ._-]*[a-zA-Z0-9])?\.[a-zA-Z0-9_-]+$

It ensures that only the English alphabet is used, no beginning or trailing spaces, and ensures the use of a file extension with at least 1 in length and no whitespace.

I've tested this on Regex101, but for future reference, this was my "test-suite":

## THE BELOW SHOULD MATCH
web.config
httpd.conf
test.txt
1.1
my long file name.txt

## THE BELOW SHOULD NOT MATCH - THOUGH VALID
æøå.txt
hosts
.gitignore
.htaccess

Solution 3:[3]

In case someone else needs to validate filenames (including Windows reserved words and such), here's a full expression: \A(?!(?:COM[0-9]|CON|LPT[0-9]|NUL|PRN|AUX|com[0-9]|con|lpt[0-9]|nul|prn|aux)|[\s\.])[^\\\/:*"?<>|]{1,254}\z

Extended expression (don't allow filenames starting with 2 dots, don't allow filenames ending in dots or whitespace):

\A(?!(?:COM[0-9]|CON|LPT[0-9]|NUL|PRN|AUX|com[0-9]|con|lpt[0-9]|nul|prn|aux)|\s|[\.]{2,})[^\\\/:*"?<>|]{1,254}(?<![\s\.])\z

Edit: For the interested, here's a link to Windows file naming conventions: https://msdn.microsoft.com/en-us/library/windows/desktop/aa365247(v=vs.85).aspx

Solution 4:[4]

use this regular expression ^[a-zA-Z0-9._ -]+$

Solution 5:[5]

This is a minor change to Engineers answer.

string regex = @"^[\w\- ]+[\w\-. ]*$"

This will block ".txt" which isn't valid.

Trouble is, it does block "..txt" which is valid

Solution 6:[6]

For full character set (Unicode) use ^[\p{L}0-9_\-.~]+$

or perhaps ^[\p{L}\p{N}_\-.~]+$ would be more accurate if we are talking about Unicode.

I added a '~' simply because I have some files using that character.

Solution 7:[7]

I've just created this. It prevents two dots and dot at end and beginning. It doesn't allow any two dots though.

^([a-zA-Z0-9_]+)\.(?!\.)([a-zA-Z0-9]{1,5})(?<!\.)$

Solution 8:[8]

I may be saying something stupid here, but it seems to me that these answers aren't correct. Firstly, are we talking Linux or Windows here (or another OS)?

Secondly, in Windows it is (I believe) perfectly legitimate to include a "$" in a filename, not to mention Unicode in general. It certainly seems possible.

I tried to get a definitive source on this... and ending up at the Wikip Filename page: in particular the section "Reserved characters and words" seems relevant: and these are, clearly, a list of things which you are NOT allowed to put in.

I'm in the Java world. And I naturally assumed that Apache Commons would have something like validateFilename, maybe in FilenameUtils... but it appears not (if it had done, this would still be potentially useful to C# programmers, as the code is usually pretty easy to understand, and could therefore be translated). I did do an experiment, though, using the method normalize: to my disappointment it allowed perfectly invalid characters (?, etc.) to "pass".

The part of the Wikip Filename page referenced above shows that this question depends on the OS you're using... but it should be possible to concoct some simple regex for Linux and Windows at least.

Then I found a Java way (at least):

Path path = java.nio.file.FileSystems.getDefault().getPath( 'bobb??::mouse.blip' );

output:

java.nio.file.InvalidPathException: Illegal char at index 4: bobb??::mouse.blip

... presumably different FileSystem objects will have different validation rules

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vinoth
Solution 2 Sora.
Solution 3
Solution 4
Solution 5 Eric
Solution 6 robs
Solution 7 luky
Solution 8 John Lord