'Python regular expression again - match url

I have such regexp:

 re.compile(r"((https?):((//)|(\\\\))+[\w\d:#@%/;$()~_?\+-=\\\.&]*)", re.MULTILINE|re.UNICODE)

But that doesn't include hashbangs (#!). What I need to change, to get it working? I know I can add ! to group with #@% etc, but that will select something like

Check this out: http://example.com/something/!!!

and I want to avoid that.



Solution 1:[1]

It could be very long but in practice mine works pretty good. Please try this one ((http|https)\:\/\/)?[a-zA-Z0-9\.\/\?\:@\-_=#]+\.([a-zA-Z]){2,6}([a-zA-Z0-9\.\&\/\?\:@\-_=#])*

It matches all of the example below

http://wwww.stackoverflow.com
abc.com
http://test.test-75.1474.stackoverflow.com/
stackoverflow.com/
stackoverflow.com
[email protected]
http://www.example.com/etcetc
www.example.com/etcetc
example.com/etcetc
user:[email protected]/etcetc
(www.itmag.com)
example.com/etcetc?query=aasd
example.com/etcetc?query=aasd&dest=asds
http://stackoverflow.com/questions/6427530/regular-expression-pattern-to-
match-url-with
www/[email protected]
[email protected].
[email protected] 
[email protected]     

Solution 2:[2]

This is a common problem, use default libraries.

For python use urlparse

Solution 3:[3]

I'll admit that I'm a little bit worried about an application that requires a regex like that to match URLs. That said, this seems to work for me:

((https?):((//)|(\\\\))+([\w\d:#@%/;$()~_?\+-=\\\.&](#!)?)*)

Solution 4:[4]

Based on this link we can use the library validators

For example:

import validators

valid=validators.url('https://codespeedy.com/')
if valid==True:
    print("Url is valid")
else:
    print("Invalid url")

Solution 5:[5]

This is the most completed pattern I use:

URL_PATTERN = r'[A-Za-z0-9]+://[A-Za-z0-9%-_]+(/[A-Za-z0-9%-_])*(#|\\?)[A-Za-z0-9%-_&=]*'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Asad
Solution 2 Alireza Mazochi
Solution 3 tsm
Solution 4 Alireza Mazochi
Solution 5 Leto Atreides