'Equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()

Are there any equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()?

The closest I've come across are encodeURI()/encodeURIComponent() and escape() (and their corresponding un-encoding functions), but they don't encode/decode the same set of special characters as far as I can tell.



Solution 1:[1]

OK, I think I'm going to go with a hybrid custom set of functions:

Encode: Use encodeURIComponent(), then put slashes back in.
Decode: Decode any %hex values found.

Here's a more complete variant of what I ended up using (it handles Unicode properly, too):

function quoteUrl(url, safe) {
    if (typeof(safe) !== 'string') {
        safe = '/';    // Don't escape slashes by default
    }

    url = encodeURIComponent(url);

    // Unescape characters that were in the safe list
    toUnencode = [  ];
    for (var i = safe.length - 1; i >= 0; --i) {
        var encoded = encodeURIComponent(safe[i]);
        if (encoded !== safe.charAt(i)) {    // Ignore safe char if it wasn't escaped
            toUnencode.push(encoded);
        }
    }

    url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent);

    return url;
}


var unquoteUrl = decodeURIComponent;    // Make alias to have symmetric function names

Note that if you don't need "safe" characters when encoding ('/' by default in Python), then you can just use the built-in encodeURIComponent() and decodeURIComponent() functions directly.

Also, if there are Unicode characters (i.e. characters with codepoint >= 128) in the string, then to maintain compatibility with JavaScript's encodeURIComponent(), the Python quote_url() would have to be:

def quote_url(url, safe):
    """URL-encodes a string (either str (i.e. ASCII) or unicode);
    uses de-facto UTF-8 encoding to handle Unicode codepoints in given string.
    """
    return urllib.quote(unicode(url).encode('utf-8'), safe)

And unquote_url() would be:

def unquote_url(url):
    """Decodes a URL that was encoded using quote_url.
    Returns a unicode instance.
    """
    return urllib.unquote(url).decode('utf-8')

Solution 2:[2]

JavaScript               |  Python
----------------------------------- 
encodeURI(str)           |  urllib.parse.quote(str, safe='~@#$&()*!+=:;,?/\'');
-----------------------------------
encodeURIComponent(str)  |  urllib.parse.quote(str, safe='~()*!\'')

On Python 3.7+ you can remove ~ from safe=.

Solution 3:[3]

The requests library is a bit more popular if you don't mind the extra dependency

from requests.utils import quote
quote(str)

Solution 4:[4]

Python: urllib.quote

Javascript:unescape

I haven't done extensive testing but for my purposes it works most of the time. I guess you have some specific characters that don't work. Maybe if I use some Asian text or something it will break :)

This came up when I googled so I put this in for all the others, if not specifically for the original question.

Solution 5:[5]

Try a regex. Something like this:

mystring.replace(/[\xFF-\xFFFF]/g, "%" + "$&".charCodeAt(0));

That will replace any character above ordinal 255 with its corresponding %HEX representation.

Solution 6:[6]

decodeURIComponent() is similar to unquote

const unquote = decodeURIComponent
const unquote_plus = (s) => decodeURIComponent(s.replace(/\+/g, ' '))

except that Python is much more forgiving. If one of the two characters after a % is not a hex digit (or there's not two characters after a %), JavaScript will throw a URIError: URI malformed error, whereas Python will just leave the % as is.

encodeURIComponent() is not quite the same as quote, you need to percent encode a few more characters and un-escape /:

const quoteChar = (c) => '%' + c.charCodeAt(0).toString(16).padStart(2, '0').toUpperCase()
const quote = (s) => encodeURIComponent(s).replace(/[()*!']/g, quoteChar).replace(/%2F/g, '/')

const quote_plus = (s) => quote(s).replace(/%20/g, '+')

The characters that Python's quote doesn't escape is documented here and is listed as (on Python 3.7+) "Letters, digits, and the characters '_.-~' are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is '/'"

The characters that JavaScript's encodeURIComponent doesn't encode is documented here and is listed as uriAlpha (upper and lowercase ASCII letters), DecimalDigit and uriMark, which are - _ . ! ~ * ' ( ).

Solution 7:[7]

Here are implementations based on a implementation on github repo purescript-python:

import urllib.parse as urllp
def encodeURI(s): return urllp.quote(s, safe="~@#$&()*!+=:;,.?/'")
def decodeURI(s): return urllp.unquote(s, errors="strict")
def encodeURIComponent(s): return urllp.quote(s, safe="~()*!.'")
def decodeURIComponent(s): return urllp.unquote(s, errors="strict")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Boris Verkhovskiy
Solution 3
Solution 4
Solution 5 jiggy
Solution 6
Solution 7 Timothy C. Quinn