'Equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()
Are there any equivalent JavaScript functions for Python's urllib.parse.quote() and urllib.parse.unquote()?
The closest I've come across are encodeURI()/encodeURIComponent() and escape() (and their corresponding un-encoding functions), but they don't encode/decode the same set of special characters as far as I can tell.
Solution 1:[1]
OK, I think I'm going to go with a hybrid custom set of functions:
Encode: Use encodeURIComponent(), then put slashes back in.
Decode: Decode any %hex values found.
Here's a more complete variant of what I ended up using (it handles Unicode properly, too):
function quoteUrl(url, safe) {
if (typeof(safe) !== 'string') {
safe = '/'; // Don't escape slashes by default
}
url = encodeURIComponent(url);
// Unescape characters that were in the safe list
toUnencode = [ ];
for (var i = safe.length - 1; i >= 0; --i) {
var encoded = encodeURIComponent(safe[i]);
if (encoded !== safe.charAt(i)) { // Ignore safe char if it wasn't escaped
toUnencode.push(encoded);
}
}
url = url.replace(new RegExp(toUnencode.join('|'), 'ig'), decodeURIComponent);
return url;
}
var unquoteUrl = decodeURIComponent; // Make alias to have symmetric function names
Note that if you don't need "safe" characters when encoding ('/' by default in Python), then you can just use the built-in encodeURIComponent() and decodeURIComponent() functions directly.
Also, if there are Unicode characters (i.e. characters with codepoint >= 128) in the string, then to maintain compatibility with JavaScript's encodeURIComponent(), the Python quote_url() would have to be:
def quote_url(url, safe):
"""URL-encodes a string (either str (i.e. ASCII) or unicode);
uses de-facto UTF-8 encoding to handle Unicode codepoints in given string.
"""
return urllib.quote(unicode(url).encode('utf-8'), safe)
And unquote_url() would be:
def unquote_url(url):
"""Decodes a URL that was encoded using quote_url.
Returns a unicode instance.
"""
return urllib.unquote(url).decode('utf-8')
Solution 2:[2]
JavaScript | Python
-----------------------------------
encodeURI(str) | urllib.parse.quote(str, safe='~@#$&()*!+=:;,?/\'');
-----------------------------------
encodeURIComponent(str) | urllib.parse.quote(str, safe='~()*!\'')
On Python 3.7+ you can remove ~ from safe=.
Solution 3:[3]
The requests library is a bit more popular if you don't mind the extra dependency
from requests.utils import quote
quote(str)
Solution 4:[4]
Python: urllib.quote
Javascript:unescape
I haven't done extensive testing but for my purposes it works most of the time. I guess you have some specific characters that don't work. Maybe if I use some Asian text or something it will break :)
This came up when I googled so I put this in for all the others, if not specifically for the original question.
Solution 5:[5]
Try a regex. Something like this:
mystring.replace(/[\xFF-\xFFFF]/g, "%" + "$&".charCodeAt(0));
That will replace any character above ordinal 255 with its corresponding %HEX representation.
Solution 6:[6]
decodeURIComponent() is similar to unquote
const unquote = decodeURIComponent
const unquote_plus = (s) => decodeURIComponent(s.replace(/\+/g, ' '))
except that Python is much more forgiving. If one of the two characters after a % is not a hex digit (or there's not two characters after a %), JavaScript will throw a URIError: URI malformed error, whereas Python will just leave the % as is.
encodeURIComponent() is not quite the same as quote, you need to percent encode a few more characters and un-escape /:
const quoteChar = (c) => '%' + c.charCodeAt(0).toString(16).padStart(2, '0').toUpperCase()
const quote = (s) => encodeURIComponent(s).replace(/[()*!']/g, quoteChar).replace(/%2F/g, '/')
const quote_plus = (s) => quote(s).replace(/%20/g, '+')
The characters that Python's quote doesn't escape is documented here and is listed as (on Python 3.7+) "Letters, digits, and the characters '_.-~' are never quoted. By default, this function is intended for quoting the path section of a URL. The optional safe parameter specifies additional ASCII characters that should not be quoted — its default value is '/'"
The characters that JavaScript's encodeURIComponent doesn't encode is documented here and is listed as uriAlpha (upper and lowercase ASCII letters), DecimalDigit and uriMark, which are - _ . ! ~ * ' ( ).
Solution 7:[7]
Here are implementations based on a implementation on github repo purescript-python:
import urllib.parse as urllp
def encodeURI(s): return urllp.quote(s, safe="~@#$&()*!+=:;,.?/'")
def decodeURI(s): return urllp.unquote(s, errors="strict")
def encodeURIComponent(s): return urllp.quote(s, safe="~()*!.'")
def decodeURIComponent(s): return urllp.unquote(s, errors="strict")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Boris Verkhovskiy |
| Solution 3 | |
| Solution 4 | |
| Solution 5 | jiggy |
| Solution 6 | |
| Solution 7 | Timothy C. Quinn |
