'Check if a string is hexadecimal
I know the easiest way is using a regular expression, but I wonder if there are other ways to do this check.
Why do I need this? I am writing a Python script that reads text messages (SMS) from a SIM card. In some situations, hex messages arrives and I need to do some processing for them, so I need to check if a received message is hexadecimal.
When I send following SMS:
Hello world!
And my script receives
00480065006C006C006F00200077006F0072006C00640021
But in some situations, I receive normal text messages (not hex). So I need to do a if hex control.
I am using Python 2.6.5.
UPDATE:
The reason of that problem is, (somehow) messages I sent are received as hex while messages sent by operator (info messages and ads.) are received as a normal string. So I decided to make a check and ensure that I have the message in the correct string format.
Some extra details: I am using a Huawei 3G modem and PyHumod to read data from the SIM card.
Possible best solution to my situation:
The best way to handle such strings is using a2b_hex (a.k.a. unhexlify) and utf-16 big endian encoding (as @JonasWielicki mentioned):
from binascii import unhexlify # unhexlify is another name of a2b_hex
mystr = "00480065006C006C006F00200077006F0072006C00640021"
unhexlify(mystr).encode("utf-16-be")
>> u'Hello world!'
Solution 1:[1]
You can:
- test whether the string contains only hexadecimal digits (0…9,A…F)
- try to convert the string to integer and see whether it fails.
Here is the code:
import string
def is_hex(s):
hex_digits = set(string.hexdigits)
# if s is long, then it is faster to check against a set
return all(c in hex_digits for c in s)
def is_hex(s):
try:
int(s, 16)
return True
except ValueError:
return False
Solution 2:[2]
I know the op mentioned regular expressions, but I wanted to contribute such a solution for completeness' sake:
def is_hex(s):
return re.fullmatch(r"^[0-9a-fA-F]$", s or "") is not None
Performance
In order to evaluate the performance of the different solutions proposed here, I used Python's timeit module. The input strings are generated randomly for three different lengths, 10, 100, 1000:
s=''.join(random.choice('0123456789abcdef') for _ in range(10))
Levon's solutions:
# int(s, 16)
10: 0.257451018987922
100: 0.40081690801889636
1000: 1.8926858339982573
# all(_ in string.hexdigits for _ in s)
10: 1.2884491360164247
100: 10.047717947978526
1000: 94.35805322701344
Other answers are variations of these two. Using a regular expression:
# re.fullmatch(r'^[0-9a-fA-F]$', s or '')
10: 0.725040541990893
100: 0.7184272820013575
1000: 0.7190397029917222
Picking the right solution thus depends on the length on the input string and whether exceptions can be handled safely. The regular expression certainly handles large strings much faster (and won't throw a ValueError on overflow), but int() is the winner for shorter strings.
Solution 3:[3]
One more simple and short solution based on transformation of string to set and checking for subset (doesn't check for '0x' prefix):
import string
def is_hex_str(s):
return set(s).issubset(string.hexdigits)
More information here.
Solution 4:[4]
Another option:
def is_hex(s):
hex_digits = set("0123456789abcdef")
for char in s:
if not (char in hex_digits):
return False
return True
Solution 5:[5]
Most of the solutions proposed above do not take into account that any decimal integer may be also decoded as hex because decimal digits set is a subset of hex digits set. So Python will happily take 123 and assume it's 0123 hex:
>>> int('123',16)
291
This may sound obvious but in most cases you'll be looking for something that was actually hex-encoded, e.g. a hash and not anything that can be hex-decoded. So probably a more robust solution should also check for an even length of the hex string:
In [1]: def is_hex(s):
...: try:
...: int(s, 16)
...: except ValueError:
...: return False
...: return len(s) % 2 == 0
...:
In [2]: is_hex('123')
Out[2]: False
In [3]: is_hex('f123')
Out[3]: True
Solution 6:[6]
This will cover the case if the string starts with '0x' or '0X': [0x|0X][0-9a-fA-F]
d='0X12a'
all(c in 'xX' + string.hexdigits for c in d)
True
Solution 7:[7]
In Python3, I tried:
def is_hex(s):
try:
tmp=bytes.fromhex(hex_data).decode('utf-8')
return ''.join([i for i in tmp if i.isprintable()])
except ValueError:
return ''
It should be better than the way: int(x, 16)
Solution 8:[8]
Using Python you are looking to determine True or False, I would use eumero's is_hex method over Levon's method one. The following code contains a gotcha...
if int(input_string, 16):
print 'it is hex'
else:
print 'it is not hex'
It incorrectly reports the string '00' as not hex because zero evaluates to False.
Solution 9:[9]
Since all the regular expression above took about the same amount of time, I would guess that most of the time was related to converting the string to a regular expression. Below is the data I got when pre-compiling the regular expression.
int_hex
0.000800 ms 10
0.001300 ms 100
0.008200 ms 1000
all_hex
0.003500 ms 10
0.015200 ms 100
0.112000 ms 1000
fullmatch_hex
0.001800 ms 10
0.001200 ms 100
0.005500 ms 1000
Solution 10:[10]
Simple solution in case you need a pattern to validate prefixed hex or binary along with decimal
\b(0x[\da-fA-F]+|[\d]+|0b[01]+)\b
Sample: https://regex101.com/r/cN4yW7/14
Then doing int('0x00480065006C006C006F00200077006F0072006C00640021', 0) in python gives
6896377547970387516320582441726837832153446723333914657
The base 0 invokes prefix guessing behaviour. This has saved me a lot of hassle. Hope it helps!
Solution 11:[11]
Most of the solution are not properly in checking string with prefix 0x
>>> is_hex_string("0xaaa")
False
>>> is_hex_string("0x123")
False
>>> is_hex_string("0xfff")
False
>>> is_hex_string("fff")
True
Solution 12:[12]
Here's my solution:
def to_decimal(s):
'''input should be int10 or hex'''
isString = isinstance(s, str)
if isString:
isHex = all(c in string.hexdigits + 'xX' for c in s)
return int(s, 16) if isHex else int(s)
else:
return int(hex(s), 16)
a = to_decimal(12)
b = to_decimal(0x10)
c = to_decimal('12')
d = to_decimal('0x10')
print(a, b, c, d)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
