'Cleaning a binary numpy array by removing some elements that fit a condition
I'm trying to load a binary file to numpy and drop some unwanted values that i don't need, then reshape that array and use it to do some calculations.
here is my code for reference:
def read_binary_data(filename, buffer_size):
np.set_printoptions(threshold=sys.maxsize)
np.set_printoptions(formatter={'int': hex})
with open(filename, "rb") as f:
binary_array = np.fromfile(f, dtype='B', offset=3)
print(binary_array)
and here is the result:
...
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xd2 0xf4
0xff 0xff 0x0 0x0 0x0 0x0 0x4e 0x44 0x0 0x0 0x8 0x0 0x0 0x0 0xcf 0xf4
0xff 0xff 0x0 0x0 0x0 0x0]
Let's say for instance I want to remove all occurrences of 0x4e 0x44 but not 0x4e and 0x44 on their own, it's the combination of the two that I'm interested in. because if say i have 0x4e 0x54 I want to keep this one intact.
how would I be able to do that?
Thank you for your help
Solution 1:[1]
Thank you every one for your input.
I figured out a way to achieve this as efficiently as I could. there probably may be a better way to do this but for now this work :D
np.set_printoptions(formatter={'int': hex})
with open(filename, "rb") as f:
binary_array = np.fromfile(f, dtype='B')
# Clean array
to_remove = []
indexes = np.where(binary_array == 0x4e)
# find all occurences of 4E54
x54_mask = binary_array[indexes[0] + 1] == 0x54
to_remove = [*to_remove, *list(indexes[0][x54_mask])]
# find all occurences of 4E53
x53_mask = binary_array[indexes[0] + 1] == 0x53
to_remove = [*to_remove, *list(indexes[0][x53_mask])]
# removing unwanted values
to_remove_f = []
for i in to_remove:
to_remove_f.append(i)
to_remove_f.append(i + 1)
binary_array = np.delete(binary_array, to_remove_f)
A for loop is only used over the 'to_remove' list which only contains < 10 values.
Peace :D
Solution 2:[2]
Note that just because your array is printing hexadecimal values, the values themselves are still integers. Anyway, here's one way to find and delete pairs of 0x4e 0x44, though probably not the most efficient:
indices_to_delete = []
for i in range(len(binary_array) - 1):
# Check if the current value and the one right after are a pair
# If so, they will need to be deleted
if binary_array[i] == int("0x4e", 0) and binary_array[i+1] == int("0x44", 0):
indices_to_delete += [i, i+1]
binary_array = np.delete(binary_array[None, :], indices_to_delete, axis=1)[0]
Your binary array now has no pairs of 0x4e 0x44, though any singular instances of 0x4e or 0x44 have been left alone.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mebarek Zouakh |
| Solution 2 | AJH |
