'How to convert a byte array back into a string?
I have written a code for Huffman Compression. My padded encoded string is converted into a byte array using the following code:
def make_byte_array(self, padded_text):
byte_array = bytearray()
for i in range(0, len(padded_text), 8):
byte_array.append(int(padded_text[i:i + 8], 2))
return byte_array
How can I convert the byte array back into the original string of my padded encoded text?
Edit:
Here is some more context that might make answering the question a bit easier. The padded compressed text I am converting into a byte array is saved into a binary file. When I read the binary file this is the output I get:
b'\xd8\xd2.\xfdc\xa9\xfd\xc4\xa2R\xf8\xack\xb4\xfe\x07&@'
The above string is what I need to convert back into the compressed padded text.
To explain my code further, I have only created the Compression part, and I used Huffman Compression. Below is the code for my Huffman Compression:
class HuffmanCoding:
def __init__(self, text_to_compress):
self.text_to_compress = text_to_compress # text that will be compressed
self.heap = []
self.codes = {} # will store the Huffman code of each character
self.decompress_map = {}
def get_frequency(self): # method to find frequency of each character in text - RLE
frequency_Dictionary = {} # creates an empty dictionary where frequency of each character will be stored
for character in self.text_to_compress: # Iterates through the text to be compressed
if character in frequency_Dictionary:
frequency_Dictionary[character] = frequency_Dictionary[character] + 1 # if character already exists in
# dictionary, its value is increased by 1
else:
frequency_Dictionary[character] = 1 # if character is not present in list, its value is set to 1
return frequency_Dictionary
def make_queue(self, frequency): # creates the priority queue of each character and its associated frequency
for key in frequency:
node = HeapNode(key, frequency[key]) # create node (character) and store its frequency alongside it
heapq.heappush(self.heap, node) # Push the node into the heap
def merge_nodes(
self): # creates HuffmanTree by getting the two minimum nodes and merging them together, until theres
# only one node left
while len(self.heap) > 1:
node1 = heapq.heappop(self.heap) # pop node from top of heap
node2 = heapq.heappop(self.heap) # pop next node which is now at the top of heap
merged = HeapNode(None, node1.freq + node2.freq) # merge the two nodes we popped out from heap
merged.left = node1
merged.right = node2
heapq.heappush(self.heap, merged) # push merged node back into the heap
def make_codes(self, root, current_code): # Creates Huffman code for each character
check = 0
if root == None:
return
if root.char != None:
self.codes[root.char] = current_code
self.decompress_map[current_code] = root.char
self.make_codes(root.left, current_code + "0") # Every time you traverse left, add a 0 - Recursive Call
self.make_codes(root.right, current_code + "1") # Every time you traverse right, add a 1 - Recursive Call
if len(self.decompress_map) == len(self.get_frequency()) and check == 0: #####################################################
codeDict = open("Codes.txt", mode="w")
codeDict.write(str(self.decompress_map))
codeDict.close()
check = 1
def assignCodes(self): # Assigns codes to each character
root = heapq.heappop(self.heap) # extracts root node from heap
current_code = ""
self.make_codes(root, current_code)
def get_compressed_text(self, text): # Replaces characters in original text with codes
compressed_text = ""
for character in text:
compressed_text += self.codes[character]
return compressed_text
def pad_encoded_text(self, compressed_text):
extra_padding = 8 - len(compressed_text) % 8 # works out how much extra padding is required
for i in range(extra_padding):
compressed_text += "0" # adds the amount of 0's that are required
return compressed_text
def make_byte_array(self, padded_text):
byte_array = bytearray()
for i in range(0, len(padded_text), 8):
byte_array.append(int(padded_text[i:i + 8], 2))
return byte_array
def show_compressed_text(self):
frequency = self.get_frequency()
self.make_queue(frequency)
self.merge_nodes()
self.assignCodes()
encoded_text = self.get_compressed_text(self.text_to_compress)
padded_encoded_text = self.pad_encoded_text(encoded_text)
byte_array = self.make_byte_array(padded_encoded_text)
return bytes(byte_array)
The class HuffmanCoding takes in the text to be compressed. The show_compressed_text then spits out the byte_array of my compressed text. I then get this byte array, and write it to a binary file. I want to open this binary file, convert the byte array inside back into the string of 0's and 1's which represents my compressed text so that I can work on decompressing it. Hopefully that makes sense.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
