'How to get PyMongo's bson to decode everything

I'm trying to get some data stored in a .bson file into a jupyter notebook.

Per this answer and this answer, the accepted answer is basically to use the bson module from PyMongo, and then the following code

FILE = "file.bson"
with open(FILE, 'rb') as f:
    data = bson.decode_all(f.read())

Now, data is a list of length 1.

data[0] is a dictionary.

The first key in this dictionary is a

data[0]["a"] is a dictionary with keys tag and data, and

data[0]["a"]["data"] is exactly what is should be, a list of integers that I can work with in python.

On the other hand, the second key in this dictionary is b

but now data[0]["b"] is a dictionary with keys tag, type, size, and data

and

data[0]["b"]["data"] is type bytes, and I'm not sure how to work with it.

I have never worked with bson before, so any input is appreciated. However, some of my questions are

  1. Does anyone have a good ref on how to work with bson in python?
  2. Does anyone know why a gets read in a readable way (not bytes), but b gets read in with more keys, but not readable (bytes as opposed to integers)
  3. I was really hoping read_all would take care of everything; does anyone know why it doesn't / what I should do differently? I've tried applying read_all again to the stuff still in bytes, but I get the error message InvalidBSON: invalid message size
  4. Does anyone have a solution for my goal, of getting the information from data[0]["b"]["data"] in a usable format (i.e. a list of integers)?


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source