'How to get specific file version from git repository using python

I have a local git repo and I'm trying to find a way to get a specific version of my xlsx file into my Python code so I can process it using pandas.

I found gitpython lib; but I'm not sure how to use it correctly.

repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'

I don't know what to do next. I tried to load it to pandas using path; but, of course, it just loads my last version.

How to load previous version of xlsx to pandas?



Solution 1:[1]

When you ask for commit.tree / 'dataset.xlsx', you get back a git.Blob object:

>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">

If you want to read the contents of the object, you can extract the content using data_stream method, which returns a file-like object:

>>> data = targetfile.data_stream.read()

Or you can use the stream_data method (don't look at me, I didn't name them), which writes data into a file-like object:

>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1