'How to get specific file version from git repository using python
I have a local git repo and I'm trying to find a way to get a specific version of my xlsx file into my Python code so I can process it using pandas.
I found gitpython lib; but I'm not sure how to use it correctly.
repo = Repo(path_to_repo)
commit = repo.commit(sha)
targetfile = commit.tree / 'dataset.xlsx'
I don't know what to do next. I tried to load it to pandas using path; but, of course, it just loads my last version.
How to load previous version of xlsx to pandas?
Solution 1:[1]
When you ask for commit.tree / 'dataset.xlsx', you get back a git.Blob object:
>>> targetfile
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
If you want to read the contents of the object, you can extract the content using data_stream method, which returns a file-like object:
>>> data = targetfile.data_stream.read()
Or you can use the stream_data method (don't look at me, I didn't name them), which writes data into a file-like object:
>>> import io
>>> buf = io.BytesIO()
>>> targetfile.stream_data(buf)
<git.Blob "3137d9443f54325b8ad8a263b13053fee47fbff2">
>>> buf.getvalue()
b'The contents of the file...'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
