'How do I efficiently clone a subset of files across a range of commits with git?
Using git, I would like to clone a subset of files between two commits on a branch. I know the timestamp of the first commit, and the second commit is near the tip.
Initially I thought the --filter=sparse:oid=<blob-ish> option for clone looked promising, but the server doesn't support it.
So far, I've tried checking out the tree data for the range:
$ git clone --no-checkout --filter=blob:none --shallow-since=<first_commit_timestamp> --branch <branch> --single-branch <remote_url>
Then setting up sparse checkout:
$ git sparse-checkout set --stdin < file_list.txt
But beyond this, I've only been able to come up with two unsatisfactory solutions to fetching the blobs.
Checkout each commit in the range
I can get the list of commits in the range and then iteratively call:
$ git checkout <commit>
But there could be a lot of commits and this is inefficient (lots of round trips).
Hacky solution
I can get the list of object ids needed for all commits after the first one with:
$ git rev-list --missing=print --first-parent --objects <first_commit_sha>..<last_commit_sha> | grep '?'
I then can find the object ids for the first commit by using:
$ git ls-tree -r <first_commit_sha>
and filtering on the files that are in my list.
Now the hacky part; I couldn't find a way to fetch a list of objects directly, but I did manage to abuse pack-objects for this purpose:
$ git pack-objects --stdout > /dev/null < object_ids.txt
Is there a better way to achieve the same result?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
