'Which commits does git rebase omit?
The git documentation says the following:
The commits that were previously saved into the temporary area are then reapplied to the current branch, one by one, in order. Note that any commits in HEAD which introduce the same textual changes as a commit in HEAD..upstream are omitted (i.e., a patch already accepted upstream with a different commit message or timestamp will be skipped).
Which seemed a bit confounding to me. Does this simply mean that any commit in the branch being rebased that doesn't change anything in the branch being rebased onto is omitted from the set of commits to be copied ?
if so:
- what if new_base is specified ? does this change the set of commits from HEAD..upstream to HEAD..new_base ?
- Why use a range ? why not just say "any commits in HEAD which introduce the same textual changes as a commit in upstream are omitted" ?
Solution 1:[1]
Does this simply mean that any commit in the branch being rebased that doesn't change anything in the branch being rebased onto is omitted from the set of commits to be copied ?
No. To get down to brass tacks, you can see what commits a plain git rebase regards as candidates with
git rev-list --reverse --no-merges @{upstream}..
and the commits it's checking against, to avoid reapplying already-applied commits, with
git rev-list --reverse --no-merges ..@{upstream}
The checking uses git patch-id. To see what git's looking at,
git rev-list --reverse --first-parent --no-merges @{upstream}.. \
| git diff-tree --patch --stdin \
| git patch-id
and
git rev-list --reverse --no-merges ..@{upstream} \
| git diff-tree --patch --stdin
| git patch-id
except that sequence is in the internals of the --right-only the git format-patch --right-only @{u}... the rebase really runs does to get its information.
Solution 2:[2]
Besides jthill's answer, which gets into some of the details of git rev-list's trickiness with using --no-merges and git patch-id, I'll add the following notes:
--no-mergesis suppressed if you use--rebase-merges.- With
--fork-point—which is sometimes the default—the rebase will omit commits that would be listed inupstream..HEADbased on data contained in the reflogs for the upstream.
The latter has gone through multiple changes over the years. At first, fork-point mode was implemented only in the git pull script. Then it was moved to git rebase proper, with the implementation tweaked, and now you can run git merge-base --fork-point to locate the fork-point commit.
Using the "fork point" is meant to help when dealing with an upstream rebase. That is, suppose that you have your clone of the Git repository over at origin. You've sent commits to whoever controls that repository. They may or may not have taken some of your commits. They may or may not have taken some other commits from other users. At some point, though, they also ran git rebase --interactive themselves and dropped from their set of commits some commits that they had at some point, that you rebased onto at some point.
Let's draw a sample situation. They started with this:
...--o--o--* <-- main
You cloned the repository and created three commits of your own, which we'll call E-F-G for no obvious reason:
...--o--o--* <-- main, origin/main
\
E--F--G <-- feature-X
They picked up four new commits, none matching yours yet, so that when you ran git fetch you got:
A--B--C--D <-- origin/feature-X
/
...--o--o--* <-- origin/main
\
E--F--G <-- feature-X
You then rebased your feature-X atop their feature-X (your origin/feature-X) to get:
E'-F'-G' <-- feature-X
/
A--B--C--D <-- origin/feature-X
/
...--o--o--* <-- origin/main
\
E--F--G [abandoned]
They then decide that commit C is bad so they rewrote their feature-X to drop C and replace their D' with a new commit D'. When you run git fetch, you get:
C--D--E'-F'-G' <-- feature-X
/
A--B--D' <-- origin/feature-X
/
...--o--o--* <-- origin/main
\
E--F--G [abandoned]
They then decide they like your commit G or G' (whichever it is) so much that they incorporate this into their feature-X, so that if you git fetch again you get:
C--D--E'-F'-G' <-- feature-X
/
A--B--D'-G" <-- origin/feature-X
/
...--o--o--* <-- origin/main
\
E--F--G [abandoned]
where their G" is their copy of your G or G': it introduces the same changes as your commit G' does to your commit F', but the line numbers don't match up.
Ideally you would like git rebase to somehow automatically determine that their commits C and D, which now look like they're your commits, are their commits and were dropped in favor of their D', and that their commit G" is "as good as" your G'. So you would want git rebase to produce this:
C--D--E'-F'-G' [abandoned]
/
A--B--D'-G" <-- origin/feature-X
/ \
...--o--o--* E"-F" <-- feature-X
\
E--F--G [abandoned]
That is, you want your git rebase to:
- not copy commit
C, even though you have one and they don't; - not copy commit
D', and - not copy commit
G'.
The patch-ID tricks that git rebase uses might cope with D' and G' here but would not correctly omit C. The fork-point code will correctly omit C, provided your origin/feature-X branch's reflog has the right information in it. That will generally be true as long as all of this activity has occurred within the last 90 days or so.
For more on the --fork-point option, see Git rebase - commit select in fork-point mode and (of course) the git rebase documentation.
Solution 3:[3]
Which seemed a bit confounding to me. Does this simply mean that any commit in the branch being rebased that doesn't change anything in the branch being rebased onto is omitted from the set of commits to be copied ?
Yes, mainly because in a normal situation, Git would never record an empty commit unless explicitly told so. What you would get instead here is a message "your working directory is clean" returned by git status.
what if new_base is specified ? does this change the set of commits from HEAD..upstream to HEAD..new_base ?
I believe this is what they meant by "upstream" indeed.
Why use a range ? why not just say "any commits in HEAD which introduce the same textual changes as a commit in upstream are omitted" ?
Because that's the way Git actually distinguishes two branches. The bottom commit of this range is the place both your branches have started to fork.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jthill |
| Solution 2 | torek |
| Solution 3 | Obsidian |
