Message ID | 20190725174611.14802-4-newren@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Cleanup merge API | expand |
Hi Elijah, On Thu, 25 Jul 2019, Elijah Newren wrote: > This is the bug that just won't die; there always seems to be another > form of it somewhere. See the commit message of 55f39cf7551b ("merge: > fix misleading pre-merge check documentation", 2018-06-30) for a more > detailed explanation), but in short: > > <quick summary> > > builtin/merge.c contains this important requirement for merge > strategies: > > ...the index must be in sync with the head commit. The strategies are > responsible to ensure this. > > This condition is important to enforce because there are two likely > failure cases when the index isn't in sync with the head commit: > > * we silently throw away changes the user had staged before the merge > > * we accidentally (and silently) include changes in the merge that > were not part of either of the branches/trees being merged > > Discarding users' work and mis-merging are both bad outcomes, especially > when done silently, so naturally this rule was stated sternly -- but, > unfortunately totally ignored in practice unless and until actual bugs > were found. But, fear not: the bugs from this were fixed in commit > ee6566e8d70d ("[PATCH] Rewrite read-tree", 2005-09-05) > through a rewrite of read-tree (again, commit 55f39cf7551b has a more > detailed explanation of how this affected merge). And it was fixed > again in commit > 160252f81626 ("git-merge-ours: make sure our index matches HEAD", 2005-11-03) > ...and it was fixed again in commit > 3ec62ad9ffba ("merge-octopus: abort if index does not match HEAD", 2016-04-09) > ...and again in commit > 65170c07d466 ("merge-recursive: avoid incorporating uncommitted changes in a merge", 2017-12-21) > ...and again in commit > eddd1a411d93 ("merge-recursive: enforce rule that index matches head before merging", 2018-06-30) > > ...with multiple testcases added to the testsuite that could be > enumerated in even more commits. > > Then, finally, in a patch in the same series as the last fix above, the > documentation about this requirement was fixed in commit 55f39cf7551b > ("merge: fix misleading pre-merge check documentation", 2018-06-30), and > we all lived happily ever after... > > </quick summary> Whoa. What a story. > Unfortunately, "ever after" apparently denotes a limited time and it > expired today. The merge-recursive rule to enforce that index matches > head was at the beginning of merge_trees() and would only trigger when > opt->call_depth was 0. Since merge_recursive() doesn't call > merge_trees() until after returning from recursing, this meant that the > check wasn't triggered by merge_recursive() until it had first finished > all the intermediate merges to create virtual merge bases. That is a > potentially HUGE amount of computation (and writing of intermediate > merge results into the .git/objects directory) before it errors out and > says, in effect, "Sorry, I can't do any merging because you have some > local changes that would be overwritten." > > Trying to enforce that all of merge_trees(), merge_recursive(), and > merge_recursive_generic() checked the index == head condition earlier > resulted in a bunch of broken tests. It turns out that > merge_recursive() has code to drop and reload the cache while recursing > to create intermediate virtual merge bases, but unfortunately that code > runs even when no recursion is necessary. This unconditional dropping > and reloading of the cache masked a few bugs: > > * builtin/merge-recursive.c: didn't even bother loading the index. > > * builtin/stash.c: feels like a fake 'builtin' because it repeatedly > invokes git subprocesses all over the place, mixed with other > operations. In particular, invoking "git reset" will reset the > index on disk, but the parent process that invoked it won't > automatically have its in-memory index updated. Yep, the idea was to move fast to a built-in, and then continue by converting all those process spawns to proper calls to libgit "API" functions. Sadly, that did not happen yet. And you're absolutely right that failing to re-read the index after spawning a `reset --hard` causes problems. IIRC I fixed them all, though, see https://public-inbox.org/git/nycvar.QRO.7.76.6.1902191127420.41@tvgsbejvaqbjf.bet/ > So, load the index in builtin/merge-recursive.c, reload the in-memory > index in builtin/stash.c, and modify the t3030 testcase to correctly > setup the index and make sure that the test fails in the expected way > (meaning it reports a rename/rename conflict). Makes sense to me. > diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c > index 5b910e351e..a4bfd8fc51 100644 > --- a/builtin/merge-recursive.c > +++ b/builtin/merge-recursive.c > @@ -1,3 +1,4 @@ > +#include "cache.h" > #include "builtin.h" > #include "commit.h" > #include "tag.h" > @@ -63,6 +64,9 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix) > if (argc - i != 3) /* "--" "<head>" "<remote>" */ > die(_("not handling anything other than two heads merge.")); > > + if (repo_read_index_unmerged(the_repository)) > + die_resolve_conflict("merge"); For a moment I was unsure whether `_unmerged()` is the right thing to do here, as it specifically allows to read the index even when there are conflict stages. But I guess it does not matter too much here. I probably would have opted for `repo_read_index()` instead, though. > + > o.branch1 = argv[++i]; > o.branch2 = argv[++i]; > > diff --git a/builtin/stash.c b/builtin/stash.c > index fde6397caa..bec011c1bb 100644 > --- a/builtin/stash.c > +++ b/builtin/stash.c > @@ -427,6 +427,8 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, > return error(_("could not save index tree")); > > reset_head(); > + discard_cache(); > + read_cache(); I was honestly puzzled why this is necessary, at first. The preceding context expands to this: discard_cache(); read_cache(); if (write_cache_as_tree(&index_tree, 0, NULL)) return error(_("could not save index tree")); So basically, we already discard the index, read it again, then write it as a tree, then reset and then we have to discard the index again? But of course, if there are uncommitted changes, this would write a tree different from HEAD, then reset the index to match HEAD, so indeed, this discard/read dance is necessary. So this hunk is good. > } > } > > diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh > index ff641b348a..a37bcc58a0 100755 > --- a/t/t3030-merge-recursive.sh > +++ b/t/t3030-merge-recursive.sh > @@ -667,15 +667,22 @@ test_expect_success 'merging with triple rename across D/F conflict' ' > test_expect_success 'merge-recursive remembers the names of all base trees' ' > git reset --hard HEAD && > > + # make the index match $c1 so that merge-recursive below does not > + # fail early > + git diff --binary HEAD $c1 -- | git apply --cached && > + > # more trees than static slots used by oid_to_hex() > for commit in $c0 $c2 $c4 $c5 $c6 $c7 > do > git rev-parse "$commit^{tree}" > done >trees && > > - # ignore the return code -- it only fails because the input is weird > + # ignore the return code; it only fails because the input is weird... > test_must_fail git -c merge.verbosity=5 merge-recursive $(cat trees) -- $c1 $c3 >out && > > + # ...but make sure it fails in the expected way > + test_i18ngrep CONFLICT.*rename/rename out && > + This is obviously a good change: it strengthens the test case by fixing a subtle bug. Thanks, Dscho > # merge-recursive prints in reverse order, but we do not care > sort <trees >expect && > sed -n "s/^virtual //p" out | sort >actual && > -- > 2.22.0.559.g28a8880890.dirty > >
Hi Dscho, On Thu, Jul 25, 2019 at 12:41 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > On Thu, 25 Jul 2019, Elijah Newren wrote: > <snip> > > ...And it was fixed again in commit > > 160252f81626 ("git-merge-ours: make sure our index matches HEAD", 2005-11-03) > > ...and it was fixed again in commit > > 3ec62ad9ffba ("merge-octopus: abort if index does not match HEAD", 2016-04-09) > > ...and again in commit > > 65170c07d466 ("merge-recursive: avoid incorporating uncommitted changes in a merge", 2017-12-21) > > ...and again in commit > > eddd1a411d93 ("merge-recursive: enforce rule that index matches head before merging", 2018-06-30) > > > > ...with multiple testcases added to the testsuite that could be > > enumerated in even more commits. > > > > Then, finally, in a patch in the same series as the last fix above, the > > documentation about this requirement was fixed in commit 55f39cf7551b > > ("merge: fix misleading pre-merge check documentation", 2018-06-30), and > > we all lived happily ever after... > > > > </quick summary> > > Whoa. What a story. I know, right? > > diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c > > index 5b910e351e..a4bfd8fc51 100644 > > --- a/builtin/merge-recursive.c > > +++ b/builtin/merge-recursive.c > > @@ -1,3 +1,4 @@ > > +#include "cache.h" > > #include "builtin.h" > > #include "commit.h" > > #include "tag.h" > > @@ -63,6 +64,9 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix) > > if (argc - i != 3) /* "--" "<head>" "<remote>" */ > > die(_("not handling anything other than two heads merge.")); > > > > + if (repo_read_index_unmerged(the_repository)) > > + die_resolve_conflict("merge"); > > For a moment I was unsure whether `_unmerged()` is the right thing to do > here, as it specifically allows to read the index even when there are > conflict stages. But I guess it does not matter too much here. I > probably would have opted for `repo_read_index()` instead, though. The names repo_read_index() and repo_read_index_unmerged() actually seem slightly misleading to me; they seem to do the opposite of what you'd think they do. repo_read_index() reads in an index and allows unmerged entries and returns istate->cache_nr. repo_read_index_unmerged() calls repo_read_index(), then checks to see if any of the entries are unmerged and returns whether or not any unmerged entries were found. So, the way to disallow conflict stages isn't to use repo_read_index(), but to use repo_read_index_unmerged(), as I did. Counter-intuitive, I know. <snip> > But of course, if there are uncommitted changes, this would write a tree > different from HEAD, then reset the index to match HEAD, so indeed, this > discard/read dance is necessary. > > So this hunk is good. > <snip> > > This is obviously a good change: it strengthens the test case by fixing > a subtle bug. > > Thanks, > Dscho Thanks for taking a look!
Hi Elijah, On Thu, 25 Jul 2019, Elijah Newren wrote: > On Thu, Jul 25, 2019 at 12:41 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Thu, 25 Jul 2019, Elijah Newren wrote: > > > > > diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c > > > index 5b910e351e..a4bfd8fc51 100644 > > > --- a/builtin/merge-recursive.c > > > +++ b/builtin/merge-recursive.c > > > @@ -1,3 +1,4 @@ > > > +#include "cache.h" > > > #include "builtin.h" > > > #include "commit.h" > > > #include "tag.h" > > > @@ -63,6 +64,9 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix) > > > if (argc - i != 3) /* "--" "<head>" "<remote>" */ > > > die(_("not handling anything other than two heads merge.")); > > > > > > + if (repo_read_index_unmerged(the_repository)) > > > + die_resolve_conflict("merge"); > > > > For a moment I was unsure whether `_unmerged()` is the right thing to do > > here, as it specifically allows to read the index even when there are > > conflict stages. But I guess it does not matter too much here. I > > probably would have opted for `repo_read_index()` instead, though. > > The names repo_read_index() and repo_read_index_unmerged() actually > seem slightly misleading to me; they seem to do the opposite of what > you'd think they do. > > repo_read_index() reads in an index and allows unmerged entries and > returns istate->cache_nr. > > repo_read_index_unmerged() calls repo_read_index(), then checks to see > if any of the entries are unmerged and returns whether or not any > unmerged entries were found. > > So, the way to disallow conflict stages isn't to use > repo_read_index(), but to use repo_read_index_unmerged(), as I did. > Counter-intuitive, I know. Whoa. No wonder I get it wrong all the time. Thanks for educating me, Dscho
diff --git a/builtin/merge-recursive.c b/builtin/merge-recursive.c index 5b910e351e..a4bfd8fc51 100644 --- a/builtin/merge-recursive.c +++ b/builtin/merge-recursive.c @@ -1,3 +1,4 @@ +#include "cache.h" #include "builtin.h" #include "commit.h" #include "tag.h" @@ -63,6 +64,9 @@ int cmd_merge_recursive(int argc, const char **argv, const char *prefix) if (argc - i != 3) /* "--" "<head>" "<remote>" */ die(_("not handling anything other than two heads merge.")); + if (repo_read_index_unmerged(the_repository)) + die_resolve_conflict("merge"); + o.branch1 = argv[++i]; o.branch2 = argv[++i]; diff --git a/builtin/stash.c b/builtin/stash.c index fde6397caa..bec011c1bb 100644 --- a/builtin/stash.c +++ b/builtin/stash.c @@ -427,6 +427,8 @@ static int do_apply_stash(const char *prefix, struct stash_info *info, return error(_("could not save index tree")); reset_head(); + discard_cache(); + read_cache(); } } diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh index ff641b348a..a37bcc58a0 100755 --- a/t/t3030-merge-recursive.sh +++ b/t/t3030-merge-recursive.sh @@ -667,15 +667,22 @@ test_expect_success 'merging with triple rename across D/F conflict' ' test_expect_success 'merge-recursive remembers the names of all base trees' ' git reset --hard HEAD && + # make the index match $c1 so that merge-recursive below does not + # fail early + git diff --binary HEAD $c1 -- | git apply --cached && + # more trees than static slots used by oid_to_hex() for commit in $c0 $c2 $c4 $c5 $c6 $c7 do git rev-parse "$commit^{tree}" done >trees && - # ignore the return code -- it only fails because the input is weird + # ignore the return code; it only fails because the input is weird... test_must_fail git -c merge.verbosity=5 merge-recursive $(cat trees) -- $c1 $c3 >out && + # ...but make sure it fails in the expected way + test_i18ngrep CONFLICT.*rename/rename out && + # merge-recursive prints in reverse order, but we do not care sort <trees >expect && sed -n "s/^virtual //p" out | sort >actual &&
This is the bug that just won't die; there always seems to be another form of it somewhere. See the commit message of 55f39cf7551b ("merge: fix misleading pre-merge check documentation", 2018-06-30) for a more detailed explanation), but in short: <quick summary> builtin/merge.c contains this important requirement for merge strategies: ...the index must be in sync with the head commit. The strategies are responsible to ensure this. This condition is important to enforce because there are two likely failure cases when the index isn't in sync with the head commit: * we silently throw away changes the user had staged before the merge * we accidentally (and silently) include changes in the merge that were not part of either of the branches/trees being merged Discarding users' work and mis-merging are both bad outcomes, especially when done silently, so naturally this rule was stated sternly -- but, unfortunately totally ignored in practice unless and until actual bugs were found. But, fear not: the bugs from this were fixed in commit ee6566e8d70d ("[PATCH] Rewrite read-tree", 2005-09-05) through a rewrite of read-tree (again, commit 55f39cf7551b has a more detailed explanation of how this affected merge). And it was fixed again in commit 160252f81626 ("git-merge-ours: make sure our index matches HEAD", 2005-11-03) ...and it was fixed again in commit 3ec62ad9ffba ("merge-octopus: abort if index does not match HEAD", 2016-04-09) ...and again in commit 65170c07d466 ("merge-recursive: avoid incorporating uncommitted changes in a merge", 2017-12-21) ...and again in commit eddd1a411d93 ("merge-recursive: enforce rule that index matches head before merging", 2018-06-30) ...with multiple testcases added to the testsuite that could be enumerated in even more commits. Then, finally, in a patch in the same series as the last fix above, the documentation about this requirement was fixed in commit 55f39cf7551b ("merge: fix misleading pre-merge check documentation", 2018-06-30), and we all lived happily ever after... </quick summary> Unfortunately, "ever after" apparently denotes a limited time and it expired today. The merge-recursive rule to enforce that index matches head was at the beginning of merge_trees() and would only trigger when opt->call_depth was 0. Since merge_recursive() doesn't call merge_trees() until after returning from recursing, this meant that the check wasn't triggered by merge_recursive() until it had first finished all the intermediate merges to create virtual merge bases. That is a potentially HUGE amount of computation (and writing of intermediate merge results into the .git/objects directory) before it errors out and says, in effect, "Sorry, I can't do any merging because you have some local changes that would be overwritten." Trying to enforce that all of merge_trees(), merge_recursive(), and merge_recursive_generic() checked the index == head condition earlier resulted in a bunch of broken tests. It turns out that merge_recursive() has code to drop and reload the cache while recursing to create intermediate virtual merge bases, but unfortunately that code runs even when no recursion is necessary. This unconditional dropping and reloading of the cache masked a few bugs: * builtin/merge-recursive.c: didn't even bother loading the index. * builtin/stash.c: feels like a fake 'builtin' because it repeatedly invokes git subprocesses all over the place, mixed with other operations. In particular, invoking "git reset" will reset the index on disk, but the parent process that invoked it won't automatically have its in-memory index updated. * t3030-merge-recursive.h: this test has always been broken in that it didn't make sure to make index match head before running. But, it didn't care about the index or even the merge result, just the verbose output while running. While commit eddd1a411d93 ("merge-recursive: enforce rule that index matches head before merging", 2018-06-30) should have uncovered this broken test, it used a test_must_fail wrapper around the merge-recursive call because it was known that the merge resulted in a rename/rename conflict. Thus, that fix only made this test fail for a different reason, and since the index == head check didn't happen until after coming all the way back out of the recursion, the testcase had enough information to pass the one check that it did perform. So, load the index in builtin/merge-recursive.c, reload the in-memory index in builtin/stash.c, and modify the t3030 testcase to correctly setup the index and make sure that the test fails in the expected way (meaning it reports a rename/rename conflict). Signed-off-by: Elijah Newren <newren@gmail.com> --- builtin/merge-recursive.c | 4 ++++ builtin/stash.c | 2 ++ t/t3030-merge-recursive.sh | 9 ++++++++- 3 files changed, 14 insertions(+), 1 deletion(-)