diff mbox series

[v2,6/6] merge: do not exit restore_state() prematurely

Message ID 0783b48c121fe74051c13e7d9118d1a5b7cb9aa9.1655621424.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series Fix merge restore state | expand

Commit Message

Elijah Newren June 19, 2022, 6:50 a.m. UTC
From: Elijah Newren <newren@gmail.com>

Previously, if the user:

* Had no local changes before starting the merge
* A merge strategy makes changes to the working tree/index but returns
  with exit status 2

Then we'd call restore_state() to clean up the changes and either let
the next merge strategy run (if there is one), or exit telling the user
that no merge strategy could handle the merge.  Unfortunately,
restore_state() did not clean up the changes as expected; that function
was a no-op if the stash was a null, and the stash would be null if
there were no local changes before starting the merge.  So, instead of
"Rewinding the tree to pristine..." as the code claimed, restore_state()
would leave garbage around in the index and working tree (possibly
including conflicts) for either the next merge strategy or for the user
after aborting the merge.  And in the case of aborting the merge, the
user would be unable to run "git merge --abort" to get rid of the
unintended leftover conflicts, because the merge control files were not
written as it was presumed that we had restored to a clean state
already.

Fix the main problem by making sure that restore_state() only skips the
stash application if the stash is null rather than skipping the whole
function.

However, there is a secondary problem -- since merge.c forks
subprocesses to do the cleanup, the in-memory index is left out-of-sync.
While there was a refresh_cache(REFRESH_QUIET) call that attempted to
correct that, that function would not handle cases where the previous
merge strategy added conflicted entries.  We need to drop the index and
re-read it to handle such cases.

(Alternatively, we could stop forking subprocesses and instead call some
appropriate function to do the work which would update the in-memory
index automatically.  For now, just do the simple fix.)

Reported-by: ZheNing Hu <adlternative@gmail.com>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
 builtin/merge.c        | 10 ++++++----
 t/t7607-merge-state.sh | 25 +++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 4 deletions(-)
 create mode 100755 t/t7607-merge-state.sh

Comments

ZheNing Hu July 17, 2022, 4:44 p.m. UTC | #1
Elijah Newren via GitGitGadget <gitgitgadget@gmail.com> 于2022年6月19日周日 14:50写道:
>
> From: Elijah Newren <newren@gmail.com>
>
> @@ -398,7 +398,9 @@ static void restore_state(const struct object_id *head,
>          */
>         run_command_v_opt(args, RUN_GIT_CMD);
>
> -       refresh_cache(REFRESH_QUIET);
> +refresh_cache:
> +       if (discard_cache() < 0 || read_cache() < 0)
> +               die(_("could not read index"));
>  }
>

We don't need to check discard_cache() return value,
it's equal to zero constantly.

>  /* This is called when no merge was necessary. */
> diff --git a/t/t7607-merge-state.sh b/t/t7607-merge-state.sh
> new file mode 100755
> index 00000000000..655478cd0b3
> --- /dev/null
> +++ b/t/t7607-merge-state.sh
> @@ -0,0 +1,25 @@
> +#!/bin/sh
> +
> +test_description="Test that merge state is as expected after failed merge"
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +. ./test-lib.sh
> +
> +test_expect_success 'set up custom strategy' '
> +       test_commit --no-tag "Initial" base base &&
> +git show-ref &&
> +
> +       for b in branch1 branch2 branch3
> +       do
> +               git checkout -b $b main &&
> +               test_commit --no-tag "Change on $b" base $b
> +       done &&
> +
> +       git checkout branch1 &&
> +       test_must_fail git merge branch2 branch3 &&
> +       git diff --exit-code --name-status &&
> +       test_path_is_missing .git/MERGE_HEAD
> +'
> +

Little typo: less a small tab before "git show ref"?

> +test_done
> --
> gitgitgadget
Junio C Hamano July 19, 2022, 11:13 p.m. UTC | #2
"Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Fix the main problem by making sure that restore_state() only skips the
> stash application if the stash is null rather than skipping the whole
> function.

OK.


> However, there is a secondary problem -- since merge.c forks
> subprocesses to do the cleanup, the in-memory index is left out-of-sync.
> While there was a refresh_cache(REFRESH_QUIET) call that attempted to
> correct that, that function would not handle cases where the previous
> merge strategy added conflicted entries.  We need to drop the index and
> re-read it to handle such cases.

Absolutely right.

> diff --git a/builtin/merge.c b/builtin/merge.c
> index aaee8f6a553..a21dece1b55 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -385,11 +385,11 @@ static void restore_state(const struct object_id *head,
>  {
>  	const char *args[] = { "stash", "apply", "--index", NULL, NULL };
>  
> -	if (is_null_oid(stash))
> -		return;
> -
>  	reset_hard(head, 1);
>  
> +	if (is_null_oid(stash))
> +		goto refresh_cache;
> +
>  	args[3] = oid_to_hex(stash);
>  
>  	/*
> @@ -398,7 +398,9 @@ static void restore_state(const struct object_id *head,
>  	 */
>  	run_command_v_opt(args, RUN_GIT_CMD);
>  
> -	refresh_cache(REFRESH_QUIET);
> +refresh_cache:
> +	if (discard_cache() < 0 || read_cache() < 0)
> +		die(_("could not read index"));

Don't we need refresh_cache() after re-reading the on-disk index, or
do we have nothing to do further after restore_state() returns and
the stat-info being stale does not matter?  Given that [3/6] exists,
I suspect that we do want to make sure the in-core index is refreshed
before we go ahead and run the next merge, no?

>  }
>  
>  /* This is called when no merge was necessary. */

> diff --git a/t/t7607-merge-state.sh b/t/t7607-merge-state.sh
> new file mode 100755

As long we are adding a brand-new script for new tests, probably we
should add tests for other steps (like [4/6]) here, perhaps?

> index 00000000000..655478cd0b3
> --- /dev/null
> +++ b/t/t7607-merge-state.sh
> @@ -0,0 +1,25 @@
> +#!/bin/sh
> +
> +test_description="Test that merge state is as expected after failed merge"
> +
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> +. ./test-lib.sh
> +
> +test_expect_success 'set up custom strategy' '
> +	test_commit --no-tag "Initial" base base &&
> +git show-ref &&

Is this part of the test, or a leftover debugging aid?

> +
> +	for b in branch1 branch2 branch3
> +	do
> +		git checkout -b $b main &&
> +		test_commit --no-tag "Change on $b" base $b
> +	done &&
> +
> +	git checkout branch1 &&
> +	test_must_fail git merge branch2 branch3 &&
> +	git diff --exit-code --name-status &&
> +	test_path_is_missing .git/MERGE_HEAD
> +'

Hmph, I am not sure if the new behaviour is not too pessimistic.
When octopus fails after successfully merging branch2 and then
failing the merge of branch3 (i.e. the last one) due to conflict,
I think octpus users are used to be able to resolve it manually
and make a commit.  Are we making it impossible by doing the
reset-restore dance here?

I do not use, and more importantly, I do not recommend others to
use, Octopus anymore, and from that point of view, it is a good move
to make Octopus harder to use on any non-trivial merge, but those
who still like Octopus may disagree.

Thanks.

> +test_done
Eric Sunshine July 20, 2022, 12:09 a.m. UTC | #3
[replying to Junio's email since I don't have the original available...]

On Tue, Jul 19, 2022 at 7:22 PM Junio C Hamano <gitster@pobox.com> wrote:
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > +     for b in branch1 branch2 branch3
> > +     do
> > +             git checkout -b $b main &&
> > +             test_commit --no-tag "Change on $b" base $b
> > +     done &&

Let's break out of the loop with `|| return 1` if something in the
loop body fails.

    for b in branch1 branch2 branch3
    do
        git checkout -b $b main &&
        test_commit --no-tag "Change on $b" base $b || return 1
    done &&
Elijah Newren July 21, 2022, 2:03 a.m. UTC | #4
On Tue, Jul 19, 2022 at 5:09 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>
> [replying to Junio's email since I don't have the original available...]
>
> On Tue, Jul 19, 2022 at 7:22 PM Junio C Hamano <gitster@pobox.com> wrote:
> > "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> > > +     for b in branch1 branch2 branch3
> > > +     do
> > > +             git checkout -b $b main &&
> > > +             test_commit --no-tag "Change on $b" base $b
> > > +     done &&
>
> Let's break out of the loop with `|| return 1` if something in the
> loop body fails.
>
>     for b in branch1 branch2 branch3
>     do
>         git checkout -b $b main &&
>         test_commit --no-tag "Change on $b" base $b || return 1
>     done &&

Okay, will do.
Elijah Newren July 21, 2022, 3:27 a.m. UTC | #5
On Tue, Jul 19, 2022 at 4:13 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Fix the main problem by making sure that restore_state() only skips the
> > stash application if the stash is null rather than skipping the whole
> > function.
>
> OK.
>
>
> > However, there is a secondary problem -- since merge.c forks
> > subprocesses to do the cleanup, the in-memory index is left out-of-sync.
> > While there was a refresh_cache(REFRESH_QUIET) call that attempted to
> > correct that, that function would not handle cases where the previous
> > merge strategy added conflicted entries.  We need to drop the index and
> > re-read it to handle such cases.
>
> Absolutely right.
>
> > diff --git a/builtin/merge.c b/builtin/merge.c
> > index aaee8f6a553..a21dece1b55 100644
> > --- a/builtin/merge.c
> > +++ b/builtin/merge.c
> > @@ -385,11 +385,11 @@ static void restore_state(const struct object_id *head,
> >  {
> >       const char *args[] = { "stash", "apply", "--index", NULL, NULL };
> >
> > -     if (is_null_oid(stash))
> > -             return;
> > -
> >       reset_hard(head, 1);
> >
> > +     if (is_null_oid(stash))
> > +             goto refresh_cache;
> > +
> >       args[3] = oid_to_hex(stash);
> >
> >       /*
> > @@ -398,7 +398,9 @@ static void restore_state(const struct object_id *head,
> >        */
> >       run_command_v_opt(args, RUN_GIT_CMD);
> >
> > -     refresh_cache(REFRESH_QUIET);
> > +refresh_cache:
> > +     if (discard_cache() < 0 || read_cache() < 0)
> > +             die(_("could not read index"));
>
> Don't we need refresh_cache() after re-reading the on-disk index, or
> do we have nothing to do further after restore_state() returns and
> the stat-info being stale does not matter?  Given that [3/6] exists,
> I suspect that we do want to make sure the in-core index is refreshed
> before we go ahead and run the next merge, no?

I don't think so; the situation for [3/6] is different.  The basic
timeline is as follows:
    1. <User does lots of stuff over weeks and months>
    2. User decides to merge one or more branches
    3. merge does save_state() [i.e. "git stash create"]
    4. The first strategy fails
    5. We restore the state before trying the next strategy

The current code is dealing with step 5.  Patch [3/6] was to prevent
failures in step 3 from users creating stat-dirty files in step 1.
Once step 3 runs, the only way to become stat-dirty again is if the
user simultaneously messes with their checkout while the "git merge"
command is running.  Attempting to preventatively handle users
modifying the working tree simultaneously with concurrent git commands
like `git merge` seems like a losing proposition to me; it'd be a huge
can of worms and have a million holes.  I don't think that's worth it.

> >  }
> >
> >  /* This is called when no merge was necessary. */
>
> > diff --git a/t/t7607-merge-state.sh b/t/t7607-merge-state.sh
> > new file mode 100755
>
> As long we are adding a brand-new script for new tests, probably we
> should add tests for other steps (like [4/6]) here, perhaps?

Yes.

>
> > index 00000000000..655478cd0b3
> > --- /dev/null
> > +++ b/t/t7607-merge-state.sh
> > @@ -0,0 +1,25 @@
> > +#!/bin/sh
> > +
> > +test_description="Test that merge state is as expected after failed merge"
> > +
> > +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> > +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
> > +. ./test-lib.sh
> > +
> > +test_expect_success 'set up custom strategy' '
> > +     test_commit --no-tag "Initial" base base &&
> > +git show-ref &&
>
> Is this part of the test, or a leftover debugging aid?

Looks like part of a leftover debugging aid; sorry about that.  Will clean up.

> > +
> > +     for b in branch1 branch2 branch3
> > +     do
> > +             git checkout -b $b main &&
> > +             test_commit --no-tag "Change on $b" base $b
> > +     done &&
> > +
> > +     git checkout branch1 &&
> > +     test_must_fail git merge branch2 branch3 &&
> > +     git diff --exit-code --name-status &&
> > +     test_path_is_missing .git/MERGE_HEAD
> > +'
>
> Hmph, I am not sure if the new behaviour is not too pessimistic.
> When octopus fails after successfully merging branch2 and then
> failing the merge of branch3 (i.e. the last one) due to conflict,

That's not what's happening here.  It is not failing due to conflict,
octopus is reporting that it cannot even leave things in a conflicted
state for the user, and is actually incapable of handling this
particular type of merge.  Part of the output seen when attempting
this merge includes:
    fatal: merge program failed
    Should not be doing an octopus.

See previous discussion at
https://lore.kernel.org/git/xmqq35hdd205.fsf@gitster.g/.  To make this
clearer, perhaps I should use "test_expect_code 2" instead of
test_must_fail, and also grep the output/error for the above messages.

> I think octpus users are used to be able to resolve it manually
> and make a commit.  Are we making it impossible by doing the
> reset-restore dance here?

No, we are not changing what octopus can handle here.  The code above
is not triggered when octopus returns that it ran into conflicts for
the user to resolve.  It is only triggered when octopus says it cannot
handle the merge in question.  See your commit 98efc8f3d8 ("octopus:
allow manual resolve on the last round.", 2006-01-13), and note that
the the "exit 2" code path is the one that we are hitting.  I'll add
some comments to the testcase and rewrite the commit message to try to
make this clearer.
diff mbox series

Patch

diff --git a/builtin/merge.c b/builtin/merge.c
index aaee8f6a553..a21dece1b55 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -385,11 +385,11 @@  static void restore_state(const struct object_id *head,
 {
 	const char *args[] = { "stash", "apply", "--index", NULL, NULL };
 
-	if (is_null_oid(stash))
-		return;
-
 	reset_hard(head, 1);
 
+	if (is_null_oid(stash))
+		goto refresh_cache;
+
 	args[3] = oid_to_hex(stash);
 
 	/*
@@ -398,7 +398,9 @@  static void restore_state(const struct object_id *head,
 	 */
 	run_command_v_opt(args, RUN_GIT_CMD);
 
-	refresh_cache(REFRESH_QUIET);
+refresh_cache:
+	if (discard_cache() < 0 || read_cache() < 0)
+		die(_("could not read index"));
 }
 
 /* This is called when no merge was necessary. */
diff --git a/t/t7607-merge-state.sh b/t/t7607-merge-state.sh
new file mode 100755
index 00000000000..655478cd0b3
--- /dev/null
+++ b/t/t7607-merge-state.sh
@@ -0,0 +1,25 @@ 
+#!/bin/sh
+
+test_description="Test that merge state is as expected after failed merge"
+
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
+. ./test-lib.sh
+
+test_expect_success 'set up custom strategy' '
+	test_commit --no-tag "Initial" base base &&
+git show-ref &&
+
+	for b in branch1 branch2 branch3
+	do
+		git checkout -b $b main &&
+		test_commit --no-tag "Change on $b" base $b
+	done &&
+
+	git checkout branch1 &&
+	test_must_fail git merge branch2 branch3 &&
+	git diff --exit-code --name-status &&
+	test_path_is_missing .git/MERGE_HEAD
+'
+
+test_done