diff mbox series

cherry-pick: refuse cherry-pick sequence if index is dirty

Message ID pull.1535.git.1684830767336.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series cherry-pick: refuse cherry-pick sequence if index is dirty | expand

Commit Message

Tao Klerks May 23, 2023, 8:32 a.m. UTC
From: Tao Klerks <tao@klerks.biz>

Cherry-pick, like merge or rebase, refuses to run when there are changes
in the index. However, if a cherry-pick sequence is requested, this
refusal happens "too late": when the cherry-pick sequence has already
started, and an "--abort" or "--quit" is needed to resume normal
operation.

Normally, when an operation is "in-progress" and you want to go back to
where you were before, "--abort" is the right thing to run. If you run
"git cherry-pick --abort" in this specific situation, however, your
staged changes are destroyed as part of the abort! Generally speaking,
the abort process assumes any changes in the index are part of the
operation to be aborted.

Add an earlier check in the cherry-pick sequence process to ensure that
the index is clean, reusing the already-generalized method used for
rebase. Also add a test.

Signed-off-by: Tao Klerks <tao@klerks.biz>
---
    cherry-pick: refuse cherry-pick sequence if index is dirty
    
    I encountered this data-loss bug (performing normally-safe operations
    results in loss of local staged changes) while exploring
    largely-unrelated changes as per thread
    CAPMMpogFHnX2YPA4VmffmA0pku=43CQJ8iebCOkFm4ravBVTeg@mail.gmail.com on
    "git checkout --force", and on the possibility of supporting same-commit
    switch while preserving merge metadata.
    
    I believe it is best handled as a standalone issue, it looks like a
    simple bugfix to me, hence this separate patch.

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1535%2FTaoK%2Ftao-cherry-pick-sequence-safety-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1535/TaoK/tao-cherry-pick-sequence-safety-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/1535

 builtin/revert.c                |  2 +-
 sequencer.c                     | 14 +++++++++-----
 sequencer.h                     |  3 ++-
 t/t3510-cherry-pick-sequence.sh | 10 ++++++++++
 4 files changed, 22 insertions(+), 7 deletions(-)


base-commit: 4a714b37029a4b63dbd22f7d7ed81f7a0d693680

Comments

Tao Klerks May 23, 2023, 4:01 p.m. UTC | #1
On Tue, May 23, 2023 at 10:32 AM Tao Klerks via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Tao Klerks <tao@klerks.biz>
>
> Cherry-pick, like merge or rebase, refuses to run when there are changes
> in the index. However, if a cherry-pick sequence is requested, this
> refusal happens "too late": when the cherry-pick sequence has already
> started, and an "--abort" or "--quit" is needed to resume normal
> operation.
>
> Normally, when an operation is "in-progress" and you want to go back to
> where you were before, "--abort" is the right thing to run. If you run
> "git cherry-pick --abort" in this specific situation, however, your
> staged changes are destroyed as part of the abort! Generally speaking,
> the abort process assumes any changes in the index are part of the
> operation to be aborted.
>
> Add an earlier check in the cherry-pick sequence process to ensure that
> the index is clean, reusing the already-generalized method used for
> rebase. Also add a test.
>
> Signed-off-by: Tao Klerks <tao@klerks.biz>
> ---

My apologies for the premature submission: I've now realized I used
the wrong existing check. "git rebase" checks for a clean *worktree*
(ignoring untracked files), and that is what I reused here. What git
merge and git cherry-pick check for, and what I should have added a
check for here, is a clean *index*.

The current implementation of this patch is far too restrictive. It
doesn't break any tests (and maybe I should add one now that I know),
but it's doing the wrong thing.

Tao
Junio C Hamano May 24, 2023, 12:06 a.m. UTC | #2
Tao Klerks <tao@klerks.biz> writes:

> The current implementation of this patch is far too restrictive. It
> doesn't break any tests (and maybe I should add one now that I know),
> but it's doing the wrong thing.

I am ambivalent.  What do we want to see in a multi-pick sequence
that is different from rebase?  A single-step cherry-pick can fail
safely before it touches the index or the working tree files, but if
two-step cherry-pick, whose first step succeeds, finds that it
cannot safely carry out its second step without clobbering the local
changes made to the working tree files, what should happen?  Are we
OK if we stopped in the state just after the first step has already
been done?

My (tentative) answer to that question is "yes", but the recovery
options of "cherry-pick" may want to work differently from what we
have seen them traditionally do.  Namely, the user accepts that the
first step is already done, and stopping "cherry-pick", be it called
"--abort" or something else, should just remove the sequencer state
and behave as if the single-pick cherry-pick on the first step only
has just finished and leave such a state in the index and the
working tree.  If that is what we are going to do, then it would
make sense to adopt the same safety semantics we use for "git merge"
and "git checkout" to ensure only that the index is clean, relying
on the unpack-trees machinery that stops before clobbering a locally
modified working tree files.  But if we are to aim for "all-or-none"
semantics people expect from aborting "git rebase", I suspect that
it would be way too complicated to allow random changes in the
working tree files that we may only discover to be problems after
starting the sequence of replaying commits one-by-one, and "too
restrictive" check may be justified.  To put it differently, if it
is too restrictive for multi-pick, then we would want to loosen it
for "git rebase" as well, as the issues are likely to be the same.

Thanks.
Tao Klerks May 24, 2023, 9:33 a.m. UTC | #3
On Wed, May 24, 2023 at 2:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Tao Klerks <tao@klerks.biz> writes:
>
> > The current implementation of this patch is far too restrictive. It
> > doesn't break any tests (and maybe I should add one now that I know),
> > but it's doing the wrong thing.
>
> I am ambivalent.  What do we want to see in a multi-pick sequence
> that is different from rebase?

I would argue there are primarily three things that are different:
1. The checkout of the new base (and checkout of the original in an "--abort")
2. The support for and/or more-common expectation of "messing" with
commits as you go, eg squash, edit
3. The (partial) support for rebasing/recreating merge commits

I'm not sure to what extent any of these justify having tighter
restrictions on when we allow a rebase to start, though.

> A single-step cherry-pick can fail
> safely before it touches the index or the working tree files, but if
> two-step cherry-pick, whose first step succeeds, finds that it
> cannot safely carry out its second step without clobbering the local
> changes made to the working tree files, what should happen?  Are we
> OK if we stopped in the state just after the first step has already
> been done?

This is the current behavior: it stops before the specific pick that
is going to affect local unstaged changes, or if there are *any*
staged changes (in which case it stops as it's about to do the first
pick - the first time this check runs). The reasoning for this
behavior, as I understand it, is that the "--abort" strategy,
intending to "undo whatever I started doing here, including a conflict
resolution", resets the index. So as long as there is nothing you want
to keep in the index, and as long as we know that any previous picks
haven't impacted any files with unstaged changes, we're good.

The bug that I want to fix is that we only end up checking whether
there are changes in the index *after* we've already committed to
resetting the index upon later "--abort". It's a kind of catch-22:
we've detected that aborting would destroy your work, so we leave you
in a state where the most obvious thing to do is abort, so we destroy
your work... Of course, if you understand what's going on you can
choose to "--quit" and *not* lose your work... but this is completely
antithetical to the general intent of "--abort".

There's another, smaller flaw here I think, common to Merge,
Single-Cherry-Pick, and Sequence-Cherry-Pick, which is that *if* you
start with unstaged changes, and you end up in a conflict resolution
or "--no-commit" pause, and you then "git add" your unstaged changes
during that pause/resolution, and you *then* later "--abort"... then
your originally-unstaged changes are destroyed by the "--abort" - so
it has *not* taken you back to where you were before the operation
started. This is, to me as a user, non-obvious, and could potentially
lead to data loss. The only way I see to fix that, is to have *all* of
these operations refuse to operate on dirty worktrees altogether -
like rebase already does.

I suspect this level of "strictness" would be welcome to newcomers,
and less welcome to existing experienced users.

>
> My (tentative) answer to that question is "yes", but the recovery
> options of "cherry-pick" may want to work differently from what we
> have seen them traditionally do.  Namely, the user accepts that the
> first step is already done, and stopping "cherry-pick", be it called
> "--abort" or something else, should just remove the sequencer state
> and behave as if the single-pick cherry-pick on the first step only
> has just finished and leave such a state in the index and the
> working tree.

This behavior exists, and is called "--quit", right?

The semantics as I understand it are:
--quit: I know what I'm doing, just remove any "ongoing operation"
metadata and let me work with the current index and worktree.
--abort: This was a bad idea, please take me back to where I was
before I started this operation (without losing any work I had
ongoing, pls!)

>  If that is what we are going to do, then it would
> make sense to adopt the same safety semantics we use for "git merge"
> and "git checkout" to ensure only that the index is clean, relying
> on the unpack-trees machinery that stops before clobbering a locally
> modified working tree files.

Yep

> But if we are to aim for "all-or-none"
> semantics people expect from aborting "git rebase", I suspect that
> it would be way too complicated to allow random changes in the
> working tree files that we may only discover to be problems after
> starting the sequence of replaying commits one-by-one, and "too
> restrictive" check may be justified.

I don't think I understand this argument. If we want to support both
sets of semantics, then that's exactly what "--quit" and "--abort"
achieve, right? (as long as we check for the dirty index *before*
committing to destroying the index in case of "--abort")

>  To put it differently, if it
> is too restrictive for multi-pick, then we would want to loosen it
> for "git rebase" as well, as the issues are likely to be the same.

My argument for only changing "sequence-cherry-pick" here, and having
it (continue to) use the index-safety-only semantics of
single-cherry-pick and merge, is that *this is not a change in
cabability* - it is only a bugfix. Switching to the worktree-safety
semantics of rebase would be a substantial change in behavior beyond
the bugfix.

I, personally, would prefer to see the worktree-safety semantics of
rebase be used in *all* these operations, so I could no longer shoot
myself in the foot by starting a merge, accidentally staging some
previously-unstaged changes during conflict resolution, and then
losing those changes by "--abort"ing. But I expect that this kind of
change would need to be behind a config option of some sort, trading
off safety against low friction.

I could imagine a setting like "core.OperationWorktreeSafety", with
settings "default" (current behavior - rebase disallows dirty
worktrees, the others disallow dirty index), "strict" (all behave like
current rebase) and "lax" (all behave like merge).

As discussed elsewhere, I would also like to (have an option to) treat
untracked files as "worktree dirtiness"/unstaged changes in exactly
the same way as changes to tracked files - but that's another topic :)

I'll prepare a v2 with index-safety-only for sequence-cherry-pick for
now, please let me know if a (better-named)
"core.OperationWorktreeSafety" option is something that you'd be
interested in / that would make sense to you.

Thanks!
Phillip Wood May 30, 2023, 1:01 p.m. UTC | #4
On 24/05/2023 10:33, Tao Klerks wrote:
> On Wed, May 24, 2023 at 2:06 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Tao Klerks <tao@klerks.biz> writes:
>>
>>> The current implementation of this patch is far too restrictive. It
>>> doesn't break any tests (and maybe I should add one now that I know),
>>> but it's doing the wrong thing.
>>
>> I am ambivalent.  What do we want to see in a multi-pick sequence
>> that is different from rebase?
> 
> I would argue there are primarily three things that are different:
> 1. The checkout of the new base (and checkout of the original in an "--abort")
> 2. The support for and/or more-common expectation of "messing" with
> commits as you go, eg squash, edit
> 3. The (partial) support for rebasing/recreating merge commits
> 
> I'm not sure to what extent any of these justify having tighter
> restrictions on when we allow a rebase to start, though.
> 
>> A single-step cherry-pick can fail
>> safely before it touches the index or the working tree files, but if
>> two-step cherry-pick, whose first step succeeds, finds that it
>> cannot safely carry out its second step without clobbering the local
>> changes made to the working tree files, what should happen?  Are we
>> OK if we stopped in the state just after the first step has already
>> been done?
> 
> This is the current behavior: it stops before the specific pick that
> is going to affect local unstaged changes, or if there are *any*
> staged changes (in which case it stops as it's about to do the first
> pick - the first time this check runs). The reasoning for this
> behavior, as I understand it, is that the "--abort" strategy,
> intending to "undo whatever I started doing here, including a conflict
> resolution", resets the index. So as long as there is nothing you want
> to keep in the index, and as long as we know that any previous picks
> haven't impacted any files with unstaged changes, we're good.

"cherry-pick --abort" uses "reset --merge" so is slightly more 
complicated than just resetting the index.

> The bug that I want to fix is that we only end up checking whether
> there are changes in the index *after* we've already committed to
> resetting the index upon later "--abort". It's a kind of catch-22:
> we've detected that aborting would destroy your work, so we leave you
> in a state where the most obvious thing to do is abort, so we destroy
> your work... Of course, if you understand what's going on you can
> choose to "--quit" and *not* lose your work... but this is completely
> antithetical to the general intent of "--abort".
> 
> There's another, smaller flaw here I think, common to Merge,
> Single-Cherry-Pick, and Sequence-Cherry-Pick, which is that *if* you
> start with unstaged changes, and you end up in a conflict resolution
> or "--no-commit" pause, and you then "git add" your unstaged changes
> during that pause/resolution, and you *then* later "--abort"... then
> your originally-unstaged changes are destroyed by the "--abort" - so
> it has *not* taken you back to where you were before the operation
> started. This is, to me as a user, non-obvious, and could potentially
> lead to data loss. The only way I see to fix that, is to have *all* of
> these operations refuse to operate on dirty worktrees altogether -
> like rebase already does.
> 
> I suspect this level of "strictness" would be welcome to newcomers,
> and less welcome to existing experienced users.

Indeed, being able to cherry-pick when the worktree is dirty is 
occasionally useful but it is confusing when "cherry-pick --abort" does 
not restore the pre-cherry-pick state (and as Junio says below it would 
be complicated to preserve the initial changes when aborting).

>> My (tentative) answer to that question is "yes", but the recovery
>> options of "cherry-pick" may want to work differently from what we
>> have seen them traditionally do.  Namely, the user accepts that the
>> first step is already done, and stopping "cherry-pick", be it called
>> "--abort" or something else, should just remove the sequencer state
>> and behave as if the single-pick cherry-pick on the first step only
>> has just finished and leave such a state in the index and the
>> working tree.
> 
> This behavior exists, and is called "--quit", right?

I think so though I'm not sure what what "--quit" does to CHERRY_PICK_HEAD.

> The semantics as I understand it are:
> --quit: I know what I'm doing, just remove any "ongoing operation"
> metadata and let me work with the current index and worktree.
> --abort: This was a bad idea, please take me back to where I was
> before I started this operation (without losing any work I had
> ongoing, pls!)
> 
>>   If that is what we are going to do, then it would
>> make sense to adopt the same safety semantics we use for "git merge"
>> and "git checkout" to ensure only that the index is clean, relying
>> on the unpack-trees machinery that stops before clobbering a locally
>> modified working tree files.
> 
> Yep
> 
>> But if we are to aim for "all-or-none"
>> semantics people expect from aborting "git rebase", I suspect that
>> it would be way too complicated to allow random changes in the
>> working tree files that we may only discover to be problems after
>> starting the sequence of replaying commits one-by-one, and "too
>> restrictive" check may be justified.
 >
> I don't think I understand this argument. If we want to support both
> sets of semantics, then that's exactly what "--quit" and "--abort"
> achieve, right? (as long as we check for the dirty index *before*
> committing to destroying the index in case of "--abort")
> 
>>   To put it differently, if it
>> is too restrictive for multi-pick, then we would want to loosen it
>> for "git rebase" as well, as the issues are likely to be the same.
> 
> My argument for only changing "sequence-cherry-pick" here, and having
> it (continue to) use the index-safety-only semantics of
> single-cherry-pick and merge, is that *this is not a change in
> cabability* - it is only a bugfix. Switching to the worktree-safety
> semantics of rebase would be a substantial change in behavior beyond
> the bugfix.
> 
> I, personally, would prefer to see the worktree-safety semantics of
> rebase be used in *all* these operations, so I could no longer shoot
> myself in the foot by starting a merge, accidentally staging some
> previously-unstaged changes during conflict resolution, and then
> losing those changes by "--abort"ing. But I expect that this kind of
> change would need to be behind a config option of some sort, trading
> off safety against low friction.
> 
> I could imagine a setting like "core.OperationWorktreeSafety", with
> settings "default" (current behavior - rebase disallows dirty
> worktrees, the others disallow dirty index), "strict" (all behave like
> current rebase) and "lax" (all behave like merge).
> 
> As discussed elsewhere, I would also like to (have an option to) treat
> untracked files as "worktree dirtiness"/unstaged changes in exactly
> the same way as changes to tracked files - but that's another topic :)
> 
> I'll prepare a v2 with index-safety-only for sequence-cherry-pick for
> now, please let me know if a (better-named)
> "core.OperationWorktreeSafety" option is something that you'd be
> interested in / that would make sense to you.

I think it's worth considering a config option. One problem though is 
that if the user has to actively enable it to get greater safety it wont 
be much help to new users who don't know about it and if it is on by 
default we'll be breaking someone's existing workflow.

Best Wishes

Phillip
> Thanks!
diff mbox series

Patch

diff --git a/builtin/revert.c b/builtin/revert.c
index 0240ec8593b..91ebba38eaa 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -224,7 +224,7 @@  static int run_sequencer(int argc, const char **argv, const char *prefix,
 		return sequencer_rollback(the_repository, opts);
 	if (cmd == 's')
 		return sequencer_skip(the_repository, opts);
-	return sequencer_pick_revisions(the_repository, opts);
+	return sequencer_pick_revisions(the_repository, opts, me);
 }
 
 int cmd_revert(int argc, const char **argv, const char *prefix)
diff --git a/sequencer.c b/sequencer.c
index b553b49fbb6..9abea3b97fc 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3162,7 +3162,7 @@  static int walk_revs_populate_todo(struct todo_list *todo_list,
 	return 0;
 }
 
-static int create_seq_dir(struct repository *r)
+static int create_seq_dir(struct repository *r, const char *requested_action)
 {
 	enum replay_action action;
 	const char *in_progress_error = NULL;
@@ -3194,6 +3194,9 @@  static int create_seq_dir(struct repository *r)
 				advise_skip ? "--skip | " : "");
 		return -1;
 	}
+	if (require_clean_work_tree(r, requested_action,
+				    _("Please commit or stash them."), 1, 1))
+		return -1;
 	if (mkdir(git_path_seq_dir(), 0777) < 0)
 		return error_errno(_("could not create sequencer directory '%s'"),
 				   git_path_seq_dir());
@@ -5169,7 +5172,8 @@  static int single_pick(struct repository *r,
 }
 
 int sequencer_pick_revisions(struct repository *r,
-			     struct replay_opts *opts)
+			     struct replay_opts *opts,
+			     const char *action)
 {
 	struct todo_list todo_list = TODO_LIST_INIT;
 	struct object_id oid;
@@ -5223,12 +5227,12 @@  int sequencer_pick_revisions(struct repository *r,
 
 	/*
 	 * Start a new cherry-pick/ revert sequence; but
-	 * first, make sure that an existing one isn't in
-	 * progress
+	 * first, make sure that the index is clean and that
+	 * an existing one isn't in progress.
 	 */
 
 	if (walk_revs_populate_todo(&todo_list, opts) ||
-			create_seq_dir(r) < 0)
+			create_seq_dir(r, action) < 0)
 		return -1;
 	if (repo_get_oid(r, "HEAD", &oid) && (opts->action == REPLAY_REVERT))
 		return error(_("can't revert as initial commit"));
diff --git a/sequencer.h b/sequencer.h
index 913a0f652d9..1b39325c52c 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -159,7 +159,8 @@  void todo_list_filter_update_refs(struct repository *r,
 /* Call this to setup defaults before parsing command line options */
 void sequencer_init_config(struct replay_opts *opts);
 int sequencer_pick_revisions(struct repository *repo,
-			     struct replay_opts *opts);
+			     struct replay_opts *opts,
+			     const char *action);
 int sequencer_continue(struct repository *repo, struct replay_opts *opts);
 int sequencer_rollback(struct repository *repo, struct replay_opts *opts);
 int sequencer_skip(struct repository *repo, struct replay_opts *opts);
diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
index 3b0fa66c33d..e8f4138bf89 100755
--- a/t/t3510-cherry-pick-sequence.sh
+++ b/t/t3510-cherry-pick-sequence.sh
@@ -47,6 +47,16 @@  test_expect_success 'cherry-pick persists data on failure' '
 	test_path_is_file .git/sequencer/opts
 '
 
+test_expect_success 'cherry-pick sequence refuses to run on dirty index' '
+	pristine_detach initial &&
+	touch localindexchange &&
+	git add localindexchange &&
+	echo picking &&
+	test_must_fail git cherry-pick initial..picked &&
+	test_path_is_missing .git/sequencer &&
+	test_must_fail git cherry-pick --abort
+'
+
 test_expect_success 'cherry-pick mid-cherry-pick-sequence' '
 	pristine_detach initial &&
 	test_must_fail git cherry-pick base..anotherpick &&