diff mbox series

[v2] sequencer: fix edit handling for cherry-pick and revert messages

Message ID pull.988.v2.git.git.1617070174458.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series [v2] sequencer: fix edit handling for cherry-pick and revert messages | expand

Commit Message

Elijah Newren March 30, 2021, 2:09 a.m. UTC
From: Elijah Newren <newren@gmail.com>

save_opts() should save any non-default values.  It was intended to do
this, but since most options in struct replay_opts default to 0, it only
saved non-zero values.  Unfortunately, this does not always work for
options.edit.  Roughly speaking, options.edit had a default value of 0
for cherry-pick but a default value of 1 for revert.  Make save_opts()
record a value whenever it differs from the default.

options.edit was also overly simplistic; we had more than two cases.
The behavior that previously existed was as follows:

                       Non-conflict commits    Right after Conflict
    revert             Edit iff isatty(0)      Edit (ignore isatty(0))
    cherry-pick        No edit                 See above
    Specify --edit     Edit (ignore isatty(0)) See above
    Specify --no-edit  (*)                     See above

    (*) Before stopping for conflicts, No edit is the behavior.  After
        stopping for conflicts, the --no-edit flag is not saved so see
        the first two rows.

However, the expected behavior is:

                       Non-conflict commits    Right after Conflict
    revert             Edit iff isatty(0)      Edit iff isatty(0)
    cherry-pick        No edit                 Edit iff isatty(0)
    Specify --edit     Edit (ignore isatty(0)) Edit (ignore isatty(0))
    Specify --no-edit  No edit                 No edit

In order to get the expected behavior, we need to change options.edit
to a tri-state: unspecified, false, or true.  When specified, we follow
what it says.  When unspecified, we need to check whether the current
commit being created is resolving a conflict as well as consulting
options.action and isatty(0).  While at it, add a should_edit() utility
function that compresses options.edit down to a boolean based on the
additional information for the non-conflict case.

continue_single_pick() is the function responsible for resuming after
conflict cases, regardless of whether there is one commit being picked
or many.  Make this function stop assuming edit behavior in all cases,
so that it can correctly handle !isatty(0) and specific requests to not
edit the commit message.

Reported-by: Renato Botelho <garga@freebsd.org>
Signed-off-by: Elijah Newren <newren@gmail.com>
---
    sequencer: fix edit handling for cherry-pick and revert messages
    
    save_opts() should save any non-default values. It was intended to do
    this, but since most options in struct replay_opts default to 0, it only
    saved non-zero values...
    
    Changes since v1:
    
     * fixed a few typos
     * Clarified that continue_single_pick() is used for resuming from
       conflict regardless of the number of commits being picked.
    
    Reported-by: Renato Botelho garga@freebsd.org Signed-off-by: Elijah
    Newren newren@gmail.com

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-988%2Fnewren%2Ffix-sequencer-no-edit-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-988/newren/fix-sequencer-no-edit-v2
Pull-Request: https://github.com/git/git/pull/988

Range-diff vs v1:

 1:  30cdc3ce215e ! 1:  376799bd3d99 sequencer: fix edit handling for cherry-pick and revert messages
     @@ Commit message
      
          save_opts() should save any non-default values.  It was intended to do
          this, but since most options in struct replay_opts default to 0, it only
     -    saved non-zero values.  Unfortunatley, this does not always work for
     +    saved non-zero values.  Unfortunately, this does not always work for
          options.edit.  Roughly speaking, options.edit had a default value of 0
          for cherry-pick but a default value of 1 for revert.  Make save_opts()
          record a value whenever it differs from the default.
     @@ Commit message
          function that compresses options.edit down to a boolean based on the
          additional information for the non-conflict case.
      
     -    Make continue_single_pick() (which is the function responsible for
     -    resuming after conflict cases) stop assuming edit behavior in all cases,
     +    continue_single_pick() is the function responsible for resuming after
     +    conflict cases, regardless of whether there is one commit being picked
     +    or many.  Make this function stop assuming edit behavior in all cases,
          so that it can correctly handle !isatty(0) and specific requests to not
          edit the commit message.
      
     @@ sequencer.c: static void record_in_rewritten(struct object_id *oid,
      +	assert(opts->edit >= -1 && opts->edit <= 1);
      +	if (opts->edit == -1)
      +		/*
     -+		 * Note the we only handle the case of non-conflicted
     ++		 * Note that we only handle the case of non-conflicted
      +		 * commits; continue_single_pick() handles the conflicted
      +		 * commits itself instead of calling this function.
      +		 */


 builtin/revert.c                |  4 +--
 sequencer.c                     | 55 ++++++++++++++++++++++++++-------
 sequencer.h                     |  6 ++--
 t/t3510-cherry-pick-sequence.sh | 32 ++++++++++++++++++-
 4 files changed, 80 insertions(+), 17 deletions(-)


base-commit: 98164e9585e02e31dcf1377a553efe076c15f8c6

Comments

Johannes Schindelin March 30, 2021, 10:13 a.m. UTC | #1
Hi Elijah,

On Tue, 30 Mar 2021, Elijah Newren via GitGitGadget wrote:

> From: Elijah Newren <newren@gmail.com>
>
> save_opts() should save any non-default values.  It was intended to do
> this, but since most options in struct replay_opts default to 0, it only
> saved non-zero values.  Unfortunately, this does not always work for
> options.edit.  Roughly speaking, options.edit had a default value of 0
> for cherry-pick but a default value of 1 for revert.  Make save_opts()
> record a value whenever it differs from the default.
>
> options.edit was also overly simplistic; we had more than two cases.
> The behavior that previously existed was as follows:
>
>                        Non-conflict commits    Right after Conflict
>     revert             Edit iff isatty(0)      Edit (ignore isatty(0))
>     cherry-pick        No edit                 See above
>     Specify --edit     Edit (ignore isatty(0)) See above
>     Specify --no-edit  (*)                     See above
>
>     (*) Before stopping for conflicts, No edit is the behavior.  After
>         stopping for conflicts, the --no-edit flag is not saved so see
>         the first two rows.
>
> However, the expected behavior is:
>
>                        Non-conflict commits    Right after Conflict
>     revert             Edit iff isatty(0)      Edit iff isatty(0)
>     cherry-pick        No edit                 Edit iff isatty(0)
>     Specify --edit     Edit (ignore isatty(0)) Edit (ignore isatty(0))
>     Specify --no-edit  No edit                 No edit
>
> In order to get the expected behavior, we need to change options.edit
> to a tri-state: unspecified, false, or true.  When specified, we follow
> what it says.  When unspecified, we need to check whether the current
> commit being created is resolving a conflict as well as consulting
> options.action and isatty(0).  While at it, add a should_edit() utility
> function that compresses options.edit down to a boolean based on the
> additional information for the non-conflict case.
>
> continue_single_pick() is the function responsible for resuming after
> conflict cases, regardless of whether there is one commit being picked
> or many.  Make this function stop assuming edit behavior in all cases,
> so that it can correctly handle !isatty(0) and specific requests to not
> edit the commit message.

Nicely explained!

I'll allow myself one tangent: the subject of the sequencer's Unix shell
script heritage seems to come up with an increasing frequency, in
particular the awful "let's write out one file per setting" strategy.

I would _love_ for `save_opts()` to write a JSON instead (or an INI via
the `git_config_*()` family of functions, as is done already by the
cherry-pick/revert stuff), now that we no longer have any shell script
backend (apart from `--preserve-merges`, but that one is on its way out
anyway).

The one thing that concerns me with this idea is that I know for a fact
that some enterprisey users play games with those files inside
`<GIT_DIR>/rebase-merge` that should be considered internal implementation
details. Not sure how to deprecate that properly, I don't think we have a
sane way to detect whether users rely on these implementation details
other than breaking their expectations, which is not really a gentle way
to ask them to update their scripts.

> diff --git a/builtin/revert.c b/builtin/revert.c
> index 314a86c5621b..81441020231a 100644
> --- a/builtin/revert.c
> +++ b/builtin/revert.c
> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
>  				"--signoff", opts->signoff,
>  				"--no-commit", opts->no_commit,
>  				"-x", opts->record_origin,
> -				"--edit", opts->edit,
> +				"--edit", opts->edit == 1,

Honestly, I'd prefer `> 0` here.

>  				NULL);
>
>  	if (cmd) {
> @@ -230,8 +230,6 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>  	struct replay_opts opts = REPLAY_OPTS_INIT;
>  	int res;
>
> -	if (isatty(0))
> -		opts.edit = 1;
>  	opts.action = REPLAY_REVERT;
>  	sequencer_init_config(&opts);
>  	res = run_sequencer(argc, argv, &opts);
> diff --git a/sequencer.c b/sequencer.c
> index 848204d3dc3f..d444c778a097 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -1860,14 +1860,26 @@ static void record_in_rewritten(struct object_id *oid,
>  		flush_rewritten_pending();
>  }
>
> +static int should_edit(struct replay_opts *opts) {
> +	assert(opts->edit >= -1 && opts->edit <= 1);

Do we really want to introduce more of these useless `assert()`s? I know
that we stopped converting them to `BUG()`, but I really dislike
introducing new ones: they have very little effect, being no-ops by
default in most setups.

> +	if (opts->edit == -1)

Maybe `< 0`, as we do elsewhere for "not specified"?

> +		/*
> +		 * Note that we only handle the case of non-conflicted
> +		 * commits; continue_single_pick() handles the conflicted
> +		 * commits itself instead of calling this function.
> +		 */
> +		return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;

Apart from the extra parentheses, that makes sense to me.

> +	return opts->edit;
> +}
> +
>  static int do_pick_commit(struct repository *r,
>  			  enum todo_command command,
>  			  struct commit *commit,
>  			  struct replay_opts *opts,
>  			  int final_fixup, int *check_todo)
>  {
> -	unsigned int flags = opts->edit ? EDIT_MSG : 0;
> -	const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r);
> +	unsigned int flags = should_edit(opts) ? EDIT_MSG : 0;
> +	const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r);
>  	struct object_id head;
>  	struct commit *base, *next, *parent;
>  	const char *base_label, *next_label;
> @@ -3101,9 +3113,9 @@ static int save_opts(struct replay_opts *opts)
>  	if (opts->no_commit)
>  		res |= git_config_set_in_file_gently(opts_file,
>  					"options.no-commit", "true");
> -	if (opts->edit)
> -		res |= git_config_set_in_file_gently(opts_file,
> -					"options.edit", "true");
> +	if (opts->edit != -1)

s/!= -1/>= 0/

> +		res |= git_config_set_in_file_gently(opts_file, "options.edit",
> +						     opts->edit ? "true" : "false");
>  	if (opts->allow_empty)
>  		res |= git_config_set_in_file_gently(opts_file,
>  					"options.allow-empty", "true");
> @@ -4077,7 +4089,7 @@ static int pick_commits(struct repository *r,
>  	prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION));
>  	if (opts->allow_ff)
>  		assert(!(opts->signoff || opts->no_commit ||
> -			 opts->record_origin || opts->edit ||
> +			 opts->record_origin || should_edit(opts) ||
>  			 opts->committer_date_is_author_date ||
>  			 opts->ignore_date));
>  	if (read_and_refresh_cache(r, opts))
> @@ -4370,14 +4382,35 @@ static int pick_commits(struct repository *r,
>  	return sequencer_remove_state(opts);
>  }
>
> -static int continue_single_pick(struct repository *r)
> +static int continue_single_pick(struct repository *r, struct replay_opts *opts)
>  {
> -	const char *argv[] = { "commit", NULL };
> +	struct strvec argv = STRVEC_INIT;
> +	int want_edit;

Do we really want that extra `want_edit` variable? I think the code would
be easier to read without it, and still be obvious enough.

> +	int ret;
>
>  	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
>  	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
>  		return error(_("no cherry-pick or revert in progress"));
> -	return run_command_v_opt(argv, RUN_GIT_CMD);
> +
> +	strvec_push(&argv, "commit");
> +
> +	/*
> +	 * continue_single_pick() handles the case of recovering from a
> +	 * conflict.  should_edit() doesn't handle that case; for a conflict,
> +	 * we want to edit if the user asked for it, or if they didn't specify
> +	 * and stdin is a tty.
> +	 */
> +	want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0));
> +	if (!want_edit)

Here is what I would prefer:

	if (!opts->edit || (opts->edit < 0 && !isatty(0)))

The rest looks good, and the comments are _really_ helpful.

And the remainder of the patch also looks good, so I will spare readers
time by not even quoting it.

Thank you!
Dscho
Junio C Hamano March 30, 2021, 6:47 p.m. UTC | #2
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
>>  				"--signoff", opts->signoff,
>>  				"--no-commit", opts->no_commit,
>>  				"-x", opts->record_origin,
>> -				"--edit", opts->edit,
>> +				"--edit", opts->edit == 1,
>
> Honestly, I'd prefer `> 0` here.

Unless somebody (including Elijah) is trying to soon introduce yet
another value to .edit member, I'd agree 100%.  If it is a tristate
(unspecified, no, yes), I think "is it positive" should be the way
to ask "does the user definitely wants it?", "is it zero" should be
the way to ask "does the user definitely declines it?" and "is it
non-negative" (and "is it negative") the way to ask "does the user
care (or not care)?".  Using that consistently is good.

>> +static int should_edit(struct replay_opts *opts) {
>> +	assert(opts->edit >= -1 && opts->edit <= 1);
>
> Do we really want to introduce more of these useless `assert()`s? I know
> that we stopped converting them to `BUG()`, but I really dislike
> introducing new ones: they have very little effect, being no-ops by
> default in most setups.

Yeah, in a new code in flux where programmers can easily make
errors, "if (...) BUG()" may not be a bad thing to add (but then we
may want to see if we can make the codepaths involved less error
prone), but I agree with your view that assert() is mostly useless.
A comment that explains the expectation and why that expectation is
there would be more useful.


>> +	if (opts->edit == -1)
>
> Maybe `< 0`, as we do elsewhere for "not specified"?

Yup.

>> +		/*
>> +		 * Note that we only handle the case of non-conflicted
>> +		 * commits; continue_single_pick() handles the conflicted
>> +		 * commits itself instead of calling this function.
>> +		 */
>> +		return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;
>
> Apart from the extra parentheses, that makes sense to me.

I can take it either way (but personally I think this particular one
is easier to see as written---this is subjective).

> ...
> The rest looks good, and the comments are _really_ helpful.

Yup, I agree.

Thanks for a review.
Elijah Newren March 30, 2021, 7:37 p.m. UTC | #3
On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> Hi Elijah,
>
> On Tue, 30 Mar 2021, Elijah Newren via GitGitGadget wrote:
>
> > From: Elijah Newren <newren@gmail.com>
> >
> > save_opts() should save any non-default values.  It was intended to do
> > this, but since most options in struct replay_opts default to 0, it only
> > saved non-zero values.  Unfortunately, this does not always work for
> > options.edit.  Roughly speaking, options.edit had a default value of 0
> > for cherry-pick but a default value of 1 for revert.  Make save_opts()
> > record a value whenever it differs from the default.
> >
> > options.edit was also overly simplistic; we had more than two cases.
> > The behavior that previously existed was as follows:
> >
> >                        Non-conflict commits    Right after Conflict
> >     revert             Edit iff isatty(0)      Edit (ignore isatty(0))
> >     cherry-pick        No edit                 See above
> >     Specify --edit     Edit (ignore isatty(0)) See above
> >     Specify --no-edit  (*)                     See above
> >
> >     (*) Before stopping for conflicts, No edit is the behavior.  After
> >         stopping for conflicts, the --no-edit flag is not saved so see
> >         the first two rows.
> >
> > However, the expected behavior is:
> >
> >                        Non-conflict commits    Right after Conflict
> >     revert             Edit iff isatty(0)      Edit iff isatty(0)
> >     cherry-pick        No edit                 Edit iff isatty(0)
> >     Specify --edit     Edit (ignore isatty(0)) Edit (ignore isatty(0))
> >     Specify --no-edit  No edit                 No edit
> >
> > In order to get the expected behavior, we need to change options.edit
> > to a tri-state: unspecified, false, or true.  When specified, we follow
> > what it says.  When unspecified, we need to check whether the current
> > commit being created is resolving a conflict as well as consulting
> > options.action and isatty(0).  While at it, add a should_edit() utility
> > function that compresses options.edit down to a boolean based on the
> > additional information for the non-conflict case.
> >
> > continue_single_pick() is the function responsible for resuming after
> > conflict cases, regardless of whether there is one commit being picked
> > or many.  Make this function stop assuming edit behavior in all cases,
> > so that it can correctly handle !isatty(0) and specific requests to not
> > edit the commit message.
>
> Nicely explained!
>
> I'll allow myself one tangent: the subject of the sequencer's Unix shell
> script heritage seems to come up with an increasing frequency, in
> particular the awful "let's write out one file per setting" strategy.
>
> I would _love_ for `save_opts()` to write a JSON instead (or an INI via
> the `git_config_*()` family of functions, as is done already by the
> cherry-pick/revert stuff), now that we no longer have any shell script
> backend (apart from `--preserve-merges`, but that one is on its way out
> anyway).
>
> The one thing that concerns me with this idea is that I know for a fact
> that some enterprisey users play games with those files inside
> `<GIT_DIR>/rebase-merge` that should be considered internal implementation
> details. Not sure how to deprecate that properly, I don't think we have a
> sane way to detect whether users rely on these implementation details
> other than breaking their expectations, which is not really a gentle way
> to ask them to update their scripts.

Ooh, I'm glad you took this tangent.  May I follow it for a second?
I'd really, really like this too, for three reasons:

1) I constantly get confused about the massive duplication and
difference in control structures and which ones affect which type of
operations in sequencer.c.  Having them both use an ini-file would
allow us to remove lots of that duplication.  I'm sure there'll still
be some rebase-specific or cherry-pick-specific options, but we don't
need a separate parallel structure for every piece of config.

2) In my merge-ort optimization work, rebasing 35 commits in linux.git
across a massive set of 26K upstream renames has dropped rename
detection time from the top spot.  And it also dropped several other
things in the merge machinery from their spots as well.  Do you know
what's the slowest now?  Wasted time from sequencer.c: the unnecessary
process forking and all the useless disk writing (which I suspect is
mostly updating the index and working directory but also writing all
the individual control files under .git/rebase-merge/).  And this
stuff from sequencer.c is not just barely the slowest part, it's the
slowest by a wide margin.

3) I also want to allow cherry-picking or rebasing branches that
aren't even checked out (assuming no conflicts are triggered;
otherwise an error can be shown with the user asked to repeat with the
operation connected to an active working directory).  For such an
operation, the difference between "cherry-pick" and "rebase" is nearly
irrelevant so you'd expect the code to be the same; every time I look
at the code, though, it seems that the control structures are
separating these two types of operations in more areas than just the
reading and writing of the config.

> > diff --git a/builtin/revert.c b/builtin/revert.c
> > index 314a86c5621b..81441020231a 100644
> > --- a/builtin/revert.c
> > +++ b/builtin/revert.c
> > @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
> >                               "--signoff", opts->signoff,
> >                               "--no-commit", opts->no_commit,
> >                               "-x", opts->record_origin,
> > -                             "--edit", opts->edit,
> > +                             "--edit", opts->edit == 1,
>
> Honestly, I'd prefer `> 0` here.

WFM; I'll change it.

>
> >                               NULL);
> >
> >       if (cmd) {
> > @@ -230,8 +230,6 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
> >       struct replay_opts opts = REPLAY_OPTS_INIT;
> >       int res;
> >
> > -     if (isatty(0))
> > -             opts.edit = 1;
> >       opts.action = REPLAY_REVERT;
> >       sequencer_init_config(&opts);
> >       res = run_sequencer(argc, argv, &opts);
> > diff --git a/sequencer.c b/sequencer.c
> > index 848204d3dc3f..d444c778a097 100644
> > --- a/sequencer.c
> > +++ b/sequencer.c
> > @@ -1860,14 +1860,26 @@ static void record_in_rewritten(struct object_id *oid,
> >               flush_rewritten_pending();
> >  }
> >
> > +static int should_edit(struct replay_opts *opts) {
> > +     assert(opts->edit >= -1 && opts->edit <= 1);
>
> Do we really want to introduce more of these useless `assert()`s? I know
> that we stopped converting them to `BUG()`, but I really dislike
> introducing new ones: they have very little effect, being no-ops by
> default in most setups.

What do you mean by "useless"?  They serve two purposes:
  * It's a very concise comment understood by readers of the code
  * Many asserts are helpful guardrails for future people attempting
to change the code and assumptions they were written under

Neither of these have anything to do with users running code in
production -- if that mattered, I would have used BUG().  Since
production use isn't relevant here, I could use code comments, but
those tend to be much more verbose, and less helpful in catching areas
where assumptions were important to the code for any future folks
(possibly myself in a few years) who want to change the code in ways
that might call into question previous assumptions.

> > +     if (opts->edit == -1)
>
> Maybe `< 0`, as we do elsewhere for "not specified"?

Makes sense; will change.

> > +             /*
> > +              * Note that we only handle the case of non-conflicted
> > +              * commits; continue_single_pick() handles the conflicted
> > +              * commits itself instead of calling this function.
> > +              */
> > +             return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;
>
> Apart from the extra parentheses, that makes sense to me.

The extra parentheses make it more readable to me; since the statement
still makes sense to you, I'll leave them in.  :-)

> > +     return opts->edit;
> > +}
> > +
> >  static int do_pick_commit(struct repository *r,
> >                         enum todo_command command,
> >                         struct commit *commit,
> >                         struct replay_opts *opts,
> >                         int final_fixup, int *check_todo)
> >  {
> > -     unsigned int flags = opts->edit ? EDIT_MSG : 0;
> > -     const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r);
> > +     unsigned int flags = should_edit(opts) ? EDIT_MSG : 0;
> > +     const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r);
> >       struct object_id head;
> >       struct commit *base, *next, *parent;
> >       const char *base_label, *next_label;
> > @@ -3101,9 +3113,9 @@ static int save_opts(struct replay_opts *opts)
> >       if (opts->no_commit)
> >               res |= git_config_set_in_file_gently(opts_file,
> >                                       "options.no-commit", "true");
> > -     if (opts->edit)
> > -             res |= git_config_set_in_file_gently(opts_file,
> > -                                     "options.edit", "true");
> > +     if (opts->edit != -1)
>
> s/!= -1/>= 0/

Will fix.

> > +             res |= git_config_set_in_file_gently(opts_file, "options.edit",
> > +                                                  opts->edit ? "true" : "false");
> >       if (opts->allow_empty)
> >               res |= git_config_set_in_file_gently(opts_file,
> >                                       "options.allow-empty", "true");
> > @@ -4077,7 +4089,7 @@ static int pick_commits(struct repository *r,
> >       prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION));
> >       if (opts->allow_ff)
> >               assert(!(opts->signoff || opts->no_commit ||
> > -                      opts->record_origin || opts->edit ||
> > +                      opts->record_origin || should_edit(opts) ||
> >                        opts->committer_date_is_author_date ||
> >                        opts->ignore_date));
> >       if (read_and_refresh_cache(r, opts))
> > @@ -4370,14 +4382,35 @@ static int pick_commits(struct repository *r,
> >       return sequencer_remove_state(opts);
> >  }
> >
> > -static int continue_single_pick(struct repository *r)
> > +static int continue_single_pick(struct repository *r, struct replay_opts *opts)
> >  {
> > -     const char *argv[] = { "commit", NULL };
> > +     struct strvec argv = STRVEC_INIT;
> > +     int want_edit;
>
> Do we really want that extra `want_edit` variable? I think the code would
> be easier to read without it, and still be obvious enough.
>
> > +     int ret;
> >
> >       if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
> >           !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
> >               return error(_("no cherry-pick or revert in progress"));
> > -     return run_command_v_opt(argv, RUN_GIT_CMD);
> > +
> > +     strvec_push(&argv, "commit");
> > +
> > +     /*
> > +      * continue_single_pick() handles the case of recovering from a
> > +      * conflict.  should_edit() doesn't handle that case; for a conflict,
> > +      * we want to edit if the user asked for it, or if they didn't specify
> > +      * and stdin is a tty.
> > +      */
> > +     want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0));
> > +     if (!want_edit)
>
> Here is what I would prefer:
>
>         if (!opts->edit || (opts->edit < 0 && !isatty(0)))

Given the changes elsewhere to use >0 as true, <0 as unspecified, and
==0 for false, this change makes perfect sense.  I'll include it.

> The rest looks good, and the comments are _really_ helpful.
>
> And the remainder of the patch also looks good, so I will spare readers
> time by not even quoting it.
>
> Thank you!
> Dscho
Elijah Newren March 30, 2021, 8:16 p.m. UTC | #4
On Tue, Mar 30, 2021 at 11:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
> >> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
> >>                              "--signoff", opts->signoff,
> >>                              "--no-commit", opts->no_commit,
> >>                              "-x", opts->record_origin,
> >> -                            "--edit", opts->edit,
> >> +                            "--edit", opts->edit == 1,
> >
> > Honestly, I'd prefer `> 0` here.
>
> Unless somebody (including Elijah) is trying to soon introduce yet
> another value to .edit member, I'd agree 100%.  If it is a tristate
> (unspecified, no, yes), I think "is it positive" should be the way
> to ask "does the user definitely wants it?", "is it zero" should be
> the way to ask "does the user definitely declines it?" and "is it
> non-negative" (and "is it negative") the way to ask "does the user
> care (or not care)?".  Using that consistently is good.

Sounds good; I'll switch it over.

> >> +static int should_edit(struct replay_opts *opts) {
> >> +    assert(opts->edit >= -1 && opts->edit <= 1);
> >
> > Do we really want to introduce more of these useless `assert()`s? I know
> > that we stopped converting them to `BUG()`, but I really dislike
> > introducing new ones: they have very little effect, being no-ops by
> > default in most setups.
>
> Yeah, in a new code in flux where programmers can easily make
> errors, "if (...) BUG()" may not be a bad thing to add (but then we
> may want to see if we can make the codepaths involved less error
> prone), but I agree with your view that assert() is mostly useless.
> A comment that explains the expectation and why that expectation is
> there would be more useful.

Since you both don't like this assert, I'll remove it.  But I strongly
disagree that assert is useless in general.  If you two have such a
strong reaction to assert statements, though, would you two prefer
that I add a new affirm() function that is ALSO compiled out in
production?  Because I really want to use one of those.  My operating
assumptions with asserts are the following:

1) If the check is relevant for production, assert() statements should
NOT be used; if(...) BUG() should be used instead.
2) assert statements will be compiled out in production, almost always
  2a) NOTE: don't make asserts expensive, since a few production users
will keep them
3) assert statements will be active for future me and some other folks
doing active git development

Do you two disagree with any of those operating assumptions?  I find
asserts very valuable because:

* It's a _concise_ code comment that is readily understood.  Any
attempt to word in English the same thing that an assert statement
checks always takes longer
* It helps future folks tweaking the rules to catch additional
locations where assumptions were made about the old rules.  In the
development of merge-ort, for example, asserts shortened my debugging
cycles as I found and attempted new optimizations or added new
features or changed data structures and so on.  The checks were _only_
assists while developing; once the code is right, the checks could be
removed.  But future development might occur, so it'd be nice to have
a way to keep the checks in the code just for those future developers
while production users remove them.

In particular, for merge-ort, I think the second point is very
helpful.  What can achieve the "remove these now-unnecessary checks
from the code for production, but keep them there for future
development"?  I thought assert() was created exactly for this
purpose.  Would you rather I created an affirm() that does essentially
the same thing and is compiled out unless DEVELOPER=1?  That would
allow us to declare all assert() calls in the code as buggy, but I'm
not sure affirm() is as readily understood by developers reading the
code as "ooh, a reminder I get to assume these statements are true
while I'm reading the rest of the code".

> >> +    if (opts->edit == -1)
> >
> > Maybe `< 0`, as we do elsewhere for "not specified"?
>
> Yup.
>
> >> +            /*
> >> +             * Note that we only handle the case of non-conflicted
> >> +             * commits; continue_single_pick() handles the conflicted
> >> +             * commits itself instead of calling this function.
> >> +             */
> >> +            return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;
> >
> > Apart from the extra parentheses, that makes sense to me.
>
> I can take it either way (but personally I think this particular one
> is easier to see as written---this is subjective).
>
> > ...
> > The rest looks good, and the comments are _really_ helpful.
>
> Yup, I agree.
>
> Thanks for a review.

Indeed; and thanks to you as well Junio for all your time reviewing.
Johannes Schindelin March 31, 2021, 1:48 p.m. UTC | #5
Hi,

On Tue, 30 Mar 2021, Elijah Newren wrote:

> On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
>
> > I'll allow myself one tangent: the subject of the sequencer's Unix
> > shell script heritage seems to come up with an increasing frequency,
> > in particular the awful "let's write out one file per setting"
> > strategy.
> >
> > I would _love_ for `save_opts()` to write a JSON instead (or an INI
> > via the `git_config_*()` family of functions, as is done already by
> > the cherry-pick/revert stuff), now that we no longer have any shell
> > script backend (apart from `--preserve-merges`, but that one is on its
> > way out anyway).
> >
> > The one thing that concerns me with this idea is that I know for a
> > fact that some enterprisey users play games with those files inside
> > `<GIT_DIR>/rebase-merge` that should be considered internal
> > implementation details. Not sure how to deprecate that properly, I
> > don't think we have a sane way to detect whether users rely on these
> > implementation details other than breaking their expectations, which
> > is not really a gentle way to ask them to update their scripts.
>
> Ooh, I'm glad you took this tangent.  May I follow it for a second?
> I'd really, really like this too, for three reasons:
>
> 1) I constantly get confused about the massive duplication and
> difference in control structures and which ones affect which type of
> operations in sequencer.c.  Having them both use an ini-file would
> allow us to remove lots of that duplication.  I'm sure there'll still
> be some rebase-specific or cherry-pick-specific options, but we don't
> need a separate parallel structure for every piece of config.
>
> 2) In my merge-ort optimization work, rebasing 35 commits in linux.git
> across a massive set of 26K upstream renames has dropped rename
> detection time from the top spot.  And it also dropped several other
> things in the merge machinery from their spots as well.  Do you know
> what's the slowest now?  Wasted time from sequencer.c: the unnecessary
> process forking and all the useless disk writing (which I suspect is
> mostly updating the index and working directory but also writing all
> the individual control files under .git/rebase-merge/).  And this
> stuff from sequencer.c is not just barely the slowest part, it's the
> slowest by a wide margin.
>
> 3) I also want to allow cherry-picking or rebasing branches that
> aren't even checked out (assuming no conflicts are triggered;
> otherwise an error can be shown with the user asked to repeat with the
> operation connected to an active working directory).  For such an
> operation, the difference between "cherry-pick" and "rebase" is nearly
> irrelevant so you'd expect the code to be the same; every time I look
> at the code, though, it seems that the control structures are
> separating these two types of operations in more areas than just the
> reading and writing of the config.

Excellent, we're in agreement, then.

The remaining question is: how do we want to go about it? Do we just want
to codify the notion that these are internal details that are nobody's
business, and if they are using the exact file system layout of the
current sequencer, then it's their responsibility to adapt?

Ciao,
Dscho
Ævar Arnfjörð Bjarmason March 31, 2021, 5:36 p.m. UTC | #6
On Tue, Mar 30 2021, Elijah Newren wrote:

> On Tue, Mar 30, 2021 at 11:47 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>>
>> >> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
>> >>                              "--signoff", opts->signoff,
>> >>                              "--no-commit", opts->no_commit,
>> >>                              "-x", opts->record_origin,
>> >> -                            "--edit", opts->edit,
>> >> +                            "--edit", opts->edit == 1,
>> >
>> > Honestly, I'd prefer `> 0` here.
>>
>> Unless somebody (including Elijah) is trying to soon introduce yet
>> another value to .edit member, I'd agree 100%.  If it is a tristate
>> (unspecified, no, yes), I think "is it positive" should be the way
>> to ask "does the user definitely wants it?", "is it zero" should be
>> the way to ask "does the user definitely declines it?" and "is it
>> non-negative" (and "is it negative") the way to ask "does the user
>> care (or not care)?".  Using that consistently is good.
>
> Sounds good; I'll switch it over.
>
>> >> +static int should_edit(struct replay_opts *opts) {
>> >> +    assert(opts->edit >= -1 && opts->edit <= 1);
>> >
>> > Do we really want to introduce more of these useless `assert()`s? I know
>> > that we stopped converting them to `BUG()`, but I really dislike
>> > introducing new ones: they have very little effect, being no-ops by
>> > default in most setups.
>>
>> Yeah, in a new code in flux where programmers can easily make
>> errors, "if (...) BUG()" may not be a bad thing to add (but then we
>> may want to see if we can make the codepaths involved less error
>> prone), but I agree with your view that assert() is mostly useless.
>> A comment that explains the expectation and why that expectation is
>> there would be more useful.
>
> Since you both don't like this assert, I'll remove it.  But I strongly
> disagree that assert is useless in general.  If you two have such a
> strong reaction to assert statements, though, would you two prefer
> that I add a new affirm() function that is ALSO compiled out in
> production?  Because I really want to use one of those.  My operating
> assumptions with asserts are the following:
>
> 1) If the check is relevant for production, assert() statements should
> NOT be used; if(...) BUG() should be used instead.
> 2) assert statements will be compiled out in production, almost always
>   2a) NOTE: don't make asserts expensive, since a few production users
> will keep them
> 3) assert statements will be active for future me and some other folks
> doing active git development
>
> Do you two disagree with any of those operating assumptions?  I find
> asserts very valuable because:
>
> * It's a _concise_ code comment that is readily understood.  Any
> attempt to word in English the same thing that an assert statement
> checks always takes longer
> * It helps future folks tweaking the rules to catch additional
> locations where assumptions were made about the old rules.  In the
> development of merge-ort, for example, asserts shortened my debugging
> cycles as I found and attempted new optimizations or added new
> features or changed data structures and so on.  The checks were _only_
> assists while developing; once the code is right, the checks could be
> removed.  But future development might occur, so it'd be nice to have
> a way to keep the checks in the code just for those future developers
> while production users remove them.
>
> In particular, for merge-ort, I think the second point is very
> helpful.  What can achieve the "remove these now-unnecessary checks
> from the code for production, but keep them there for future
> development"?  I thought assert() was created exactly for this
> purpose.  Would you rather I created an affirm() that does essentially
> the same thing and is compiled out unless DEVELOPER=1?  That would
> allow us to declare all assert() calls in the code as buggy, but I'm
> not sure affirm() is as readily understood by developers reading the
> code as "ooh, a reminder I get to assume these statements are true
> while I'm reading the rest of the code".

I don't mind the asserts, or to have them in the default build.

But if you'd like to submit patches for asserts and can't otherwise get
them accepted, then can we please not make DEVELOPER a thing that you
can't turn on in production without thinking twice? Per my
https://lore.kernel.org/git/87wnusj6gt.fsf@evledraar.gmail.com/

>> >> +    if (opts->edit == -1)
>> >
>> > Maybe `< 0`, as we do elsewhere for "not specified"?
>>
>> Yup.
>>
>> >> +            /*
>> >> +             * Note that we only handle the case of non-conflicted
>> >> +             * commits; continue_single_pick() handles the conflicted
>> >> +             * commits itself instead of calling this function.
>> >> +             */
>> >> +            return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;
>> >
>> > Apart from the extra parentheses, that makes sense to me.
>>
>> I can take it either way (but personally I think this particular one
>> is easier to see as written---this is subjective).
>>
>> > ...
>> > The rest looks good, and the comments are _really_ helpful.
>>
>> Yup, I agree.
>>
>> Thanks for a review.
>
> Indeed; and thanks to you as well Junio for all your time reviewing.
Elijah Newren March 31, 2021, 5:52 p.m. UTC | #7
On Wed, Mar 31, 2021 at 10:36 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
>
> On Tue, Mar 30 2021, Elijah Newren wrote:
>
> > In particular, for merge-ort, I think the second point is very
> > helpful.  What can achieve the "remove these now-unnecessary checks
> > from the code for production, but keep them there for future
> > development"?  I thought assert() was created exactly for this
> > purpose.  Would you rather I created an affirm() that does essentially
> > the same thing and is compiled out unless DEVELOPER=1?  That would
> > allow us to declare all assert() calls in the code as buggy, but I'm
> > not sure affirm() is as readily understood by developers reading the
> > code as "ooh, a reminder I get to assume these statements are true
> > while I'm reading the rest of the code".
>
> I don't mind the asserts, or to have them in the default build.
>
> But if you'd like to submit patches for asserts and can't otherwise get
> them accepted, then can we please not make DEVELOPER a thing that you
> can't turn on in production without thinking twice? Per my
> https://lore.kernel.org/git/87wnusj6gt.fsf@evledraar.gmail.com/

Fair enough; if we have to go the affirm() route, I should probably
just make it depend on NDEBUG.  :-)
Junio C Hamano March 31, 2021, 6:01 p.m. UTC | #8
Elijah Newren <newren@gmail.com> writes:

> ... would you two prefer
> that I add a new affirm() function that is ALSO compiled out in
> production?

The reason I do not think affirm would fly is because it shares the
most problematic trait with assert(): it can be comipled out.

There was at least one bug we found and fixed in a piece of code
that came later in a codepath that had an assert() in it.  The buggy
code (IIRC, it was added way after the assert() was written in an
unrelated patch) was unknowingly depending on a side-effect of an
innocuous function call (i.e. assert(func_tion(args) == expected))
made in the assert condition, and the bug did not reproduce in the
developer's environment.
Elijah Newren April 1, 2021, 4:31 p.m. UTC | #9
On Wed, Mar 31, 2021 at 11:01 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> > ... would you two prefer
> > that I add a new affirm() function that is ALSO compiled out in
> > production?
>
> The reason I do not think affirm would fly is because it shares the
> most problematic trait with assert(): it can be comipled out.
>
> There was at least one bug we found and fixed in a piece of code
> that came later in a codepath that had an assert() in it.  The buggy
> code (IIRC, it was added way after the assert() was written in an
> unrelated patch) was unknowingly depending on a side-effect of an
> innocuous function call (i.e. assert(func_tion(args) == expected))
> made in the assert condition, and the bug did not reproduce in the
> developer's environment.

Ah, now that's a much different argument than claiming assert is
useless.  I had to debug one of those once about a decade ago, in a
different project, put in by some other developer, and yeah it's
painful.

If that causes assert() to be frowned upon, then I can understand
based on this reason.  If so, perhaps a BUG_ON() function would be
useful?  The "if (...) BUG()" construct causes line coverage problems
and I find it an onerous way to signal to future code readers "hey,
just as a reminder, these things should be true at this point".  If
you need to signal something more than that or explain why the things
being false is a problem or what might cause it, then the "if (...)
BUG()" construct is useful and there are certainly many places for
that, but it just doesn't fit all the cases.
Phillip Wood April 2, 2021, 11:28 a.m. UTC | #10
Hi Dscho and Elijah

On 31/03/2021 14:48, Johannes Schindelin wrote:
> Hi,
> 
> On Tue, 30 Mar 2021, Elijah Newren wrote:
> 
>> On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
>> <Johannes.Schindelin@gmx.de> wrote:
>>
>>> I'll allow myself one tangent: the subject of the sequencer's Unix
>>> shell script heritage seems to come up with an increasing frequency,
>>> in particular the awful "let's write out one file per setting"
>>> strategy.
>>>
>>> I would _love_ for `save_opts()` to write a JSON instead (or an INI
>>> via the `git_config_*()` family of functions, as is done already by
>>> the cherry-pick/revert stuff), now that we no longer have any shell
>>> script backend (apart from `--preserve-merges`, but that one is on its
>>> way out anyway).
>>>
>>> The one thing that concerns me with this idea is that I know for a
>>> fact that some enterprisey users play games with those files inside
>>> `<GIT_DIR>/rebase-merge` that should be considered internal
>>> implementation details. Not sure how to deprecate that properly, I
>>> don't think we have a sane way to detect whether users rely on these
>>> implementation details other than breaking their expectations, which
>>> is not really a gentle way to ask them to update their scripts.

I think it depends if users are just reading the files or writing to 
them. If they are reading them (which my scripts do) then we could 
continue to write the important files along side the new single-file 
that git actually reads. I think there is a distinction between the 
files such as git-rebase-todo which hold state and the one-line files 
which hold the options passed by the user. For example I have scripts 
that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head 
and check that author-script exists. If a script starts a rebase then it 
should know what options are in effect without reading them from 
.git/rebase-merge.

>> Ooh, I'm glad you took this tangent.  May I follow it for a second?
>> I'd really, really like this too, for three reasons:
>>
>> 1) I constantly get confused about the massive duplication and
>> difference in control structures and which ones affect which type of
>> operations in sequencer.c.  Having them both use an ini-file would
>> allow us to remove lots of that duplication.  I'm sure there'll still
>> be some rebase-specific or cherry-pick-specific options, but we don't
>> need a separate parallel structure for every piece of config.
>>
>> 2) In my merge-ort optimization work, rebasing 35 commits in linux.git
>> across a massive set of 26K upstream renames has dropped rename
>> detection time from the top spot.  And it also dropped several other
>> things in the merge machinery from their spots as well.  Do you know
>> what's the slowest now?  Wasted time from sequencer.c: the unnecessary
>> process forking and all the useless disk writing (which I suspect is
>> mostly updating the index and working directory but also writing all
>> the individual control files under .git/rebase-merge/).  And this
>> stuff from sequencer.c is not just barely the slowest part, it's the
>> slowest by a wide margin.

I think we do a lot of needless writing which is unrelated to whether we 
write to one file or may files. For example from memory picking a commit 
involves writing the 'message', 'author-script', 'rewritten-pending' 
(which is often immediately deleted), 'rewritten-list' (we append to 
that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the 
pick has become empty), 'git-rebase-todo' is completely rewritten, and 
'done' is appended to. None of this is necessary. For rewording and 
merges the only thing that needs to be written is the commit message for 
the external process to use. Fixup and squash add a couple more files 
into the mix.

I think we would save a lot by only syncing the state to disk when we 
stop or run an exec command (the state needs to be synced so exec 
commands can alter the todo list). In those cases we need to write the 
index and possibly run an external process so writing a couple of files 
is probably insignificant.

Where I think we can usefully consolidate is the one-line files which 
store the options rather than state - these are read an written much 
less frequently so I don't think they have much of a performance hit but 
it would be much nicer to just serialize the options to a single file.

>>
>> 3) I also want to allow cherry-picking or rebasing branches that
>> aren't even checked out (assuming no conflicts are triggered;
>> otherwise an error can be shown with the user asked to repeat with the
>> operation connected to an active working directory).

Exciting!

>>  For such an
>> operation, the difference between "cherry-pick" and "rebase" is nearly
>> irrelevant so you'd expect the code to be the same; every time I look
>> at the code, though, it seems that the control structures are
>> separating these two types of operations in more areas than just the
>> reading and writing of the config.

Yes this can be confusing, for example rebase and cherry-pick handle the 
todo list differently. Rebase removes the command before trying to pick 
the commit and adds it back if the pick fails for a non-conflict reason, 
cherry-pick only removes the command if the pick is successful.

Best Wishes

Phillip

> Excellent, we're in agreement, then.
> 
> The remaining question is: how do we want to go about it? Do we just want
> to codify the notion that these are internal details that are nobody's
> business, and if they are using the exact file system layout of the
> current sequencer, then it's their responsibility to adapt?
> 
> Ciao,
> Dscho
>
Phillip Wood April 2, 2021, 1:10 p.m. UTC | #11
one more thought below...

On 02/04/2021 12:28, Phillip Wood wrote:
> Hi Dscho and Elijah
> 
> On 31/03/2021 14:48, Johannes Schindelin wrote:
>> Hi,
>>
>> On Tue, 30 Mar 2021, Elijah Newren wrote:
>>
>>> On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin
>>> <Johannes.Schindelin@gmx.de> wrote:
>>>
>>>> I'll allow myself one tangent: the subject of the sequencer's Unix
>>>> shell script heritage seems to come up with an increasing frequency,
>>>> in particular the awful "let's write out one file per setting"
>>>> strategy.
>>>>
>>>> I would _love_ for `save_opts()` to write a JSON instead (or an INI
>>>> via the `git_config_*()` family of functions, as is done already by
>>>> the cherry-pick/revert stuff), now that we no longer have any shell
>>>> script backend (apart from `--preserve-merges`, but that one is on its
>>>> way out anyway).
>>>>
>>>> The one thing that concerns me with this idea is that I know for a
>>>> fact that some enterprisey users play games with those files inside
>>>> `<GIT_DIR>/rebase-merge` that should be considered internal
>>>> implementation details. Not sure how to deprecate that properly, I
>>>> don't think we have a sane way to detect whether users rely on these
>>>> implementation details other than breaking their expectations, which
>>>> is not really a gentle way to ask them to update their scripts.
> 
> I think it depends if users are just reading the files or writing to 
> them. If they are reading them (which my scripts do) then we could 
> continue to write the important files along side the new single-file 
> that git actually reads. I think there is a distinction between the 
> files such as git-rebase-todo which hold state and the one-line files 
> which hold the options passed by the user. For example I have scripts 
> that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head 
> and check that author-script exists. If a script starts a rebase then it 
> should know what options are in effect without reading them from 
> .git/rebase-merge.
> 
>>> Ooh, I'm glad you took this tangent.  May I follow it for a second?
>>> I'd really, really like this too, for three reasons:
>>>
>>> 1) I constantly get confused about the massive duplication and
>>> difference in control structures and which ones affect which type of
>>> operations in sequencer.c.  Having them both use an ini-file would
>>> allow us to remove lots of that duplication.  I'm sure there'll still
>>> be some rebase-specific or cherry-pick-specific options, but we don't
>>> need a separate parallel structure for every piece of config.

One thing to bear in mind is that you can cherry-pick or revert a 
sequence of commits while rebasing - I think that means we need to store 
the state for rebase in a separate location to that for cherry-pick/revert

Best Wishes

Phillip

>>> 2) In my merge-ort optimization work, rebasing 35 commits in linux.git
>>> across a massive set of 26K upstream renames has dropped rename
>>> detection time from the top spot.  And it also dropped several other
>>> things in the merge machinery from their spots as well.  Do you know
>>> what's the slowest now?  Wasted time from sequencer.c: the unnecessary
>>> process forking and all the useless disk writing (which I suspect is
>>> mostly updating the index and working directory but also writing all
>>> the individual control files under .git/rebase-merge/).  And this
>>> stuff from sequencer.c is not just barely the slowest part, it's the
>>> slowest by a wide margin.
> 
> I think we do a lot of needless writing which is unrelated to whether we 
> write to one file or may files. For example from memory picking a commit 
> involves writing the 'message', 'author-script', 'rewritten-pending' 
> (which is often immediately deleted), 'rewritten-list' (we append to 
> that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the 
> pick has become empty), 'git-rebase-todo' is completely rewritten, and 
> 'done' is appended to. None of this is necessary. For rewording and 
> merges the only thing that needs to be written is the commit message for 
> the external process to use. Fixup and squash add a couple more files 
> into the mix.
> 
> I think we would save a lot by only syncing the state to disk when we 
> stop or run an exec command (the state needs to be synced so exec 
> commands can alter the todo list). In those cases we need to write the 
> index and possibly run an external process so writing a couple of files 
> is probably insignificant.
> 
> Where I think we can usefully consolidate is the one-line files which 
> store the options rather than state - these are read an written much 
> less frequently so I don't think they have much of a performance hit but 
> it would be much nicer to just serialize the options to a single file.
> 
>>>
>>> 3) I also want to allow cherry-picking or rebasing branches that
>>> aren't even checked out (assuming no conflicts are triggered;
>>> otherwise an error can be shown with the user asked to repeat with the
>>> operation connected to an active working directory).
> 
> Exciting!
> 
>>>  For such an
>>> operation, the difference between "cherry-pick" and "rebase" is nearly
>>> irrelevant so you'd expect the code to be the same; every time I look
>>> at the code, though, it seems that the control structures are
>>> separating these two types of operations in more areas than just the
>>> reading and writing of the config.
> 
> Yes this can be confusing, for example rebase and cherry-pick handle the 
> todo list differently. Rebase removes the command before trying to pick 
> the commit and adds it back if the pick fails for a non-conflict reason, 
> cherry-pick only removes the command if the pick is successful.
> 
> Best Wishes
> 
> Phillip
> 
>> Excellent, we're in agreement, then.
>>
>> The remaining question is: how do we want to go about it? Do we just want
>> to codify the notion that these are internal details that are nobody's
>> business, and if they are using the exact file system layout of the
>> current sequencer, then it's their responsibility to adapt?
>>
>> Ciao,
>> Dscho
>>
>
Junio C Hamano April 2, 2021, 9:01 p.m. UTC | #12
Phillip Wood <phillip.wood123@gmail.com> writes:

> I think we would save a lot by only syncing the state to disk when we
> stop or run an exec command (the state needs to be synced so exec 
> commands can alter the todo list). In those cases we need to write the
> index and possibly run an external process so writing a couple of
> files is probably insignificant.

The optimization opportunity of this may be a lot smaller than you
would think---you must cater to not just exec but hook scripts that
are run while a new commit is made, which means every step you'd
need to write anyway.

> Where I think we can usefully consolidate is the one-line files which
> store the options rather than state - these are read an written much 
> less frequently so I don't think they have much of a performance hit
> but it would be much nicer to just serialize the options to a single
> file.

Would that break external scripts, hooks, etc.?  I am not sure if we
even have any rough consensus for allowing other people to peek into
the .git/rebase-*/ directories, but I am inclined to say that it
sounds more like a solution looking for a problem than a good idea
to solve some concrete problem.
Elijah Newren April 2, 2021, 10:18 p.m. UTC | #13
On Fri, Apr 2, 2021 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Phillip Wood <phillip.wood123@gmail.com> writes:
>
> > I think we would save a lot by only syncing the state to disk when we
> > stop or run an exec command (the state needs to be synced so exec
> > commands can alter the todo list). In those cases we need to write the
> > index and possibly run an external process so writing a couple of
> > files is probably insignificant.
>
> The optimization opportunity of this may be a lot smaller than you
> would think---you must cater to not just exec but hook scripts that
> are run while a new commit is made, which means every step you'd
> need to write anyway.

From Documentation/git-rebase.txt, right after discussing how the
backends disagree on which hooks are called and how they are called:

...In each case, the calling of these hooks was by accident of
implementation rather than by design (both backends were originally
implemented as shell scripts and happened to invoke other commands
like 'git checkout' or 'git commit' that would call the hooks).  Both
backends should have the same behavior, though it is not entirely
clear which, if any, is correct.  We will likely make rebase stop
calling either of these hooks in the future.


Even if others now disagree with the above, I know I can get a huge
speedup by changing sequencer to stop per-commit wasteful work (stop
forking processes like "git commit", don't write control structures if
the rebase hasn't ended or hit a conflict, don't update the working
copy or index or reflog).  It's enough of a speedup that if backward
compatibility won't allow such a method to be used by default, I'd
still make yet another backend that could be optionally used.  And I'd
have the default rebase and cherry-pick start printing annoying
deprecation notices so that users become aware of a faster option.
Should I go that route?  t/helpers/test-fast-rebase.c already has a
good start...

> > Where I think we can usefully consolidate is the one-line files which
> > store the options rather than state - these are read an written much
> > less frequently so I don't think they have much of a performance hit
> > but it would be much nicer to just serialize the options to a single
> > file.
>
> Would that break external scripts, hooks, etc.?  I am not sure if we
> even have any rough consensus for allowing other people to peek into
> the .git/rebase-*/ directories, but I am inclined to say that it
> sounds more like a solution looking for a problem than a good idea
> to solve some concrete problem.

To be honest, my bigger complaint with the non-unified backend config
is how far it bled into the code and how difficult it is to determine
how things are controlled and which sections of code are relevant for
which types of operations.  And how much of a pain I'm worried it'd
make the "allow rebasing and cherry-picking in a bare repository (or
for an unchecked-out branch)" functionality.  Perhaps the code can be
cleaned up without changing these on-disk structures, in which case my
concern for this point would lessen.
Junio C Hamano April 2, 2021, 10:27 p.m. UTC | #14
Elijah Newren <newren@gmail.com> writes:

> Even if others now disagree with the above, I know I can get a huge
> speedup by changing sequencer to stop per-commit wasteful work (stop
> forking processes like "git commit", don't write control structures if
> the rebase hasn't ended or hit a conflict, don't update the working
> copy or index or reflog).  It's enough of a speedup that if backward
> compatibility won't allow such a method to be used by default, I'd
> still make yet another backend that could be optionally used.  And I'd
> have the default rebase and cherry-pick start printing annoying
> deprecation notices so that users become aware of a faster option.

A faster and less powerful interface is good; I doubt deprecation
would work well. If a workflow depends on things like post-commit
hook, the affected users deserve some way to migrate to --exec or
whatever method to compensate the loss of functionality.
Johannes Schindelin April 8, 2021, 2:40 a.m. UTC | #15
Hi,

On Fri, 2 Apr 2021, Junio C Hamano wrote:

> Elijah Newren <newren@gmail.com> writes:
>
> > Even if others now disagree with the above, I know I can get a huge
> > speedup by changing sequencer to stop per-commit wasteful work (stop
> > forking processes like "git commit", don't write control structures if
> > the rebase hasn't ended or hit a conflict, don't update the working
> > copy or index or reflog).  It's enough of a speedup that if backward
> > compatibility won't allow such a method to be used by default, I'd
> > still make yet another backend that could be optionally used.  And I'd
> > have the default rebase and cherry-pick start printing annoying
> > deprecation notices so that users become aware of a faster option.
>
> A faster and less powerful interface is good; I doubt deprecation
> would work well. If a workflow depends on things like post-commit
> hook, the affected users deserve some way to migrate to --exec or
> whatever method to compensate the loss of functionality.

I could imagine that there is opportunity to "persist on disk only when
needed". For example, if no `pre-commit` hook is installed that needs to
be run, there is no need to update the worktree nor HEAD until the rebase
is done.

And this type of `only write to disk when needed` functionality could
probably be abstracted away so much as to make the rest of the code
look elegant again.

Ciao,
Dscho
Junio C Hamano April 8, 2021, 5:45 p.m. UTC | #16
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> > ...  It's enough of a speedup that if backward
>> > compatibility won't allow such a method to be used by default, I'd
>> > still make yet another backend that could be optionally used.  And I'd
>> > have the default rebase and cherry-pick start printing annoying
>> > deprecation notices so that users become aware of a faster option.
>>
>> A faster and less powerful interface is good; I doubt deprecation
>> would work well. If a workflow depends on things like post-commit
>> hook, the affected users deserve some way to migrate to --exec or
>> whatever method to compensate the loss of functionality.
>
> I could imagine that there is opportunity to "persist on disk only when
> needed". For example, if no `pre-commit` hook is installed that needs to
> be run, there is no need to update the worktree nor HEAD until the rebase
> is done.
>
> And this type of `only write to disk when needed` functionality could
> probably be abstracted away so much as to make the rest of the code
> look elegant again.

Yes, we are on the same page.  What Elijah proposed as "another
backend" would unfortunately be different, and needs to be adjusted
with such an "only when needed" tweak.  The check hopefully needs to
be done only once (e.g. do we have this hook enabled?  do we have
that hook enabled? do we have a commit with this trait in the range
being worked on? etc. etc.) upfront and for certain workflows may
not require any persistence.

And until that happens, "annoying deprecation notices" will never be
a viable step to take.
Christian Couder April 8, 2021, 7:58 p.m. UTC | #17
Hi,

On Thu, Apr 8, 2021 at 2:22 PM Johannes Schindelin
<Johannes.Schindelin@gmx.de> wrote:
>
> On Fri, 2 Apr 2021, Junio C Hamano wrote:
>
> > Elijah Newren <newren@gmail.com> writes:
> >
> > > Even if others now disagree with the above, I know I can get a huge
> > > speedup by changing sequencer to stop per-commit wasteful work (stop
> > > forking processes like "git commit", don't write control structures if
> > > the rebase hasn't ended or hit a conflict, don't update the working
> > > copy or index or reflog).

It would be interesting to know which of these updates (working copy,
index or reflog) is the most costly. Of course the best would be to
compare the costs when updates are done at each step vs only at the
end.

Some years ago I worked on the split index as some tests showed that,
for really big repos with a big index, it could significantly speed up
rebases. So I have the, perhaps wrong, impression that most of the
gain would be related to the index.

> > > It's enough of a speedup that if backward
> > > compatibility won't allow such a method to be used by default, I'd
> > > still make yet another backend that could be optionally used.  And I'd
> > > have the default rebase and cherry-pick start printing annoying
> > > deprecation notices so that users become aware of a faster option.
> >
> > A faster and less powerful interface is good; I doubt deprecation
> > would work well. If a workflow depends on things like post-commit
> > hook, the affected users deserve some way to migrate to --exec or
> > whatever method to compensate the loss of functionality.
>
> I could imagine that there is opportunity to "persist on disk only when
> needed". For example, if no `pre-commit` hook is installed that needs to
> be run, there is no need to update the worktree nor HEAD until the rebase
> is done.
>
> And this type of `only write to disk when needed` functionality could
> probably be abstracted away so much as to make the rest of the code
> look elegant again.

Yeah, I agree with this approach too.

If the split index could also become the default one day, I guess we
could be doing pretty well even when some hooks are installed.
Johannes Schindelin April 9, 2021, 1:53 p.m. UTC | #18
Hi Christian,

On Thu, 8 Apr 2021, Christian Couder wrote:

> On Thu, Apr 8, 2021 at 2:22 PM Johannes Schindelin
> <Johannes.Schindelin@gmx.de> wrote:
> >
> > On Fri, 2 Apr 2021, Junio C Hamano wrote:
> >
> > > Elijah Newren <newren@gmail.com> writes:
> > >
> > > > Even if others now disagree with the above, I know I can get a huge
> > > > speedup by changing sequencer to stop per-commit wasteful work (stop
> > > > forking processes like "git commit", don't write control structures if
> > > > the rebase hasn't ended or hit a conflict, don't update the working
> > > > copy or index or reflog).
>
> It would be interesting to know which of these updates (working copy,
> index or reflog) is the most costly. Of course the best would be to
> compare the costs when updates are done at each step vs only at the
> end.

I would bet that the worktree updates are the most costly thing.

Note: there is also the step where new objects are written to the object
database, but that is a step that we can _not_ skip.

> If the split index could also become the default one day, I guess we
> could be doing pretty well even when some hooks are installed.

Hmm. My opinion about the split index does not match yours.

I'd much rather see us tackle the challenge of allowing truly incremental
index updates. That "split index" feature feels like a bandaid that tries
to do sort of an incremental update, and fails at it because it is not
truly incremental. Granted, that would look much more like a real database
and is much, much harder than what the split index code does (think
e.g. locking).

Ciao,
Dscho
diff mbox series

Patch

diff --git a/builtin/revert.c b/builtin/revert.c
index 314a86c5621b..81441020231a 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -182,7 +182,7 @@  static int run_sequencer(int argc, const char **argv, struct replay_opts *opts)
 				"--signoff", opts->signoff,
 				"--no-commit", opts->no_commit,
 				"-x", opts->record_origin,
-				"--edit", opts->edit,
+				"--edit", opts->edit == 1,
 				NULL);
 
 	if (cmd) {
@@ -230,8 +230,6 @@  int cmd_revert(int argc, const char **argv, const char *prefix)
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int res;
 
-	if (isatty(0))
-		opts.edit = 1;
 	opts.action = REPLAY_REVERT;
 	sequencer_init_config(&opts);
 	res = run_sequencer(argc, argv, &opts);
diff --git a/sequencer.c b/sequencer.c
index 848204d3dc3f..d444c778a097 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -1860,14 +1860,26 @@  static void record_in_rewritten(struct object_id *oid,
 		flush_rewritten_pending();
 }
 
+static int should_edit(struct replay_opts *opts) {
+	assert(opts->edit >= -1 && opts->edit <= 1);
+	if (opts->edit == -1)
+		/*
+		 * Note that we only handle the case of non-conflicted
+		 * commits; continue_single_pick() handles the conflicted
+		 * commits itself instead of calling this function.
+		 */
+		return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0;
+	return opts->edit;
+}
+
 static int do_pick_commit(struct repository *r,
 			  enum todo_command command,
 			  struct commit *commit,
 			  struct replay_opts *opts,
 			  int final_fixup, int *check_todo)
 {
-	unsigned int flags = opts->edit ? EDIT_MSG : 0;
-	const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r);
+	unsigned int flags = should_edit(opts) ? EDIT_MSG : 0;
+	const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r);
 	struct object_id head;
 	struct commit *base, *next, *parent;
 	const char *base_label, *next_label;
@@ -3101,9 +3113,9 @@  static int save_opts(struct replay_opts *opts)
 	if (opts->no_commit)
 		res |= git_config_set_in_file_gently(opts_file,
 					"options.no-commit", "true");
-	if (opts->edit)
-		res |= git_config_set_in_file_gently(opts_file,
-					"options.edit", "true");
+	if (opts->edit != -1)
+		res |= git_config_set_in_file_gently(opts_file, "options.edit",
+						     opts->edit ? "true" : "false");
 	if (opts->allow_empty)
 		res |= git_config_set_in_file_gently(opts_file,
 					"options.allow-empty", "true");
@@ -4077,7 +4089,7 @@  static int pick_commits(struct repository *r,
 	prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION));
 	if (opts->allow_ff)
 		assert(!(opts->signoff || opts->no_commit ||
-			 opts->record_origin || opts->edit ||
+			 opts->record_origin || should_edit(opts) ||
 			 opts->committer_date_is_author_date ||
 			 opts->ignore_date));
 	if (read_and_refresh_cache(r, opts))
@@ -4370,14 +4382,35 @@  static int pick_commits(struct repository *r,
 	return sequencer_remove_state(opts);
 }
 
-static int continue_single_pick(struct repository *r)
+static int continue_single_pick(struct repository *r, struct replay_opts *opts)
 {
-	const char *argv[] = { "commit", NULL };
+	struct strvec argv = STRVEC_INIT;
+	int want_edit;
+	int ret;
 
 	if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") &&
 	    !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD"))
 		return error(_("no cherry-pick or revert in progress"));
-	return run_command_v_opt(argv, RUN_GIT_CMD);
+
+	strvec_push(&argv, "commit");
+
+	/*
+	 * continue_single_pick() handles the case of recovering from a
+	 * conflict.  should_edit() doesn't handle that case; for a conflict,
+	 * we want to edit if the user asked for it, or if they didn't specify
+	 * and stdin is a tty.
+	 */
+	want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0));
+	if (!want_edit)
+		/*
+		 * Include --cleanup=strip as well because we don't want the
+		 * "# Conflicts:" messages.
+		 */
+		strvec_pushl(&argv, "--no-edit", "--cleanup=strip", NULL);
+
+	ret = run_command_v_opt(argv.v, RUN_GIT_CMD);
+	strvec_clear(&argv);
+	return ret;
 }
 
 static int commit_staged_changes(struct repository *r,
@@ -4547,7 +4580,7 @@  int sequencer_continue(struct repository *r, struct replay_opts *opts)
 			goto release_todo_list;
 		}
 	} else if (!file_exists(get_todo_path(opts)))
-		return continue_single_pick(r);
+		return continue_single_pick(r, opts);
 	else if ((res = read_populate_todo(r, &todo_list, opts)))
 		goto release_todo_list;
 
@@ -4556,7 +4589,7 @@  int sequencer_continue(struct repository *r, struct replay_opts *opts)
 		if (refs_ref_exists(get_main_ref_store(r),
 				    "CHERRY_PICK_HEAD") ||
 		    refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) {
-			res = continue_single_pick(r);
+			res = continue_single_pick(r, opts);
 			if (res)
 				goto release_todo_list;
 		}
diff --git a/sequencer.h b/sequencer.h
index f8b2e4ab8527..d57d8ea23d7a 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -31,8 +31,10 @@  enum commit_msg_cleanup_mode {
 struct replay_opts {
 	enum replay_action action;
 
-	/* Boolean options */
+	/* Tri-state options: unspecified, false, or true */
 	int edit;
+
+	/* Boolean options */
 	int record_origin;
 	int no_commit;
 	int signoff;
@@ -71,7 +73,7 @@  struct replay_opts {
 	/* Only used by REPLAY_NONE */
 	struct rev_info *revs;
 };
-#define REPLAY_OPTS_INIT { .action = -1, .current_fixups = STRBUF_INIT }
+#define REPLAY_OPTS_INIT { .edit = -1, .action = -1, .current_fixups = STRBUF_INIT }
 
 /*
  * Note that ordering matters in this enum. Not only must it match the mapping
diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh
index b76cb6de91d0..49010aa9469d 100755
--- a/t/t3510-cherry-pick-sequence.sh
+++ b/t/t3510-cherry-pick-sequence.sh
@@ -65,7 +65,7 @@  test_expect_success 'cherry-pick persists opts correctly' '
 	# gets interrupted, use a high-enough number that is larger
 	# than the number of parents of any commit we have created
 	mainline=4 &&
-	test_expect_code 128 git cherry-pick -s -m $mainline --strategy=recursive -X patience -X ours initial..anotherpick &&
+	test_expect_code 128 git cherry-pick -s -m $mainline --strategy=recursive -X patience -X ours --edit initial..anotherpick &&
 	test_path_is_dir .git/sequencer &&
 	test_path_is_file .git/sequencer/head &&
 	test_path_is_file .git/sequencer/todo &&
@@ -84,6 +84,36 @@  test_expect_success 'cherry-pick persists opts correctly' '
 	ours
 	EOF
 	git config --file=.git/sequencer/opts --get-all options.strategy-option >actual &&
+	test_cmp expect actual &&
+	echo "true" >expect &&
+	git config --file=.git/sequencer/opts --get-all options.edit >actual &&
+	test_cmp expect actual
+'
+
+test_expect_success 'revert persists opts correctly' '
+	pristine_detach initial &&
+	# to make sure that the session to revert a sequence
+	# gets interrupted, revert commits that are not in the history
+	# of HEAD.
+	test_expect_code 1 git revert -s --strategy=recursive -X patience -X ours --no-edit picked yetanotherpick &&
+	test_path_is_dir .git/sequencer &&
+	test_path_is_file .git/sequencer/head &&
+	test_path_is_file .git/sequencer/todo &&
+	test_path_is_file .git/sequencer/opts &&
+	echo "true" >expect &&
+	git config --file=.git/sequencer/opts --get-all options.signoff >actual &&
+	test_cmp expect actual &&
+	echo "recursive" >expect &&
+	git config --file=.git/sequencer/opts --get-all options.strategy >actual &&
+	test_cmp expect actual &&
+	cat >expect <<-\EOF &&
+	patience
+	ours
+	EOF
+	git config --file=.git/sequencer/opts --get-all options.strategy-option >actual &&
+	test_cmp expect actual &&
+	echo "false" >expect &&
+	git config --file=.git/sequencer/opts --get-all options.edit >actual &&
 	test_cmp expect actual
 '