Message ID | pull.988.v2.git.git.1617070174458.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] sequencer: fix edit handling for cherry-pick and revert messages | expand |
Hi Elijah, On Tue, 30 Mar 2021, Elijah Newren via GitGitGadget wrote: > From: Elijah Newren <newren@gmail.com> > > save_opts() should save any non-default values. It was intended to do > this, but since most options in struct replay_opts default to 0, it only > saved non-zero values. Unfortunately, this does not always work for > options.edit. Roughly speaking, options.edit had a default value of 0 > for cherry-pick but a default value of 1 for revert. Make save_opts() > record a value whenever it differs from the default. > > options.edit was also overly simplistic; we had more than two cases. > The behavior that previously existed was as follows: > > Non-conflict commits Right after Conflict > revert Edit iff isatty(0) Edit (ignore isatty(0)) > cherry-pick No edit See above > Specify --edit Edit (ignore isatty(0)) See above > Specify --no-edit (*) See above > > (*) Before stopping for conflicts, No edit is the behavior. After > stopping for conflicts, the --no-edit flag is not saved so see > the first two rows. > > However, the expected behavior is: > > Non-conflict commits Right after Conflict > revert Edit iff isatty(0) Edit iff isatty(0) > cherry-pick No edit Edit iff isatty(0) > Specify --edit Edit (ignore isatty(0)) Edit (ignore isatty(0)) > Specify --no-edit No edit No edit > > In order to get the expected behavior, we need to change options.edit > to a tri-state: unspecified, false, or true. When specified, we follow > what it says. When unspecified, we need to check whether the current > commit being created is resolving a conflict as well as consulting > options.action and isatty(0). While at it, add a should_edit() utility > function that compresses options.edit down to a boolean based on the > additional information for the non-conflict case. > > continue_single_pick() is the function responsible for resuming after > conflict cases, regardless of whether there is one commit being picked > or many. Make this function stop assuming edit behavior in all cases, > so that it can correctly handle !isatty(0) and specific requests to not > edit the commit message. Nicely explained! I'll allow myself one tangent: the subject of the sequencer's Unix shell script heritage seems to come up with an increasing frequency, in particular the awful "let's write out one file per setting" strategy. I would _love_ for `save_opts()` to write a JSON instead (or an INI via the `git_config_*()` family of functions, as is done already by the cherry-pick/revert stuff), now that we no longer have any shell script backend (apart from `--preserve-merges`, but that one is on its way out anyway). The one thing that concerns me with this idea is that I know for a fact that some enterprisey users play games with those files inside `<GIT_DIR>/rebase-merge` that should be considered internal implementation details. Not sure how to deprecate that properly, I don't think we have a sane way to detect whether users rely on these implementation details other than breaking their expectations, which is not really a gentle way to ask them to update their scripts. > diff --git a/builtin/revert.c b/builtin/revert.c > index 314a86c5621b..81441020231a 100644 > --- a/builtin/revert.c > +++ b/builtin/revert.c > @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) > "--signoff", opts->signoff, > "--no-commit", opts->no_commit, > "-x", opts->record_origin, > - "--edit", opts->edit, > + "--edit", opts->edit == 1, Honestly, I'd prefer `> 0` here. > NULL); > > if (cmd) { > @@ -230,8 +230,6 @@ int cmd_revert(int argc, const char **argv, const char *prefix) > struct replay_opts opts = REPLAY_OPTS_INIT; > int res; > > - if (isatty(0)) > - opts.edit = 1; > opts.action = REPLAY_REVERT; > sequencer_init_config(&opts); > res = run_sequencer(argc, argv, &opts); > diff --git a/sequencer.c b/sequencer.c > index 848204d3dc3f..d444c778a097 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -1860,14 +1860,26 @@ static void record_in_rewritten(struct object_id *oid, > flush_rewritten_pending(); > } > > +static int should_edit(struct replay_opts *opts) { > + assert(opts->edit >= -1 && opts->edit <= 1); Do we really want to introduce more of these useless `assert()`s? I know that we stopped converting them to `BUG()`, but I really dislike introducing new ones: they have very little effect, being no-ops by default in most setups. > + if (opts->edit == -1) Maybe `< 0`, as we do elsewhere for "not specified"? > + /* > + * Note that we only handle the case of non-conflicted > + * commits; continue_single_pick() handles the conflicted > + * commits itself instead of calling this function. > + */ > + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; Apart from the extra parentheses, that makes sense to me. > + return opts->edit; > +} > + > static int do_pick_commit(struct repository *r, > enum todo_command command, > struct commit *commit, > struct replay_opts *opts, > int final_fixup, int *check_todo) > { > - unsigned int flags = opts->edit ? EDIT_MSG : 0; > - const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r); > + unsigned int flags = should_edit(opts) ? EDIT_MSG : 0; > + const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r); > struct object_id head; > struct commit *base, *next, *parent; > const char *base_label, *next_label; > @@ -3101,9 +3113,9 @@ static int save_opts(struct replay_opts *opts) > if (opts->no_commit) > res |= git_config_set_in_file_gently(opts_file, > "options.no-commit", "true"); > - if (opts->edit) > - res |= git_config_set_in_file_gently(opts_file, > - "options.edit", "true"); > + if (opts->edit != -1) s/!= -1/>= 0/ > + res |= git_config_set_in_file_gently(opts_file, "options.edit", > + opts->edit ? "true" : "false"); > if (opts->allow_empty) > res |= git_config_set_in_file_gently(opts_file, > "options.allow-empty", "true"); > @@ -4077,7 +4089,7 @@ static int pick_commits(struct repository *r, > prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION)); > if (opts->allow_ff) > assert(!(opts->signoff || opts->no_commit || > - opts->record_origin || opts->edit || > + opts->record_origin || should_edit(opts) || > opts->committer_date_is_author_date || > opts->ignore_date)); > if (read_and_refresh_cache(r, opts)) > @@ -4370,14 +4382,35 @@ static int pick_commits(struct repository *r, > return sequencer_remove_state(opts); > } > > -static int continue_single_pick(struct repository *r) > +static int continue_single_pick(struct repository *r, struct replay_opts *opts) > { > - const char *argv[] = { "commit", NULL }; > + struct strvec argv = STRVEC_INIT; > + int want_edit; Do we really want that extra `want_edit` variable? I think the code would be easier to read without it, and still be obvious enough. > + int ret; > > if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") && > !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) > return error(_("no cherry-pick or revert in progress")); > - return run_command_v_opt(argv, RUN_GIT_CMD); > + > + strvec_push(&argv, "commit"); > + > + /* > + * continue_single_pick() handles the case of recovering from a > + * conflict. should_edit() doesn't handle that case; for a conflict, > + * we want to edit if the user asked for it, or if they didn't specify > + * and stdin is a tty. > + */ > + want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0)); > + if (!want_edit) Here is what I would prefer: if (!opts->edit || (opts->edit < 0 && !isatty(0))) The rest looks good, and the comments are _really_ helpful. And the remainder of the patch also looks good, so I will spare readers time by not even quoting it. Thank you! Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) >> "--signoff", opts->signoff, >> "--no-commit", opts->no_commit, >> "-x", opts->record_origin, >> - "--edit", opts->edit, >> + "--edit", opts->edit == 1, > > Honestly, I'd prefer `> 0` here. Unless somebody (including Elijah) is trying to soon introduce yet another value to .edit member, I'd agree 100%. If it is a tristate (unspecified, no, yes), I think "is it positive" should be the way to ask "does the user definitely wants it?", "is it zero" should be the way to ask "does the user definitely declines it?" and "is it non-negative" (and "is it negative") the way to ask "does the user care (or not care)?". Using that consistently is good. >> +static int should_edit(struct replay_opts *opts) { >> + assert(opts->edit >= -1 && opts->edit <= 1); > > Do we really want to introduce more of these useless `assert()`s? I know > that we stopped converting them to `BUG()`, but I really dislike > introducing new ones: they have very little effect, being no-ops by > default in most setups. Yeah, in a new code in flux where programmers can easily make errors, "if (...) BUG()" may not be a bad thing to add (but then we may want to see if we can make the codepaths involved less error prone), but I agree with your view that assert() is mostly useless. A comment that explains the expectation and why that expectation is there would be more useful. >> + if (opts->edit == -1) > > Maybe `< 0`, as we do elsewhere for "not specified"? Yup. >> + /* >> + * Note that we only handle the case of non-conflicted >> + * commits; continue_single_pick() handles the conflicted >> + * commits itself instead of calling this function. >> + */ >> + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; > > Apart from the extra parentheses, that makes sense to me. I can take it either way (but personally I think this particular one is easier to see as written---this is subjective). > ... > The rest looks good, and the comments are _really_ helpful. Yup, I agree. Thanks for a review.
On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > Hi Elijah, > > On Tue, 30 Mar 2021, Elijah Newren via GitGitGadget wrote: > > > From: Elijah Newren <newren@gmail.com> > > > > save_opts() should save any non-default values. It was intended to do > > this, but since most options in struct replay_opts default to 0, it only > > saved non-zero values. Unfortunately, this does not always work for > > options.edit. Roughly speaking, options.edit had a default value of 0 > > for cherry-pick but a default value of 1 for revert. Make save_opts() > > record a value whenever it differs from the default. > > > > options.edit was also overly simplistic; we had more than two cases. > > The behavior that previously existed was as follows: > > > > Non-conflict commits Right after Conflict > > revert Edit iff isatty(0) Edit (ignore isatty(0)) > > cherry-pick No edit See above > > Specify --edit Edit (ignore isatty(0)) See above > > Specify --no-edit (*) See above > > > > (*) Before stopping for conflicts, No edit is the behavior. After > > stopping for conflicts, the --no-edit flag is not saved so see > > the first two rows. > > > > However, the expected behavior is: > > > > Non-conflict commits Right after Conflict > > revert Edit iff isatty(0) Edit iff isatty(0) > > cherry-pick No edit Edit iff isatty(0) > > Specify --edit Edit (ignore isatty(0)) Edit (ignore isatty(0)) > > Specify --no-edit No edit No edit > > > > In order to get the expected behavior, we need to change options.edit > > to a tri-state: unspecified, false, or true. When specified, we follow > > what it says. When unspecified, we need to check whether the current > > commit being created is resolving a conflict as well as consulting > > options.action and isatty(0). While at it, add a should_edit() utility > > function that compresses options.edit down to a boolean based on the > > additional information for the non-conflict case. > > > > continue_single_pick() is the function responsible for resuming after > > conflict cases, regardless of whether there is one commit being picked > > or many. Make this function stop assuming edit behavior in all cases, > > so that it can correctly handle !isatty(0) and specific requests to not > > edit the commit message. > > Nicely explained! > > I'll allow myself one tangent: the subject of the sequencer's Unix shell > script heritage seems to come up with an increasing frequency, in > particular the awful "let's write out one file per setting" strategy. > > I would _love_ for `save_opts()` to write a JSON instead (or an INI via > the `git_config_*()` family of functions, as is done already by the > cherry-pick/revert stuff), now that we no longer have any shell script > backend (apart from `--preserve-merges`, but that one is on its way out > anyway). > > The one thing that concerns me with this idea is that I know for a fact > that some enterprisey users play games with those files inside > `<GIT_DIR>/rebase-merge` that should be considered internal implementation > details. Not sure how to deprecate that properly, I don't think we have a > sane way to detect whether users rely on these implementation details > other than breaking their expectations, which is not really a gentle way > to ask them to update their scripts. Ooh, I'm glad you took this tangent. May I follow it for a second? I'd really, really like this too, for three reasons: 1) I constantly get confused about the massive duplication and difference in control structures and which ones affect which type of operations in sequencer.c. Having them both use an ini-file would allow us to remove lots of that duplication. I'm sure there'll still be some rebase-specific or cherry-pick-specific options, but we don't need a separate parallel structure for every piece of config. 2) In my merge-ort optimization work, rebasing 35 commits in linux.git across a massive set of 26K upstream renames has dropped rename detection time from the top spot. And it also dropped several other things in the merge machinery from their spots as well. Do you know what's the slowest now? Wasted time from sequencer.c: the unnecessary process forking and all the useless disk writing (which I suspect is mostly updating the index and working directory but also writing all the individual control files under .git/rebase-merge/). And this stuff from sequencer.c is not just barely the slowest part, it's the slowest by a wide margin. 3) I also want to allow cherry-picking or rebasing branches that aren't even checked out (assuming no conflicts are triggered; otherwise an error can be shown with the user asked to repeat with the operation connected to an active working directory). For such an operation, the difference between "cherry-pick" and "rebase" is nearly irrelevant so you'd expect the code to be the same; every time I look at the code, though, it seems that the control structures are separating these two types of operations in more areas than just the reading and writing of the config. > > diff --git a/builtin/revert.c b/builtin/revert.c > > index 314a86c5621b..81441020231a 100644 > > --- a/builtin/revert.c > > +++ b/builtin/revert.c > > @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) > > "--signoff", opts->signoff, > > "--no-commit", opts->no_commit, > > "-x", opts->record_origin, > > - "--edit", opts->edit, > > + "--edit", opts->edit == 1, > > Honestly, I'd prefer `> 0` here. WFM; I'll change it. > > > NULL); > > > > if (cmd) { > > @@ -230,8 +230,6 @@ int cmd_revert(int argc, const char **argv, const char *prefix) > > struct replay_opts opts = REPLAY_OPTS_INIT; > > int res; > > > > - if (isatty(0)) > > - opts.edit = 1; > > opts.action = REPLAY_REVERT; > > sequencer_init_config(&opts); > > res = run_sequencer(argc, argv, &opts); > > diff --git a/sequencer.c b/sequencer.c > > index 848204d3dc3f..d444c778a097 100644 > > --- a/sequencer.c > > +++ b/sequencer.c > > @@ -1860,14 +1860,26 @@ static void record_in_rewritten(struct object_id *oid, > > flush_rewritten_pending(); > > } > > > > +static int should_edit(struct replay_opts *opts) { > > + assert(opts->edit >= -1 && opts->edit <= 1); > > Do we really want to introduce more of these useless `assert()`s? I know > that we stopped converting them to `BUG()`, but I really dislike > introducing new ones: they have very little effect, being no-ops by > default in most setups. What do you mean by "useless"? They serve two purposes: * It's a very concise comment understood by readers of the code * Many asserts are helpful guardrails for future people attempting to change the code and assumptions they were written under Neither of these have anything to do with users running code in production -- if that mattered, I would have used BUG(). Since production use isn't relevant here, I could use code comments, but those tend to be much more verbose, and less helpful in catching areas where assumptions were important to the code for any future folks (possibly myself in a few years) who want to change the code in ways that might call into question previous assumptions. > > + if (opts->edit == -1) > > Maybe `< 0`, as we do elsewhere for "not specified"? Makes sense; will change. > > + /* > > + * Note that we only handle the case of non-conflicted > > + * commits; continue_single_pick() handles the conflicted > > + * commits itself instead of calling this function. > > + */ > > + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; > > Apart from the extra parentheses, that makes sense to me. The extra parentheses make it more readable to me; since the statement still makes sense to you, I'll leave them in. :-) > > + return opts->edit; > > +} > > + > > static int do_pick_commit(struct repository *r, > > enum todo_command command, > > struct commit *commit, > > struct replay_opts *opts, > > int final_fixup, int *check_todo) > > { > > - unsigned int flags = opts->edit ? EDIT_MSG : 0; > > - const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r); > > + unsigned int flags = should_edit(opts) ? EDIT_MSG : 0; > > + const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r); > > struct object_id head; > > struct commit *base, *next, *parent; > > const char *base_label, *next_label; > > @@ -3101,9 +3113,9 @@ static int save_opts(struct replay_opts *opts) > > if (opts->no_commit) > > res |= git_config_set_in_file_gently(opts_file, > > "options.no-commit", "true"); > > - if (opts->edit) > > - res |= git_config_set_in_file_gently(opts_file, > > - "options.edit", "true"); > > + if (opts->edit != -1) > > s/!= -1/>= 0/ Will fix. > > + res |= git_config_set_in_file_gently(opts_file, "options.edit", > > + opts->edit ? "true" : "false"); > > if (opts->allow_empty) > > res |= git_config_set_in_file_gently(opts_file, > > "options.allow-empty", "true"); > > @@ -4077,7 +4089,7 @@ static int pick_commits(struct repository *r, > > prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION)); > > if (opts->allow_ff) > > assert(!(opts->signoff || opts->no_commit || > > - opts->record_origin || opts->edit || > > + opts->record_origin || should_edit(opts) || > > opts->committer_date_is_author_date || > > opts->ignore_date)); > > if (read_and_refresh_cache(r, opts)) > > @@ -4370,14 +4382,35 @@ static int pick_commits(struct repository *r, > > return sequencer_remove_state(opts); > > } > > > > -static int continue_single_pick(struct repository *r) > > +static int continue_single_pick(struct repository *r, struct replay_opts *opts) > > { > > - const char *argv[] = { "commit", NULL }; > > + struct strvec argv = STRVEC_INIT; > > + int want_edit; > > Do we really want that extra `want_edit` variable? I think the code would > be easier to read without it, and still be obvious enough. > > > + int ret; > > > > if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") && > > !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) > > return error(_("no cherry-pick or revert in progress")); > > - return run_command_v_opt(argv, RUN_GIT_CMD); > > + > > + strvec_push(&argv, "commit"); > > + > > + /* > > + * continue_single_pick() handles the case of recovering from a > > + * conflict. should_edit() doesn't handle that case; for a conflict, > > + * we want to edit if the user asked for it, or if they didn't specify > > + * and stdin is a tty. > > + */ > > + want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0)); > > + if (!want_edit) > > Here is what I would prefer: > > if (!opts->edit || (opts->edit < 0 && !isatty(0))) Given the changes elsewhere to use >0 as true, <0 as unspecified, and ==0 for false, this change makes perfect sense. I'll include it. > The rest looks good, and the comments are _really_ helpful. > > And the remainder of the patch also looks good, so I will spare readers > time by not even quoting it. > > Thank you! > Dscho
On Tue, Mar 30, 2021 at 11:47 AM Junio C Hamano <gitster@pobox.com> wrote: > > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: > > >> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) > >> "--signoff", opts->signoff, > >> "--no-commit", opts->no_commit, > >> "-x", opts->record_origin, > >> - "--edit", opts->edit, > >> + "--edit", opts->edit == 1, > > > > Honestly, I'd prefer `> 0` here. > > Unless somebody (including Elijah) is trying to soon introduce yet > another value to .edit member, I'd agree 100%. If it is a tristate > (unspecified, no, yes), I think "is it positive" should be the way > to ask "does the user definitely wants it?", "is it zero" should be > the way to ask "does the user definitely declines it?" and "is it > non-negative" (and "is it negative") the way to ask "does the user > care (or not care)?". Using that consistently is good. Sounds good; I'll switch it over. > >> +static int should_edit(struct replay_opts *opts) { > >> + assert(opts->edit >= -1 && opts->edit <= 1); > > > > Do we really want to introduce more of these useless `assert()`s? I know > > that we stopped converting them to `BUG()`, but I really dislike > > introducing new ones: they have very little effect, being no-ops by > > default in most setups. > > Yeah, in a new code in flux where programmers can easily make > errors, "if (...) BUG()" may not be a bad thing to add (but then we > may want to see if we can make the codepaths involved less error > prone), but I agree with your view that assert() is mostly useless. > A comment that explains the expectation and why that expectation is > there would be more useful. Since you both don't like this assert, I'll remove it. But I strongly disagree that assert is useless in general. If you two have such a strong reaction to assert statements, though, would you two prefer that I add a new affirm() function that is ALSO compiled out in production? Because I really want to use one of those. My operating assumptions with asserts are the following: 1) If the check is relevant for production, assert() statements should NOT be used; if(...) BUG() should be used instead. 2) assert statements will be compiled out in production, almost always 2a) NOTE: don't make asserts expensive, since a few production users will keep them 3) assert statements will be active for future me and some other folks doing active git development Do you two disagree with any of those operating assumptions? I find asserts very valuable because: * It's a _concise_ code comment that is readily understood. Any attempt to word in English the same thing that an assert statement checks always takes longer * It helps future folks tweaking the rules to catch additional locations where assumptions were made about the old rules. In the development of merge-ort, for example, asserts shortened my debugging cycles as I found and attempted new optimizations or added new features or changed data structures and so on. The checks were _only_ assists while developing; once the code is right, the checks could be removed. But future development might occur, so it'd be nice to have a way to keep the checks in the code just for those future developers while production users remove them. In particular, for merge-ort, I think the second point is very helpful. What can achieve the "remove these now-unnecessary checks from the code for production, but keep them there for future development"? I thought assert() was created exactly for this purpose. Would you rather I created an affirm() that does essentially the same thing and is compiled out unless DEVELOPER=1? That would allow us to declare all assert() calls in the code as buggy, but I'm not sure affirm() is as readily understood by developers reading the code as "ooh, a reminder I get to assume these statements are true while I'm reading the rest of the code". > >> + if (opts->edit == -1) > > > > Maybe `< 0`, as we do elsewhere for "not specified"? > > Yup. > > >> + /* > >> + * Note that we only handle the case of non-conflicted > >> + * commits; continue_single_pick() handles the conflicted > >> + * commits itself instead of calling this function. > >> + */ > >> + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; > > > > Apart from the extra parentheses, that makes sense to me. > > I can take it either way (but personally I think this particular one > is easier to see as written---this is subjective). > > > ... > > The rest looks good, and the comments are _really_ helpful. > > Yup, I agree. > > Thanks for a review. Indeed; and thanks to you as well Junio for all your time reviewing.
Hi, On Tue, 30 Mar 2021, Elijah Newren wrote: > On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > I'll allow myself one tangent: the subject of the sequencer's Unix > > shell script heritage seems to come up with an increasing frequency, > > in particular the awful "let's write out one file per setting" > > strategy. > > > > I would _love_ for `save_opts()` to write a JSON instead (or an INI > > via the `git_config_*()` family of functions, as is done already by > > the cherry-pick/revert stuff), now that we no longer have any shell > > script backend (apart from `--preserve-merges`, but that one is on its > > way out anyway). > > > > The one thing that concerns me with this idea is that I know for a > > fact that some enterprisey users play games with those files inside > > `<GIT_DIR>/rebase-merge` that should be considered internal > > implementation details. Not sure how to deprecate that properly, I > > don't think we have a sane way to detect whether users rely on these > > implementation details other than breaking their expectations, which > > is not really a gentle way to ask them to update their scripts. > > Ooh, I'm glad you took this tangent. May I follow it for a second? > I'd really, really like this too, for three reasons: > > 1) I constantly get confused about the massive duplication and > difference in control structures and which ones affect which type of > operations in sequencer.c. Having them both use an ini-file would > allow us to remove lots of that duplication. I'm sure there'll still > be some rebase-specific or cherry-pick-specific options, but we don't > need a separate parallel structure for every piece of config. > > 2) In my merge-ort optimization work, rebasing 35 commits in linux.git > across a massive set of 26K upstream renames has dropped rename > detection time from the top spot. And it also dropped several other > things in the merge machinery from their spots as well. Do you know > what's the slowest now? Wasted time from sequencer.c: the unnecessary > process forking and all the useless disk writing (which I suspect is > mostly updating the index and working directory but also writing all > the individual control files under .git/rebase-merge/). And this > stuff from sequencer.c is not just barely the slowest part, it's the > slowest by a wide margin. > > 3) I also want to allow cherry-picking or rebasing branches that > aren't even checked out (assuming no conflicts are triggered; > otherwise an error can be shown with the user asked to repeat with the > operation connected to an active working directory). For such an > operation, the difference between "cherry-pick" and "rebase" is nearly > irrelevant so you'd expect the code to be the same; every time I look > at the code, though, it seems that the control structures are > separating these two types of operations in more areas than just the > reading and writing of the config. Excellent, we're in agreement, then. The remaining question is: how do we want to go about it? Do we just want to codify the notion that these are internal details that are nobody's business, and if they are using the exact file system layout of the current sequencer, then it's their responsibility to adapt? Ciao, Dscho
On Tue, Mar 30 2021, Elijah Newren wrote: > On Tue, Mar 30, 2021 at 11:47 AM Junio C Hamano <gitster@pobox.com> wrote: >> >> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> >> >> @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) >> >> "--signoff", opts->signoff, >> >> "--no-commit", opts->no_commit, >> >> "-x", opts->record_origin, >> >> - "--edit", opts->edit, >> >> + "--edit", opts->edit == 1, >> > >> > Honestly, I'd prefer `> 0` here. >> >> Unless somebody (including Elijah) is trying to soon introduce yet >> another value to .edit member, I'd agree 100%. If it is a tristate >> (unspecified, no, yes), I think "is it positive" should be the way >> to ask "does the user definitely wants it?", "is it zero" should be >> the way to ask "does the user definitely declines it?" and "is it >> non-negative" (and "is it negative") the way to ask "does the user >> care (or not care)?". Using that consistently is good. > > Sounds good; I'll switch it over. > >> >> +static int should_edit(struct replay_opts *opts) { >> >> + assert(opts->edit >= -1 && opts->edit <= 1); >> > >> > Do we really want to introduce more of these useless `assert()`s? I know >> > that we stopped converting them to `BUG()`, but I really dislike >> > introducing new ones: they have very little effect, being no-ops by >> > default in most setups. >> >> Yeah, in a new code in flux where programmers can easily make >> errors, "if (...) BUG()" may not be a bad thing to add (but then we >> may want to see if we can make the codepaths involved less error >> prone), but I agree with your view that assert() is mostly useless. >> A comment that explains the expectation and why that expectation is >> there would be more useful. > > Since you both don't like this assert, I'll remove it. But I strongly > disagree that assert is useless in general. If you two have such a > strong reaction to assert statements, though, would you two prefer > that I add a new affirm() function that is ALSO compiled out in > production? Because I really want to use one of those. My operating > assumptions with asserts are the following: > > 1) If the check is relevant for production, assert() statements should > NOT be used; if(...) BUG() should be used instead. > 2) assert statements will be compiled out in production, almost always > 2a) NOTE: don't make asserts expensive, since a few production users > will keep them > 3) assert statements will be active for future me and some other folks > doing active git development > > Do you two disagree with any of those operating assumptions? I find > asserts very valuable because: > > * It's a _concise_ code comment that is readily understood. Any > attempt to word in English the same thing that an assert statement > checks always takes longer > * It helps future folks tweaking the rules to catch additional > locations where assumptions were made about the old rules. In the > development of merge-ort, for example, asserts shortened my debugging > cycles as I found and attempted new optimizations or added new > features or changed data structures and so on. The checks were _only_ > assists while developing; once the code is right, the checks could be > removed. But future development might occur, so it'd be nice to have > a way to keep the checks in the code just for those future developers > while production users remove them. > > In particular, for merge-ort, I think the second point is very > helpful. What can achieve the "remove these now-unnecessary checks > from the code for production, but keep them there for future > development"? I thought assert() was created exactly for this > purpose. Would you rather I created an affirm() that does essentially > the same thing and is compiled out unless DEVELOPER=1? That would > allow us to declare all assert() calls in the code as buggy, but I'm > not sure affirm() is as readily understood by developers reading the > code as "ooh, a reminder I get to assume these statements are true > while I'm reading the rest of the code". I don't mind the asserts, or to have them in the default build. But if you'd like to submit patches for asserts and can't otherwise get them accepted, then can we please not make DEVELOPER a thing that you can't turn on in production without thinking twice? Per my https://lore.kernel.org/git/87wnusj6gt.fsf@evledraar.gmail.com/ >> >> + if (opts->edit == -1) >> > >> > Maybe `< 0`, as we do elsewhere for "not specified"? >> >> Yup. >> >> >> + /* >> >> + * Note that we only handle the case of non-conflicted >> >> + * commits; continue_single_pick() handles the conflicted >> >> + * commits itself instead of calling this function. >> >> + */ >> >> + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; >> > >> > Apart from the extra parentheses, that makes sense to me. >> >> I can take it either way (but personally I think this particular one >> is easier to see as written---this is subjective). >> >> > ... >> > The rest looks good, and the comments are _really_ helpful. >> >> Yup, I agree. >> >> Thanks for a review. > > Indeed; and thanks to you as well Junio for all your time reviewing.
On Wed, Mar 31, 2021 at 10:36 AM Ævar Arnfjörð Bjarmason <avarab@gmail.com> wrote: > > On Tue, Mar 30 2021, Elijah Newren wrote: > > > In particular, for merge-ort, I think the second point is very > > helpful. What can achieve the "remove these now-unnecessary checks > > from the code for production, but keep them there for future > > development"? I thought assert() was created exactly for this > > purpose. Would you rather I created an affirm() that does essentially > > the same thing and is compiled out unless DEVELOPER=1? That would > > allow us to declare all assert() calls in the code as buggy, but I'm > > not sure affirm() is as readily understood by developers reading the > > code as "ooh, a reminder I get to assume these statements are true > > while I'm reading the rest of the code". > > I don't mind the asserts, or to have them in the default build. > > But if you'd like to submit patches for asserts and can't otherwise get > them accepted, then can we please not make DEVELOPER a thing that you > can't turn on in production without thinking twice? Per my > https://lore.kernel.org/git/87wnusj6gt.fsf@evledraar.gmail.com/ Fair enough; if we have to go the affirm() route, I should probably just make it depend on NDEBUG. :-)
Elijah Newren <newren@gmail.com> writes: > ... would you two prefer > that I add a new affirm() function that is ALSO compiled out in > production? The reason I do not think affirm would fly is because it shares the most problematic trait with assert(): it can be comipled out. There was at least one bug we found and fixed in a piece of code that came later in a codepath that had an assert() in it. The buggy code (IIRC, it was added way after the assert() was written in an unrelated patch) was unknowingly depending on a side-effect of an innocuous function call (i.e. assert(func_tion(args) == expected)) made in the assert condition, and the bug did not reproduce in the developer's environment.
On Wed, Mar 31, 2021 at 11:01 AM Junio C Hamano <gitster@pobox.com> wrote: > > Elijah Newren <newren@gmail.com> writes: > > > ... would you two prefer > > that I add a new affirm() function that is ALSO compiled out in > > production? > > The reason I do not think affirm would fly is because it shares the > most problematic trait with assert(): it can be comipled out. > > There was at least one bug we found and fixed in a piece of code > that came later in a codepath that had an assert() in it. The buggy > code (IIRC, it was added way after the assert() was written in an > unrelated patch) was unknowingly depending on a side-effect of an > innocuous function call (i.e. assert(func_tion(args) == expected)) > made in the assert condition, and the bug did not reproduce in the > developer's environment. Ah, now that's a much different argument than claiming assert is useless. I had to debug one of those once about a decade ago, in a different project, put in by some other developer, and yeah it's painful. If that causes assert() to be frowned upon, then I can understand based on this reason. If so, perhaps a BUG_ON() function would be useful? The "if (...) BUG()" construct causes line coverage problems and I find it an onerous way to signal to future code readers "hey, just as a reminder, these things should be true at this point". If you need to signal something more than that or explain why the things being false is a problem or what might cause it, then the "if (...) BUG()" construct is useful and there are certainly many places for that, but it just doesn't fit all the cases.
Hi Dscho and Elijah On 31/03/2021 14:48, Johannes Schindelin wrote: > Hi, > > On Tue, 30 Mar 2021, Elijah Newren wrote: > >> On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin >> <Johannes.Schindelin@gmx.de> wrote: >> >>> I'll allow myself one tangent: the subject of the sequencer's Unix >>> shell script heritage seems to come up with an increasing frequency, >>> in particular the awful "let's write out one file per setting" >>> strategy. >>> >>> I would _love_ for `save_opts()` to write a JSON instead (or an INI >>> via the `git_config_*()` family of functions, as is done already by >>> the cherry-pick/revert stuff), now that we no longer have any shell >>> script backend (apart from `--preserve-merges`, but that one is on its >>> way out anyway). >>> >>> The one thing that concerns me with this idea is that I know for a >>> fact that some enterprisey users play games with those files inside >>> `<GIT_DIR>/rebase-merge` that should be considered internal >>> implementation details. Not sure how to deprecate that properly, I >>> don't think we have a sane way to detect whether users rely on these >>> implementation details other than breaking their expectations, which >>> is not really a gentle way to ask them to update their scripts. I think it depends if users are just reading the files or writing to them. If they are reading them (which my scripts do) then we could continue to write the important files along side the new single-file that git actually reads. I think there is a distinction between the files such as git-rebase-todo which hold state and the one-line files which hold the options passed by the user. For example I have scripts that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head and check that author-script exists. If a script starts a rebase then it should know what options are in effect without reading them from .git/rebase-merge. >> Ooh, I'm glad you took this tangent. May I follow it for a second? >> I'd really, really like this too, for three reasons: >> >> 1) I constantly get confused about the massive duplication and >> difference in control structures and which ones affect which type of >> operations in sequencer.c. Having them both use an ini-file would >> allow us to remove lots of that duplication. I'm sure there'll still >> be some rebase-specific or cherry-pick-specific options, but we don't >> need a separate parallel structure for every piece of config. >> >> 2) In my merge-ort optimization work, rebasing 35 commits in linux.git >> across a massive set of 26K upstream renames has dropped rename >> detection time from the top spot. And it also dropped several other >> things in the merge machinery from their spots as well. Do you know >> what's the slowest now? Wasted time from sequencer.c: the unnecessary >> process forking and all the useless disk writing (which I suspect is >> mostly updating the index and working directory but also writing all >> the individual control files under .git/rebase-merge/). And this >> stuff from sequencer.c is not just barely the slowest part, it's the >> slowest by a wide margin. I think we do a lot of needless writing which is unrelated to whether we write to one file or may files. For example from memory picking a commit involves writing the 'message', 'author-script', 'rewritten-pending' (which is often immediately deleted), 'rewritten-list' (we append to that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the pick has become empty), 'git-rebase-todo' is completely rewritten, and 'done' is appended to. None of this is necessary. For rewording and merges the only thing that needs to be written is the commit message for the external process to use. Fixup and squash add a couple more files into the mix. I think we would save a lot by only syncing the state to disk when we stop or run an exec command (the state needs to be synced so exec commands can alter the todo list). In those cases we need to write the index and possibly run an external process so writing a couple of files is probably insignificant. Where I think we can usefully consolidate is the one-line files which store the options rather than state - these are read an written much less frequently so I don't think they have much of a performance hit but it would be much nicer to just serialize the options to a single file. >> >> 3) I also want to allow cherry-picking or rebasing branches that >> aren't even checked out (assuming no conflicts are triggered; >> otherwise an error can be shown with the user asked to repeat with the >> operation connected to an active working directory). Exciting! >> For such an >> operation, the difference between "cherry-pick" and "rebase" is nearly >> irrelevant so you'd expect the code to be the same; every time I look >> at the code, though, it seems that the control structures are >> separating these two types of operations in more areas than just the >> reading and writing of the config. Yes this can be confusing, for example rebase and cherry-pick handle the todo list differently. Rebase removes the command before trying to pick the commit and adds it back if the pick fails for a non-conflict reason, cherry-pick only removes the command if the pick is successful. Best Wishes Phillip > Excellent, we're in agreement, then. > > The remaining question is: how do we want to go about it? Do we just want > to codify the notion that these are internal details that are nobody's > business, and if they are using the exact file system layout of the > current sequencer, then it's their responsibility to adapt? > > Ciao, > Dscho >
one more thought below... On 02/04/2021 12:28, Phillip Wood wrote: > Hi Dscho and Elijah > > On 31/03/2021 14:48, Johannes Schindelin wrote: >> Hi, >> >> On Tue, 30 Mar 2021, Elijah Newren wrote: >> >>> On Tue, Mar 30, 2021 at 3:13 AM Johannes Schindelin >>> <Johannes.Schindelin@gmx.de> wrote: >>> >>>> I'll allow myself one tangent: the subject of the sequencer's Unix >>>> shell script heritage seems to come up with an increasing frequency, >>>> in particular the awful "let's write out one file per setting" >>>> strategy. >>>> >>>> I would _love_ for `save_opts()` to write a JSON instead (or an INI >>>> via the `git_config_*()` family of functions, as is done already by >>>> the cherry-pick/revert stuff), now that we no longer have any shell >>>> script backend (apart from `--preserve-merges`, but that one is on its >>>> way out anyway). >>>> >>>> The one thing that concerns me with this idea is that I know for a >>>> fact that some enterprisey users play games with those files inside >>>> `<GIT_DIR>/rebase-merge` that should be considered internal >>>> implementation details. Not sure how to deprecate that properly, I >>>> don't think we have a sane way to detect whether users rely on these >>>> implementation details other than breaking their expectations, which >>>> is not really a gentle way to ask them to update their scripts. > > I think it depends if users are just reading the files or writing to > them. If they are reading them (which my scripts do) then we could > continue to write the important files along side the new single-file > that git actually reads. I think there is a distinction between the > files such as git-rebase-todo which hold state and the one-line files > which hold the options passed by the user. For example I have scripts > that read git-rebase-todo, rewritten-pending, rewritten-list, amend-head > and check that author-script exists. If a script starts a rebase then it > should know what options are in effect without reading them from > .git/rebase-merge. > >>> Ooh, I'm glad you took this tangent. May I follow it for a second? >>> I'd really, really like this too, for three reasons: >>> >>> 1) I constantly get confused about the massive duplication and >>> difference in control structures and which ones affect which type of >>> operations in sequencer.c. Having them both use an ini-file would >>> allow us to remove lots of that duplication. I'm sure there'll still >>> be some rebase-specific or cherry-pick-specific options, but we don't >>> need a separate parallel structure for every piece of config. One thing to bear in mind is that you can cherry-pick or revert a sequence of commits while rebasing - I think that means we need to store the state for rebase in a separate location to that for cherry-pick/revert Best Wishes Phillip >>> 2) In my merge-ort optimization work, rebasing 35 commits in linux.git >>> across a massive set of 26K upstream renames has dropped rename >>> detection time from the top spot. And it also dropped several other >>> things in the merge machinery from their spots as well. Do you know >>> what's the slowest now? Wasted time from sequencer.c: the unnecessary >>> process forking and all the useless disk writing (which I suspect is >>> mostly updating the index and working directory but also writing all >>> the individual control files under .git/rebase-merge/). And this >>> stuff from sequencer.c is not just barely the slowest part, it's the >>> slowest by a wide margin. > > I think we do a lot of needless writing which is unrelated to whether we > write to one file or may files. For example from memory picking a commit > involves writing the 'message', 'author-script', 'rewritten-pending' > (which is often immediately deleted), 'rewritten-list' (we append to > that one) 'CHERRY_PICK_HEAD' (which is immediately deleted unless the > pick has become empty), 'git-rebase-todo' is completely rewritten, and > 'done' is appended to. None of this is necessary. For rewording and > merges the only thing that needs to be written is the commit message for > the external process to use. Fixup and squash add a couple more files > into the mix. > > I think we would save a lot by only syncing the state to disk when we > stop or run an exec command (the state needs to be synced so exec > commands can alter the todo list). In those cases we need to write the > index and possibly run an external process so writing a couple of files > is probably insignificant. > > Where I think we can usefully consolidate is the one-line files which > store the options rather than state - these are read an written much > less frequently so I don't think they have much of a performance hit but > it would be much nicer to just serialize the options to a single file. > >>> >>> 3) I also want to allow cherry-picking or rebasing branches that >>> aren't even checked out (assuming no conflicts are triggered; >>> otherwise an error can be shown with the user asked to repeat with the >>> operation connected to an active working directory). > > Exciting! > >>> For such an >>> operation, the difference between "cherry-pick" and "rebase" is nearly >>> irrelevant so you'd expect the code to be the same; every time I look >>> at the code, though, it seems that the control structures are >>> separating these two types of operations in more areas than just the >>> reading and writing of the config. > > Yes this can be confusing, for example rebase and cherry-pick handle the > todo list differently. Rebase removes the command before trying to pick > the commit and adds it back if the pick fails for a non-conflict reason, > cherry-pick only removes the command if the pick is successful. > > Best Wishes > > Phillip > >> Excellent, we're in agreement, then. >> >> The remaining question is: how do we want to go about it? Do we just want >> to codify the notion that these are internal details that are nobody's >> business, and if they are using the exact file system layout of the >> current sequencer, then it's their responsibility to adapt? >> >> Ciao, >> Dscho >> >
Phillip Wood <phillip.wood123@gmail.com> writes: > I think we would save a lot by only syncing the state to disk when we > stop or run an exec command (the state needs to be synced so exec > commands can alter the todo list). In those cases we need to write the > index and possibly run an external process so writing a couple of > files is probably insignificant. The optimization opportunity of this may be a lot smaller than you would think---you must cater to not just exec but hook scripts that are run while a new commit is made, which means every step you'd need to write anyway. > Where I think we can usefully consolidate is the one-line files which > store the options rather than state - these are read an written much > less frequently so I don't think they have much of a performance hit > but it would be much nicer to just serialize the options to a single > file. Would that break external scripts, hooks, etc.? I am not sure if we even have any rough consensus for allowing other people to peek into the .git/rebase-*/ directories, but I am inclined to say that it sounds more like a solution looking for a problem than a good idea to solve some concrete problem.
On Fri, Apr 2, 2021 at 2:01 PM Junio C Hamano <gitster@pobox.com> wrote: > > Phillip Wood <phillip.wood123@gmail.com> writes: > > > I think we would save a lot by only syncing the state to disk when we > > stop or run an exec command (the state needs to be synced so exec > > commands can alter the todo list). In those cases we need to write the > > index and possibly run an external process so writing a couple of > > files is probably insignificant. > > The optimization opportunity of this may be a lot smaller than you > would think---you must cater to not just exec but hook scripts that > are run while a new commit is made, which means every step you'd > need to write anyway. From Documentation/git-rebase.txt, right after discussing how the backends disagree on which hooks are called and how they are called: ...In each case, the calling of these hooks was by accident of implementation rather than by design (both backends were originally implemented as shell scripts and happened to invoke other commands like 'git checkout' or 'git commit' that would call the hooks). Both backends should have the same behavior, though it is not entirely clear which, if any, is correct. We will likely make rebase stop calling either of these hooks in the future. Even if others now disagree with the above, I know I can get a huge speedup by changing sequencer to stop per-commit wasteful work (stop forking processes like "git commit", don't write control structures if the rebase hasn't ended or hit a conflict, don't update the working copy or index or reflog). It's enough of a speedup that if backward compatibility won't allow such a method to be used by default, I'd still make yet another backend that could be optionally used. And I'd have the default rebase and cherry-pick start printing annoying deprecation notices so that users become aware of a faster option. Should I go that route? t/helpers/test-fast-rebase.c already has a good start... > > Where I think we can usefully consolidate is the one-line files which > > store the options rather than state - these are read an written much > > less frequently so I don't think they have much of a performance hit > > but it would be much nicer to just serialize the options to a single > > file. > > Would that break external scripts, hooks, etc.? I am not sure if we > even have any rough consensus for allowing other people to peek into > the .git/rebase-*/ directories, but I am inclined to say that it > sounds more like a solution looking for a problem than a good idea > to solve some concrete problem. To be honest, my bigger complaint with the non-unified backend config is how far it bled into the code and how difficult it is to determine how things are controlled and which sections of code are relevant for which types of operations. And how much of a pain I'm worried it'd make the "allow rebasing and cherry-picking in a bare repository (or for an unchecked-out branch)" functionality. Perhaps the code can be cleaned up without changing these on-disk structures, in which case my concern for this point would lessen.
Elijah Newren <newren@gmail.com> writes: > Even if others now disagree with the above, I know I can get a huge > speedup by changing sequencer to stop per-commit wasteful work (stop > forking processes like "git commit", don't write control structures if > the rebase hasn't ended or hit a conflict, don't update the working > copy or index or reflog). It's enough of a speedup that if backward > compatibility won't allow such a method to be used by default, I'd > still make yet another backend that could be optionally used. And I'd > have the default rebase and cherry-pick start printing annoying > deprecation notices so that users become aware of a faster option. A faster and less powerful interface is good; I doubt deprecation would work well. If a workflow depends on things like post-commit hook, the affected users deserve some way to migrate to --exec or whatever method to compensate the loss of functionality.
Hi, On Fri, 2 Apr 2021, Junio C Hamano wrote: > Elijah Newren <newren@gmail.com> writes: > > > Even if others now disagree with the above, I know I can get a huge > > speedup by changing sequencer to stop per-commit wasteful work (stop > > forking processes like "git commit", don't write control structures if > > the rebase hasn't ended or hit a conflict, don't update the working > > copy or index or reflog). It's enough of a speedup that if backward > > compatibility won't allow such a method to be used by default, I'd > > still make yet another backend that could be optionally used. And I'd > > have the default rebase and cherry-pick start printing annoying > > deprecation notices so that users become aware of a faster option. > > A faster and less powerful interface is good; I doubt deprecation > would work well. If a workflow depends on things like post-commit > hook, the affected users deserve some way to migrate to --exec or > whatever method to compensate the loss of functionality. I could imagine that there is opportunity to "persist on disk only when needed". For example, if no `pre-commit` hook is installed that needs to be run, there is no need to update the worktree nor HEAD until the rebase is done. And this type of `only write to disk when needed` functionality could probably be abstracted away so much as to make the rest of the code look elegant again. Ciao, Dscho
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes: >> > ... It's enough of a speedup that if backward >> > compatibility won't allow such a method to be used by default, I'd >> > still make yet another backend that could be optionally used. And I'd >> > have the default rebase and cherry-pick start printing annoying >> > deprecation notices so that users become aware of a faster option. >> >> A faster and less powerful interface is good; I doubt deprecation >> would work well. If a workflow depends on things like post-commit >> hook, the affected users deserve some way to migrate to --exec or >> whatever method to compensate the loss of functionality. > > I could imagine that there is opportunity to "persist on disk only when > needed". For example, if no `pre-commit` hook is installed that needs to > be run, there is no need to update the worktree nor HEAD until the rebase > is done. > > And this type of `only write to disk when needed` functionality could > probably be abstracted away so much as to make the rest of the code > look elegant again. Yes, we are on the same page. What Elijah proposed as "another backend" would unfortunately be different, and needs to be adjusted with such an "only when needed" tweak. The check hopefully needs to be done only once (e.g. do we have this hook enabled? do we have that hook enabled? do we have a commit with this trait in the range being worked on? etc. etc.) upfront and for certain workflows may not require any persistence. And until that happens, "annoying deprecation notices" will never be a viable step to take.
Hi, On Thu, Apr 8, 2021 at 2:22 PM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > > On Fri, 2 Apr 2021, Junio C Hamano wrote: > > > Elijah Newren <newren@gmail.com> writes: > > > > > Even if others now disagree with the above, I know I can get a huge > > > speedup by changing sequencer to stop per-commit wasteful work (stop > > > forking processes like "git commit", don't write control structures if > > > the rebase hasn't ended or hit a conflict, don't update the working > > > copy or index or reflog). It would be interesting to know which of these updates (working copy, index or reflog) is the most costly. Of course the best would be to compare the costs when updates are done at each step vs only at the end. Some years ago I worked on the split index as some tests showed that, for really big repos with a big index, it could significantly speed up rebases. So I have the, perhaps wrong, impression that most of the gain would be related to the index. > > > It's enough of a speedup that if backward > > > compatibility won't allow such a method to be used by default, I'd > > > still make yet another backend that could be optionally used. And I'd > > > have the default rebase and cherry-pick start printing annoying > > > deprecation notices so that users become aware of a faster option. > > > > A faster and less powerful interface is good; I doubt deprecation > > would work well. If a workflow depends on things like post-commit > > hook, the affected users deserve some way to migrate to --exec or > > whatever method to compensate the loss of functionality. > > I could imagine that there is opportunity to "persist on disk only when > needed". For example, if no `pre-commit` hook is installed that needs to > be run, there is no need to update the worktree nor HEAD until the rebase > is done. > > And this type of `only write to disk when needed` functionality could > probably be abstracted away so much as to make the rest of the code > look elegant again. Yeah, I agree with this approach too. If the split index could also become the default one day, I guess we could be doing pretty well even when some hooks are installed.
Hi Christian, On Thu, 8 Apr 2021, Christian Couder wrote: > On Thu, Apr 8, 2021 at 2:22 PM Johannes Schindelin > <Johannes.Schindelin@gmx.de> wrote: > > > > On Fri, 2 Apr 2021, Junio C Hamano wrote: > > > > > Elijah Newren <newren@gmail.com> writes: > > > > > > > Even if others now disagree with the above, I know I can get a huge > > > > speedup by changing sequencer to stop per-commit wasteful work (stop > > > > forking processes like "git commit", don't write control structures if > > > > the rebase hasn't ended or hit a conflict, don't update the working > > > > copy or index or reflog). > > It would be interesting to know which of these updates (working copy, > index or reflog) is the most costly. Of course the best would be to > compare the costs when updates are done at each step vs only at the > end. I would bet that the worktree updates are the most costly thing. Note: there is also the step where new objects are written to the object database, but that is a step that we can _not_ skip. > If the split index could also become the default one day, I guess we > could be doing pretty well even when some hooks are installed. Hmm. My opinion about the split index does not match yours. I'd much rather see us tackle the challenge of allowing truly incremental index updates. That "split index" feature feels like a bandaid that tries to do sort of an incremental update, and fails at it because it is not truly incremental. Granted, that would look much more like a real database and is much, much harder than what the split index code does (think e.g. locking). Ciao, Dscho
diff --git a/builtin/revert.c b/builtin/revert.c index 314a86c5621b..81441020231a 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -182,7 +182,7 @@ static int run_sequencer(int argc, const char **argv, struct replay_opts *opts) "--signoff", opts->signoff, "--no-commit", opts->no_commit, "-x", opts->record_origin, - "--edit", opts->edit, + "--edit", opts->edit == 1, NULL); if (cmd) { @@ -230,8 +230,6 @@ int cmd_revert(int argc, const char **argv, const char *prefix) struct replay_opts opts = REPLAY_OPTS_INIT; int res; - if (isatty(0)) - opts.edit = 1; opts.action = REPLAY_REVERT; sequencer_init_config(&opts); res = run_sequencer(argc, argv, &opts); diff --git a/sequencer.c b/sequencer.c index 848204d3dc3f..d444c778a097 100644 --- a/sequencer.c +++ b/sequencer.c @@ -1860,14 +1860,26 @@ static void record_in_rewritten(struct object_id *oid, flush_rewritten_pending(); } +static int should_edit(struct replay_opts *opts) { + assert(opts->edit >= -1 && opts->edit <= 1); + if (opts->edit == -1) + /* + * Note that we only handle the case of non-conflicted + * commits; continue_single_pick() handles the conflicted + * commits itself instead of calling this function. + */ + return (opts->action == REPLAY_REVERT && isatty(0)) ? 1 : 0; + return opts->edit; +} + static int do_pick_commit(struct repository *r, enum todo_command command, struct commit *commit, struct replay_opts *opts, int final_fixup, int *check_todo) { - unsigned int flags = opts->edit ? EDIT_MSG : 0; - const char *msg_file = opts->edit ? NULL : git_path_merge_msg(r); + unsigned int flags = should_edit(opts) ? EDIT_MSG : 0; + const char *msg_file = should_edit(opts) ? NULL : git_path_merge_msg(r); struct object_id head; struct commit *base, *next, *parent; const char *base_label, *next_label; @@ -3101,9 +3113,9 @@ static int save_opts(struct replay_opts *opts) if (opts->no_commit) res |= git_config_set_in_file_gently(opts_file, "options.no-commit", "true"); - if (opts->edit) - res |= git_config_set_in_file_gently(opts_file, - "options.edit", "true"); + if (opts->edit != -1) + res |= git_config_set_in_file_gently(opts_file, "options.edit", + opts->edit ? "true" : "false"); if (opts->allow_empty) res |= git_config_set_in_file_gently(opts_file, "options.allow-empty", "true"); @@ -4077,7 +4089,7 @@ static int pick_commits(struct repository *r, prev_reflog_action = xstrdup(getenv(GIT_REFLOG_ACTION)); if (opts->allow_ff) assert(!(opts->signoff || opts->no_commit || - opts->record_origin || opts->edit || + opts->record_origin || should_edit(opts) || opts->committer_date_is_author_date || opts->ignore_date)); if (read_and_refresh_cache(r, opts)) @@ -4370,14 +4382,35 @@ static int pick_commits(struct repository *r, return sequencer_remove_state(opts); } -static int continue_single_pick(struct repository *r) +static int continue_single_pick(struct repository *r, struct replay_opts *opts) { - const char *argv[] = { "commit", NULL }; + struct strvec argv = STRVEC_INIT; + int want_edit; + int ret; if (!refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") && !refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) return error(_("no cherry-pick or revert in progress")); - return run_command_v_opt(argv, RUN_GIT_CMD); + + strvec_push(&argv, "commit"); + + /* + * continue_single_pick() handles the case of recovering from a + * conflict. should_edit() doesn't handle that case; for a conflict, + * we want to edit if the user asked for it, or if they didn't specify + * and stdin is a tty. + */ + want_edit = (opts->edit == 1) || ((opts->edit == -1) && isatty(0)); + if (!want_edit) + /* + * Include --cleanup=strip as well because we don't want the + * "# Conflicts:" messages. + */ + strvec_pushl(&argv, "--no-edit", "--cleanup=strip", NULL); + + ret = run_command_v_opt(argv.v, RUN_GIT_CMD); + strvec_clear(&argv); + return ret; } static int commit_staged_changes(struct repository *r, @@ -4547,7 +4580,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts) goto release_todo_list; } } else if (!file_exists(get_todo_path(opts))) - return continue_single_pick(r); + return continue_single_pick(r, opts); else if ((res = read_populate_todo(r, &todo_list, opts))) goto release_todo_list; @@ -4556,7 +4589,7 @@ int sequencer_continue(struct repository *r, struct replay_opts *opts) if (refs_ref_exists(get_main_ref_store(r), "CHERRY_PICK_HEAD") || refs_ref_exists(get_main_ref_store(r), "REVERT_HEAD")) { - res = continue_single_pick(r); + res = continue_single_pick(r, opts); if (res) goto release_todo_list; } diff --git a/sequencer.h b/sequencer.h index f8b2e4ab8527..d57d8ea23d7a 100644 --- a/sequencer.h +++ b/sequencer.h @@ -31,8 +31,10 @@ enum commit_msg_cleanup_mode { struct replay_opts { enum replay_action action; - /* Boolean options */ + /* Tri-state options: unspecified, false, or true */ int edit; + + /* Boolean options */ int record_origin; int no_commit; int signoff; @@ -71,7 +73,7 @@ struct replay_opts { /* Only used by REPLAY_NONE */ struct rev_info *revs; }; -#define REPLAY_OPTS_INIT { .action = -1, .current_fixups = STRBUF_INIT } +#define REPLAY_OPTS_INIT { .edit = -1, .action = -1, .current_fixups = STRBUF_INIT } /* * Note that ordering matters in this enum. Not only must it match the mapping diff --git a/t/t3510-cherry-pick-sequence.sh b/t/t3510-cherry-pick-sequence.sh index b76cb6de91d0..49010aa9469d 100755 --- a/t/t3510-cherry-pick-sequence.sh +++ b/t/t3510-cherry-pick-sequence.sh @@ -65,7 +65,7 @@ test_expect_success 'cherry-pick persists opts correctly' ' # gets interrupted, use a high-enough number that is larger # than the number of parents of any commit we have created mainline=4 && - test_expect_code 128 git cherry-pick -s -m $mainline --strategy=recursive -X patience -X ours initial..anotherpick && + test_expect_code 128 git cherry-pick -s -m $mainline --strategy=recursive -X patience -X ours --edit initial..anotherpick && test_path_is_dir .git/sequencer && test_path_is_file .git/sequencer/head && test_path_is_file .git/sequencer/todo && @@ -84,6 +84,36 @@ test_expect_success 'cherry-pick persists opts correctly' ' ours EOF git config --file=.git/sequencer/opts --get-all options.strategy-option >actual && + test_cmp expect actual && + echo "true" >expect && + git config --file=.git/sequencer/opts --get-all options.edit >actual && + test_cmp expect actual +' + +test_expect_success 'revert persists opts correctly' ' + pristine_detach initial && + # to make sure that the session to revert a sequence + # gets interrupted, revert commits that are not in the history + # of HEAD. + test_expect_code 1 git revert -s --strategy=recursive -X patience -X ours --no-edit picked yetanotherpick && + test_path_is_dir .git/sequencer && + test_path_is_file .git/sequencer/head && + test_path_is_file .git/sequencer/todo && + test_path_is_file .git/sequencer/opts && + echo "true" >expect && + git config --file=.git/sequencer/opts --get-all options.signoff >actual && + test_cmp expect actual && + echo "recursive" >expect && + git config --file=.git/sequencer/opts --get-all options.strategy >actual && + test_cmp expect actual && + cat >expect <<-\EOF && + patience + ours + EOF + git config --file=.git/sequencer/opts --get-all options.strategy-option >actual && + test_cmp expect actual && + echo "false" >expect && + git config --file=.git/sequencer/opts --get-all options.edit >actual && test_cmp expect actual '