Message ID | 20200124150500.15260-1-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] rebase -i: stop checking out the tip of the branch to rebase | expand |
Alban Gruin <alban.gruin@gmail.com> writes: > On big repositories, it's a performance penalty: with `rebase -i', the > user may have to wait before editing the todo list while git is > extracting the branch silently, and "quiet" rebases will be slower than > `am'. > > Since we already have the oid of the tip of the branch in > `opts->orig_head', it's useless to switch to this commit. > ... > Before this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m8,940s > user 0m6,830s > sys 0m2,121s > > After this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m1,834s > user 0m0,916s > sys 0m0,206s > > Reported-by: SZEDER Gábor <szeder.dev@gmail.com> > Signed-off-by: Alban Gruin <alban.gruin@gmail.com> > --- Good. > diff --git a/sequencer.c b/sequencer.c > index b9dbf1adb0..4dc245d7ec 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts, > return ret; > } > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit) > -{ > - const char *action; > - > - if (commit && *commit) { > - action = reflog_message(opts, "start", "checkout %s", commit); > - if (run_git_checkout(r, opts, commit, action)) > - return error(_("could not checkout %s"), commit); > - } > - > - return 0; > -} > - > static int checkout_onto(struct repository *r, struct replay_opts *opts, > const char *onto_name, const struct object_id *onto, > const char *orig_head) > diff --git a/sequencer.h b/sequencer.h > index 9f9ae291e3..74f1e2673e 100644 > --- a/sequencer.h > +++ b/sequencer.h > @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r, > const struct commit *current_head, > const struct object_id *new_head); > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit); > - > #define SUMMARY_INITIAL_COMMIT (1 << 0) > #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1) > void print_commit_summary(struct repository *repo, Nice to see this helper to go. Thanks.
Hi Alban, On Fri, 24 Jan 2020, Alban Gruin wrote: > One of the first things done when using a sequencer-based > rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo > list. This requires knowledge of the commit range to rebase. To get > the oid of the last commit of the range, the tip of the branch to rebase > is checked out with prepare_branch_to_be_rebased(), then the oid of the > head is read. After this, the tip of the branch is not even modified. > The `am' backend, on the other hand, does not check out the branch. > > On big repositories, it's a performance penalty: with `rebase -i', the > user may have to wait before editing the todo list while git is > extracting the branch silently, and "quiet" rebases will be slower than > `am'. > > Since we already have the oid of the tip of the branch in > `opts->orig_head', it's useless to switch to this commit. > > This removes the call to prepare_branch_to_be_rebased() in > do_interactive_rebase(), and adds a `orig_head' parameter to > get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it > is no longer used. At this point, I am a bit puzzled as a reader: why can we just drop this? My immediate reaction was: isn't this required to switch to a new branch when `switch_to` is non-`NULL`? So I went digging a little. The `prepare_branch_to_be_rebased()` call was introduced in 53bbcfbde7c (rebase -i: implement the main part of interactive rebase as a builtin, 2018-09-27). And looking at the `git-rebase--interactive` part of that patch, it becomes relatively obvious that we inherited this behavior from the shell scripting days. 2c58483a598 (rebase -i: rewrite setup_reflog_action() in C, 2018-08-10) converted the `setup_reflog_action` function (which oddly enough not only set up the reflog action, but also switched to a new branch if so configured). That function was introduced in d48f97aa854 (rebase: reindent function git_rebase__interactive, 2018-03-23), but that was _still_ not the commit that introduced that "let's check out the upstream" behavior. It goes _all_ the way back to 1b1dce4bae7 (Teach rebase an interactive mode, 2007-06-25). Except that back then, it was only done when a branch name was provided (`git rebase -i <upstream> <branch-to-switch-to>`). So it behaved correctly. The problem was introduced in 71786f54c41 (rebase: factor out reference parsing, 2011-02-06), as it substituted the `if test ! -z "$1"` with `if test ! -z "$switch_to"`, relying on the command-line parsing of `git-rebase.sh`. But wait! Wait, wait, wait! `switch_to` is still only set in that incantation where we provide a branch name _in addition to_ an upstream commit. Ah, I think I slowly see where this is going. The problem is actually 2ec33cdd19b (rebase--interactive: don't require what's rebased to be a branch, 2010-03-14) which failed to realize that essentially the entire `git checkout` was necessary to accommodate the subsequent call a mere 8 lines further down from that `checkout`: git symbolic-ref HEAD > "$DOTEST"/head-name 2> /dev/null || echo "detached HEAD" > "$DOTEST"/head-name So that 2ec33cdd19b commit could have saved itself a lot of trouble by realizing what the role of that `git checkout` is, and should have pulled that `head-name` logic into that conditional instead of _actually_ switching to a new branch. Now, let's see what the C code does to determine "head-name". Indeed, it is already handled in the option parsing, in that monster of a function called `cmd_rebase()`. And yes, I think that `head_name` should also be mentioned in this commit message, as something like git rebase -i <base> <branch-to-switch-to> should eventually indeed switch to the specified branch, and this here patch does _not_ break that promise. This is my only concern with this patch, though, therefore: Acked-by: Johannes Schindelin <johannes.schindelin@gmx.de> Thanks, Dscho > This introduces a visible change: as we do not switch on the tip of the > branch to rebase, no reflog entry is created at the beginning of the > rebase for it. > > Unscientific performance measurements, performed on linux.git, are as > follow: > > Before this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m8,940s > user 0m6,830s > sys 0m2,121s > > After this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m1,834s > user 0m0,916s > sys 0m0,206s > > Reported-by: SZEDER Gábor <szeder.dev@gmail.com> > Signed-off-by: Alban Gruin <alban.gruin@gmail.com> > --- > > Added a line in the first paragraph to make it clear that the `am' > backend is not affected. > > builtin/rebase.c | 18 +++++------------- > sequencer.c | 14 -------------- > sequencer.h | 3 --- > 3 files changed, 5 insertions(+), 30 deletions(-) > > diff --git a/builtin/rebase.c b/builtin/rebase.c > index 8081741f8a..6154ad8fa5 100644 > --- a/builtin/rebase.c > +++ b/builtin/rebase.c > @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags) > } > > static int get_revision_ranges(struct commit *upstream, struct commit *onto, > - const char **head_hash, > + struct object_id *orig_head, const char **head_hash, > char **revisions, char **shortrevisions) > { > struct commit *base_rev = upstream ? upstream : onto; > const char *shorthead; > - struct object_id orig_head; > - > - if (get_oid("HEAD", &orig_head)) > - return error(_("no HEAD?")); > > - *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ); > + *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); > *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), > *head_hash); > > - shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV); > + shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); > > if (upstream) { > const char *shortrev; > @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) > struct replay_opts replay = get_replay_opts(opts); > struct string_list commands = STRING_LIST_INIT_DUP; > > - if (prepare_branch_to_be_rebased(the_repository, &replay, > - opts->switch_to)) > - return -1; > - > - if (get_revision_ranges(opts->upstream, opts->onto, &head_hash, > - &revisions, &shortrevisions)) > + if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, > + &head_hash, &revisions, &shortrevisions)) > return -1; > > if (init_basic_state(&replay, > diff --git a/sequencer.c b/sequencer.c > index b9dbf1adb0..4dc245d7ec 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts, > return ret; > } > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit) > -{ > - const char *action; > - > - if (commit && *commit) { > - action = reflog_message(opts, "start", "checkout %s", commit); > - if (run_git_checkout(r, opts, commit, action)) > - return error(_("could not checkout %s"), commit); > - } > - > - return 0; > -} > - > static int checkout_onto(struct repository *r, struct replay_opts *opts, > const char *onto_name, const struct object_id *onto, > const char *orig_head) > diff --git a/sequencer.h b/sequencer.h > index 9f9ae291e3..74f1e2673e 100644 > --- a/sequencer.h > +++ b/sequencer.h > @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r, > const struct commit *current_head, > const struct object_id *new_head); > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit); > - > #define SUMMARY_INITIAL_COMMIT (1 << 0) > #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1) > void print_commit_summary(struct repository *repo, > -- > 2.24.1 > >
diff --git a/builtin/rebase.c b/builtin/rebase.c index 8081741f8a..6154ad8fa5 100644 --- a/builtin/rebase.c +++ b/builtin/rebase.c @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags) } static int get_revision_ranges(struct commit *upstream, struct commit *onto, - const char **head_hash, + struct object_id *orig_head, const char **head_hash, char **revisions, char **shortrevisions) { struct commit *base_rev = upstream ? upstream : onto; const char *shorthead; - struct object_id orig_head; - - if (get_oid("HEAD", &orig_head)) - return error(_("no HEAD?")); - *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ); + *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), *head_hash); - shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV); + shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); if (upstream) { const char *shortrev; @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) struct replay_opts replay = get_replay_opts(opts); struct string_list commands = STRING_LIST_INIT_DUP; - if (prepare_branch_to_be_rebased(the_repository, &replay, - opts->switch_to)) - return -1; - - if (get_revision_ranges(opts->upstream, opts->onto, &head_hash, - &revisions, &shortrevisions)) + if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, + &head_hash, &revisions, &shortrevisions)) return -1; if (init_basic_state(&replay, diff --git a/sequencer.c b/sequencer.c index b9dbf1adb0..4dc245d7ec 100644 --- a/sequencer.c +++ b/sequencer.c @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts, return ret; } -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, - const char *commit) -{ - const char *action; - - if (commit && *commit) { - action = reflog_message(opts, "start", "checkout %s", commit); - if (run_git_checkout(r, opts, commit, action)) - return error(_("could not checkout %s"), commit); - } - - return 0; -} - static int checkout_onto(struct repository *r, struct replay_opts *opts, const char *onto_name, const struct object_id *onto, const char *orig_head) diff --git a/sequencer.h b/sequencer.h index 9f9ae291e3..74f1e2673e 100644 --- a/sequencer.h +++ b/sequencer.h @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r, const struct commit *current_head, const struct object_id *new_head); -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, - const char *commit); - #define SUMMARY_INITIAL_COMMIT (1 << 0) #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1) void print_commit_summary(struct repository *repo,
One of the first things done when using a sequencer-based rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo list. This requires knowledge of the commit range to rebase. To get the oid of the last commit of the range, the tip of the branch to rebase is checked out with prepare_branch_to_be_rebased(), then the oid of the head is read. After this, the tip of the branch is not even modified. The `am' backend, on the other hand, does not check out the branch. On big repositories, it's a performance penalty: with `rebase -i', the user may have to wait before editing the todo list while git is extracting the branch silently, and "quiet" rebases will be slower than `am'. Since we already have the oid of the tip of the branch in `opts->orig_head', it's useless to switch to this commit. This removes the call to prepare_branch_to_be_rebased() in do_interactive_rebase(), and adds a `orig_head' parameter to get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it is no longer used. This introduces a visible change: as we do not switch on the tip of the branch to rebase, no reflog entry is created at the beginning of the rebase for it. Unscientific performance measurements, performed on linux.git, are as follow: Before this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m8,940s user 0m6,830s sys 0m2,121s After this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m1,834s user 0m0,916s sys 0m0,206s Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- Added a line in the first paragraph to make it clear that the `am' backend is not affected. builtin/rebase.c | 18 +++++------------- sequencer.c | 14 -------------- sequencer.h | 3 --- 3 files changed, 5 insertions(+), 30 deletions(-)