Message ID | 20200124144545.12984-1-alban.gruin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] rebase -i: stop checking out the tip of the branch to rebase | expand |
Le 24/01/2020 à 15:45, Alban Gruin a écrit : > One of the first things done when using a sequencer-based > rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo > list. This requires knowledge of the commit range to rebase. To get > the oid of the last commit of the range, the tip of the branch to rebase > is checked out with prepare_branch_to_be_rebased(), then the oid of the > head is read. After this, the tip of the branch is not even modified. > > On big repositories, it's a performance penalty: with `rebase -i', the > user may have to wait before editing the todo list while git is > extracting the branch silently, and "quiet" rebases will be slower than > `am'. > > Since we already have the oid of the tip of the branch in > `opts->orig_head', it's useless to switch to this commit. > > This removes the call to prepare_branch_to_be_rebased() in > do_interactive_rebase(), and adds a `orig_head' parameter to > get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it > is no longer used. > > This introduces a visible change: as we do not switch on the tip of the > branch to rebase, no reflog entry is created at the beginning of the > rebase for it. > > Unscientific performance measurements, performed on linux.git, are as > follow: > > Before this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m8,940s > user 0m6,830s > sys 0m2,121s > > After this patch: > > $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 > > real 0m1,834s > user 0m0,916s > sys 0m0,206s > > Reported-by: SZEDER Gábor <szeder.dev@gmail.com> > Signed-off-by: Alban Gruin <alban.gruin@gmail.com> > --- > Forget this patch, I forgot to clearly say that the `am' backend is not affected. > Notes: > Changes since v1: > > - The first version of the commit message talked specifically about > `rebase -i', but this problem is common to all sequencer-based > rebases. The first paragraph has been reworded to clear up the > confusion. > > - Included benchmarks in the commit message, as suggested by Elijah > Newren. > > The code did not change. > > builtin/rebase.c | 18 +++++------------- > sequencer.c | 14 -------------- > sequencer.h | 3 --- > 3 files changed, 5 insertions(+), 30 deletions(-) > > diff --git a/builtin/rebase.c b/builtin/rebase.c > index 8081741f8a..6154ad8fa5 100644 > --- a/builtin/rebase.c > +++ b/builtin/rebase.c > @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags) > } > > static int get_revision_ranges(struct commit *upstream, struct commit *onto, > - const char **head_hash, > + struct object_id *orig_head, const char **head_hash, > char **revisions, char **shortrevisions) > { > struct commit *base_rev = upstream ? upstream : onto; > const char *shorthead; > - struct object_id orig_head; > - > - if (get_oid("HEAD", &orig_head)) > - return error(_("no HEAD?")); > > - *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ); > + *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); > *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), > *head_hash); > > - shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV); > + shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); > > if (upstream) { > const char *shortrev; > @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) > struct replay_opts replay = get_replay_opts(opts); > struct string_list commands = STRING_LIST_INIT_DUP; > > - if (prepare_branch_to_be_rebased(the_repository, &replay, > - opts->switch_to)) > - return -1; > - > - if (get_revision_ranges(opts->upstream, opts->onto, &head_hash, > - &revisions, &shortrevisions)) > + if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, > + &head_hash, &revisions, &shortrevisions)) > return -1; > > if (init_basic_state(&replay, > diff --git a/sequencer.c b/sequencer.c > index b9dbf1adb0..4dc245d7ec 100644 > --- a/sequencer.c > +++ b/sequencer.c > @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts, > return ret; > } > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit) > -{ > - const char *action; > - > - if (commit && *commit) { > - action = reflog_message(opts, "start", "checkout %s", commit); > - if (run_git_checkout(r, opts, commit, action)) > - return error(_("could not checkout %s"), commit); > - } > - > - return 0; > -} > - > static int checkout_onto(struct repository *r, struct replay_opts *opts, > const char *onto_name, const struct object_id *onto, > const char *orig_head) > diff --git a/sequencer.h b/sequencer.h > index 9f9ae291e3..74f1e2673e 100644 > --- a/sequencer.h > +++ b/sequencer.h > @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r, > const struct commit *current_head, > const struct object_id *new_head); > > -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, > - const char *commit); > - > #define SUMMARY_INITIAL_COMMIT (1 << 0) > #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1) > void print_commit_summary(struct repository *repo, >
On 2020-01-24 15:45, Alban Gruin wrote: > Notes: > Changes since v1: > > - The first version of the commit message talked specifically about > `rebase -i', but this problem is common to all sequencer-based > rebases. The first paragraph has been reworded to clear up the > confusion. Would it make sense to update the subject line as well?
Alban Gruin <alban.gruin@gmail.com> writes: > Forget this patch, I forgot to clearly say that the `am' backend is not > affected. The phrase "is not affected" makes it sound like "Do not worry, I made sure it is not broken by this patch", but I do not think that is the more important part ;-). The shared codepath for all types of rebase before dispatching already knew what commit at the tip of the branch being rebased is, but the sequencer-based backend was doing unnecessary work to figure it out again by checking it out. And this patch is about fixing that, isn't it? So I do not think singling out 'am' is a good use of readers' time. The first paragraph can be further tweaked why the extra checkout is unneeded. Before dispatching the control to one of the individual rebase backends, the shared codepath in "rebase" figures out what branch is being rebased, because it is necessary to compute the range of commits to replay to run any backend. The rebase backend based on the sequencer machinery (used for '-i', '-r' and '-m') however computed this commit range by actually checking out the branch and reading HEAD, which was unnecessary, as the working tree is then immediately gets reset to that of the commit on which rebased history is built (aka "onto" commit). or something along the line, perhaps? With this patch applied, the wasteful prepare_branch_to_be_rebased() has no caller, and the patch removes it from sequencer.[ch] as well, which is very good.
diff --git a/builtin/rebase.c b/builtin/rebase.c index 8081741f8a..6154ad8fa5 100644 --- a/builtin/rebase.c +++ b/builtin/rebase.c @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags) } static int get_revision_ranges(struct commit *upstream, struct commit *onto, - const char **head_hash, + struct object_id *orig_head, const char **head_hash, char **revisions, char **shortrevisions) { struct commit *base_rev = upstream ? upstream : onto; const char *shorthead; - struct object_id orig_head; - - if (get_oid("HEAD", &orig_head)) - return error(_("no HEAD?")); - *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ); + *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ); *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid), *head_hash); - shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV); + shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV); if (upstream) { const char *shortrev; @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags) struct replay_opts replay = get_replay_opts(opts); struct string_list commands = STRING_LIST_INIT_DUP; - if (prepare_branch_to_be_rebased(the_repository, &replay, - opts->switch_to)) - return -1; - - if (get_revision_ranges(opts->upstream, opts->onto, &head_hash, - &revisions, &shortrevisions)) + if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head, + &head_hash, &revisions, &shortrevisions)) return -1; if (init_basic_state(&replay, diff --git a/sequencer.c b/sequencer.c index b9dbf1adb0..4dc245d7ec 100644 --- a/sequencer.c +++ b/sequencer.c @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts, return ret; } -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, - const char *commit) -{ - const char *action; - - if (commit && *commit) { - action = reflog_message(opts, "start", "checkout %s", commit); - if (run_git_checkout(r, opts, commit, action)) - return error(_("could not checkout %s"), commit); - } - - return 0; -} - static int checkout_onto(struct repository *r, struct replay_opts *opts, const char *onto_name, const struct object_id *onto, const char *orig_head) diff --git a/sequencer.h b/sequencer.h index 9f9ae291e3..74f1e2673e 100644 --- a/sequencer.h +++ b/sequencer.h @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r, const struct commit *current_head, const struct object_id *new_head); -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts, - const char *commit); - #define SUMMARY_INITIAL_COMMIT (1 << 0) #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1) void print_commit_summary(struct repository *repo,
One of the first things done when using a sequencer-based rebase (ie. `rebase -i', `rebase -r', or `rebase -m') is to make a todo list. This requires knowledge of the commit range to rebase. To get the oid of the last commit of the range, the tip of the branch to rebase is checked out with prepare_branch_to_be_rebased(), then the oid of the head is read. After this, the tip of the branch is not even modified. On big repositories, it's a performance penalty: with `rebase -i', the user may have to wait before editing the todo list while git is extracting the branch silently, and "quiet" rebases will be slower than `am'. Since we already have the oid of the tip of the branch in `opts->orig_head', it's useless to switch to this commit. This removes the call to prepare_branch_to_be_rebased() in do_interactive_rebase(), and adds a `orig_head' parameter to get_revision_ranges(). prepare_branch_to_be_rebased() is removed as it is no longer used. This introduces a visible change: as we do not switch on the tip of the branch to rebase, no reflog entry is created at the beginning of the rebase for it. Unscientific performance measurements, performed on linux.git, are as follow: Before this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m8,940s user 0m6,830s sys 0m2,121s After this patch: $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50 real 0m1,834s user 0m0,916s sys 0m0,206s Reported-by: SZEDER Gábor <szeder.dev@gmail.com> Signed-off-by: Alban Gruin <alban.gruin@gmail.com> --- Notes: Changes since v1: - The first version of the commit message talked specifically about `rebase -i', but this problem is common to all sequencer-based rebases. The first paragraph has been reworded to clear up the confusion. - Included benchmarks in the commit message, as suggested by Elijah Newren. The code did not change. builtin/rebase.c | 18 +++++------------- sequencer.c | 14 -------------- sequencer.h | 3 --- 3 files changed, 5 insertions(+), 30 deletions(-)