[v1] rebase -i: stop checking out the tip of the branch to rebase
diff mbox series

Message ID 20200121191857.23047-1-alban.gruin@gmail.com
State New
Headers show
Series
  • [v1] rebase -i: stop checking out the tip of the branch to rebase
Related show

Commit Message

Alban Gruin Jan. 21, 2020, 7:18 p.m. UTC
One of the first things done by the interactive rebase is to make a todo
list.  This requires knowledge of the commit range to rebase.  To get
the oid of the last commit of the range, the tip of the branch to rebase
is checked out with prepare_branch_to_be_rebased(), then the oid of the
HEAD is read.  On big repositories, it's a performance penalty: the user
may have to wait before editing the todo list while git is extracting the
branch silently (because git-checkout is silenced here).  After this,
the head of the branch is not even modified.

Since we already have the oid of the tip of the branch in
`opts->orig_head', it's useless to switch to this commit.

This removes the call to prepare_branch_to_be_rebased() in
do_interactive_rebase(), and adds a `orig_head' parameter to
get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
is no longer used.

This introduces a visible change: as we do not switch on the tip of the
branch to rebase, no reflog entry is created at the beginning of the
rebase for it.

Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
---

Notes:
    Improvements brought by this patch:
    
    Before:
    
    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
    
    real    0m8,940s
    user    0m6,830s
    sys     0m2,121s
    
    After:
    
    $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
    
    real    0m1,834s
    user    0m0,916s
    sys     0m0,206s
    
    Both tests have been performed on a 5400 RPM SATA III hard drive.

 builtin/rebase.c | 18 +++++-------------
 sequencer.c      | 14 --------------
 sequencer.h      |  3 ---
 3 files changed, 5 insertions(+), 30 deletions(-)

Comments

Elijah Newren Jan. 21, 2020, 8:07 p.m. UTC | #1
Hi Alban,

// Adding Phillip and Johannes since they know the sequencer internals
very well.

On Tue, Jan 21, 2020 at 11:21 AM Alban Gruin <alban.gruin@gmail.com> wrote:
>
> One of the first things done by the interactive rebase is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> HEAD is read.  On big repositories, it's a performance penalty: the user
> may have to wait before editing the todo list while git is extracting the
> branch silently (because git-checkout is silenced here).  After this,
> the head of the branch is not even modified.
>
> Since we already have the oid of the tip of the branch in
> `opts->orig_head', it's useless to switch to this commit.
>
> This removes the call to prepare_branch_to_be_rebased() in
> do_interactive_rebase(), and adds a `orig_head' parameter to
> get_revision_ranges().  prepare_branch_to_be_rebased() is removed as it
> is no longer used.
>
> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.

Oh, sweet, thanks for digging in.  I had also dug in just after the
report, but not quite far enough as I still had failing tests and I
was feeling a bit stretched thin on other projects so I punted hoping
that SZEDER would post something.  Looks like the orig_head thing was
probably what I was missing.

I was a little surprised that there wasn't any regression test that
needed to be modified, as it reminded me of a previous conversation
about excessive work in the interactive backend[1], but after looking
it up that was apparently about too many calls to commit rather than
too many calls to checkout.

[1] https://lore.kernel.org/git/nycvar.QRO.7.76.6.1811121614190.39@tvgsbejvaqbjf.bet/

> Reported-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Alban Gruin <alban.gruin@gmail.com>
> ---
>
> Notes:
>     Improvements brought by this patch:
>
>     Before:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m8,940s
>     user    0m6,830s
>     sys     0m2,121s
>
>     After:
>
>     $ time git rebase -m --onto v4.18 463fa44eec2fef50~ 463fa44eec2fef50
>
>     real    0m1,834s
>     user    0m0,916s
>     sys     0m0,206s

Nice...do we want to mention this in the commit message proper too?

>
>     Both tests have been performed on a 5400 RPM SATA III hard drive.
>
>  builtin/rebase.c | 18 +++++-------------
>  sequencer.c      | 14 --------------
>  sequencer.h      |  3 ---
>  3 files changed, 5 insertions(+), 30 deletions(-)
>
> diff --git a/builtin/rebase.c b/builtin/rebase.c
> index 8081741f8a..6154ad8fa5 100644
> --- a/builtin/rebase.c
> +++ b/builtin/rebase.c
> @@ -246,21 +246,17 @@ static int edit_todo_file(unsigned flags)
>  }
>
>  static int get_revision_ranges(struct commit *upstream, struct commit *onto,
> -                              const char **head_hash,
> +                              struct object_id *orig_head, const char **head_hash,
>                                char **revisions, char **shortrevisions)
>  {
>         struct commit *base_rev = upstream ? upstream : onto;
>         const char *shorthead;
> -       struct object_id orig_head;
> -
> -       if (get_oid("HEAD", &orig_head))
> -               return error(_("no HEAD?"));
>
> -       *head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
> +       *head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
>         *revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
>                                                    *head_hash);
>
> -       shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
> +       shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
>
>         if (upstream) {
>                 const char *shortrev;
> @@ -314,12 +310,8 @@ static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
>         struct replay_opts replay = get_replay_opts(opts);
>         struct string_list commands = STRING_LIST_INIT_DUP;
>
> -       if (prepare_branch_to_be_rebased(the_repository, &replay,
> -                                        opts->switch_to))
> -               return -1;
> -
> -       if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
> -                               &revisions, &shortrevisions))
> +       if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
> +                               &head_hash, &revisions, &shortrevisions))
>                 return -1;
>
>         if (init_basic_state(&replay,
> diff --git a/sequencer.c b/sequencer.c
> index b9dbf1adb0..4dc245d7ec 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -3715,20 +3715,6 @@ static int run_git_checkout(struct repository *r, struct replay_opts *opts,
>         return ret;
>  }
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -                                const char *commit)
> -{
> -       const char *action;
> -
> -       if (commit && *commit) {
> -               action = reflog_message(opts, "start", "checkout %s", commit);
> -               if (run_git_checkout(r, opts, commit, action))
> -                       return error(_("could not checkout %s"), commit);
> -       }
> -
> -       return 0;
> -}
> -
>  static int checkout_onto(struct repository *r, struct replay_opts *opts,
>                          const char *onto_name, const struct object_id *onto,
>                          const char *orig_head)
> diff --git a/sequencer.h b/sequencer.h
> index 9f9ae291e3..74f1e2673e 100644
> --- a/sequencer.h
> +++ b/sequencer.h
> @@ -190,9 +190,6 @@ void commit_post_rewrite(struct repository *r,
>                          const struct commit *current_head,
>                          const struct object_id *new_head);
>
> -int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
> -                                const char *commit);
> -
>  #define SUMMARY_INITIAL_COMMIT   (1 << 0)
>  #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
>  void print_commit_summary(struct repository *repo,
> --
> 2.24.1

The code looks reasonable to me, but I'm still not completely familiar
with all the rebase and sequencer code so I'm hoping Phillip or
Johannes can give a thumbs up.

Thanks for digging into this and figuring out the bits that I missed
when I tried.


Elijah
Junio C Hamano Jan. 22, 2020, 8:24 p.m. UTC | #2
Alban Gruin <alban.gruin@gmail.com> writes:

> This introduces a visible change: as we do not switch on the tip of the
> branch to rebase, no reflog entry is created at the beginning of the
> rebase for it.

Fortunately, this does not mean "git log @{-1}.." during a rebase
starts to behave differently.  If it were the case, that would have
been a bad regression, but that is not the case.

If you omit one reflog entry, you are creating TWO changes.  The
detaching of the HEAD to the "onto" commit would record, just like
any other reflog entry, would record from which commit we detached
HEAD from.  That reflog entry will also be different, as you'd be
switching from a different commit.  

I am not sure what the implication of this difference will be in
practice, though, but it must be smaller than the effect of the
missing reflog entry discussed earlier.
Junio C Hamano Jan. 22, 2020, 8:47 p.m. UTC | #3
Alban Gruin <alban.gruin@gmail.com> writes:

> One of the first things done by the interactive rebase is to make a todo
> list.  This requires knowledge of the commit range to rebase.  To get
> the oid of the last commit of the range, the tip of the branch to rebase
> is checked out with prepare_branch_to_be_rebased(), then the oid of the
> HEAD is read.  On big repositories, it's a performance penalty: the user
> may have to wait before editing the todo list while git is extracting the
> branch silently (because git-checkout is silenced here).  After this,
> the head of the branch is not even modified.

Hmph.  One curious thing in the above is why this is specific to
"rebase -i". The need to know the commit range to rebase is shared
across any rebase backend, and it would be the most natural to parse
the optional second argument (i.e. the branch or the commit to
rebase) before builtin/rebase.c dispatches to a specific rebase
backend, wouldn't it?  So, the question is why a normal "rebase"
does not need the same fix?

If the answer is "rebase in general was fine without extra checkout,
but 'rebase -i' was doing an unnecessary checkout" (or any other
answer) that is something that would help future readers to record
in the commit log message.

Thanks.
Alban Gruin Jan. 24, 2020, 2:45 p.m. UTC | #4
Hi Junio,

Le 22/01/2020 à 21:47, Junio C Hamano a écrit :
> Alban Gruin <alban.gruin@gmail.com> writes:
> 
>> One of the first things done by the interactive rebase is to make a todo
>> list.  This requires knowledge of the commit range to rebase.  To get
>> the oid of the last commit of the range, the tip of the branch to rebase
>> is checked out with prepare_branch_to_be_rebased(), then the oid of the
>> HEAD is read.  On big repositories, it's a performance penalty: the user
>> may have to wait before editing the todo list while git is extracting the
>> branch silently (because git-checkout is silenced here).  After this,
>> the head of the branch is not even modified.
> 
> Hmph.  One curious thing in the above is why this is specific to
> "rebase -i". The need to know the commit range to rebase is shared
> across any rebase backend, and it would be the most natural to parse
> the optional second argument (i.e. the branch or the commit to
> rebase) before builtin/rebase.c dispatches to a specific rebase
> backend, wouldn't it?  So, the question is why a normal "rebase"
> does not need the same fix?
> 

That's a problem shared by all rebases using the sequencer, so -m and -r
are also affected by this.  `am' is not.

> If the answer is "rebase in general was fine without extra checkout,
> but 'rebase -i' was doing an unnecessary checkout" (or any other
> answer) that is something that would help future readers to record
> in the commit log message.
> 

So yes, the answer is that the am backend does not perform this
checkout, unlike all others rebases.

I will resend this patch very soon.

> Thanks.
> 
> 

Cheers,
Alban

Patch
diff mbox series

diff --git a/builtin/rebase.c b/builtin/rebase.c
index 8081741f8a..6154ad8fa5 100644
--- a/builtin/rebase.c
+++ b/builtin/rebase.c
@@ -246,21 +246,17 @@  static int edit_todo_file(unsigned flags)
 }
 
 static int get_revision_ranges(struct commit *upstream, struct commit *onto,
-			       const char **head_hash,
+			       struct object_id *orig_head, const char **head_hash,
 			       char **revisions, char **shortrevisions)
 {
 	struct commit *base_rev = upstream ? upstream : onto;
 	const char *shorthead;
-	struct object_id orig_head;
-
-	if (get_oid("HEAD", &orig_head))
-		return error(_("no HEAD?"));
 
-	*head_hash = find_unique_abbrev(&orig_head, GIT_MAX_HEXSZ);
+	*head_hash = find_unique_abbrev(orig_head, GIT_MAX_HEXSZ);
 	*revisions = xstrfmt("%s...%s", oid_to_hex(&base_rev->object.oid),
 						   *head_hash);
 
-	shorthead = find_unique_abbrev(&orig_head, DEFAULT_ABBREV);
+	shorthead = find_unique_abbrev(orig_head, DEFAULT_ABBREV);
 
 	if (upstream) {
 		const char *shortrev;
@@ -314,12 +310,8 @@  static int do_interactive_rebase(struct rebase_options *opts, unsigned flags)
 	struct replay_opts replay = get_replay_opts(opts);
 	struct string_list commands = STRING_LIST_INIT_DUP;
 
-	if (prepare_branch_to_be_rebased(the_repository, &replay,
-					 opts->switch_to))
-		return -1;
-
-	if (get_revision_ranges(opts->upstream, opts->onto, &head_hash,
-				&revisions, &shortrevisions))
+	if (get_revision_ranges(opts->upstream, opts->onto, &opts->orig_head,
+				&head_hash, &revisions, &shortrevisions))
 		return -1;
 
 	if (init_basic_state(&replay,
diff --git a/sequencer.c b/sequencer.c
index b9dbf1adb0..4dc245d7ec 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -3715,20 +3715,6 @@  static int run_git_checkout(struct repository *r, struct replay_opts *opts,
 	return ret;
 }
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit)
-{
-	const char *action;
-
-	if (commit && *commit) {
-		action = reflog_message(opts, "start", "checkout %s", commit);
-		if (run_git_checkout(r, opts, commit, action))
-			return error(_("could not checkout %s"), commit);
-	}
-
-	return 0;
-}
-
 static int checkout_onto(struct repository *r, struct replay_opts *opts,
 			 const char *onto_name, const struct object_id *onto,
 			 const char *orig_head)
diff --git a/sequencer.h b/sequencer.h
index 9f9ae291e3..74f1e2673e 100644
--- a/sequencer.h
+++ b/sequencer.h
@@ -190,9 +190,6 @@  void commit_post_rewrite(struct repository *r,
 			 const struct commit *current_head,
 			 const struct object_id *new_head);
 
-int prepare_branch_to_be_rebased(struct repository *r, struct replay_opts *opts,
-				 const char *commit);
-
 #define SUMMARY_INITIAL_COMMIT   (1 << 0)
 #define SUMMARY_SHOW_AUTHOR_DATE (1 << 1)
 void print_commit_summary(struct repository *repo,