[1/1] checkout: add simple check for 'git checkout -b'
diff mbox series

Message ID dcf5c60c69d8275a557ffe3d3ae30911d2140162.1567098090.git.gitgitgadget@gmail.com
State New
Headers show
Series
  • checkout: add simple check for 'git checkout -b'
Related show

Commit Message

kdnakt via GitGitGadget Aug. 29, 2019, 5:01 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The 'git switch' command was created to separate half of the
behavior of 'git checkout'. It specifically has the mode to
do nothing with the index and working directory if the user
only specifies to create a new branch and change HEAD to that
branch. This is also the behavior most users expect from
'git checkout -b', but for historical reasons it also performs
an index update by scanning the working directory. This can be
slow for even moderately-sized repos.

A performance fix for 'git checkout -b' was introduced by
fa655d8411 (checkout: optimize "git checkout -b <new_branch>"
2018-08-16). That change includes details about the config
setting checkout.optimizeNewBranch when the sparse-checkout
feature is required. The way this change detected if this
behavior change is safe was through the skip_merge_working_tree()
method. This method was complex and needed to be updated
as new options were introduced.

This behavior was essentially reverted by 65f099b ("switch:
no worktree status unless real branch switch happens"
2019-03-29). Instead, two members of the checkout_opts struct
were used to distinguish between 'git checkout' and 'git switch':

    * switch_branch_doing_nothing_is_ok
    * only_merge_on_switching_branches

These settings have opposite values depending on if we start
in cmd_checkout or cmd_switch.

The message for 64f099b includes "Users of big repos are
encouraged to move to switch." Making this change while
'git switch' is still experimental is too aggressive.

Create a happy medium between these two options by making
'git checkout -b <branch>' behave just like 'git switch',
but only if we read exactly those arguments. This must
be done in cmd_checkout to avoid the arguments being
consumed by the option parsing logic.

This differs from the previous change by fa644d8 in that
the config option checkout.optimizeNewBranch remains
deleted. This means that 'git checkout -b' will ignore
the index merge even if we have a sparse-checkout file.
While this is a behavior change for 'git checkout -b',
it matches the behavior of 'git switch -c'.

Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 builtin/checkout.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Elijah Newren Aug. 29, 2019, 5:25 p.m. UTC | #1
On Thu, Aug 29, 2019 at 10:04 AM Derrick Stolee via GitGitGadget
<gitgitgadget@gmail.com> wrote:
>
> From: Derrick Stolee <dstolee@microsoft.com>
>
> The 'git switch' command was created to separate half of the
> behavior of 'git checkout'. It specifically has the mode to
> do nothing with the index and working directory if the user
> only specifies to create a new branch and change HEAD to that
> branch. This is also the behavior most users expect from
> 'git checkout -b', but for historical reasons it also performs
> an index update by scanning the working directory. This can be
> slow for even moderately-sized repos.
>
> A performance fix for 'git checkout -b' was introduced by
> fa655d8411 (checkout: optimize "git checkout -b <new_branch>"
> 2018-08-16). That change includes details about the config
> setting checkout.optimizeNewBranch when the sparse-checkout
> feature is required. The way this change detected if this
> behavior change is safe was through the skip_merge_working_tree()
> method. This method was complex and needed to be updated
> as new options were introduced.
>
> This behavior was essentially reverted by 65f099b ("switch:
> no worktree status unless real branch switch happens"
> 2019-03-29). Instead, two members of the checkout_opts struct
> were used to distinguish between 'git checkout' and 'git switch':
>
>     * switch_branch_doing_nothing_is_ok
>     * only_merge_on_switching_branches
>
> These settings have opposite values depending on if we start
> in cmd_checkout or cmd_switch.
>
> The message for 64f099b includes "Users of big repos are
> encouraged to move to switch." Making this change while
> 'git switch' is still experimental is too aggressive.
>
> Create a happy medium between these two options by making
> 'git checkout -b <branch>' behave just like 'git switch',
> but only if we read exactly those arguments. This must
> be done in cmd_checkout to avoid the arguments being
> consumed by the option parsing logic.
>
> This differs from the previous change by fa644d8 in that
> the config option checkout.optimizeNewBranch remains
> deleted. This means that 'git checkout -b' will ignore
> the index merge even if we have a sparse-checkout file.
> While this is a behavior change for 'git checkout -b',
> it matches the behavior of 'git switch -c'.
>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  builtin/checkout.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
>
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 6123f732a2..116200cf90 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -1713,6 +1713,15 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>         opts.overlay_mode = -1;
>         opts.checkout_index = -2;    /* default on */
>         opts.checkout_worktree = -2; /* default on */
> +
> +       if (argc == 3 && !strcmp(argv[1], "-b")) {
> +               /*
> +                * User ran 'git checkout -b <branch>' and expects
> +                * the same behavior as 'git switch -c <branch>'.
> +                */
> +               opts.switch_branch_doing_nothing_is_ok = 0;
> +               opts.only_merge_on_switching_branches = 1;
> +       }
>
>         options = parse_options_dup(checkout_options);
>         options = add_common_options(&opts, options);
> --
> gitgitgadget

Nice!  Thanks for doing this; a small and localized performance hack
is much nicer than a big and non-localized one.  I also appreciate the
detailed history in the commit message.

Just for fun, I tested on linux (with a relatively fast SSD) using a
simple git-bomb repo with 10M index entries but a sparse checkout of
just one file.  'git switch -c' takes approximately 0.004s before or
after this patch.  'git checkout -b' before this patch:

$ time git checkout -b newbranch1
Switched to a new branch 'newbranch1'

real    0m13.533s
user    0m9.824s
sys    0m2.828s


After this patch:

$ time git checkout -b newbranch2
Switched to a new branch 'newbranch2'

real    0m0.003s
user    0m0.000s
sys    0m0.000s


Anyway, looks good to me.
Phillip Wood Aug. 29, 2019, 6:54 p.m. UTC | #2
Hi Stolee

On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The 'git switch' command was created to separate half of the
> behavior of 'git checkout'. It specifically has the mode to
> do nothing with the index and working directory if the user
> only specifies to create a new branch and change HEAD to that
> branch. This is also the behavior most users expect from
> 'git checkout -b', but for historical reasons it also performs
> an index update by scanning the working directory. This can be
> slow for even moderately-sized repos.
> 
> A performance fix for 'git checkout -b' was introduced by
> fa655d8411 (checkout: optimize "git checkout -b <new_branch>"
> 2018-08-16). That change includes details about the config
> setting checkout.optimizeNewBranch when the sparse-checkout
> feature is required. The way this change detected if this
> behavior change is safe was through the skip_merge_working_tree()
> method. This method was complex and needed to be updated
> as new options were introduced.
> 
> This behavior was essentially reverted by 65f099b ("switch:
> no worktree status unless real branch switch happens"
> 2019-03-29). Instead, two members of the checkout_opts struct
> were used to distinguish between 'git checkout' and 'git switch':
> 
>      * switch_branch_doing_nothing_is_ok
>      * only_merge_on_switching_branches
> 
> These settings have opposite values depending on if we start
> in cmd_checkout or cmd_switch.
> 
> The message for 64f099b includes "Users of big repos are
> encouraged to move to switch." Making this change while
> 'git switch' is still experimental is too aggressive.
> 
> Create a happy medium between these two options by making
> 'git checkout -b <branch>' behave just like 'git switch',
> but only if we read exactly those arguments. This must
> be done in cmd_checkout to avoid the arguments being
> consumed by the option parsing logic.
> 
> This differs from the previous change by fa644d8 in that
> the config option checkout.optimizeNewBranch remains
> deleted. This means that 'git checkout -b' will ignore
> the index merge even if we have a sparse-checkout file.
> While this is a behavior change for 'git checkout -b',
> it matches the behavior of 'git switch -c'.
> 
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>   builtin/checkout.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
> 
> diff --git a/builtin/checkout.c b/builtin/checkout.c
> index 6123f732a2..116200cf90 100644
> --- a/builtin/checkout.c
> +++ b/builtin/checkout.c
> @@ -1713,6 +1713,15 @@ int cmd_checkout(int argc, const char **argv, const char *prefix)
>   	opts.overlay_mode = -1;
>   	opts.checkout_index = -2;    /* default on */
>   	opts.checkout_worktree = -2; /* default on */
> +	
> +	if (argc == 3 && !strcmp(argv[1], "-b")) {
> +		/*
> +		 * User ran 'git checkout -b <branch>' and expects

What if the user ran 'git checkout -b<branch>'? Then argc == 2.

Best Wishes

Phillip

> +		 * the same behavior as 'git switch -c <branch>'.
> +		 */
> +		opts.switch_branch_doing_nothing_is_ok = 0;
> +		opts.only_merge_on_switching_branches = 1;
> +	}
>   
>   	options = parse_options_dup(checkout_options);
>   	options = add_common_options(&opts, options);
>
Derrick Stolee Aug. 29, 2019, 8:07 p.m. UTC | #3
On 8/29/2019 2:54 PM, Phillip Wood wrote:
> Hi Stolee
> 
> On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
>> +   
>> +    if (argc == 3 && !strcmp(argv[1], "-b")) {
>> +        /*
>> +         * User ran 'git checkout -b <branch>' and expects
> 
> What if the user ran 'git checkout -b<branch>'? Then argc == 2.

Good catch. I'm tempted to say "don't do that" to keep this
simple. They won't have incorrect results, just slower than
the "with space" option.

However, if there is enough interest in correcting the "-b<branch>"
case, then I can make another attempt at this.

-Stolee
Pratyush Yadav Aug. 29, 2019, 8:30 p.m. UTC | #4
On 29/08/19 04:07PM, Derrick Stolee wrote:
> On 8/29/2019 2:54 PM, Phillip Wood wrote:
> > Hi Stolee
> > 
> > On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
> >> +   
> >> +    if (argc == 3 && !strcmp(argv[1], "-b")) {
> >> +        /*
> >> +         * User ran 'git checkout -b <branch>' and expects
> > 
> > What if the user ran 'git checkout -b<branch>'? Then argc == 2.
> 
> Good catch. I'm tempted to say "don't do that" to keep this
> simple. They won't have incorrect results, just slower than
> the "with space" option.
> 
> However, if there is enough interest in correcting the "-b<branch>"
> case, then I can make another attempt at this.
 
You can probably do this with:

  !strncmp(argv[1], "-b", 2)

The difference is so little, might as well do it IMO.
Pratyush Yadav Aug. 29, 2019, 9:40 p.m. UTC | #5
On 30/08/19 02:00AM, Pratyush Yadav wrote:
> On 29/08/19 04:07PM, Derrick Stolee wrote:
> > On 8/29/2019 2:54 PM, Phillip Wood wrote:
> > > Hi Stolee
> > > 
> > > On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
> > >> +   
> > >> +    if (argc == 3 && !strcmp(argv[1], "-b")) {
> > >> +        /*
> > >> +         * User ran 'git checkout -b <branch>' and expects
> > > 
> > > What if the user ran 'git checkout -b<branch>'? Then argc == 2.
> > 
> > Good catch. I'm tempted to say "don't do that" to keep this
> > simple. They won't have incorrect results, just slower than
> > the "with space" option.
> > 
> > However, if there is enough interest in correcting the "-b<branch>"
> > case, then I can make another attempt at this.
>  
> You can probably do this with:
> 
>   !strncmp(argv[1], "-b", 2)
> 
> The difference is so little, might as well do it IMO.
 
Actually, that is not correct. I took a quick look before writing this 
and missed the fact that argc == 3 is the bigger problem.

Thinking a little more about this, you can mix other options with 
checkout -b, like --track. You can also specify <start_point>.

Now I don't know enough about this optimization you are doing to know 
whether we need to optimize when these options are given, but at least 
for --track I don't see any reason not to.

So maybe you are better off using something like getopt() (warning: 
getopt modifies the input string so you probably want to duplicate it) 
if you want to support all cases. Though for this simple case you can 
probably get away by just directly scanning the argv list for "-b" 
(using strncmp instead of strcmp to account for "-b<branch-name>)
Elijah Newren Aug. 30, 2019, 12:19 a.m. UTC | #6
On Thu, Aug 29, 2019 at 2:42 PM Pratyush Yadav <me@yadavpratyush.com> wrote:
>
> On 30/08/19 02:00AM, Pratyush Yadav wrote:
> > On 29/08/19 04:07PM, Derrick Stolee wrote:
> > > On 8/29/2019 2:54 PM, Phillip Wood wrote:
> > > > Hi Stolee
> > > >
> > > > On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
> > > >> +
> > > >> +    if (argc == 3 && !strcmp(argv[1], "-b")) {
> > > >> +        /*
> > > >> +         * User ran 'git checkout -b <branch>' and expects
> > > >
> > > > What if the user ran 'git checkout -b<branch>'? Then argc == 2.
> > >
> > > Good catch. I'm tempted to say "don't do that" to keep this
> > > simple. They won't have incorrect results, just slower than
> > > the "with space" option.
> > >
> > > However, if there is enough interest in correcting the "-b<branch>"
> > > case, then I can make another attempt at this.
> >
> > You can probably do this with:
> >
> >   !strncmp(argv[1], "-b", 2)
> >
> > The difference is so little, might as well do it IMO.
>
> Actually, that is not correct. I took a quick look before writing this
> and missed the fact that argc == 3 is the bigger problem.
>
> Thinking a little more about this, you can mix other options with
> checkout -b, like --track. You can also specify <start_point>.
>
> Now I don't know enough about this optimization you are doing to know
> whether we need to optimize when these options are given, but at least
> for --track I don't see any reason not to.
>
> So maybe you are better off using something like getopt() (warning:
> getopt modifies the input string so you probably want to duplicate it)
> if you want to support all cases. Though for this simple case you can
> probably get away by just directly scanning the argv list for "-b"
> (using strncmp instead of strcmp to account for "-b<branch-name>)

NO.  This would be unsafe to use if <start_point> is specified.  I
think either -f or -m together with -b make no sense unless
<start_point> is specified, but if they do make sense separately, I'm
guessing this hack should not be used with those flags.  And
additional flags may appear in the future that should not be used
together with this hack.

Personally, although I understand the desire to support any possible
cases in general, *this is a performance hack*.  As such, it should be
as simple and localized as possible.  I don't think supporting
old-style stuck flags (-b$BRANCH) is worth complicating this.  I'm
even leery of adding support for --track (do any users of huge repos
use -b with --track?  Does anyone at all use --track anymore?  I'm not
sure I've ever seen any user use that flag in the last 10 years other
than myself.)  Besides, in the *worst* possible case, the command the
user specifies works just fine...it just takes a little longer.  My
opinion is that Stolee's patch is perfect as-is and should not be
generalized at all.

Just my $0.02,
Elijah
Taylor Blau Aug. 30, 2019, 12:43 a.m. UTC | #7
Hi Elijah,

On Thu, Aug 29, 2019 at 05:19:44PM -0700, Elijah Newren wrote:
> On Thu, Aug 29, 2019 at 2:42 PM Pratyush Yadav <me@yadavpratyush.com> wrote:
> >
> > On 30/08/19 02:00AM, Pratyush Yadav wrote:
> > > On 29/08/19 04:07PM, Derrick Stolee wrote:
> > > > On 8/29/2019 2:54 PM, Phillip Wood wrote:
> > > > > Hi Stolee
> > > > >
> > > > > On 29/08/2019 18:01, Derrick Stolee via GitGitGadget wrote:
> > > > >> +
> > > > >> +    if (argc == 3 && !strcmp(argv[1], "-b")) {
> > > > >> +        /*
> > > > >> +         * User ran 'git checkout -b <branch>' and expects
> > > > >
> > > > > What if the user ran 'git checkout -b<branch>'? Then argc == 2.
> > > >
> > > > Good catch. I'm tempted to say "don't do that" to keep this
> > > > simple. They won't have incorrect results, just slower than
> > > > the "with space" option.
> > > >
> > > > However, if there is enough interest in correcting the "-b<branch>"
> > > > case, then I can make another attempt at this.
> > >
> > > You can probably do this with:
> > >
> > >   !strncmp(argv[1], "-b", 2)
> > >
> > > The difference is so little, might as well do it IMO.
> >
> > Actually, that is not correct. I took a quick look before writing this
> > and missed the fact that argc == 3 is the bigger problem.
> >
> > Thinking a little more about this, you can mix other options with
> > checkout -b, like --track. You can also specify <start_point>.
> >
> > Now I don't know enough about this optimization you are doing to know
> > whether we need to optimize when these options are given, but at least
> > for --track I don't see any reason not to.
> >
> > So maybe you are better off using something like getopt() (warning:
> > getopt modifies the input string so you probably want to duplicate it)
> > if you want to support all cases. Though for this simple case you can
> > probably get away by just directly scanning the argv list for "-b"
> > (using strncmp instead of strcmp to account for "-b<branch-name>)
>
> NO.  This would be unsafe to use if <start_point> is specified.  I
> think either -f or -m together with -b make no sense unless
> <start_point> is specified, but if they do make sense separately, I'm
> guessing this hack should not be used with those flags.  And
> additional flags may appear in the future that should not be used
> together with this hack.
>
> Personally, although I understand the desire to support any possible
> cases in general, *this is a performance hack*.  As such, it should be
> as simple and localized as possible.  I don't think supporting
> old-style stuck flags (-b$BRANCH) is worth complicating this.  I'm
> even leery of adding support for --track (do any users of huge repos
> use -b with --track?  Does anyone at all use --track anymore?  I'm not
> sure I've ever seen any user use that flag in the last 10 years other
> than myself.)  Besides, in the *worst* possible case, the command the
> user specifies works just fine...it just takes a little longer.  My
> opinion is that Stolee's patch is perfect as-is and should not be
> generalized at all.

I wholeheartedly agree with this, and pledge my $.02 towards it as well.
Now with a combined total of $.04, I think that this patch is ready for
queueing as-is.

> Just my $0.02,
> Elijah

Thanks,
Taylor
Derrick Stolee Aug. 30, 2019, 4:56 p.m. UTC | #8
On 8/29/2019 8:43 PM, Taylor Blau wrote:
> Hi Elijah,
> 
> On Thu, Aug 29, 2019 at 05:19:44PM -0700, Elijah Newren wrote:
>> Personally, although I understand the desire to support any possible
>> cases in general, *this is a performance hack*.  As such, it should be
>> as simple and localized as possible.  I don't think supporting
>> old-style stuck flags (-b$BRANCH) is worth complicating this.  I'm
>> even leery of adding support for --track (do any users of huge repos
>> use -b with --track?  Does anyone at all use --track anymore?  I'm not
>> sure I've ever seen any user use that flag in the last 10 years other
>> than myself.)  Besides, in the *worst* possible case, the command the
>> user specifies works just fine...it just takes a little longer.  My
>> opinion is that Stolee's patch is perfect as-is and should not be
>> generalized at all.
> 
> I wholeheartedly agree with this, and pledge my $.02 towards it as well.
> Now with a combined total of $.04, I think that this patch is ready for
> queueing as-is.

Thanks, both!
Junio C Hamano Aug. 30, 2019, 5:18 p.m. UTC | #9
Taylor Blau <me@ttaylorr.com> writes:

> I wholeheartedly agree with this, and pledge my $.02 towards it as well.
> Now with a combined total of $.04, I think that this patch is ready for
> queueing as-is.

;-)

Patch
diff mbox series

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 6123f732a2..116200cf90 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1713,6 +1713,15 @@  int cmd_checkout(int argc, const char **argv, const char *prefix)
 	opts.overlay_mode = -1;
 	opts.checkout_index = -2;    /* default on */
 	opts.checkout_worktree = -2; /* default on */
+	
+	if (argc == 3 && !strcmp(argv[1], "-b")) {
+		/*
+		 * User ran 'git checkout -b <branch>' and expects
+		 * the same behavior as 'git switch -c <branch>'.
+		 */
+		opts.switch_branch_doing_nothing_is_ok = 0;
+		opts.only_merge_on_switching_branches = 1;
+	}
 
 	options = parse_options_dup(checkout_options);
 	options = add_common_options(&opts, options);