diff mbox series

revert & cherry-pick: run git gc --auto

Message ID 20181010193557.19052-1-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series revert & cherry-pick: run git gc --auto | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 10, 2018, 7:35 p.m. UTC
Expand on the work started in 095c741edd ("commit: run git gc --auto
just before the post-commit hook", 2018-02-28) to run "gc --auto" in
more commands where new objects can be created.

The notably missing commands are now "rebase" and "stash". Both are
being rewritten in C, so any use of "gc --auto" there can wait for
that.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

After reading the "Users are encouraged to run this task..." paragraph
in the git-gc manpage I was wondering if due to gc --auto all over the
place now (including recently in git-commit with a patch of mine) if
we shouldn't change that advice.

I'm meaning to send some doc changes to git-gc.txt, but in the
meantime let's address this low-hanging fruit of running gc --auto
when we revert or cherry-pick commits, which can like git-commit
create a significant amount of loose objects.

 builtin/revert.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Phillip Wood Oct. 11, 2018, 9:23 a.m. UTC | #1
Hi Ævar

On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
> Expand on the work started in 095c741edd ("commit: run git gc --auto
> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
> more commands where new objects can be created.
> 
> The notably missing commands are now "rebase" and "stash". Both are
> being rewritten in C, so any use of "gc --auto" there can wait for
> that.

If cherry-pick, revert or 'rebase -i' edit the commit message then they
fork 'git commit' so gc --auto will be run there anyway. I wonder if it
would be better to call 'gc --auto' from sequencer.c at the end of a
string of successful picks, that would cover cherry-pick, 'rebase -iu'
and revert. With 'rebase -i' it might be nice to avoid calling 'gc
--auto' until the very end, rather than every time we stop for an edit
but that is probably more trouble than it is worth.

Best Wishes

Phillip

> 
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
> 
> After reading the "Users are encouraged to run this task..." paragraph
> in the git-gc manpage I was wondering if due to gc --auto all over the
> place now (including recently in git-commit with a patch of mine) if
> we shouldn't change that advice.
> 
> I'm meaning to send some doc changes to git-gc.txt, but in the
> meantime let's address this low-hanging fruit of running gc --auto
> when we revert or cherry-pick commits, which can like git-commit
> create a significant amount of loose objects.
> 
>  builtin/revert.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/builtin/revert.c b/builtin/revert.c
> index 9a66720cfc..1b20902910 100644
> --- a/builtin/revert.c
> +++ b/builtin/revert.c
> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>  {
>  	struct replay_opts opts = REPLAY_OPTS_INIT;
>  	int res;
> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>  
>  	if (isatty(0))
>  		opts.edit = 1;
> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>  	res = run_sequencer(argc, argv, &opts);
>  	if (res < 0)
>  		die(_("revert failed"));
> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>  	return res;
>  }
>  
> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
>  {
>  	struct replay_opts opts = REPLAY_OPTS_INIT;
>  	int res;
> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>  
>  	opts.action = REPLAY_PICK;
>  	sequencer_init_config(&opts);
>  	res = run_sequencer(argc, argv, &opts);
>  	if (res < 0)
>  		die(_("cherry-pick failed"));
> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>  	return res;
>  }
>
Ævar Arnfjörð Bjarmason Oct. 11, 2018, 10:08 a.m. UTC | #2
On Thu, Oct 11 2018, Phillip Wood wrote:

> Hi Ævar
>
> On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
>> Expand on the work started in 095c741edd ("commit: run git gc --auto
>> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
>> more commands where new objects can be created.
>>
>> The notably missing commands are now "rebase" and "stash". Both are
>> being rewritten in C, so any use of "gc --auto" there can wait for
>> that.
>
> If cherry-pick, revert or 'rebase -i' edit the commit message then they
> fork 'git commit' so gc --auto will be run there anyway.

Yeah it seems I totally screwed up the testing for this patch, first it
doesn't even compile because I'm not including run-command.h, I *did*
fix that, but while wrangling a few things didn't commit that *sigh*.

And yeah, there's some invocations where we now run gc --auto twice,
i.e. if you do revert, but not revert --no-edit, and not on cherry-pick,
but on cherry-pick --edit.

So yeah, this really needs to be re-thought.

> I wonder if it would be better to call 'gc --auto' from sequencer.c at
> the end of a string of successful picks, that would cover cherry-pick,
> 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid
> calling 'gc --auto' until the very end, rather than every time we stop
> for an edit but that is probably more trouble than it is worth.

That seems a lot better indeed. I.e. running it from the sequencer. I do
wonder if there should be some smarts about running it in the middle of
a sequence, i.e. think of a case where we're rebasing 10k commits, which
is a gc need similar to what happens in the middle of "git svn
clone". So maybe something where we gc --auto in the sequencer for every
Nth commit, and at the end.

>
>>
>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>> ---
>>
>> After reading the "Users are encouraged to run this task..." paragraph
>> in the git-gc manpage I was wondering if due to gc --auto all over the
>> place now (including recently in git-commit with a patch of mine) if
>> we shouldn't change that advice.
>>
>> I'm meaning to send some doc changes to git-gc.txt, but in the
>> meantime let's address this low-hanging fruit of running gc --auto
>> when we revert or cherry-pick commits, which can like git-commit
>> create a significant amount of loose objects.
>>
>>  builtin/revert.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/builtin/revert.c b/builtin/revert.c
>> index 9a66720cfc..1b20902910 100644
>> --- a/builtin/revert.c
>> +++ b/builtin/revert.c
>> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>>  {
>>  	struct replay_opts opts = REPLAY_OPTS_INIT;
>>  	int res;
>> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>>
>>  	if (isatty(0))
>>  		opts.edit = 1;
>> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>>  	res = run_sequencer(argc, argv, &opts);
>>  	if (res < 0)
>>  		die(_("revert failed"));
>> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>>  	return res;
>>  }
>>
>> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
>>  {
>>  	struct replay_opts opts = REPLAY_OPTS_INIT;
>>  	int res;
>> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>>
>>  	opts.action = REPLAY_PICK;
>>  	sequencer_init_config(&opts);
>>  	res = run_sequencer(argc, argv, &opts);
>>  	if (res < 0)
>>  		die(_("cherry-pick failed"));
>> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>>  	return res;
>>  }
>>
SZEDER Gábor Oct. 11, 2018, 10:25 a.m. UTC | #3
On Thu, Oct 11, 2018 at 12:08:47PM +0200, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 11 2018, Phillip Wood wrote:
> 
> > Hi Ævar
> >
> > On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
> >> Expand on the work started in 095c741edd ("commit: run git gc --auto
> >> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
> >> more commands where new objects can be created.
> >>
> >> The notably missing commands are now "rebase" and "stash". Both are
> >> being rewritten in C, so any use of "gc --auto" there can wait for
> >> that.
> >
> > If cherry-pick, revert or 'rebase -i' edit the commit message then they
> > fork 'git commit' so gc --auto will be run there anyway.
> 
> Yeah it seems I totally screwed up the testing for this patch, first it
> doesn't even compile because I'm not including run-command.h, I *did*
> fix that, but while wrangling a few things didn't commit that *sigh*.
> 
> And yeah, there's some invocations where we now run gc --auto twice,
> i.e. if you do revert, but not revert --no-edit, and not on cherry-pick,
> but on cherry-pick --edit.
> 
> So yeah, this really needs to be re-thought.
> 
> > I wonder if it would be better to call 'gc --auto' from sequencer.c at
> > the end of a string of successful picks, that would cover cherry-pick,
> > 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid
> > calling 'gc --auto' until the very end, rather than every time we stop
> > for an edit but that is probably more trouble than it is worth.
> 
> That seems a lot better indeed. I.e. running it from the sequencer. I do
> wonder if there should be some smarts about running it in the middle of
> a sequence, i.e. think of a case where we're rebasing 10k commits, which
> is a gc need similar to what happens in the middle of "git svn
> clone". So maybe something where we gc --auto in the sequencer for every
> Nth commit, and at the end.

How would that affect setups with 'gc.autoDetach = false', or, more
importantly, platforms, where 'git gc --auto' always runs in the
foreground?
Ævar Arnfjörð Bjarmason Oct. 11, 2018, 10:34 a.m. UTC | #4
On Thu, Oct 11 2018, SZEDER Gábor wrote:

> On Thu, Oct 11, 2018 at 12:08:47PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> On Thu, Oct 11 2018, Phillip Wood wrote:
>>
>> > Hi Ævar
>> >
>> > On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
>> >> Expand on the work started in 095c741edd ("commit: run git gc --auto
>> >> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
>> >> more commands where new objects can be created.
>> >>
>> >> The notably missing commands are now "rebase" and "stash". Both are
>> >> being rewritten in C, so any use of "gc --auto" there can wait for
>> >> that.
>> >
>> > If cherry-pick, revert or 'rebase -i' edit the commit message then they
>> > fork 'git commit' so gc --auto will be run there anyway.
>>
>> Yeah it seems I totally screwed up the testing for this patch, first it
>> doesn't even compile because I'm not including run-command.h, I *did*
>> fix that, but while wrangling a few things didn't commit that *sigh*.
>>
>> And yeah, there's some invocations where we now run gc --auto twice,
>> i.e. if you do revert, but not revert --no-edit, and not on cherry-pick,
>> but on cherry-pick --edit.
>>
>> So yeah, this really needs to be re-thought.
>>
>> > I wonder if it would be better to call 'gc --auto' from sequencer.c at
>> > the end of a string of successful picks, that would cover cherry-pick,
>> > 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid
>> > calling 'gc --auto' until the very end, rather than every time we stop
>> > for an edit but that is probably more trouble than it is worth.
>>
>> That seems a lot better indeed. I.e. running it from the sequencer. I do
>> wonder if there should be some smarts about running it in the middle of
>> a sequence, i.e. think of a case where we're rebasing 10k commits, which
>> is a gc need similar to what happens in the middle of "git svn
>> clone". So maybe something where we gc --auto in the sequencer for every
>> Nth commit, and at the end.
>
> How would that affect setups with 'gc.autoDetach = false', or, more
> importantly, platforms, where 'git gc --auto' always runs in the
> foreground?

I see we define NO_POSIX_GOODIES on Windows/MinGW, so those don't
demonize "gc", but then I'm confused by this which seems to imply the
opposite: https://github.com/Microsoft/vscode/issues/29901

As far as the general UI question goes, I think if you define
gc.autoDetach=true you're already OK with having "git fetch" and various
commands that produce commits block, so I don't see a big difference in
doing this in the middle of a rebase.

But it seems (aside from the question of how this is done on Windows)
that we demonize by default everywhere now, so I think it's OK to be
less conservative about where we run gc.

We also run a GC every 1000th commit in "git svn clone/rebase" already.
Phillip Wood Oct. 11, 2018, 11:14 a.m. UTC | #5
Hi Ævar
On 11/10/2018 11:08, Ævar Arnfjörð Bjarmason wrote:
> 
> On Thu, Oct 11 2018, Phillip Wood wrote:
> 
>> Hi Ævar
>>
>> On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote:
>>> Expand on the work started in 095c741edd ("commit: run git gc --auto
>>> just before the post-commit hook", 2018-02-28) to run "gc --auto" in
>>> more commands where new objects can be created.
>>>
>>> The notably missing commands are now "rebase" and "stash". Both are
>>> being rewritten in C, so any use of "gc --auto" there can wait for
>>> that.
>>
>> If cherry-pick, revert or 'rebase -i' edit the commit message then they
>> fork 'git commit' so gc --auto will be run there anyway.
> 
> Yeah it seems I totally screwed up the testing for this patch, first it
> doesn't even compile because I'm not including run-command.h, I *did*
> fix that, but while wrangling a few things didn't commit that *sigh*.
> 
> And yeah, there's some invocations where we now run gc --auto twice,
> i.e. if you do revert, but not revert --no-edit, and not on cherry-pick,
> but on cherry-pick --edit.
> 
> So yeah, this really needs to be re-thought.
> 
>> I wonder if it would be better to call 'gc --auto' from sequencer.c at
>> the end of a string of successful picks, that would cover cherry-pick,
>> 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid
>> calling 'gc --auto' until the very end, rather than every time we stop
>> for an edit but that is probably more trouble than it is worth.
> 
> That seems a lot better indeed. I.e. running it from the sequencer. I do
> wonder if there should be some smarts about running it in the middle of
> a sequence, i.e. think of a case where we're rebasing 10k commits, which
> is a gc need similar to what happens in the middle of "git svn
> clone". So maybe something where we gc --auto in the sequencer for every
> Nth commit, and at the end.

That sounds like a good idea. It would be nice if need_to_gc() was in 
libgit, then we could avoid the cost of forking unless we actually need 
to gc. Looking at builtin/gc.c there seem to be quite a few global 
variables so transforming it to library code may not be that straight 
forward.

Best Wishes

Phillip

>>
>>>
>>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
>>> ---
>>>
>>> After reading the "Users are encouraged to run this task..." paragraph
>>> in the git-gc manpage I was wondering if due to gc --auto all over the
>>> place now (including recently in git-commit with a patch of mine) if
>>> we shouldn't change that advice.
>>>
>>> I'm meaning to send some doc changes to git-gc.txt, but in the
>>> meantime let's address this low-hanging fruit of running gc --auto
>>> when we revert or cherry-pick commits, which can like git-commit
>>> create a significant amount of loose objects.
>>>
>>>   builtin/revert.c | 4 ++++
>>>   1 file changed, 4 insertions(+)
>>>
>>> diff --git a/builtin/revert.c b/builtin/revert.c
>>> index 9a66720cfc..1b20902910 100644
>>> --- a/builtin/revert.c
>>> +++ b/builtin/revert.c
>>> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>>>   {
>>>   	struct replay_opts opts = REPLAY_OPTS_INIT;
>>>   	int res;
>>> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>>>
>>>   	if (isatty(0))
>>>   		opts.edit = 1;
>>> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix)
>>>   	res = run_sequencer(argc, argv, &opts);
>>>   	if (res < 0)
>>>   		die(_("revert failed"));
>>> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>>>   	return res;
>>>   }
>>>
>>> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
>>>   {
>>>   	struct replay_opts opts = REPLAY_OPTS_INIT;
>>>   	int res;
>>> +	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
>>>
>>>   	opts.action = REPLAY_PICK;
>>>   	sequencer_init_config(&opts);
>>>   	res = run_sequencer(argc, argv, &opts);
>>>   	if (res < 0)
>>>   		die(_("cherry-pick failed"));
>>> +	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
>>>   	return res;
>>>   }
>>>
SZEDER Gábor Oct. 11, 2018, 11:24 a.m. UTC | #6
On Thu, Oct 11, 2018 at 12:34:35PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I see we define NO_POSIX_GOODIES on Windows/MinGW, so those don't
> demonize "gc", but then I'm confused by this which seems to imply the
> opposite: https://github.com/Microsoft/vscode/issues/29901

I don't think it implies that.

The last comment starts with "Code calls git fetch periodically".  I
presume that it does so in the background (to prevent blocking the UI
until 'git fetch' runs), therefore 'git gc --auto' starts already in
the background.  Furthermore, notice that 'git prune' on that
screenshot has two 'git.exe' parents: I think its parent is 'git gc
--auto' and its grandparent is 'git fetch'.  Now, if that 'git gc
--auto' were to go to the background as a result of our daemonize(),
then the grandparent 'git fetch' would have very likely exited
already.
diff mbox series

Patch

diff --git a/builtin/revert.c b/builtin/revert.c
index 9a66720cfc..1b20902910 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -209,6 +209,7 @@  int cmd_revert(int argc, const char **argv, const char *prefix)
 {
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int res;
+	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
 
 	if (isatty(0))
 		opts.edit = 1;
@@ -217,6 +218,7 @@  int cmd_revert(int argc, const char **argv, const char *prefix)
 	res = run_sequencer(argc, argv, &opts);
 	if (res < 0)
 		die(_("revert failed"));
+	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
 	return res;
 }
 
@@ -224,11 +226,13 @@  int cmd_cherry_pick(int argc, const char **argv, const char *prefix)
 {
 	struct replay_opts opts = REPLAY_OPTS_INIT;
 	int res;
+	const char *argv_gc_auto[] = {"gc", "--auto", NULL};
 
 	opts.action = REPLAY_PICK;
 	sequencer_init_config(&opts);
 	res = run_sequencer(argc, argv, &opts);
 	if (res < 0)
 		die(_("cherry-pick failed"));
+	run_command_v_opt(argv_gc_auto, RUN_GIT_CMD);
 	return res;
 }