Message ID | 20181010193557.19052-1-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | revert & cherry-pick: run git gc --auto | expand |
Hi Ævar On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote: > Expand on the work started in 095c741edd ("commit: run git gc --auto > just before the post-commit hook", 2018-02-28) to run "gc --auto" in > more commands where new objects can be created. > > The notably missing commands are now "rebase" and "stash". Both are > being rewritten in C, so any use of "gc --auto" there can wait for > that. If cherry-pick, revert or 'rebase -i' edit the commit message then they fork 'git commit' so gc --auto will be run there anyway. I wonder if it would be better to call 'gc --auto' from sequencer.c at the end of a string of successful picks, that would cover cherry-pick, 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid calling 'gc --auto' until the very end, rather than every time we stop for an edit but that is probably more trouble than it is worth. Best Wishes Phillip > > Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> > --- > > After reading the "Users are encouraged to run this task..." paragraph > in the git-gc manpage I was wondering if due to gc --auto all over the > place now (including recently in git-commit with a patch of mine) if > we shouldn't change that advice. > > I'm meaning to send some doc changes to git-gc.txt, but in the > meantime let's address this low-hanging fruit of running gc --auto > when we revert or cherry-pick commits, which can like git-commit > create a significant amount of loose objects. > > builtin/revert.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/builtin/revert.c b/builtin/revert.c > index 9a66720cfc..1b20902910 100644 > --- a/builtin/revert.c > +++ b/builtin/revert.c > @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) > { > struct replay_opts opts = REPLAY_OPTS_INIT; > int res; > + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; > > if (isatty(0)) > opts.edit = 1; > @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) > res = run_sequencer(argc, argv, &opts); > if (res < 0) > die(_("revert failed")); > + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); > return res; > } > > @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) > { > struct replay_opts opts = REPLAY_OPTS_INIT; > int res; > + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; > > opts.action = REPLAY_PICK; > sequencer_init_config(&opts); > res = run_sequencer(argc, argv, &opts); > if (res < 0) > die(_("cherry-pick failed")); > + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); > return res; > } >
On Thu, Oct 11 2018, Phillip Wood wrote: > Hi Ævar > > On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote: >> Expand on the work started in 095c741edd ("commit: run git gc --auto >> just before the post-commit hook", 2018-02-28) to run "gc --auto" in >> more commands where new objects can be created. >> >> The notably missing commands are now "rebase" and "stash". Both are >> being rewritten in C, so any use of "gc --auto" there can wait for >> that. > > If cherry-pick, revert or 'rebase -i' edit the commit message then they > fork 'git commit' so gc --auto will be run there anyway. Yeah it seems I totally screwed up the testing for this patch, first it doesn't even compile because I'm not including run-command.h, I *did* fix that, but while wrangling a few things didn't commit that *sigh*. And yeah, there's some invocations where we now run gc --auto twice, i.e. if you do revert, but not revert --no-edit, and not on cherry-pick, but on cherry-pick --edit. So yeah, this really needs to be re-thought. > I wonder if it would be better to call 'gc --auto' from sequencer.c at > the end of a string of successful picks, that would cover cherry-pick, > 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid > calling 'gc --auto' until the very end, rather than every time we stop > for an edit but that is probably more trouble than it is worth. That seems a lot better indeed. I.e. running it from the sequencer. I do wonder if there should be some smarts about running it in the middle of a sequence, i.e. think of a case where we're rebasing 10k commits, which is a gc need similar to what happens in the middle of "git svn clone". So maybe something where we gc --auto in the sequencer for every Nth commit, and at the end. > >> >> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >> --- >> >> After reading the "Users are encouraged to run this task..." paragraph >> in the git-gc manpage I was wondering if due to gc --auto all over the >> place now (including recently in git-commit with a patch of mine) if >> we shouldn't change that advice. >> >> I'm meaning to send some doc changes to git-gc.txt, but in the >> meantime let's address this low-hanging fruit of running gc --auto >> when we revert or cherry-pick commits, which can like git-commit >> create a significant amount of loose objects. >> >> builtin/revert.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/builtin/revert.c b/builtin/revert.c >> index 9a66720cfc..1b20902910 100644 >> --- a/builtin/revert.c >> +++ b/builtin/revert.c >> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) >> { >> struct replay_opts opts = REPLAY_OPTS_INIT; >> int res; >> + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; >> >> if (isatty(0)) >> opts.edit = 1; >> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) >> res = run_sequencer(argc, argv, &opts); >> if (res < 0) >> die(_("revert failed")); >> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); >> return res; >> } >> >> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) >> { >> struct replay_opts opts = REPLAY_OPTS_INIT; >> int res; >> + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; >> >> opts.action = REPLAY_PICK; >> sequencer_init_config(&opts); >> res = run_sequencer(argc, argv, &opts); >> if (res < 0) >> die(_("cherry-pick failed")); >> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); >> return res; >> } >>
On Thu, Oct 11, 2018 at 12:08:47PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Oct 11 2018, Phillip Wood wrote: > > > Hi Ævar > > > > On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote: > >> Expand on the work started in 095c741edd ("commit: run git gc --auto > >> just before the post-commit hook", 2018-02-28) to run "gc --auto" in > >> more commands where new objects can be created. > >> > >> The notably missing commands are now "rebase" and "stash". Both are > >> being rewritten in C, so any use of "gc --auto" there can wait for > >> that. > > > > If cherry-pick, revert or 'rebase -i' edit the commit message then they > > fork 'git commit' so gc --auto will be run there anyway. > > Yeah it seems I totally screwed up the testing for this patch, first it > doesn't even compile because I'm not including run-command.h, I *did* > fix that, but while wrangling a few things didn't commit that *sigh*. > > And yeah, there's some invocations where we now run gc --auto twice, > i.e. if you do revert, but not revert --no-edit, and not on cherry-pick, > but on cherry-pick --edit. > > So yeah, this really needs to be re-thought. > > > I wonder if it would be better to call 'gc --auto' from sequencer.c at > > the end of a string of successful picks, that would cover cherry-pick, > > 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid > > calling 'gc --auto' until the very end, rather than every time we stop > > for an edit but that is probably more trouble than it is worth. > > That seems a lot better indeed. I.e. running it from the sequencer. I do > wonder if there should be some smarts about running it in the middle of > a sequence, i.e. think of a case where we're rebasing 10k commits, which > is a gc need similar to what happens in the middle of "git svn > clone". So maybe something where we gc --auto in the sequencer for every > Nth commit, and at the end. How would that affect setups with 'gc.autoDetach = false', or, more importantly, platforms, where 'git gc --auto' always runs in the foreground?
On Thu, Oct 11 2018, SZEDER Gábor wrote: > On Thu, Oct 11, 2018 at 12:08:47PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Thu, Oct 11 2018, Phillip Wood wrote: >> >> > Hi Ævar >> > >> > On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote: >> >> Expand on the work started in 095c741edd ("commit: run git gc --auto >> >> just before the post-commit hook", 2018-02-28) to run "gc --auto" in >> >> more commands where new objects can be created. >> >> >> >> The notably missing commands are now "rebase" and "stash". Both are >> >> being rewritten in C, so any use of "gc --auto" there can wait for >> >> that. >> > >> > If cherry-pick, revert or 'rebase -i' edit the commit message then they >> > fork 'git commit' so gc --auto will be run there anyway. >> >> Yeah it seems I totally screwed up the testing for this patch, first it >> doesn't even compile because I'm not including run-command.h, I *did* >> fix that, but while wrangling a few things didn't commit that *sigh*. >> >> And yeah, there's some invocations where we now run gc --auto twice, >> i.e. if you do revert, but not revert --no-edit, and not on cherry-pick, >> but on cherry-pick --edit. >> >> So yeah, this really needs to be re-thought. >> >> > I wonder if it would be better to call 'gc --auto' from sequencer.c at >> > the end of a string of successful picks, that would cover cherry-pick, >> > 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid >> > calling 'gc --auto' until the very end, rather than every time we stop >> > for an edit but that is probably more trouble than it is worth. >> >> That seems a lot better indeed. I.e. running it from the sequencer. I do >> wonder if there should be some smarts about running it in the middle of >> a sequence, i.e. think of a case where we're rebasing 10k commits, which >> is a gc need similar to what happens in the middle of "git svn >> clone". So maybe something where we gc --auto in the sequencer for every >> Nth commit, and at the end. > > How would that affect setups with 'gc.autoDetach = false', or, more > importantly, platforms, where 'git gc --auto' always runs in the > foreground? I see we define NO_POSIX_GOODIES on Windows/MinGW, so those don't demonize "gc", but then I'm confused by this which seems to imply the opposite: https://github.com/Microsoft/vscode/issues/29901 As far as the general UI question goes, I think if you define gc.autoDetach=true you're already OK with having "git fetch" and various commands that produce commits block, so I don't see a big difference in doing this in the middle of a rebase. But it seems (aside from the question of how this is done on Windows) that we demonize by default everywhere now, so I think it's OK to be less conservative about where we run gc. We also run a GC every 1000th commit in "git svn clone/rebase" already.
Hi Ævar On 11/10/2018 11:08, Ævar Arnfjörð Bjarmason wrote: > > On Thu, Oct 11 2018, Phillip Wood wrote: > >> Hi Ævar >> >> On 10/10/2018 20:35, Ævar Arnfjörð Bjarmason wrote: >>> Expand on the work started in 095c741edd ("commit: run git gc --auto >>> just before the post-commit hook", 2018-02-28) to run "gc --auto" in >>> more commands where new objects can be created. >>> >>> The notably missing commands are now "rebase" and "stash". Both are >>> being rewritten in C, so any use of "gc --auto" there can wait for >>> that. >> >> If cherry-pick, revert or 'rebase -i' edit the commit message then they >> fork 'git commit' so gc --auto will be run there anyway. > > Yeah it seems I totally screwed up the testing for this patch, first it > doesn't even compile because I'm not including run-command.h, I *did* > fix that, but while wrangling a few things didn't commit that *sigh*. > > And yeah, there's some invocations where we now run gc --auto twice, > i.e. if you do revert, but not revert --no-edit, and not on cherry-pick, > but on cherry-pick --edit. > > So yeah, this really needs to be re-thought. > >> I wonder if it would be better to call 'gc --auto' from sequencer.c at >> the end of a string of successful picks, that would cover cherry-pick, >> 'rebase -iu' and revert. With 'rebase -i' it might be nice to avoid >> calling 'gc --auto' until the very end, rather than every time we stop >> for an edit but that is probably more trouble than it is worth. > > That seems a lot better indeed. I.e. running it from the sequencer. I do > wonder if there should be some smarts about running it in the middle of > a sequence, i.e. think of a case where we're rebasing 10k commits, which > is a gc need similar to what happens in the middle of "git svn > clone". So maybe something where we gc --auto in the sequencer for every > Nth commit, and at the end. That sounds like a good idea. It would be nice if need_to_gc() was in libgit, then we could avoid the cost of forking unless we actually need to gc. Looking at builtin/gc.c there seem to be quite a few global variables so transforming it to library code may not be that straight forward. Best Wishes Phillip >> >>> >>> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> >>> --- >>> >>> After reading the "Users are encouraged to run this task..." paragraph >>> in the git-gc manpage I was wondering if due to gc --auto all over the >>> place now (including recently in git-commit with a patch of mine) if >>> we shouldn't change that advice. >>> >>> I'm meaning to send some doc changes to git-gc.txt, but in the >>> meantime let's address this low-hanging fruit of running gc --auto >>> when we revert or cherry-pick commits, which can like git-commit >>> create a significant amount of loose objects. >>> >>> builtin/revert.c | 4 ++++ >>> 1 file changed, 4 insertions(+) >>> >>> diff --git a/builtin/revert.c b/builtin/revert.c >>> index 9a66720cfc..1b20902910 100644 >>> --- a/builtin/revert.c >>> +++ b/builtin/revert.c >>> @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) >>> { >>> struct replay_opts opts = REPLAY_OPTS_INIT; >>> int res; >>> + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; >>> >>> if (isatty(0)) >>> opts.edit = 1; >>> @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) >>> res = run_sequencer(argc, argv, &opts); >>> if (res < 0) >>> die(_("revert failed")); >>> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); >>> return res; >>> } >>> >>> @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) >>> { >>> struct replay_opts opts = REPLAY_OPTS_INIT; >>> int res; >>> + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; >>> >>> opts.action = REPLAY_PICK; >>> sequencer_init_config(&opts); >>> res = run_sequencer(argc, argv, &opts); >>> if (res < 0) >>> die(_("cherry-pick failed")); >>> + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); >>> return res; >>> } >>>
On Thu, Oct 11, 2018 at 12:34:35PM +0200, Ævar Arnfjörð Bjarmason wrote: > I see we define NO_POSIX_GOODIES on Windows/MinGW, so those don't > demonize "gc", but then I'm confused by this which seems to imply the > opposite: https://github.com/Microsoft/vscode/issues/29901 I don't think it implies that. The last comment starts with "Code calls git fetch periodically". I presume that it does so in the background (to prevent blocking the UI until 'git fetch' runs), therefore 'git gc --auto' starts already in the background. Furthermore, notice that 'git prune' on that screenshot has two 'git.exe' parents: I think its parent is 'git gc --auto' and its grandparent is 'git fetch'. Now, if that 'git gc --auto' were to go to the background as a result of our daemonize(), then the grandparent 'git fetch' would have very likely exited already.
diff --git a/builtin/revert.c b/builtin/revert.c index 9a66720cfc..1b20902910 100644 --- a/builtin/revert.c +++ b/builtin/revert.c @@ -209,6 +209,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; int res; + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; if (isatty(0)) opts.edit = 1; @@ -217,6 +218,7 @@ int cmd_revert(int argc, const char **argv, const char *prefix) res = run_sequencer(argc, argv, &opts); if (res < 0) die(_("revert failed")); + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); return res; } @@ -224,11 +226,13 @@ int cmd_cherry_pick(int argc, const char **argv, const char *prefix) { struct replay_opts opts = REPLAY_OPTS_INIT; int res; + const char *argv_gc_auto[] = {"gc", "--auto", NULL}; opts.action = REPLAY_PICK; sequencer_init_config(&opts); res = run_sequencer(argc, argv, &opts); if (res < 0) die(_("cherry-pick failed")); + run_command_v_opt(argv_gc_auto, RUN_GIT_CMD); return res; }
Expand on the work started in 095c741edd ("commit: run git gc --auto just before the post-commit hook", 2018-02-28) to run "gc --auto" in more commands where new objects can be created. The notably missing commands are now "rebase" and "stash". Both are being rewritten in C, so any use of "gc --auto" there can wait for that. Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- After reading the "Users are encouraged to run this task..." paragraph in the git-gc manpage I was wondering if due to gc --auto all over the place now (including recently in git-commit with a patch of mine) if we shouldn't change that advice. I'm meaning to send some doc changes to git-gc.txt, but in the meantime let's address this low-hanging fruit of running gc --auto when we revert or cherry-pick commits, which can like git-commit create a significant amount of loose objects. builtin/revert.c | 4 ++++ 1 file changed, 4 insertions(+)