Message ID | ca78d3dc7c0270b434ee4ca4ef618212c7dc1d5b.1723712608.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | builtin/maintenance: fix auto-detach with non-standard tasks | expand |
Patrick Steinhardt <ps@pks.im> writes: > +test_expect_success '--detach overrides gc.autoDetach=false' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + > + # Prepare the repository such that git-gc(1) ends up repacking. > + test_commit "$(test_oid blob17_1)" && > + test_commit "$(test_oid blob17_2)" && > + git config gc.autodetach false && > + git config gc.auto 2 && > + > + cat >expect <<-EOF && > + Auto packing the repository in background for optimum performance. > + See "git help gc" for manual housekeeping. > + EOF > + GIT_PROGRESS_DELAY=0 git gc --auto --detach 2>actual && > + test_cmp expect actual > + ) > +' If the gc/maintenance is going to background itself, it is possible that it still is running, possibly with files under repo/.git/ open and the process running in repo directory, when the test_when_finished clean-up trap goes in effect? I am wondering where this comes from: https://github.com/git/git/actions/runs/10408467351/job/28825980833#step:6:2000 where "rm -rf repo" dies with an unusual rm: can't remove 'repo/.git': Directory not empty and my theory is that after "rm -rf" _thinks_ it removed everything underneath, before it attempts to rmdir("repo/.git"), the repack process in the background has created a new pack, and "rm -rf" does not go back and try to create such a new cruft. The most robust way to work around such a "race" is to wait for the backgrounded process before cleaning up, or after seeing that the message we use as a signal that the "gc" has backgrounded itself, kill that backgrounded process before exiting the test and causing the clean-up to trigger.
Junio C Hamano <gitster@pobox.com> writes: > Patrick Steinhardt <ps@pks.im> writes: > >> +test_expect_success '--detach overrides gc.autoDetach=false' ' >> + test_when_finished "rm -rf repo" && >> + git init repo && >> + ( >> + cd repo && >> + >> + # Prepare the repository such that git-gc(1) ends up repacking. >> + test_commit "$(test_oid blob17_1)" && >> + test_commit "$(test_oid blob17_2)" && >> + git config gc.autodetach false && >> + git config gc.auto 2 && >> + >> + cat >expect <<-EOF && >> + Auto packing the repository in background for optimum performance. >> + See "git help gc" for manual housekeeping. >> + EOF >> + GIT_PROGRESS_DELAY=0 git gc --auto --detach 2>actual && >> + test_cmp expect actual >> + ) >> +' > > If the gc/maintenance is going to background itself, it is possible > that it still is running, possibly with files under repo/.git/ open > and the process running in repo directory, when the test_when_finished > clean-up trap goes in effect? > > I am wondering where this comes from: > > https://github.com/git/git/actions/runs/10408467351/job/28825980833#step:6:2000 > > where "rm -rf repo" dies with an unusual > > rm: can't remove 'repo/.git': Directory not empty > > and my theory is that after "rm -rf" _thinks_ it removed everything > underneath, before it attempts to rmdir("repo/.git"), the repack > process in the background has created a new pack, and "rm -rf" does > not go back and try to create such a new cruft. > > The most robust way to work around such a "race" is to wait for the > backgrounded process before cleaning up, or after seeing that the > message we use as a signal that the "gc" has backgrounded itself, > kill that backgrounded process before exiting the test and causing > the clean-up to trigger. There already is a clue left by those who worked on this test the last time at the end of the script. It says: # DO NOT leave a detached auto gc process running near the end of the # test script: it can run long enough in the background to racily # interfere with the cleanup in 'test_done'. immediately before "test_done". In the meantime, I am wondering something simple and silly like the attached is sufficient. The idea is that we expect the "oops we couldn't clean" code not to trigger most of the time, but if it does, we just wait (with back off) a bit and retry. t/t6500-gc.sh | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git c/t/t6500-gc.sh w/t/t6500-gc.sh index 737c99e0f8..4a991e087a 100755 --- c/t/t6500-gc.sh +++ w/t/t6500-gc.sh @@ -396,8 +396,22 @@ test_expect_success 'background auto gc respects lock for all operations' ' test_cmp expect actual ' +wait_to_clean () { + count=10 sleep=1 + until rm -rf "$1" && ! test -d "$1" + do + if test $count = 0 + then + return 1 + fi + count=$(( count - 1 )) + sleep=$(( sleep + sleep )) + sleep $sleep + done +} + test_expect_success '--detach overrides gc.autoDetach=false' ' - test_when_finished "rm -rf repo" && + test_when_finished "wait_to_clean repo" && git init repo && ( cd repo && @@ -418,7 +432,7 @@ test_expect_success '--detach overrides gc.autoDetach=false' ' ' test_expect_success '--no-detach overrides gc.autoDetach=true' ' - test_when_finished "rm -rf repo" && + test_when_finished "wait_to_clean repo" && git init repo && ( cd repo &&
On Thu, Aug 15, 2024 at 03:29:20PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > Patrick Steinhardt <ps@pks.im> writes: > > > >> +test_expect_success '--detach overrides gc.autoDetach=false' ' > >> + test_when_finished "rm -rf repo" && > >> + git init repo && > >> + ( > >> + cd repo && > >> + > >> + # Prepare the repository such that git-gc(1) ends up repacking. > >> + test_commit "$(test_oid blob17_1)" && > >> + test_commit "$(test_oid blob17_2)" && > >> + git config gc.autodetach false && > >> + git config gc.auto 2 && > >> + > >> + cat >expect <<-EOF && > >> + Auto packing the repository in background for optimum performance. > >> + See "git help gc" for manual housekeeping. > >> + EOF > >> + GIT_PROGRESS_DELAY=0 git gc --auto --detach 2>actual && > >> + test_cmp expect actual > >> + ) > >> +' > > > > If the gc/maintenance is going to background itself, it is possible > > that it still is running, possibly with files under repo/.git/ open > > and the process running in repo directory, when the test_when_finished > > clean-up trap goes in effect? > > > > I am wondering where this comes from: > > > > https://github.com/git/git/actions/runs/10408467351/job/28825980833#step:6:2000 > > > > where "rm -rf repo" dies with an unusual > > > > rm: can't remove 'repo/.git': Directory not empty > > > > and my theory is that after "rm -rf" _thinks_ it removed everything > > underneath, before it attempts to rmdir("repo/.git"), the repack > > process in the background has created a new pack, and "rm -rf" does > > not go back and try to create such a new cruft. > > > > The most robust way to work around such a "race" is to wait for the > > backgrounded process before cleaning up, or after seeing that the > > message we use as a signal that the "gc" has backgrounded itself, > > kill that backgrounded process before exiting the test and causing > > the clean-up to trigger. > > There already is a clue left by those who worked on this test the > last time at the end of the script. It says: > > # DO NOT leave a detached auto gc process running near the end of the > # test script: it can run long enough in the background to racily > # interfere with the cleanup in 'test_done'. > > immediately before "test_done". > > In the meantime, I am wondering something simple and silly like the > attached is sufficient. The idea is that we expect the "oops we > couldn't clean" code not to trigger most of the time, but if it > does, we just wait (with back off) a bit and retry. Ah, indeed, that is a problem. We already have a better tool to fix this with `run_and_wait_for_auto_gc()`. It creates a separate file descriptor and waits for it to close via some shell trickery, which will only happen once the child process of git-gc(1) has exited. Will fix, thanks! Patrick
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt index b5561c458a..370e22faae 100644 --- a/Documentation/git-gc.txt +++ b/Documentation/git-gc.txt @@ -9,7 +9,7 @@ git-gc - Cleanup unnecessary files and optimize the local repository SYNOPSIS -------- [verse] -'git gc' [--aggressive] [--auto] [--quiet] [--prune=<date> | --no-prune] [--force] [--keep-largest-pack] +'git gc' [--aggressive] [--auto] [--[no-]detach] [--quiet] [--prune=<date> | --no-prune] [--force] [--keep-largest-pack] DESCRIPTION ----------- @@ -53,6 +53,9 @@ configuration options such as `gc.auto` and `gc.autoPackLimit`, all other housekeeping tasks (e.g. rerere, working trees, reflog...) will be performed as well. +--[no-]detach:: + Run in the background if the system supports it. This option overrides + the `gc.autoDetach` config. --[no-]cruft:: When expiring unreachable objects, pack them separately into a diff --git a/builtin/gc.c b/builtin/gc.c index f815557081..269a77960f 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -242,9 +242,13 @@ static enum schedule_priority parse_schedule(const char *value) struct maintenance_run_opts { int auto_flag; + int detach; int quiet; enum schedule_priority schedule; }; +#define MAINTENANCE_RUN_OPTS_INIT { \ + .detach = -1, \ +} static int pack_refs_condition(UNUSED struct gc_config *cfg) { @@ -664,7 +668,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix) int keep_largest_pack = -1; timestamp_t dummy; struct child_process rerere_cmd = CHILD_PROCESS_INIT; - struct maintenance_run_opts opts = {0}; + struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT; struct gc_config cfg = GC_CONFIG_INIT; const char *prune_expire_sentinel = "sentinel"; const char *prune_expire_arg = prune_expire_sentinel; @@ -681,6 +685,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix) OPT_BOOL(0, "aggressive", &aggressive, N_("be more thorough (increased runtime)")), OPT_BOOL_F(0, "auto", &opts.auto_flag, N_("enable auto-gc mode"), PARSE_OPT_NOCOMPLETE), + OPT_BOOL(0, "detach", &opts.detach, + N_("perform garbage collection in the background")), OPT_BOOL_F(0, "force", &force, N_("force running gc even if there may be another gc running"), PARSE_OPT_NOCOMPLETE), @@ -729,6 +735,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix) strvec_push(&repack, "-q"); if (opts.auto_flag) { + if (cfg.detach_auto && opts.detach < 0) + opts.detach = 1; + /* * Auto-gc should be least intrusive as possible. */ @@ -738,38 +747,12 @@ int cmd_gc(int argc, const char **argv, const char *prefix) } if (!quiet) { - if (cfg.detach_auto) + if (opts.detach > 0) fprintf(stderr, _("Auto packing the repository in background for optimum performance.\n")); else fprintf(stderr, _("Auto packing the repository for optimum performance.\n")); fprintf(stderr, _("See \"git help gc\" for manual housekeeping.\n")); } - if (cfg.detach_auto) { - ret = report_last_gc_error(); - if (ret == 1) { - /* Last gc --auto failed. Skip this one. */ - ret = 0; - goto out; - - } else if (ret) { - /* an I/O error occurred, already reported */ - goto out; - } - - if (lock_repo_for_gc(force, &pid)) { - ret = 0; - goto out; - } - - gc_before_repack(&opts, &cfg); /* dies on failure */ - delete_tempfile(&pidfile); - - /* - * failure to daemonize is ok, we'll continue - * in foreground - */ - daemonized = !daemonize(); - } } else { struct string_list keep_pack = STRING_LIST_INIT_NODUP; @@ -784,6 +767,33 @@ int cmd_gc(int argc, const char **argv, const char *prefix) string_list_clear(&keep_pack, 0); } + if (opts.detach > 0) { + ret = report_last_gc_error(); + if (ret == 1) { + /* Last gc --auto failed. Skip this one. */ + ret = 0; + goto out; + + } else if (ret) { + /* an I/O error occurred, already reported */ + goto out; + } + + if (lock_repo_for_gc(force, &pid)) { + ret = 0; + goto out; + } + + gc_before_repack(&opts, &cfg); /* dies on failure */ + delete_tempfile(&pidfile); + + /* + * failure to daemonize is ok, we'll continue + * in foreground + */ + daemonized = !daemonize(); + } + name = lock_repo_for_gc(force, &pid); if (name) { if (opts.auto_flag) { @@ -1537,7 +1547,7 @@ static int task_option_parse(const struct option *opt UNUSED, static int maintenance_run(int argc, const char **argv, const char *prefix) { int i; - struct maintenance_run_opts opts; + struct maintenance_run_opts opts = MAINTENANCE_RUN_OPTS_INIT; struct gc_config cfg = GC_CONFIG_INIT; struct option builtin_maintenance_run_options[] = { OPT_BOOL(0, "auto", &opts.auto_flag, @@ -1554,8 +1564,6 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) }; int ret; - memset(&opts, 0, sizeof(opts)); - opts.quiet = !isatty(2); for (i = 0; i < TASK__COUNT; i++) diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh index 1b5909d1b7..737c99e0f8 100755 --- a/t/t6500-gc.sh +++ b/t/t6500-gc.sh @@ -396,6 +396,45 @@ test_expect_success 'background auto gc respects lock for all operations' ' test_cmp expect actual ' +test_expect_success '--detach overrides gc.autoDetach=false' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + # Prepare the repository such that git-gc(1) ends up repacking. + test_commit "$(test_oid blob17_1)" && + test_commit "$(test_oid blob17_2)" && + git config gc.autodetach false && + git config gc.auto 2 && + + cat >expect <<-EOF && + Auto packing the repository in background for optimum performance. + See "git help gc" for manual housekeeping. + EOF + GIT_PROGRESS_DELAY=0 git gc --auto --detach 2>actual && + test_cmp expect actual + ) +' + +test_expect_success '--no-detach overrides gc.autoDetach=true' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + # Prepare the repository such that git-gc(1) ends up repacking. + test_commit "$(test_oid blob17_1)" && + test_commit "$(test_oid blob17_2)" && + git config gc.autodetach true && + git config gc.auto 2 && + + GIT_PROGRESS_DELAY=0 git gc --auto --no-detach 2>output && + test_grep "Auto packing the repository for optimum performance." output && + test_grep "Collecting referenced commits: 2, done." output + ) +' + # DO NOT leave a detached auto gc process running near the end of the # test script: it can run long enough in the background to racily # interfere with the cleanup in 'test_done'.
When running `git gc --auto`, the command will by default detach and continue running in the background. This behaviour can be tweaked via the `gc.autoDetach` config, but not via a command line switch. We need that in a subsequent commit though, where git-maintenance(1) will want to ask its git-gc(1) child process to not detach anymore. Add a `--[no-]detach` flag that does this for us. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- Documentation/git-gc.txt | 5 ++- builtin/gc.c | 70 ++++++++++++++++++++++------------------ t/t6500-gc.sh | 39 ++++++++++++++++++++++ 3 files changed, 82 insertions(+), 32 deletions(-)