Message ID | 347d0a200201ce215f5b2c46d23de0cdd0181956.1723804990.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | a6affd3343f143e5983036150bb8e95eb336cbc7 |
Headers | show |
Series | builtin/maintenance: fix auto-detach with non-standard tasks | expand |
On Fri, Aug 16, 2024 at 12:45:15PM +0200, Patrick Steinhardt wrote: > +test_expect_success '--no-detach causes maintenance to not run in background' ' > [...] > + # We have no better way to check whether or not the task ran in > + # the background than to verify whether it output anything. The > + # next testcase checks the reverse, making this somewhat safer. > + git maintenance run --no-detach >out 2>&1 && > + test_line_count = 1 out > [...] > +test_expect_success '--detach causes maintenance to run in background' ' > + test_when_finished "rm -rf repo" && > + git init repo && > + ( > + cd repo && > + > + test_commit something && > + git config set maintenance.gc.enabled false && > + git config set maintenance.loose-objects.enabled true && > + git config set maintenance.loose-objects.auto 1 && > + git config set maintenance.incremental-repack.enabled true && > + > + git maintenance run --detach >out 2>&1 && > + test_must_be_empty out > + ) > +' This second test seems to fail racily (or maybe always? see below). In CI on Windows, I saw: 'out' is not empty, it contains: fc9fea69579f349e3b02e3264cffbef03e4b1852 That would make sense to me if the detached process still held the original stdout/stderr channel open (in which case we'd racily see the same line as in the no-detach case). But we do appear to call daemonize(), which closes both. Curiously, the code in gc.c does this: /* Failure to daemonize is ok, we'll continue in foreground. */ if (opts->detach > 0) daemonize(); and the only way for daemonize to fail is if NO_POSIX_GOODIES is set. Which I'd expect on Windows. But then I'd expect this test to _always_ fail on Windows. Does it? If so, should it be marked with !MINGW? While investigating that, I ran it with --stress locally (on Linux) and got some odd (and definitely racy) results. The test itself passes, but the "rm -rf repo" in the test_when_finished sometimes fails with: rm: cannot remove 'repo/.git/objects': Directory not empty or similar (sometimes it's another directory like 'repo/.git'). My guess is that the background process is still running and creating files in the repository, racing with rm's call to rmdir(). Even if we remove the test_when_finished, it would mean that the final cleanup after test_done might similarly fail, leaving a crufty trash directory. I think to make this robust, we'd need some way of detecting when the background process has finished. I don't think we report the pid anywhere, and the daemonize() call means it won't even be in the same process group. Maybe we could spin looking for the incremental pack it will create (and timeout after N seconds)? That feels pretty hacky, but I can't think of anything better. -Peff
On Sat, Aug 17, 2024 at 03:09:25AM -0400, Jeff King wrote: > While investigating that, I ran it with --stress locally (on Linux) and > got some odd (and definitely racy) results. The test itself passes, but > the "rm -rf repo" in the test_when_finished sometimes fails with: > > rm: cannot remove 'repo/.git/objects': Directory not empty > > or similar (sometimes it's another directory like 'repo/.git'). My guess > is that the background process is still running and creating files in > the repository, racing with rm's call to rmdir(). > > Even if we remove the test_when_finished, it would mean that the final > cleanup after test_done might similarly fail, leaving a crufty trash > directory. I think to make this robust, we'd need some way of detecting > when the background process has finished. I don't think we report the > pid anywhere, and the daemonize() call means it won't even be in the > same process group. Maybe we could spin looking for the incremental pack > it will create (and timeout after N seconds)? That feels pretty hacky, > but I can't think of anything better. Ah, I just noticed that a similar problem happened in t6500, discussed during v2 of the series (I only looked at the latest version). I think t7900 needs a similar run_and_wait_for_auto_gc() solution. I suspect the Windows "out is not empty" issue is separate, though. -Peff
On Sat, Aug 17, 2024 at 03:09:24AM -0400, Jeff King wrote: > On Fri, Aug 16, 2024 at 12:45:15PM +0200, Patrick Steinhardt wrote: > > > +test_expect_success '--no-detach causes maintenance to not run in background' ' > > [...] > > + # We have no better way to check whether or not the task ran in > > + # the background than to verify whether it output anything. The > > + # next testcase checks the reverse, making this somewhat safer. > > + git maintenance run --no-detach >out 2>&1 && > > + test_line_count = 1 out > > [...] > > +test_expect_success '--detach causes maintenance to run in background' ' > > + test_when_finished "rm -rf repo" && > > + git init repo && > > + ( > > + cd repo && > > + > > + test_commit something && > > + git config set maintenance.gc.enabled false && > > + git config set maintenance.loose-objects.enabled true && > > + git config set maintenance.loose-objects.auto 1 && > > + git config set maintenance.incremental-repack.enabled true && > > + > > + git maintenance run --detach >out 2>&1 && > > + test_must_be_empty out > > + ) > > +' > > This second test seems to fail racily (or maybe always? see below). In > CI on Windows, I saw: > > 'out' is not empty, it contains: > fc9fea69579f349e3b02e3264cffbef03e4b1852 > > That would make sense to me if the detached process still held the > original stdout/stderr channel open (in which case we'd racily see the > same line as in the no-detach case). But we do appear to call > daemonize(), which closes both. > > Curiously, the code in gc.c does this: > > /* Failure to daemonize is ok, we'll continue in foreground. */ > if (opts->detach > 0) > daemonize(); > > and the only way for daemonize to fail is if NO_POSIX_GOODIES is set. > Which I'd expect on Windows. But then I'd expect this test to _always_ > fail on Windows. Does it? If so, should it be marked with !MINGW? > > While investigating that, I ran it with --stress locally (on Linux) and > got some odd (and definitely racy) results. The test itself passes, but > the "rm -rf repo" in the test_when_finished sometimes fails with: > > rm: cannot remove 'repo/.git/objects': Directory not empty > > or similar (sometimes it's another directory like 'repo/.git'). My guess > is that the background process is still running and creating files in > the repository, racing with rm's call to rmdir(). > > Even if we remove the test_when_finished, it would mean that the final > cleanup after test_done might similarly fail, leaving a crufty trash > directory. I think to make this robust, we'd need some way of detecting > when the background process has finished. I don't think we report the > pid anywhere, and the daemonize() call means it won't even be in the > same process group. Maybe we could spin looking for the incremental pack > it will create (and timeout after N seconds)? That feels pretty hacky, > but I can't think of anything better. Oh, good catch indeed, I can reproduce this on Linux with `--stress`. I think we can use the same workaround as we do in t6500, namely open a new file descriptor, inherit it and then wait for the child process to close it. Thanks! Patrick
diff --git a/builtin/gc.c b/builtin/gc.c index 269a77960f..63106e2028 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -1426,6 +1426,10 @@ static int maintenance_run_tasks(struct maintenance_run_opts *opts, } free(lock_path); + /* Failure to daemonize is ok, we'll continue in foreground. */ + if (opts->detach > 0) + daemonize(); + for (i = 0; !found_selected && i < TASK__COUNT; i++) found_selected = tasks[i].selected_order >= 0; @@ -1552,6 +1556,8 @@ static int maintenance_run(int argc, const char **argv, const char *prefix) struct option builtin_maintenance_run_options[] = { OPT_BOOL(0, "auto", &opts.auto_flag, N_("run tasks based on the state of the repository")), + OPT_BOOL(0, "detach", &opts.detach, + N_("perform maintenance in the background")), OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"), N_("run tasks based on frequency"), maintenance_opt_schedule), diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh index 8595489ceb..771525aa4b 100755 --- a/t/t7900-maintenance.sh +++ b/t/t7900-maintenance.sh @@ -908,4 +908,43 @@ test_expect_success 'failed schedule prevents config change' ' done ' +test_expect_success '--no-detach causes maintenance to not run in background' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + # Prepare the repository such that git-maintenance(1) ends up + # outputting something. + test_commit something && + git config set maintenance.gc.enabled false && + git config set maintenance.loose-objects.enabled true && + git config set maintenance.loose-objects.auto 1 && + git config set maintenance.incremental-repack.enabled true && + + # We have no better way to check whether or not the task ran in + # the background than to verify whether it output anything. The + # next testcase checks the reverse, making this somewhat safer. + git maintenance run --no-detach >out 2>&1 && + test_line_count = 1 out + ) +' + +test_expect_success '--detach causes maintenance to run in background' ' + test_when_finished "rm -rf repo" && + git init repo && + ( + cd repo && + + test_commit something && + git config set maintenance.gc.enabled false && + git config set maintenance.loose-objects.enabled true && + git config set maintenance.loose-objects.auto 1 && + git config set maintenance.incremental-repack.enabled true && + + git maintenance run --detach >out 2>&1 && + test_must_be_empty out + ) +' + test_done
Same as the preceding commit, add a `--[no-]detach` flag to the git-maintenance(1) command. This will be used in a subsequent commit to fix backgrounding of that command when configured with a non-standard set of tasks. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- builtin/gc.c | 6 ++++++ t/t7900-maintenance.sh | 39 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 45 insertions(+)