diff mbox series

[v3,6/7] builtin/maintenance: add a `--detach` flag

Message ID 347d0a200201ce215f5b2c46d23de0cdd0181956.1723804990.git.ps@pks.im (mailing list archive)
State Accepted
Commit a6affd3343f143e5983036150bb8e95eb336cbc7
Headers show
Series builtin/maintenance: fix auto-detach with non-standard tasks | expand

Commit Message

Patrick Steinhardt Aug. 16, 2024, 10:45 a.m. UTC
Same as the preceding commit, add a `--[no-]detach` flag to the
git-maintenance(1) command. This will be used in a subsequent commit to
fix backgrounding of that command when configured with a non-standard
set of tasks.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/gc.c           |  6 ++++++
 t/t7900-maintenance.sh | 39 +++++++++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+)

Comments

Jeff King Aug. 17, 2024, 7:09 a.m. UTC | #1
On Fri, Aug 16, 2024 at 12:45:15PM +0200, Patrick Steinhardt wrote:

> +test_expect_success '--no-detach causes maintenance to not run in background' '
> [...]
> +		# We have no better way to check whether or not the task ran in
> +		# the background than to verify whether it output anything. The
> +		# next testcase checks the reverse, making this somewhat safer.
> +		git maintenance run --no-detach >out 2>&1 &&
> +		test_line_count = 1 out
> [...]
> +test_expect_success '--detach causes maintenance to run in background' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +
> +		test_commit something &&
> +		git config set maintenance.gc.enabled false &&
> +		git config set maintenance.loose-objects.enabled true &&
> +		git config set maintenance.loose-objects.auto 1 &&
> +		git config set maintenance.incremental-repack.enabled true &&
> +
> +		git maintenance run --detach >out 2>&1 &&
> +		test_must_be_empty out
> +	)
> +'

This second test seems to fail racily (or maybe always? see below). In
CI on Windows, I saw:

  'out' is not empty, it contains:
  fc9fea69579f349e3b02e3264cffbef03e4b1852

That would make sense to me if the detached process still held the
original stdout/stderr channel open (in which case we'd racily see the
same line as in the no-detach case). But we do appear to call
daemonize(), which closes both.

Curiously, the code in gc.c does this:

          /* Failure to daemonize is ok, we'll continue in foreground. */
          if (opts->detach > 0)
                  daemonize();

and the only way for daemonize to fail is if NO_POSIX_GOODIES is set.
Which I'd expect on Windows. But then I'd expect this test to _always_
fail on Windows. Does it? If so, should it be marked with !MINGW?

While investigating that, I ran it with --stress locally (on Linux) and
got some odd (and definitely racy) results. The test itself passes, but
the "rm -rf repo" in the test_when_finished sometimes fails with:

  rm: cannot remove 'repo/.git/objects': Directory not empty

or similar (sometimes it's another directory like 'repo/.git'). My guess
is that the background process is still running and creating files in
the repository, racing with rm's call to rmdir().

Even if we remove the test_when_finished, it would mean that the final
cleanup after test_done might similarly fail, leaving a crufty trash
directory. I think to make this robust, we'd need some way of detecting
when the background process has finished. I don't think we report the
pid anywhere, and the daemonize() call means it won't even be in the
same process group. Maybe we could spin looking for the incremental pack
it will create (and timeout after N seconds)? That feels pretty hacky,
but I can't think of anything better.

-Peff
Jeff King Aug. 17, 2024, 7:14 a.m. UTC | #2
On Sat, Aug 17, 2024 at 03:09:25AM -0400, Jeff King wrote:

> While investigating that, I ran it with --stress locally (on Linux) and
> got some odd (and definitely racy) results. The test itself passes, but
> the "rm -rf repo" in the test_when_finished sometimes fails with:
> 
>   rm: cannot remove 'repo/.git/objects': Directory not empty
> 
> or similar (sometimes it's another directory like 'repo/.git'). My guess
> is that the background process is still running and creating files in
> the repository, racing with rm's call to rmdir().
> 
> Even if we remove the test_when_finished, it would mean that the final
> cleanup after test_done might similarly fail, leaving a crufty trash
> directory. I think to make this robust, we'd need some way of detecting
> when the background process has finished. I don't think we report the
> pid anywhere, and the daemonize() call means it won't even be in the
> same process group. Maybe we could spin looking for the incremental pack
> it will create (and timeout after N seconds)? That feels pretty hacky,
> but I can't think of anything better.

Ah, I just noticed that a similar problem happened in t6500, discussed
during v2 of the series (I only looked at the latest version). I think
t7900 needs a similar run_and_wait_for_auto_gc() solution.

I suspect the Windows "out is not empty" issue is separate, though.

-Peff
Patrick Steinhardt Aug. 19, 2024, 6:17 a.m. UTC | #3
On Sat, Aug 17, 2024 at 03:09:24AM -0400, Jeff King wrote:
> On Fri, Aug 16, 2024 at 12:45:15PM +0200, Patrick Steinhardt wrote:
> 
> > +test_expect_success '--no-detach causes maintenance to not run in background' '
> > [...]
> > +		# We have no better way to check whether or not the task ran in
> > +		# the background than to verify whether it output anything. The
> > +		# next testcase checks the reverse, making this somewhat safer.
> > +		git maintenance run --no-detach >out 2>&1 &&
> > +		test_line_count = 1 out
> > [...]
> > +test_expect_success '--detach causes maintenance to run in background' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +
> > +		test_commit something &&
> > +		git config set maintenance.gc.enabled false &&
> > +		git config set maintenance.loose-objects.enabled true &&
> > +		git config set maintenance.loose-objects.auto 1 &&
> > +		git config set maintenance.incremental-repack.enabled true &&
> > +
> > +		git maintenance run --detach >out 2>&1 &&
> > +		test_must_be_empty out
> > +	)
> > +'
> 
> This second test seems to fail racily (or maybe always? see below). In
> CI on Windows, I saw:
> 
>   'out' is not empty, it contains:
>   fc9fea69579f349e3b02e3264cffbef03e4b1852
> 
> That would make sense to me if the detached process still held the
> original stdout/stderr channel open (in which case we'd racily see the
> same line as in the no-detach case). But we do appear to call
> daemonize(), which closes both.
> 
> Curiously, the code in gc.c does this:
> 
>           /* Failure to daemonize is ok, we'll continue in foreground. */
>           if (opts->detach > 0)
>                   daemonize();
> 
> and the only way for daemonize to fail is if NO_POSIX_GOODIES is set.
> Which I'd expect on Windows. But then I'd expect this test to _always_
> fail on Windows. Does it? If so, should it be marked with !MINGW?
> 
> While investigating that, I ran it with --stress locally (on Linux) and
> got some odd (and definitely racy) results. The test itself passes, but
> the "rm -rf repo" in the test_when_finished sometimes fails with:
> 
>   rm: cannot remove 'repo/.git/objects': Directory not empty
> 
> or similar (sometimes it's another directory like 'repo/.git'). My guess
> is that the background process is still running and creating files in
> the repository, racing with rm's call to rmdir().
> 
> Even if we remove the test_when_finished, it would mean that the final
> cleanup after test_done might similarly fail, leaving a crufty trash
> directory. I think to make this robust, we'd need some way of detecting
> when the background process has finished. I don't think we report the
> pid anywhere, and the daemonize() call means it won't even be in the
> same process group. Maybe we could spin looking for the incremental pack
> it will create (and timeout after N seconds)? That feels pretty hacky,
> but I can't think of anything better.

Oh, good catch indeed, I can reproduce this on Linux with `--stress`.

I think we can use the same workaround as we do in t6500, namely open a
new file descriptor, inherit it and then wait for the child process to
close it.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/builtin/gc.c b/builtin/gc.c
index 269a77960f..63106e2028 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -1426,6 +1426,10 @@  static int maintenance_run_tasks(struct maintenance_run_opts *opts,
 	}
 	free(lock_path);
 
+	/* Failure to daemonize is ok, we'll continue in foreground. */
+	if (opts->detach > 0)
+		daemonize();
+
 	for (i = 0; !found_selected && i < TASK__COUNT; i++)
 		found_selected = tasks[i].selected_order >= 0;
 
@@ -1552,6 +1556,8 @@  static int maintenance_run(int argc, const char **argv, const char *prefix)
 	struct option builtin_maintenance_run_options[] = {
 		OPT_BOOL(0, "auto", &opts.auto_flag,
 			 N_("run tasks based on the state of the repository")),
+		OPT_BOOL(0, "detach", &opts.detach,
+			 N_("perform maintenance in the background")),
 		OPT_CALLBACK(0, "schedule", &opts.schedule, N_("frequency"),
 			     N_("run tasks based on frequency"),
 			     maintenance_opt_schedule),
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 8595489ceb..771525aa4b 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -908,4 +908,43 @@  test_expect_success 'failed schedule prevents config change' '
 	done
 '
 
+test_expect_success '--no-detach causes maintenance to not run in background' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		# Prepare the repository such that git-maintenance(1) ends up
+		# outputting something.
+		test_commit something &&
+		git config set maintenance.gc.enabled false &&
+		git config set maintenance.loose-objects.enabled true &&
+		git config set maintenance.loose-objects.auto 1 &&
+		git config set maintenance.incremental-repack.enabled true &&
+
+		# We have no better way to check whether or not the task ran in
+		# the background than to verify whether it output anything. The
+		# next testcase checks the reverse, making this somewhat safer.
+		git maintenance run --no-detach >out 2>&1 &&
+		test_line_count = 1 out
+	)
+'
+
+test_expect_success '--detach causes maintenance to run in background' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+
+		test_commit something &&
+		git config set maintenance.gc.enabled false &&
+		git config set maintenance.loose-objects.enabled true &&
+		git config set maintenance.loose-objects.auto 1 &&
+		git config set maintenance.incremental-repack.enabled true &&
+
+		git maintenance run --detach >out 2>&1 &&
+		test_must_be_empty out
+	)
+'
+
 test_done