diff mbox series

[v2] receive-pack: fix stale packfile locks when dying

Message ID e1ee1d8026a361bc58d16bc741e2b347ada7a53e.1678431076.git.ps@pks.im (mailing list archive)
State Accepted
Commit c55c30669ced6e08b41b3c921f0da200247c9811
Headers show
Series [v2] receive-pack: fix stale packfile locks when dying | expand

Commit Message

Patrick Steinhardt March 10, 2023, 6:52 a.m. UTC
When accepting a packfile in git-receive-pack(1), we feed that packfile
into git-index-pack(1) to generate the packfile index. As the packfile
would often only contain unreachable objects until the references have
been updated, concurrently running garbage collection might be tempted
to delete the packfile right away and thus cause corruption. To fix
this, we ask git-index-pack(1) to create a `.keep` file before moving
the packfile into place, which is getting deleted again once all of the
reference updates have been processed.

Now in production systems we have observed that those `.keep` files are
sometimes not getting deleted as expected, where the result is that
repositories tend to grow packfiles that are never deleted over time.
This seems to be caused by a race when git-receive-pack(1) is killed
after we have migrated the kept packfile from the quarantine directory
into the main object database. While this race window is typically small
it can be extended for example by installing a `proc-receive` hook.

Fix this race by registering the lockfile as a tempfile so that it will
automatically be removed at exit or when receiving a signal.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/receive-pack.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

Comments

Jeff King March 10, 2023, 8:51 a.m. UTC | #1
On Fri, Mar 10, 2023 at 07:52:10AM +0100, Patrick Steinhardt wrote:

> Fix this race by registering the lockfile as a tempfile so that it will
> automatically be removed at exit or when receiving a signal.

Unsurprisingly, this version looks good to me. :)

Looks like you adjusted for index_pack_lockfile() returning NULL, which
makes sense.

>  builtin/receive-pack.c | 14 ++++++++++----
>  1 file changed, 10 insertions(+), 4 deletions(-)

There's no test here, and I think we _could_ make a reliable one with
something like:

  1. Have a proc-receive hook that signals via fifo that it's running,
     then pauses indefinitely.

  2. Start a push in the background, then wait on the fifo signal.

  3. Kill receive-pack.

But it's awfully complicated for little gain. I'm fine with not worrying
about it (and I did something similar manually to make to sure it works
as we expect).

-Peff
Patrick Steinhardt March 10, 2023, 11:49 a.m. UTC | #2
On Fri, Mar 10, 2023 at 03:51:29AM -0500, Jeff King wrote:
> On Fri, Mar 10, 2023 at 07:52:10AM +0100, Patrick Steinhardt wrote:
[snip]
> >  builtin/receive-pack.c | 14 ++++++++++----
> >  1 file changed, 10 insertions(+), 4 deletions(-)
> 
> There's no test here, and I think we _could_ make a reliable one with
> something like:
> 
>   1. Have a proc-receive hook that signals via fifo that it's running,
>      then pauses indefinitely.
> 
>   2. Start a push in the background, then wait on the fifo signal.
> 
>   3. Kill receive-pack.
> 
> But it's awfully complicated for little gain. I'm fine with not worrying
> about it (and I did something similar manually to make to sure it works
> as we expect).

I actually tried doing exactly what you're proposing here yesterday. It
caused hangs, lingering processes, you had to play games to not trigger
the &&-chain linter due to the backgrounding via &, and ultimately it
somehow just didn't work.

So at some point I came to the same conclusion as you did, that it's
just too complicated for little gain.

Thanks.

Patrick
Jeff King March 10, 2023, 1:20 p.m. UTC | #3
On Fri, Mar 10, 2023 at 12:49:04PM +0100, Patrick Steinhardt wrote:

> > There's no test here, and I think we _could_ make a reliable one with
> > something like:
> > 
> >   1. Have a proc-receive hook that signals via fifo that it's running,
> >      then pauses indefinitely.
> > 
> >   2. Start a push in the background, then wait on the fifo signal.
> > 
> >   3. Kill receive-pack.
> > 
> > But it's awfully complicated for little gain. I'm fine with not worrying
> > about it (and I did something similar manually to make to sure it works
> > as we expect).
> 
> I actually tried doing exactly what you're proposing here yesterday. It
> caused hangs, lingering processes, you had to play games to not trigger
> the &&-chain linter due to the backgrounding via &, and ultimately it
> somehow just didn't work.
> 
> So at some point I came to the same conclusion as you did, that it's
> just too complicated for little gain.

So I don't think we should apply this, because it's horrific. But
because I like a challenge, here is a test that should be race-free,
doesn't hang, and doesn't leave lingering processes (the blocking hook
waits for its parent to die).

diff --git a/t/t5400-send-pack.sh b/t/t5400-send-pack.sh
index 3f81f16e133..ca6950ef433 100755
--- a/t/t5400-send-pack.sh
+++ b/t/t5400-send-pack.sh
@@ -300,4 +300,37 @@ test_expect_success 'receive-pack de-dupes .have lines' '
 	test_cmp expect refs
 '
 
+test_expect_success PIPE 'receive-pack cleans up .keep after signal' '
+	git init --bare keep.git &&
+	write_script keep.git/hooks/proc-receive <<-\EOF &&
+	# tell the other side we are in the hook
+	echo running proc-receive >&9
+	# and then wait until our parent receive-pack exits
+	wait "$PPID"
+	EOF
+	git -C keep.git config receive.procReceiveRefs refs/ &&
+
+	mkfifo rpstatus &&
+	(
+		# this will fail because we kill receive-pack while it hangs,
+		# but we will not see its exit code until we "wait" below.
+		git push \
+			--receive-pack="echo \$\$ >&9; exec git-receive-pack" \
+			--all keep.git 9>rpstatus &
+	) &&
+
+	exec 9<rpstatus &&
+	read <&9 rp_pid &&
+	read <&9 ready_signal &&
+	kill -15 "$rp_pid" &&
+	# in theory we get the signal code from receive-pack here,
+	# but we are racing with send-pack to call wait(), so we
+	# might also get "not found" from wait
+	{ wait "$rp_pid"; OUT=$?; } &&
+	{ test_match_signal 15 $OUT || test "$OUT" = 127; } &&
+
+	find keep.git/objects/pack -name "*.keep" >actual &&
+	test_must_be_empty actual
+'
+
 test_done

It fails as expected without your patch (the .keep file is left in
"actual"), and passes with it.

-Peff
diff mbox series

Patch

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index cd5c7a28ef..b5369d7429 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2184,7 +2184,7 @@  static const char *parse_pack_header(struct pack_header *hdr)
 	}
 }
 
-static const char *pack_lockfile;
+static struct tempfile *pack_lockfile;
 
 static void push_header_arg(struct strvec *args, struct pack_header *hdr)
 {
@@ -2251,6 +2251,7 @@  static const char *unpack(int err_fd, struct shallow_info *si)
 			return "unpack-objects abnormal exit";
 	} else {
 		char hostname[HOST_NAME_MAX + 1];
+		char *lockfile;
 
 		strvec_pushl(&child.args, "index-pack", "--stdin", NULL);
 		push_header_arg(&child.args, &hdr);
@@ -2280,8 +2281,14 @@  static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out, NULL);
+
+		lockfile = index_pack_lockfile(child.out, NULL);
+		if (lockfile) {
+			pack_lockfile = register_tempfile(lockfile);
+			free(lockfile);
+		}
 		close(child.out);
+
 		status = finish_command(&child);
 		if (status)
 			return "index-pack abnormal exit";
@@ -2568,8 +2575,7 @@  int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		use_keepalive = KEEPALIVE_ALWAYS;
 		execute_commands(commands, unpack_status, &si,
 				 &push_options);
-		if (pack_lockfile)
-			unlink_or_warn(pack_lockfile);
+		delete_tempfile(&pack_lockfile);
 		sigchain_push(SIGPIPE, SIG_IGN);
 		if (report_status_v2)
 			report_v2(commands, unpack_status);