diff mbox series

receive-pack: fix stale packfile locks when dying

Message ID e16bd81bf9e251aa6959fbe10a3fbc215a4a1c12.1678367338.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series receive-pack: fix stale packfile locks when dying | expand

Commit Message

Patrick Steinhardt March 9, 2023, 1:09 p.m. UTC
When accepting a packfile in git-receive-pack(1), we feed that packfile
into git-index-pack(1) to generate the packfile index. As the packfile
would often only contain unreachable objects until the references have
been updated, concurrently running garbage collection might be tempted
to delete the packfile right away and thus cause corruption. To fix
this, we ask git-index-pack(1) to create a `.keep` file before moving
the packfile into place, which is getting deleted again once all of the
reference updates have been processed.

Now in production systems we have observed that those `.keep` files are
sometimes not getting deleted as expected, where the result is that
repositories tend to grow packfiles that are never deleted over time.
This seems to be caused by a race when git-receive-pack(1) is killed
after we have migrated the kept packfile from the quarantine directory
into the main object database. While this race window is typically small
it can be extended for example by installing a `proc-receive` hook.

Fix this race by installing an atexit(3P) handler that unlinks the keep
file.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 builtin/receive-pack.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Jeff King March 9, 2023, 3:59 p.m. UTC | #1
On Thu, Mar 09, 2023 at 02:09:23PM +0100, Patrick Steinhardt wrote:

> Now in production systems we have observed that those `.keep` files are
> sometimes not getting deleted as expected, where the result is that
> repositories tend to grow packfiles that are never deleted over time.
> This seems to be caused by a race when git-receive-pack(1) is killed
> after we have migrated the kept packfile from the quarantine directory
> into the main object database. While this race window is typically small
> it can be extended for example by installing a `proc-receive` hook.

That makes sense, and I think this is a good direction.

> Fix this race by installing an atexit(3P) handler that unlinks the keep
> file.

This will work if we call die(), but I think you'd be better off using
the tempfile subsystem:

  - this patch doesn't handle signal death, and I don't see any reason
    you wouldn't want to handle it there (in fact, from your
    description, it sounds like signal death is the culprit you suspect)

  - this will double-unlink in most cases; once when we intend to after
    calling execute_commands(), and then it will try again (and
    presumably fail) at exit. Probably not a huge deal, but kind of
    ugly. You could set it to NULL after unlinking, but...

  - as the variable is not marked as volatile, a signal that causes an
    exit could cause the handler to see an inconsistent state if you
    modify it after setting up the handler. The tempfile code gets this
    right and is pretty battle-tested.

I think you'd just want something like this (totally untested):

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index cd5c7a28eff..22bbce573e9 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2184,7 +2184,7 @@ static const char *parse_pack_header(struct pack_header *hdr)
 	}
 }
 
-static const char *pack_lockfile;
+static struct tempfile *pack_lockfile;
 
 static void push_header_arg(struct strvec *args, struct pack_header *hdr)
 {
@@ -2198,6 +2198,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 	const char *hdr_err;
 	int status;
 	struct child_process child = CHILD_PROCESS_INIT;
+	char *lockfile;
 	int fsck_objects = (receive_fsck_objects >= 0
 			    ? receive_fsck_objects
 			    : transfer_fsck_objects >= 0
@@ -2280,7 +2281,9 @@ static const char *unpack(int err_fd, struct shallow_info *si)
 		status = start_command(&child);
 		if (status)
 			return "index-pack fork failed";
-		pack_lockfile = index_pack_lockfile(child.out, NULL);
+		lockfile = index_pack_lockfile(child.out, NULL);
+		pack_lockfile = register_tempfile(lockfile);
+		free(lockfile);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)
@@ -2568,8 +2571,7 @@ int cmd_receive_pack(int argc, const char **argv, const char *prefix)
 		use_keepalive = KEEPALIVE_ALWAYS;
 		execute_commands(commands, unpack_status, &si,
 				 &push_options);
-		if (pack_lockfile)
-			unlink_or_warn(pack_lockfile);
+		delete_tempfile(&pack_lockfile);
 		sigchain_push(SIGPIPE, SIG_IGN);
 		if (report_status_v2)
 			report_v2(commands, unpack_status);

The unconditional call to delete_tempfile() should be OK. If we don't
have a file (because we did unpack-objects instead), then it's a noop.

I think one could also make an argument that index_pack_lockfile()
should return a tempfile struct itself, but I didn't look too closely at
the other caller on the fetch side (but it should be conceptually the
same).

-Peff
Taylor Blau March 9, 2023, 6:26 p.m. UTC | #2
On Thu, Mar 09, 2023 at 02:09:23PM +0100, Patrick Steinhardt wrote:
> Fix this race by installing an atexit(3P) handler that unlinks the keep
> file.

This reminded me of a discussion that I thought you and I had a few
months ago on the list about whether or not it was safe to call unlink()
in an async signal handler.

TL;DR, it is, and the link back to that discussion is here:

  https://lore.kernel.org/git/YdjBkZsnYd+zYne1@nand.local/

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  builtin/receive-pack.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index cd5c7a28ef..0a6030d775 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2186,6 +2186,12 @@ static const char *parse_pack_header(struct pack_header *hdr)
>
>  static const char *pack_lockfile;
>
> +static void unlink_pack_lockfile(void)
> +{
> +	if (pack_lockfile)
> +		unlink(pack_lockfile);
> +}
> +

...and I think that this would all work, but I agree that using the
tempfile API here (as Peff suggests below) would probably be more
ergonomic.

Thanks,
Taylor
Junio C Hamano March 9, 2023, 6:48 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index cd5c7a28ef..0a6030d775 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -2186,6 +2186,12 @@ static const char *parse_pack_header(struct pack_header *hdr)
>  
>  static const char *pack_lockfile;
>  
> +static void unlink_pack_lockfile(void)
> +{
> +	if (pack_lockfile)
> +		unlink(pack_lockfile);
> +}
> +
>  static void push_header_arg(struct strvec *args, struct pack_header *hdr)
>  {
>  	strvec_pushf(args, "--pack_header=%"PRIu32",%"PRIu32,
> @@ -2281,6 +2287,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>  		if (status)
>  			return "index-pack fork failed";
>  		pack_lockfile = index_pack_lockfile(child.out, NULL);
> +		atexit(unlink_pack_lockfile);

Hmph, why isn't this a straight application of tempfile API?
Junio C Hamano March 9, 2023, 6:49 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

>> @@ -2281,6 +2287,7 @@ static const char *unpack(int err_fd, struct shallow_info *si)
>>  		if (status)
>>  			return "index-pack fork failed";
>>  		pack_lockfile = index_pack_lockfile(child.out, NULL);
>> +		atexit(unlink_pack_lockfile);
>
> Hmph, why isn't this a straight application of tempfile API?

Ah, sorry, that has already been raised in the thread.  I should
have checked first.

Sorry for the noise.
Patrick Steinhardt March 10, 2023, 6:24 a.m. UTC | #5
On Thu, Mar 09, 2023 at 10:59:12AM -0500, Jeff King wrote:
> On Thu, Mar 09, 2023 at 02:09:23PM +0100, Patrick Steinhardt wrote:
> 
> > Now in production systems we have observed that those `.keep` files are
> > sometimes not getting deleted as expected, where the result is that
> > repositories tend to grow packfiles that are never deleted over time.
> > This seems to be caused by a race when git-receive-pack(1) is killed
> > after we have migrated the kept packfile from the quarantine directory
> > into the main object database. While this race window is typically small
> > it can be extended for example by installing a `proc-receive` hook.
> 
> That makes sense, and I think this is a good direction.
> 
> > Fix this race by installing an atexit(3P) handler that unlinks the keep
> > file.
> 
> This will work if we call die(), but I think you'd be better off using
> the tempfile subsystem:
> 
>   - this patch doesn't handle signal death, and I don't see any reason
>     you wouldn't want to handle it there (in fact, from your
>     description, it sounds like signal death is the culprit you suspect)
> 
>   - this will double-unlink in most cases; once when we intend to after
>     calling execute_commands(), and then it will try again (and
>     presumably fail) at exit. Probably not a huge deal, but kind of
>     ugly. You could set it to NULL after unlinking, but...
> 
>   - as the variable is not marked as volatile, a signal that causes an
>     exit could cause the handler to see an inconsistent state if you
>     modify it after setting up the handler. The tempfile code gets this
>     right and is pretty battle-tested.

Ah, I didn't know that you can easily register an already-existing file
as tempfile. That is indeed much nicer, thanks!

> I think one could also make an argument that index_pack_lockfile()
> should return a tempfile struct itself, but I didn't look too closely at
> the other caller on the fetch side (but it should be conceptually the
> same).

I had a look at it, but git-fetch-pack(1) works quite differently in
that regard as it also supports the case where the packfile lock should
stay locked after it exits via the `--keep` switch. So the logic is more
intricate here.

Furthermore, git-fetch-pack(1) only does the locking, but never unlocks
the packfiles. That is instead handled by git-fetch(1). So converting
the interface to use tempfiles directly wouldn't work as we are crossing
process boundaries here.

And last but not least, git-fetch(1) already knows to unlock packs both
via an atexit handler and via a signal handler. So there is nothing to
be done here.

Patrick
Jeff King March 10, 2023, 8:37 a.m. UTC | #6
On Fri, Mar 10, 2023 at 07:24:36AM +0100, Patrick Steinhardt wrote:

> > I think one could also make an argument that index_pack_lockfile()
> > should return a tempfile struct itself, but I didn't look too closely at
> > the other caller on the fetch side (but it should be conceptually the
> > same).
> 
> I had a look at it, but git-fetch-pack(1) works quite differently in
> that regard as it also supports the case where the packfile lock should
> stay locked after it exits via the `--keep` switch. So the logic is more
> intricate here.
> 
> Furthermore, git-fetch-pack(1) only does the locking, but never unlocks
> the packfiles. That is instead handled by git-fetch(1). So converting
> the interface to use tempfiles directly wouldn't work as we are crossing
> process boundaries here.

I think the calls into fetch-pack.c that handle the pack lockfiles can
happen in-process from git-fetch itself. But I also think there are
probably cases where they don't (v0 git-over-http should use a separate
"fetch-pack --stateless-rpc", I believe).

So yeah, it's probably too complicated to worry about lumping in here,
especially since you noted that it handles cleanup correctly already.

Thanks for looking into it.

-Peff
diff mbox series

Patch

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index cd5c7a28ef..0a6030d775 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -2186,6 +2186,12 @@  static const char *parse_pack_header(struct pack_header *hdr)
 
 static const char *pack_lockfile;
 
+static void unlink_pack_lockfile(void)
+{
+	if (pack_lockfile)
+		unlink(pack_lockfile);
+}
+
 static void push_header_arg(struct strvec *args, struct pack_header *hdr)
 {
 	strvec_pushf(args, "--pack_header=%"PRIu32",%"PRIu32,
@@ -2281,6 +2287,7 @@  static const char *unpack(int err_fd, struct shallow_info *si)
 		if (status)
 			return "index-pack fork failed";
 		pack_lockfile = index_pack_lockfile(child.out, NULL);
+		atexit(unlink_pack_lockfile);
 		close(child.out);
 		status = finish_command(&child);
 		if (status)