diff mbox series

fetch-pack: ignore SIGPIPE when writing to index-pack

Message ID YZgQD3lrw4+i4EMd@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 2a4aed42ecf6ee4ba8824423bd40010c3572aa0c
Headers show
Series fetch-pack: ignore SIGPIPE when writing to index-pack | expand

Commit Message

Jeff King Nov. 19, 2021, 8:58 p.m. UTC
When fetching, we send the incoming pack to index-pack (or
unpack-objects) via the sideband demuxer. If index-pack hits an error
(e.g., because an object fails fsck), then it will die immediately. This
may cause us to get SIGPIPE on the fetch, as we're still trying to write
pack contents from the sideband demuxer (which is typically a thread,
and thus takes down the whole fetch process).

You can see this in action with:

  ./t5702-protocol-v2.sh --stress --run=59

which ends with (wrapped for readability):

  test_must_fail: died by signal 13: git -c protocol.version=2 \
    -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \
    clone http://127.0.0.1:5708/smart/http_parent http_child
  not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object

This is mostly cosmetic. The actual error of interest (in this case, the
object that failed the fsck check) comes from index-pack straight to
stderr, so the user still sees it. They _might_ even see fetch-pack
complaining about index-pack failing, because the main thread is racing
with the sideband-demuxer. But they'll definitely see the signal death
in the exit code, which is what the test is complaining about.

We can make this more predictable by just ignoring SIGPIPE. The sideband
demuxer uses write_or_die(), so it will notice and stop (gracefully,
because we hook die_routine() to exit just the thread). And during this
section we're not writing anywhere else where we'd be concerned about
SIGPIPE preventing us from wasting effort writing to nowhere.

Signed-off-by: Jeff King <peff@peff.net>
---
I wondered if the receive-pack side would have a similar problem, but
there I think it's accepting the input directly from the network. So the
client-side push may see a premature hangup. But there the SIGPIPE goes
to pack-objects (which is writing straight to the network), and the
parent send-pack/push process detects this; see the comment near the
"141" check at the end of send-pack.c:pack_objects().

I cc'd Jonathan because it's his test, but really I think that is just
luck. AFAICT this would be a problem for any fetch where
transfer.fsckObjects detects a problem.

 fetch-pack.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Junio C Hamano Nov. 19, 2021, 10:21 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> When fetching, we send the incoming pack to index-pack (or
> unpack-objects) via the sideband demuxer. If index-pack hits an error
> (e.g., because an object fails fsck), then it will die immediately. This
> may cause us to get SIGPIPE on the fetch, as we're still trying to write
> pack contents from the sideband demuxer (which is typically a thread,
> and thus takes down the whole fetch process).

So, ... we'd die anyway and won't update the refs and anything that
leaves permanent damage to the repository either way, but we choose
a better way to die by not taking SIGPIPE, but to get an error from
one of the write()s or the final close(), which will lead us to more
"controlled" death using the normal error path?

> This is mostly cosmetic. The actual error of interest (in this case, the
> object that failed the fsck check) comes from index-pack straight to
> stderr, so the user still sees it. They _might_ even see fetch-pack
> complaining about index-pack failing, because the main thread is racing
> with the sideband-demuxer. But they'll definitely see the signal death
> in the exit code, which is what the test is complaining about.

OK.

> We can make this more predictable by just ignoring SIGPIPE. The sideband
> demuxer uses write_or_die(), so it will notice and stop (gracefully,
> because we hook die_routine() to exit just the thread). And during this
> section we're not writing anywhere else where we'd be concerned about
> SIGPIPE preventing us from wasting effort writing to nowhere.

OK.

> +#include "sigchain.h"
>  
>  static int transfer_unpack_limit = -1;
>  static int fetch_unpack_limit = -1;
> @@ -956,6 +957,8 @@ static int get_pack(struct fetch_pack_args *args,
>  			strvec_push(index_pack_args, cmd.args.v[i]);
>  	}
>  
> +	sigchain_push(SIGPIPE, SIG_IGN);
> +
>  	cmd.in = demux.out;
>  	cmd.git_cmd = 1;
>  	if (start_command(&cmd))
> @@ -986,6 +989,8 @@ static int get_pack(struct fetch_pack_args *args,
>  	if (use_sideband && finish_async(&demux))
>  		die(_("error in sideband demultiplexer"));
>  
> +	sigchain_pop(SIGPIPE);
> +
>  	/*
>  	 * Now that index-pack has succeeded, write the promisor file using the
>  	 * obtained .keep filename if necessary

Thanks.
Jeff King Nov. 19, 2021, 10:32 p.m. UTC | #2
On Fri, Nov 19, 2021 at 02:21:58PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > When fetching, we send the incoming pack to index-pack (or
> > unpack-objects) via the sideband demuxer. If index-pack hits an error
> > (e.g., because an object fails fsck), then it will die immediately. This
> > may cause us to get SIGPIPE on the fetch, as we're still trying to write
> > pack contents from the sideband demuxer (which is typically a thread,
> > and thus takes down the whole fetch process).
> 
> So, ... we'd die anyway and won't update the refs and anything that
> leaves permanent damage to the repository either way, but we choose
> a better way to die by not taking SIGPIPE, but to get an error from
> one of the write()s or the final close(), which will lead us to more
> "controlled" death using the normal error path?

Yes, exactly. We'll be exiting either way; it's just a matter of racily
changing the exit code and possibly the error message from the parent
fetch process.

So I do doubt this really hurts users much in practice, but making the
tests robust seems worth it to me (because I found it after tracking
down a flaky test failure). And because it _is_ a race in the code, I
fixed it there rather than papering over the SIGPIPE exit in the test
script.

-Peff
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index a9604f35a3..8fe3a49c1c 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -25,6 +25,7 @@ 
 #include "shallow.h"
 #include "commit-reach.h"
 #include "commit-graph.h"
+#include "sigchain.h"
 
 static int transfer_unpack_limit = -1;
 static int fetch_unpack_limit = -1;
@@ -956,6 +957,8 @@  static int get_pack(struct fetch_pack_args *args,
 			strvec_push(index_pack_args, cmd.args.v[i]);
 	}
 
+	sigchain_push(SIGPIPE, SIG_IGN);
+
 	cmd.in = demux.out;
 	cmd.git_cmd = 1;
 	if (start_command(&cmd))
@@ -986,6 +989,8 @@  static int get_pack(struct fetch_pack_args *args,
 	if (use_sideband && finish_async(&demux))
 		die(_("error in sideband demultiplexer"));
 
+	sigchain_pop(SIGPIPE);
+
 	/*
 	 * Now that index-pack has succeeded, write the promisor file using the
 	 * obtained .keep filename if necessary