Message ID | YZgQD3lrw4+i4EMd@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 2a4aed42ecf6ee4ba8824423bd40010c3572aa0c |
Headers | show |
Series | fetch-pack: ignore SIGPIPE when writing to index-pack | expand |
Jeff King <peff@peff.net> writes: > When fetching, we send the incoming pack to index-pack (or > unpack-objects) via the sideband demuxer. If index-pack hits an error > (e.g., because an object fails fsck), then it will die immediately. This > may cause us to get SIGPIPE on the fetch, as we're still trying to write > pack contents from the sideband demuxer (which is typically a thread, > and thus takes down the whole fetch process). So, ... we'd die anyway and won't update the refs and anything that leaves permanent damage to the repository either way, but we choose a better way to die by not taking SIGPIPE, but to get an error from one of the write()s or the final close(), which will lead us to more "controlled" death using the normal error path? > This is mostly cosmetic. The actual error of interest (in this case, the > object that failed the fsck check) comes from index-pack straight to > stderr, so the user still sees it. They _might_ even see fetch-pack > complaining about index-pack failing, because the main thread is racing > with the sideband-demuxer. But they'll definitely see the signal death > in the exit code, which is what the test is complaining about. OK. > We can make this more predictable by just ignoring SIGPIPE. The sideband > demuxer uses write_or_die(), so it will notice and stop (gracefully, > because we hook die_routine() to exit just the thread). And during this > section we're not writing anywhere else where we'd be concerned about > SIGPIPE preventing us from wasting effort writing to nowhere. OK. > +#include "sigchain.h" > > static int transfer_unpack_limit = -1; > static int fetch_unpack_limit = -1; > @@ -956,6 +957,8 @@ static int get_pack(struct fetch_pack_args *args, > strvec_push(index_pack_args, cmd.args.v[i]); > } > > + sigchain_push(SIGPIPE, SIG_IGN); > + > cmd.in = demux.out; > cmd.git_cmd = 1; > if (start_command(&cmd)) > @@ -986,6 +989,8 @@ static int get_pack(struct fetch_pack_args *args, > if (use_sideband && finish_async(&demux)) > die(_("error in sideband demultiplexer")); > > + sigchain_pop(SIGPIPE); > + > /* > * Now that index-pack has succeeded, write the promisor file using the > * obtained .keep filename if necessary Thanks.
On Fri, Nov 19, 2021 at 02:21:58PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > When fetching, we send the incoming pack to index-pack (or > > unpack-objects) via the sideband demuxer. If index-pack hits an error > > (e.g., because an object fails fsck), then it will die immediately. This > > may cause us to get SIGPIPE on the fetch, as we're still trying to write > > pack contents from the sideband demuxer (which is typically a thread, > > and thus takes down the whole fetch process). > > So, ... we'd die anyway and won't update the refs and anything that > leaves permanent damage to the repository either way, but we choose > a better way to die by not taking SIGPIPE, but to get an error from > one of the write()s or the final close(), which will lead us to more > "controlled" death using the normal error path? Yes, exactly. We'll be exiting either way; it's just a matter of racily changing the exit code and possibly the error message from the parent fetch process. So I do doubt this really hurts users much in practice, but making the tests robust seems worth it to me (because I found it after tracking down a flaky test failure). And because it _is_ a race in the code, I fixed it there rather than papering over the SIGPIPE exit in the test script. -Peff
diff --git a/fetch-pack.c b/fetch-pack.c index a9604f35a3..8fe3a49c1c 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -25,6 +25,7 @@ #include "shallow.h" #include "commit-reach.h" #include "commit-graph.h" +#include "sigchain.h" static int transfer_unpack_limit = -1; static int fetch_unpack_limit = -1; @@ -956,6 +957,8 @@ static int get_pack(struct fetch_pack_args *args, strvec_push(index_pack_args, cmd.args.v[i]); } + sigchain_push(SIGPIPE, SIG_IGN); + cmd.in = demux.out; cmd.git_cmd = 1; if (start_command(&cmd)) @@ -986,6 +989,8 @@ static int get_pack(struct fetch_pack_args *args, if (use_sideband && finish_async(&demux)) die(_("error in sideband demultiplexer")); + sigchain_pop(SIGPIPE); + /* * Now that index-pack has succeeded, write the promisor file using the * obtained .keep filename if necessary
When fetching, we send the incoming pack to index-pack (or unpack-objects) via the sideband demuxer. If index-pack hits an error (e.g., because an object fails fsck), then it will die immediately. This may cause us to get SIGPIPE on the fetch, as we're still trying to write pack contents from the sideband demuxer (which is typically a thread, and thus takes down the whole fetch process). You can see this in action with: ./t5702-protocol-v2.sh --stress --run=59 which ends with (wrapped for readability): test_must_fail: died by signal 13: git -c protocol.version=2 \ -c transfer.fsckobjects=1 -c fetch.uriprotocols=http,https \ clone http://127.0.0.1:5708/smart/http_parent http_child not ok 59 - packfile-uri with transfer.fsckobjects fails on bad object This is mostly cosmetic. The actual error of interest (in this case, the object that failed the fsck check) comes from index-pack straight to stderr, so the user still sees it. They _might_ even see fetch-pack complaining about index-pack failing, because the main thread is racing with the sideband-demuxer. But they'll definitely see the signal death in the exit code, which is what the test is complaining about. We can make this more predictable by just ignoring SIGPIPE. The sideband demuxer uses write_or_die(), so it will notice and stop (gracefully, because we hook die_routine() to exit just the thread). And during this section we're not writing anywhere else where we'd be concerned about SIGPIPE preventing us from wasting effort writing to nowhere. Signed-off-by: Jeff King <peff@peff.net> --- I wondered if the receive-pack side would have a similar problem, but there I think it's accepting the input directly from the network. So the client-side push may see a premature hangup. But there the SIGPIPE goes to pack-objects (which is writing straight to the network), and the parent send-pack/push process detects this; see the comment near the "141" check at the end of send-pack.c:pack_objects(). I cc'd Jonathan because it's his test, but really I think that is just luck. AFAICT this would be a problem for any fetch where transfer.fsckObjects detects a problem. fetch-pack.c | 5 +++++ 1 file changed, 5 insertions(+)