Message ID | 20190303165843.GB23755@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] fetch: avoid calling write_or_die() | expand |
Jeff King <peff@peff.net> writes: > ... But by dying immediately, we never actually > read the ERR packet and report its content to the user. This is a (racy) > problem on all platforms. Yeah, I do not think of a good solution for it (nor I am not convinced that it is worth fixing, to be honest. The cable may get cut before we have a chance to see the ERR packet, or the other side may have died before producing one. The fix obviously looks good. Thanks.
On Mon, Mar 04, 2019 at 10:11:46AM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > ... But by dying immediately, we never actually > > read the ERR packet and report its content to the user. This is a (racy) > > problem on all platforms. > > Yeah, I do not think of a good solution for it (nor I am not > convinced that it is worth fixing, to be honest. The cable may get > cut before we have a chance to see the ERR packet, or the other side > may have died before producing one. We definitely can never cover this 100%. But I wonder if we could put a little more effort into "best effort". Specifically, I was thinking that on seeing the write error, we might do something like: void NORETURN last_ditch_proto_read(const char *msg) { char *line; /* * we had a write error; see if the server sent us anything * useful to report */ if (packet_read_line_gently(fd, NULL, &line) && skip_prefix(line, "ERR ", &line)) { die("remote error: %s", line); } /* otherwise, just report the write error */ die("%s", msg); } The tricky thing is that the writer does not always know the correct fd to read more packets from (since the write errors may be deep in generic code). I suspect we could rig up some kind of hacky global "if this descriptor variable is non-negative, then do a last ditch read from it". I do wonder if the race is better or worse when doing local fetches in the test suite. On a real network with actual transit times, I suspect we'd do better, because our writes would still be fast (we dump bytes to the OS buffer) and we'd spend a higher percentage of our time waiting to read back from the other side (which is what we want, because then we see the ERR they wrote to us). That's just a guess, though. > The fix obviously looks good. Thanks. Yeah, I don't think any of the above discussion needs to block the fix here. Here's a re-roll of the first patch, though, marked for translation as requested by Duy. -- >8 -- Subject: [PATCH] fetch: avoid calling write_or_die() The write_or_die() function has one quirk that a caller might not expect: when it sees EPIPE from the write() call, it translates that into a death by SIGPIPE. This doesn't change the overall behavior (the program exits either way), but it does potentially confuse test scripts looking for a non-signal exit code. Let's switch away from using write_or_die() in a few code paths, which will give us more consistent exit codes. It also gives us the opportunity to write more descriptive error messages, since we have context that write_or_die() does not. Note that this won't do much by itself, since we'd typically be killed by SIGPIPE before write_or_die() even gets a chance to do its thing. That will be addressed in the next patch. Signed-off-by: Jeff King <peff@peff.net> --- fetch-pack.c | 9 ++++++--- pkt-line.c | 6 ++++-- 2 files changed, 10 insertions(+), 5 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index 812be15d7e..e69993b2eb 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -191,8 +191,10 @@ static void send_request(struct fetch_pack_args *args, if (args->stateless_rpc) { send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX); packet_flush(fd); - } else - write_or_die(fd, buf->buf, buf->len); + } else { + if (write_in_full(fd, buf->buf, buf->len) < 0) + die_errno(_("unable to write to remote")); + } } static void insert_one_alternate_object(struct fetch_negotiator *negotiator, @@ -1163,7 +1165,8 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out, /* Send request */ packet_buf_flush(&req_buf); - write_or_die(fd_out, req_buf.buf, req_buf.len); + if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0) + die_errno(_("unable to write request to remote")); strbuf_release(&req_buf); return ret; diff --git a/pkt-line.c b/pkt-line.c index d4b71d3e82..6bd496a9bb 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -88,13 +88,15 @@ static void packet_trace(const char *buf, unsigned int len, int write) void packet_flush(int fd) { packet_trace("0000", 4, 1); - write_or_die(fd, "0000", 4); + if (write_in_full(fd, "0000", 4) < 0) + die_errno(_("unable to write flush packet")); } void packet_delim(int fd) { packet_trace("0001", 4, 1); - write_or_die(fd, "0001", 4); + if (write_in_full(fd, "0001", 4) < 0) + die_errno(_("unable to write delim packet")); } int packet_flush_gently(int fd)
diff --git a/builtin/fetch.c b/builtin/fetch.c index b620fd54b4..4ba63d5ac6 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1556,7 +1556,9 @@ static int fetch_one(struct remote *remote, int argc, const char **argv, int pru sigchain_push_common(unlock_pack_on_signal); atexit(unlock_pack); + sigchain_push(SIGPIPE, SIG_IGN); exit_code = do_fetch(gtransport, &rs); + sigchain_pop(SIGPIPE); refspec_clear(&rs); transport_disconnect(gtransport); gtransport = NULL;
The default SIGPIPE behavior can be useful for a command that generates a lot of output: if the receiver of our output goes away, we'll be notified asynchronously to stop generating it (typically by killing the program). But for a command like fetch, which is primarily concerned with receiving data and writing it to disk, an unexpected SIGPIPE can be awkward. We're already checking the return value of all of our write() calls, and dying due to the signal takes away our chance to gracefully handle the error. On Linux, we wouldn't generally see SIGPIPE at all during fetch. If the other side of the network connection hangs up, we'll see ECONNRESET. But on OS X, we get a SIGPIPE, and the process is killed. This causes t5570 to racily fail, as we sometimes die by signal (instead of the expected die() call) when the server side hangs up. Let's ignore SIGPIPE during the network portion of the fetch, which will cause our write() to return EPIPE, giving us consistent behavior across platforms. This fixes the test flakiness, but note that it stops short of fixing the larger problem. The server side hit a fatal error, sent us an "ERR" packet, and then hung up. We notice the failure because we're trying to write to a closed socket. But by dying immediately, we never actually read the ERR packet and report its content to the user. This is a (racy) problem on all platforms. So this patch lays the groundwork from which that problem might be fixed consistently, but it doesn't actually fix it. Note the placement of the SIGPIPE handling. The absolute minimal change would be to ignore SIGPIPE only when we're writing. But twiddling the signal handler for each write call is inefficient and maintenance burden. On the opposite end of the spectrum, we could simply declare that fetch does not need SIGPIPE handling, since it doesn't generate a lot of output, and we could just ignore it at the start of cmd_fetch(). This patch takes a middle ground. It ignores SIGPIPE during the network operation (which is admittedly most of the program, since the actual network operations are all done under the hood by the transport code). So it's still pretty coarse. Signed-off-by: Jeff King <peff@peff.net> --- I had a hard time trying to write the last two paragraphs. I think the placement here really is not significantly different than just ignoring SIGPIPE for the entirety of cmd_fetch(). I kind of like your suggestion elsewhere in the thread to just teach the git wrapper an IGNORE_SIGPIPE flag. builtin/fetch.c | 2 ++ 1 file changed, 2 insertions(+)