diff mbox series

[2/2] fetch: ignore SIGPIPE during network operation

Message ID 20190303165843.GB23755@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series [1/2] fetch: avoid calling write_or_die() | expand

Commit Message

Jeff King March 3, 2019, 4:58 p.m. UTC
The default SIGPIPE behavior can be useful for a command that generates
a lot of output: if the receiver of our output goes away, we'll be
notified asynchronously to stop generating it (typically by killing the
program).

But for a command like fetch, which is primarily concerned with
receiving data and writing it to disk, an unexpected SIGPIPE can be
awkward. We're already checking the return value of all of our write()
calls, and dying due to the signal takes away our chance to gracefully
handle the error.

On Linux, we wouldn't generally see SIGPIPE at all during fetch. If the
other side of the network connection hangs up, we'll see ECONNRESET. But
on OS X, we get a SIGPIPE, and the process is killed. This causes t5570
to racily fail, as we sometimes die by signal (instead of the expected
die() call) when the server side hangs up.

Let's ignore SIGPIPE during the network portion of the fetch, which will
cause our write() to return EPIPE, giving us consistent behavior across
platforms.

This fixes the test flakiness, but note that it stops short of fixing
the larger problem. The server side hit a fatal error, sent us an "ERR"
packet, and then hung up. We notice the failure because we're trying to
write to a closed socket. But by dying immediately, we never actually
read the ERR packet and report its content to the user. This is a (racy)
problem on all platforms. So this patch lays the groundwork from which
that problem might be fixed consistently, but it doesn't actually fix
it.

Note the placement of the SIGPIPE handling. The absolute minimal change
would be to ignore SIGPIPE only when we're writing. But twiddling the
signal handler for each write call is inefficient and maintenance
burden. On the opposite end of the spectrum, we could simply declare
that fetch does not need SIGPIPE handling, since it doesn't generate a
lot of output, and we could just ignore it at the start of cmd_fetch().

This patch takes a middle ground. It ignores SIGPIPE during the network
operation (which is admittedly most of the program, since the actual
network operations are all done under the hood by the transport code).
So it's still pretty coarse.

Signed-off-by: Jeff King <peff@peff.net>
---
I had a hard time trying to write the last two paragraphs. I think the
placement here really is not significantly different than just ignoring
SIGPIPE for the entirety of cmd_fetch(). I kind of like your suggestion
elsewhere in the thread to just teach the git wrapper an IGNORE_SIGPIPE
flag.

 builtin/fetch.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Junio C Hamano March 4, 2019, 1:11 a.m. UTC | #1
Jeff King <peff@peff.net> writes:

> ... But by dying immediately, we never actually
> read the ERR packet and report its content to the user. This is a (racy)
> problem on all platforms.

Yeah, I do not think of a good solution for it (nor I am not
convinced that it is worth fixing, to be honest.  The cable may get
cut before we have a chance to see the ERR packet, or the other side
may have died before producing one.

The fix obviously looks good.  Thanks.
Jeff King March 5, 2019, 4:11 a.m. UTC | #2
On Mon, Mar 04, 2019 at 10:11:46AM +0900, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > ... But by dying immediately, we never actually
> > read the ERR packet and report its content to the user. This is a (racy)
> > problem on all platforms.
> 
> Yeah, I do not think of a good solution for it (nor I am not
> convinced that it is worth fixing, to be honest.  The cable may get
> cut before we have a chance to see the ERR packet, or the other side
> may have died before producing one.

We definitely can never cover this 100%. But I wonder if we could put a
little more effort into "best effort". Specifically, I was thinking that
on seeing the write error, we might do something like:

  void NORETURN last_ditch_proto_read(const char *msg)
  {
	char *line;

	/*
	 * we had a write error; see if the server sent us anything
	 * useful to report
	 */
	if (packet_read_line_gently(fd, NULL, &line) &&
	    skip_prefix(line, "ERR ", &line)) {
	        die("remote error: %s", line);
	}

	/* otherwise, just report the write error */
	die("%s", msg);
  }

The tricky thing is that the writer does not always know the correct fd
to read more packets from (since the write errors may be deep in generic
code). I suspect we could rig up some kind of hacky global "if this
descriptor variable is non-negative, then do a last ditch read from it".

I do wonder if the race is better or worse when doing local fetches in
the test suite. On a real network with actual transit times, I suspect
we'd do better, because our writes would still be fast (we dump bytes to
the OS buffer) and we'd spend a higher percentage of our time waiting to
read back from the other side (which is what we want, because then we
see the ERR they wrote to us). That's just a guess, though.

> The fix obviously looks good.  Thanks.

Yeah, I don't think any of the above discussion needs to block the fix
here.

Here's a re-roll of the first patch, though, marked for translation as
requested by Duy.

-- >8 --
Subject: [PATCH] fetch: avoid calling write_or_die()

The write_or_die() function has one quirk that a caller might not
expect: when it sees EPIPE from the write() call, it translates that
into a death by SIGPIPE. This doesn't change the overall behavior (the
program exits either way), but it does potentially confuse test scripts
looking for a non-signal exit code.

Let's switch away from using write_or_die() in a few code paths, which
will give us more consistent exit codes. It also gives us the
opportunity to write more descriptive error messages, since we have
context that write_or_die() does not.

Note that this won't do much by itself, since we'd typically be killed
by SIGPIPE before write_or_die() even gets a chance to do its thing.
That will be addressed in the next patch.

Signed-off-by: Jeff King <peff@peff.net>
---
 fetch-pack.c | 9 ++++++---
 pkt-line.c   | 6 ++++--
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/fetch-pack.c b/fetch-pack.c
index 812be15d7e..e69993b2eb 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -191,8 +191,10 @@ static void send_request(struct fetch_pack_args *args,
 	if (args->stateless_rpc) {
 		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
 		packet_flush(fd);
-	} else
-		write_or_die(fd, buf->buf, buf->len);
+	} else {
+		if (write_in_full(fd, buf->buf, buf->len) < 0)
+			die_errno(_("unable to write to remote"));
+	}
 }
 
 static void insert_one_alternate_object(struct fetch_negotiator *negotiator,
@@ -1163,7 +1165,8 @@ static int send_fetch_request(struct fetch_negotiator *negotiator, int fd_out,
 
 	/* Send request */
 	packet_buf_flush(&req_buf);
-	write_or_die(fd_out, req_buf.buf, req_buf.len);
+	if (write_in_full(fd_out, req_buf.buf, req_buf.len) < 0)
+		die_errno(_("unable to write request to remote"));
 
 	strbuf_release(&req_buf);
 	return ret;
diff --git a/pkt-line.c b/pkt-line.c
index d4b71d3e82..6bd496a9bb 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -88,13 +88,15 @@ static void packet_trace(const char *buf, unsigned int len, int write)
 void packet_flush(int fd)
 {
 	packet_trace("0000", 4, 1);
-	write_or_die(fd, "0000", 4);
+	if (write_in_full(fd, "0000", 4) < 0)
+		die_errno(_("unable to write flush packet"));
 }
 
 void packet_delim(int fd)
 {
 	packet_trace("0001", 4, 1);
-	write_or_die(fd, "0001", 4);
+	if (write_in_full(fd, "0001", 4) < 0)
+		die_errno(_("unable to write delim packet"));
 }
 
 int packet_flush_gently(int fd)
diff mbox series

Patch

diff --git a/builtin/fetch.c b/builtin/fetch.c
index b620fd54b4..4ba63d5ac6 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1556,7 +1556,9 @@  static int fetch_one(struct remote *remote, int argc, const char **argv, int pru
 
 	sigchain_push_common(unlock_pack_on_signal);
 	atexit(unlock_pack);
+	sigchain_push(SIGPIPE, SIG_IGN);
 	exit_code = do_fetch(gtransport, &rs);
+	sigchain_pop(SIGPIPE);
 	refspec_clear(&rs);
 	transport_disconnect(gtransport);
 	gtransport = NULL;