diff mbox series

[RFC] fetch-pack: try harder to read an ERR packet

Message ID 20200422163357.27056-1-chriscool@tuxfamily.org (mailing list archive)
State New, archived
Headers show
Series [RFC] fetch-pack: try harder to read an ERR packet | expand

Commit Message

Christian Couder April 22, 2020, 4:33 p.m. UTC
When the server has hung up after sending an ERR packet to the
client, the client might still be writing, for example a "done"
line. Therefore the client might get a write error before reading
the ERR packet.

When fetching this could result in the client displaying a
"Broken pipe" error, instead of the more useful error sent by
the server in the ERR packet.

Instead of immediately die()ing after write_in_full() returned an
error, fetch should first try to read all incoming packets in the hope
that the remote did send an ERR packet before it died, and then die
with the error in that packet, or fall back to the current generic
error message if there is no ERR packet (e.g. remote segfaulted or
something similarly horrible).

Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
---
This just formats the following patch from SZEDER Gábor:

https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/

I haven't tried to implement some suggestions discussed later
in the above thread like:

  - renaming send_request()
  - covering more code pathes
  - avoid blocking indefinitely by looking for an ERR packet
    only if the write() resulted in ECONNRESET or EPIPE
  - first printing an error message with error_errno() before
    going on to look for an ERR packet
  - implementing a timeout

as it seems to me that there was no consensus about those.

It follows up from discussions in this thread:

https://lore.kernel.org/git/cover.1584477196.git.me@ttaylorr.com/

 fetch-pack.c | 25 +++++++++++++++++++------
 1 file changed, 19 insertions(+), 6 deletions(-)

Comments

Taylor Blau April 22, 2020, 11:27 p.m. UTC | #1
On Wed, Apr 22, 2020 at 06:33:57PM +0200, Christian Couder wrote:

It looks like this may be missing a:

  From: SZEDER Gábor <szeder.dev@gmail.com>

header.

> When the server has hung up after sending an ERR packet to the
> client, the client might still be writing, for example a "done"
> line. Therefore the client might get a write error before reading
> the ERR packet.
>
> When fetching this could result in the client displaying a
> "Broken pipe" error, instead of the more useful error sent by
> the server in the ERR packet.

All makes sense.

> Instead of immediately die()ing after write_in_full() returned an
> error, fetch should first try to read all incoming packets in the hope
> that the remote did send an ERR packet before it died, and then die
> with the error in that packet, or fall back to the current generic
> error message if there is no ERR packet (e.g. remote segfaulted or
> something similarly horrible).
>
> Signed-off-by: SZEDER Gábor <szeder.dev@gmail.com>
> Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
> ---
> This just formats the following patch from SZEDER Gábor:
>
> https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
>
> I haven't tried to implement some suggestions discussed later
> in the above thread like:
>
>   - renaming send_request()

Agreed that this is probably something we should do. Maybe
'send_request_retry' or something? That name is kind of terrible.

>   - covering more code pathes

I see where Peff raised this point originally, but I think that this is
a good step forward as-is. No need to hold this up looking for complete
coverage, since this is obviously improving the situation.

>   - avoid blocking indefinitely by looking for an ERR packet
>     only if the write() resulted in ECONNRESET or EPIPE

I think that I may have addressed this point below.

>   - first printing an error message with error_errno() before
>     going on to look for an ERR packet

I'm not sure what I think about this one. I'd certainly welcome all
opinions on this matter.

>   - implementing a timeout

A timeout may be a good thing to do. See what you think about my
suggestion below, first, though.

>
> as it seems to me that there was no consensus about those.
>
> It follows up from discussions in this thread:
>
> https://lore.kernel.org/git/cover.1584477196.git.me@ttaylorr.com/
>
>  fetch-pack.c | 25 +++++++++++++++++++------
>  1 file changed, 19 insertions(+), 6 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 1734a573b0..63e8925e2b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -185,14 +185,27 @@ static enum ack_type get_ack(struct packet_reader *reader,
>  }
>
>  static void send_request(struct fetch_pack_args *args,
> -			 int fd, struct strbuf *buf)
> +			 int fd, struct strbuf *buf,
> +			 struct packet_reader *reader)
>  {
>  	if (args->stateless_rpc) {
>  		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
>  		packet_flush(fd);
>  	} else {
> -		if (write_in_full(fd, buf->buf, buf->len) < 0)
> +		if (write_in_full(fd, buf->buf, buf->len) < 0) {

This makes sense; if 'write_in_full' fails, we want to go into the
error-handling case below.

But, I wonder if we could do better than re-using 'write_in_full' here.
We definitely do want to write 'buf->len' bytes overall, but what
happens when a 'write()' fails? I think it's at _that_ point we want to
try and read a packet or two off from the remote.

What if instead this looked something like:

  const char *p = buf;
  ssize_t total = 0;

  while (count > 0) {
    ssize_t written = xwrite(fd, p, count);
    if (written < 0)
      return -1;
    /* note the change on this line */
    if (!written && packet_reader_read(reader) == PACKET_READ_EOF) {
      errno = ENOSPC;
      return -1;
    }
    count -= written;
    p += written;
    total += written;
  }

  return total;

That is basically the definition of 'write_in_full', but when we didn't
get a chance to write anything, then we try to read one packet.

This way, we only read exactly as many packets as we need to when we hit
this case. I'm not sure that it matters in practice, though.

> +			int save_errno = errno;
> +			/*
> +			 * Read everything the remote has sent to us.
> +			 * If there is an ERR packet, then the loop die()s
> +			 * with the received error message.
> +			 * If we reach EOF without seeing an ERR, then die()
> +			 * with a generic error message, most likely "Broken
> +			 * pipe".
> +			 */
> +			while (packet_reader_read(reader) != PACKET_READ_EOF);
> +			errno = save_errno;
>  			die_errno(_("unable to write to remote"));
> +		}
>  	}
>  }
>
> @@ -349,7 +362,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  		const char *arg;
>  		struct object_id oid;
>
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>  		while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
>  			if (skip_prefix(reader.line, "shallow ", &arg)) {
>  				if (get_oid_hex(arg, &oid))
> @@ -372,7 +385,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  			die(_("expected shallow/unshallow, got %s"), reader.line);
>  		}
>  	} else if (!args->stateless_rpc)
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>
>  	if (!args->stateless_rpc) {
>  		/* If we aren't using the stateless-rpc interface
> @@ -395,7 +408,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  			int ack;
>
>  			packet_buf_flush(&req_buf);
> -			send_request(args, fd[1], &req_buf);
> +			send_request(args, fd[1], &req_buf, &reader);
>  			strbuf_setlen(&req_buf, state_len);
>  			flushes++;
>  			flush_at = next_flush(args->stateless_rpc, count);
> @@ -470,7 +483,7 @@ static int find_common(struct fetch_negotiator *negotiator,
>  	trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
>  	if (!got_ready || !no_done) {
>  		packet_buf_write(&req_buf, "done\n");
> -		send_request(args, fd[1], &req_buf);
> +		send_request(args, fd[1], &req_buf, &reader);
>  	}
>  	print_verbose(args, _("done"));
>  	if (retval != 0) {
> --
> 2.17.1

Thanks,
Taylor
Christian Couder April 28, 2020, 7:39 a.m. UTC | #2
On Thu, Apr 23, 2020 at 1:27 AM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Wed, Apr 22, 2020 at 06:33:57PM +0200, Christian Couder wrote:
>
> It looks like this may be missing a:
>
>   From: SZEDER Gábor <szeder.dev@gmail.com>
>
> header.

Yeah, I wanted to have him as the author but forgot. The next version
will have it.

[...]

> > ---
> > This just formats the following patch from SZEDER Gábor:
> >
> > https://lore.kernel.org/git/20190830121005.GI8571@szeder.dev/
> >
> > I haven't tried to implement some suggestions discussed later
> > in the above thread like:
> >
> >   - renaming send_request()
>
> Agreed that this is probably something we should do. Maybe
> 'send_request_retry' or something? That name is kind of terrible.

Not sure 'send_request_retry' is better as we are not really retrying
to send the request. My take would be something like
'send_request_read_err'. For now I have left it as is though.

> >   - covering more code pathes
>
> I see where Peff raised this point originally, but I think that this is
> a good step forward as-is. No need to hold this up looking for complete
> coverage, since this is obviously improving the situation.

Ok.

> >   - avoid blocking indefinitely by looking for an ERR packet
> >     only if the write() resulted in ECONNRESET or EPIPE
>
> I think that I may have addressed this point below.
>
> >   - first printing an error message with error_errno() before
> >     going on to look for an ERR packet
>
> I'm not sure what I think about this one. I'd certainly welcome all
> opinions on this matter.
>
> >   - implementing a timeout
>
> A timeout may be a good thing to do. See what you think about my
> suggestion below, first, though.

Ok, thanks for your suggestion.

> > diff --git a/fetch-pack.c b/fetch-pack.c
> > index 1734a573b0..63e8925e2b 100644
> > --- a/fetch-pack.c
> > +++ b/fetch-pack.c
> > @@ -185,14 +185,27 @@ static enum ack_type get_ack(struct packet_reader *reader,
> >  }
> >
> >  static void send_request(struct fetch_pack_args *args,
> > -                      int fd, struct strbuf *buf)
> > +                      int fd, struct strbuf *buf,
> > +                      struct packet_reader *reader)
> >  {
> >       if (args->stateless_rpc) {
> >               send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
> >               packet_flush(fd);
> >       } else {
> > -             if (write_in_full(fd, buf->buf, buf->len) < 0)
> > +             if (write_in_full(fd, buf->buf, buf->len) < 0) {
>
> This makes sense; if 'write_in_full' fails, we want to go into the
> error-handling case below.
>
> But, I wonder if we could do better than re-using 'write_in_full' here.
> We definitely do want to write 'buf->len' bytes overall, but what
> happens when a 'write()' fails? I think it's at _that_ point we want to
> try and read a packet or two off from the remote.

Yeah, good idea.

> What if instead this looked something like:
>
>   const char *p = buf;
>   ssize_t total = 0;
>
>   while (count > 0) {
>     ssize_t written = xwrite(fd, p, count);
>     if (written < 0)
>       return -1;
>     /* note the change on this line */
>     if (!written && packet_reader_read(reader) == PACKET_READ_EOF) {
>       errno = ENOSPC;
>       return -1;
>     }
>     count -= written;
>     p += written;
>     total += written;
>   }
>
>   return total;
>
> That is basically the definition of 'write_in_full', but when we didn't
> get a chance to write anything, then we try to read one packet.

Yeah, your code above looks correct. I have added a new function doing
the above in the new version I will send soon.

> This way, we only read exactly as many packets as we need to when we hit
> this case. I'm not sure that it matters in practice, though.

I am not sure I understand what you think doesn't matter in practice.

Thanks,
Christian.
Jeff King April 28, 2020, 8:33 a.m. UTC | #3
On Wed, Apr 22, 2020 at 05:27:44PM -0600, Taylor Blau wrote:

> > -		if (write_in_full(fd, buf->buf, buf->len) < 0)
> > +		if (write_in_full(fd, buf->buf, buf->len) < 0) {
> 
> This makes sense; if 'write_in_full' fails, we want to go into the
> error-handling case below.
> 
> But, I wonder if we could do better than re-using 'write_in_full' here.
> We definitely do want to write 'buf->len' bytes overall, but what
> happens when a 'write()' fails? I think it's at _that_ point we want to
> try and read a packet or two off from the remote.
> 
> What if instead this looked something like:
> 
>   const char *p = buf;
>   ssize_t total = 0;
> 
>   while (count > 0) {
>     ssize_t written = xwrite(fd, p, count);
>     if (written < 0)
>       return -1;
>     /* note the change on this line */
>     if (!written && packet_reader_read(reader) == PACKET_READ_EOF) {
>       errno = ENOSPC;
>       return -1;
>     }
>     count -= written;
>     p += written;
>     total += written;
>   }
> 
>   return total;
> 
> That is basically the definition of 'write_in_full', but when we didn't
> get a chance to write anything, then we try to read one packet.

I'm not sure I understand what this is trying to do. If write_in_full()
returns an error, then we know that write() just failed, and it would be
appropriate to check errno at that point and decide whether to read a
packet.

The code you've written above doesn't make sense to me. If we see an
error, we'd return _before_ doing any packet_reader_read() stuff. We'd
trigger it only when write() returns 0. But it should only do that if we
fed it 0 bytes, which we know we'd never pass (because we wouldn't run
the loop if count==0).

-Peff
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index 1734a573b0..63e8925e2b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -185,14 +185,27 @@  static enum ack_type get_ack(struct packet_reader *reader,
 }
 
 static void send_request(struct fetch_pack_args *args,
-			 int fd, struct strbuf *buf)
+			 int fd, struct strbuf *buf,
+			 struct packet_reader *reader)
 {
 	if (args->stateless_rpc) {
 		send_sideband(fd, -1, buf->buf, buf->len, LARGE_PACKET_MAX);
 		packet_flush(fd);
 	} else {
-		if (write_in_full(fd, buf->buf, buf->len) < 0)
+		if (write_in_full(fd, buf->buf, buf->len) < 0) {
+			int save_errno = errno;
+			/*
+			 * Read everything the remote has sent to us.
+			 * If there is an ERR packet, then the loop die()s
+			 * with the received error message.
+			 * If we reach EOF without seeing an ERR, then die()
+			 * with a generic error message, most likely "Broken
+			 * pipe".
+			 */
+			while (packet_reader_read(reader) != PACKET_READ_EOF);
+			errno = save_errno;
 			die_errno(_("unable to write to remote"));
+		}
 	}
 }
 
@@ -349,7 +362,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 		const char *arg;
 		struct object_id oid;
 
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 		while (packet_reader_read(&reader) == PACKET_READ_NORMAL) {
 			if (skip_prefix(reader.line, "shallow ", &arg)) {
 				if (get_oid_hex(arg, &oid))
@@ -372,7 +385,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 			die(_("expected shallow/unshallow, got %s"), reader.line);
 		}
 	} else if (!args->stateless_rpc)
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 
 	if (!args->stateless_rpc) {
 		/* If we aren't using the stateless-rpc interface
@@ -395,7 +408,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 			int ack;
 
 			packet_buf_flush(&req_buf);
-			send_request(args, fd[1], &req_buf);
+			send_request(args, fd[1], &req_buf, &reader);
 			strbuf_setlen(&req_buf, state_len);
 			flushes++;
 			flush_at = next_flush(args->stateless_rpc, count);
@@ -470,7 +483,7 @@  static int find_common(struct fetch_negotiator *negotiator,
 	trace2_region_leave("fetch-pack", "negotiation_v0_v1", the_repository);
 	if (!got_ready || !no_done) {
 		packet_buf_write(&req_buf, "done\n");
-		send_request(args, fd[1], &req_buf);
+		send_request(args, fd[1], &req_buf, &reader);
 	}
 	print_verbose(args, _("done"));
 	if (retval != 0) {