Message ID | df7d3659ae5f11d163f1e992f3b9403be709ddb7.1544572142.git.steadmon@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Unify pkt-line error handling and refactor smart-http | expand |
On Tue, Dec 11, 2018 at 04:25:15PM -0800, Josh Steadmon wrote: > From: Masaya Suzuki <masayasuzuki@google.com> > > In the Git pack protocol definition, an error packet may appear only in > a certain context. However, servers can face a runtime error (e.g. I/O > error) at an arbitrary timing. This patch changes the protocol to allow > an error packet to be sent instead of any packet. > > Following this protocol spec change, the error packet handling code is > moved to pkt-line.c. This is a change in the spec with an accompanying change in the code, which raises the question: what do other implementations do with this change (both older Git, and implementations like JGit, libgit2, etc)? I think the answer for older Git is "hang up unceremoniously", which is probably OK given the semantics of the change. And I'd suspect most other implementations would do the same. I just wonder if anybody tested it with other implementations. > +An error packet is a special pkt-line that contains an error string. > + > +---- > + error-line = PKT-LINE("ERR" SP explanation-text) > +---- > + > +Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY > +be sent. Once this packet is sent by a client or a server, the data transfer > +process defined in this protocol is terminated. The packfile data is typically packetized, too, and contains arbitrary data (that could have "ERR" in it). It looks like we don't specifically say PKT-LINE() in that part of the protocol spec, though, so I think this is OK. Likewise, in the implementation: > diff --git a/pkt-line.c b/pkt-line.c > index 04d10bbd03..ce9e42d10e 100644 > --- a/pkt-line.c > +++ b/pkt-line.c > @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > return PACKET_READ_EOF; > } > > + if (starts_with(buffer, "ERR ")) { > + die(_("remote error: %s"), buffer + 4); > + } > + > if ((options & PACKET_READ_CHOMP_NEWLINE) && > len && buffer[len-1] == '\n') > len--; This ERR handling has been moved to a very low level. What happens if we're passing arbitrary data via the packet_read() code? Could we erroneously trigger an error if a packfile happens to have the bytes "ERR " at a packet boundary? For packfiles via upload-pack, I _think_ we're OK, because we only packetize it when a sideband is in use. In which case this would never match, because we'd have "\1" in the first byte slot. But are there are other cases we need to worry about? Just brainstorming, I can think of: 1. We also pass packetized packfiles between git-remote-https and the stateless-rpc mode of fetch-pack/send-pack. And I don't think we use sidebands there. 2. The packet code is used for long-lived clean/smudge filters these days, which also pass arbitrary data. So I think it's probably not a good idea to unconditionally have callers of packet_read_with_status() handle this. We'd need a flag like PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. -Peff
On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote: > > On Tue, Dec 11, 2018 at 04:25:15PM -0800, Josh Steadmon wrote: > > > From: Masaya Suzuki <masayasuzuki@google.com> > > > > In the Git pack protocol definition, an error packet may appear only in > > a certain context. However, servers can face a runtime error (e.g. I/O > > error) at an arbitrary timing. This patch changes the protocol to allow > > an error packet to be sent instead of any packet. > > > > Following this protocol spec change, the error packet handling code is > > moved to pkt-line.c. > > This is a change in the spec with an accompanying change in the code, > which raises the question: what do other implementations do with this > change (both older Git, and implementations like JGit, libgit2, etc)? JGit is similar to Git. It parses "ERR " in limited places. When it sees an ERR packet in an unexpected place, it'll fail somewhere in the parsing code. https://github.com/eclipse/jgit/blob/30c6c7542190c149e2aee792f992a312a5fc5793/org.eclipse.jgit/src/org/eclipse/jgit/transport/PacketLineIn.java#L145-L147 https://github.com/eclipse/jgit/blob/f40b39345cd9b54473ee871bff401fe3d394ffe3/org.eclipse.jgit/src/org/eclipse/jgit/transport/BasePackConnection.java#L208 I'm not familiar with libgit2 code, but it seems it handles this at a lower level. An error type packet is parsed out at a low level, and the error handling is done by the callers of the packet parser. https://github.com/libgit2/libgit2/blob/bea65980c7a42e34edfafbdc40b199ba7b2a564e/src/transports/smart_pkt.c#L482-L483 I cannot find an ERR packet handling in go-git. It seems to me that if an ERR packet appears it treats it as a parsing error. https://github.com/src-d/go-git/blob/master/plumbing/protocol/packp/common.go#L60-L62 > > I think the answer for older Git is "hang up unceremoniously", which is > probably OK given the semantics of the change. And I'd suspect most > other implementations would do the same. I just wonder if anybody tested > it with other implementations. I'm thinking aloud here. There would be two aspects of the protocol compatibility: (1) new clients speak to old servers (2) old clients speak to a new server that speaks the updated protocol. For (1), I assume that in the Git pack protocol, a packet starting from "ERR " does not appear naturally except for a very special case that the server doesn't support sideband, but using the updated protocol. As you mentioned, at first it looks like this can mistakenly parse the pack file of git-receive-pack as an ERR packet, assuming that git-receive-pack's pack file is packetized. Actually git-receive-pack's pack file is not packetized in the Git pack protocol (https://github.com/git/git/blob/master/builtin/receive-pack.c#L1695). I recently wrote a Git protocol parser (https://github.com/google/gitprotocolio), and I confirmed that this is the case at least for the HTTP transport. git-upload-pack's pack file is indeed packetized, but packetized with sideband. Except for the case where sideband is not used, the packfiles wouldn't be considered as an ERR packet accidentally. For (2), if the old clients see an unexpected ERR packet, they cannot parse it. They would handle this unparsable data as if the server is not speaking Git protocol correctly. Even if the old clients just ignore the packet, due to the nature of the ERR packet, the server won't send further data. The client won't be able to proceed. Overall, the clients anyway face an error, and the only difference would be whether the clients can show an error nicely or not. The new clients will show the errors nicely to users. Old clients will not. > > > +An error packet is a special pkt-line that contains an error string. > > + > > +---- > > + error-line = PKT-LINE("ERR" SP explanation-text) > > +---- > > + > > +Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY > > +be sent. Once this packet is sent by a client or a server, the data transfer > > +process defined in this protocol is terminated. > > The packfile data is typically packetized, too, and contains arbitrary > data (that could have "ERR" in it). It looks like we don't specifically > say PKT-LINE() in that part of the protocol spec, though, so I think > this is OK. As I described above, as far as I can see, the packfile in git-upload-pack is not packetized. The packfile in git-receive-pack is packetized but typically with sideband. At least at the Git pack protocol level, this should be OK. > > Likewise, in the implementation: > > > diff --git a/pkt-line.c b/pkt-line.c > > index 04d10bbd03..ce9e42d10e 100644 > > --- a/pkt-line.c > > +++ b/pkt-line.c > > @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, > > return PACKET_READ_EOF; > > } > > > > + if (starts_with(buffer, "ERR ")) { > > + die(_("remote error: %s"), buffer + 4); > > + } > > + > > if ((options & PACKET_READ_CHOMP_NEWLINE) && > > len && buffer[len-1] == '\n') > > len--; > > This ERR handling has been moved to a very low level. What happens if > we're passing arbitrary data via the packet_read() code? Could we > erroneously trigger an error if a packfile happens to have the bytes > "ERR " at a packet boundary? > > For packfiles via upload-pack, I _think_ we're OK, because we only > packetize it when a sideband is in use. In which case this would never > match, because we'd have "\1" in the first byte slot. > > But are there are other cases we need to worry about? Just > brainstorming, I can think of: > > 1. We also pass packetized packfiles between git-remote-https and > the stateless-rpc mode of fetch-pack/send-pack. And I don't think > we use sidebands there. > > 2. The packet code is used for long-lived clean/smudge filters these > days, which also pass arbitrary data. > > So I think it's probably not a good idea to unconditionally have callers > of packet_read_with_status() handle this. We'd need a flag like > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. This is outside of the Git pack protocol so having a separate parsing mode makes sense to me. > > -Peff
On Wed, Dec 12, 2018 at 05:17:01PM -0800, Masaya Suzuki wrote: > > This is a change in the spec with an accompanying change in the code, > > which raises the question: what do other implementations do with this > > change (both older Git, and implementations like JGit, libgit2, etc)? > > JGit is similar to Git. It parses "ERR " in limited places. When it sees an ERR > packet in an unexpected place, it'll fail somewhere in the parsing code. > > https://github.com/eclipse/jgit/blob/30c6c7542190c149e2aee792f992a312a5fc5793/org.eclipse.jgit/src/org/eclipse/jgit/transport/PacketLineIn.java#L145-L147 > https://github.com/eclipse/jgit/blob/f40b39345cd9b54473ee871bff401fe3d394ffe3/org.eclipse.jgit/src/org/eclipse/jgit/transport/BasePackConnection.java#L208 > > I'm not familiar with libgit2 code, but it seems it handles this at a > lower level. An error type packet is parsed out at a low level, and > the error handling is done by the callers of the packet parser. > > https://github.com/libgit2/libgit2/blob/bea65980c7a42e34edfafbdc40b199ba7b2a564e/src/transports/smart_pkt.c#L482-L483 > > I cannot find an ERR packet handling in go-git. It seems to me that if > an ERR packet appears it treats it as a parsing error. > > https://github.com/src-d/go-git/blob/master/plumbing/protocol/packp/common.go#L60-L62 Thanks for digging into these. It does make sense that other implementations would give a parsing error. Hopefully they also produce a sensible error message (ideally printing the bogus pktline), but even if they don't we're probably no worse off than the status quo. With the current scheme, the server can't give any message, and just has to hang up anyway. > > I think the answer for older Git is "hang up unceremoniously", which is > > probably OK given the semantics of the change. And I'd suspect most > > other implementations would do the same. I just wonder if anybody tested > > it with other implementations. > > I'm thinking aloud here. There would be two aspects of the protocol > compatibility: (1) new clients speak to old servers (2) old clients > speak to a new server that speaks the updated protocol. > > For (1), I assume that in the Git pack protocol, a packet starting > from "ERR " does not appear naturally except for a very special case > that the server doesn't support sideband, but using the updated > protocol. As you mentioned, at first it looks like this can mistakenly > parse the pack file of git-receive-pack as an ERR packet, assuming > that git-receive-pack's pack file is packetized. Actually > git-receive-pack's pack file is not packetized in the Git pack > protocol (https://github.com/git/git/blob/master/builtin/receive-pack.c#L1695). > I recently wrote a Git protocol parser > (https://github.com/google/gitprotocolio), and I confirmed that this > is the case at least for the HTTP transport. git-upload-pack's pack > file is indeed packetized, but packetized with sideband. Except for > the case where sideband is not used, the packfiles wouldn't be > considered as an ERR packet accidentally. Right, that matches my understanding. > For (2), if the old clients see an unexpected ERR packet, they cannot > parse it. They would handle this unparsable data as if the server is > not speaking Git protocol correctly. Even if the old clients just > ignore the packet, due to the nature of the ERR packet, the server > won't send further data. The client won't be able to proceed. Overall, > the clients anyway face an error, and the only difference would be > whether the clients can show an error nicely or not. The new clients > will show the errors nicely to users. Old clients will not. Yeah, this was the case I was more concerned about, and I think it is probably OK (by this rationale, and what I wrote above). > > So I think it's probably not a good idea to unconditionally have callers > > of packet_read_with_status() handle this. We'd need a flag like > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. > > This is outside of the Git pack protocol so having a separate parsing > mode makes sense to me. Yeah. Here's a sample script which works with current Git (the index contains the uppercased content "ERR FOO"), but fails after this patch (Git thinks the filter reported an error and dies; it's not great that we die in the packet-reading code at all for this case, but your patch is hardly the first call to die() in that function). -- >8 -- git init -q repo cd repo echo '*.magic filter=magic' >.git/info/attributes git config filter.magic.process $PWD/filter # toy filter to uppercase content cat >filter <<-\EOF #!/usr/bin/perl sub read_pkt { my @r; while (1) { read(STDIN, my $len, 4); last if $len eq "0000"; read(STDIN, my $buf, hex($len)-4); push @r, $buf; } return @r; } sub write_pkt { local $| = 1; while (@_) { my $buf = shift; printf "%04x", length($buf) + 4; print $buf; } print "0000"; } read_pkt(); # handshake write_pkt(qw(git-filter-server version=2)); read_pkt(); # capabilities write_pkt(qw(capability=clean)); read_pkt(); # clean command @content = read_pkt(); write_pkt(qw(status=success)); write_pkt(map { uc } @content); write_pkt(); # final status EOF chmod +x filter echo 'err foo' >foo.magic git add foo.magic git cat-file blob :foo.magic
On 2018.12.12 17:17, Masaya Suzuki wrote: > On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote: > > This ERR handling has been moved to a very low level. What happens if > > we're passing arbitrary data via the packet_read() code? Could we > > erroneously trigger an error if a packfile happens to have the bytes > > "ERR " at a packet boundary? > > > > For packfiles via upload-pack, I _think_ we're OK, because we only > > packetize it when a sideband is in use. In which case this would never > > match, because we'd have "\1" in the first byte slot. > > > > But are there are other cases we need to worry about? Just > > brainstorming, I can think of: > > > > 1. We also pass packetized packfiles between git-remote-https and > > the stateless-rpc mode of fetch-pack/send-pack. And I don't think > > we use sidebands there. > > > > 2. The packet code is used for long-lived clean/smudge filters these > > days, which also pass arbitrary data. > > > > So I think it's probably not a good idea to unconditionally have callers > > of packet_read_with_status() handle this. We'd need a flag like > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. > > This is outside of the Git pack protocol so having a separate parsing > mode makes sense to me. This sounds like it could be a significant refactoring. Should we go back to V2 of this series, and then work on the new parsing mode separately?
On Thu, Dec 13, 2018 at 02:18:26PM -0800, Josh Steadmon wrote: > On 2018.12.12 17:17, Masaya Suzuki wrote: > > On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote: > > > This ERR handling has been moved to a very low level. What happens if > > > we're passing arbitrary data via the packet_read() code? Could we > > > erroneously trigger an error if a packfile happens to have the bytes > > > "ERR " at a packet boundary? > > > > > > For packfiles via upload-pack, I _think_ we're OK, because we only > > > packetize it when a sideband is in use. In which case this would never > > > match, because we'd have "\1" in the first byte slot. > > > > > > But are there are other cases we need to worry about? Just > > > brainstorming, I can think of: > > > > > > 1. We also pass packetized packfiles between git-remote-https and > > > the stateless-rpc mode of fetch-pack/send-pack. And I don't think > > > we use sidebands there. > > > > > > 2. The packet code is used for long-lived clean/smudge filters these > > > days, which also pass arbitrary data. > > > > > > So I think it's probably not a good idea to unconditionally have callers > > > of packet_read_with_status() handle this. We'd need a flag like > > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. > > > > This is outside of the Git pack protocol so having a separate parsing > > mode makes sense to me. > > This sounds like it could be a significant refactoring. Should we go > back to V2 of this series, and then work on the new parsing mode > separately? Which one is v2? :) Just the remote-curl cleanups from me, and then your "die on server-side errors" patch? -Peff
On 2018.12.17 16:33, Jeff King wrote: > On Thu, Dec 13, 2018 at 02:18:26PM -0800, Josh Steadmon wrote: > > > On 2018.12.12 17:17, Masaya Suzuki wrote: > > > On Wed, Dec 12, 2018 at 3:02 AM Jeff King <peff@peff.net> wrote: > > > > This ERR handling has been moved to a very low level. What happens if > > > > we're passing arbitrary data via the packet_read() code? Could we > > > > erroneously trigger an error if a packfile happens to have the bytes > > > > "ERR " at a packet boundary? > > > > > > > > For packfiles via upload-pack, I _think_ we're OK, because we only > > > > packetize it when a sideband is in use. In which case this would never > > > > match, because we'd have "\1" in the first byte slot. > > > > > > > > But are there are other cases we need to worry about? Just > > > > brainstorming, I can think of: > > > > > > > > 1. We also pass packetized packfiles between git-remote-https and > > > > the stateless-rpc mode of fetch-pack/send-pack. And I don't think > > > > we use sidebands there. > > > > > > > > 2. The packet code is used for long-lived clean/smudge filters these > > > > days, which also pass arbitrary data. > > > > > > > > So I think it's probably not a good idea to unconditionally have callers > > > > of packet_read_with_status() handle this. We'd need a flag like > > > > PACKET_READ_RESPECT_ERR, and to trigger it from the appropriate callers. > > > > > > This is outside of the Git pack protocol so having a separate parsing > > > mode makes sense to me. > > > > This sounds like it could be a significant refactoring. Should we go > > back to V2 of this series, and then work on the new parsing mode > > separately? > > Which one is v2? :) > > Just the remote-curl cleanups from me, and then your "die on server-side > errors" patch? Yes, that one :)
On Wed, Dec 19, 2018 at 03:30:05PM -0800, Josh Steadmon wrote: > > > > This is outside of the Git pack protocol so having a separate parsing > > > > mode makes sense to me. > > > > > > This sounds like it could be a significant refactoring. Should we go > > > back to V2 of this series, and then work on the new parsing mode > > > separately? > > > > Which one is v2? :) > > > > Just the remote-curl cleanups from me, and then your "die on server-side > > errors" patch? > > Yes, that one :) Then yes, that sounds reasonable to me. -Peff
diff --git a/Documentation/technical/pack-protocol.txt b/Documentation/technical/pack-protocol.txt index 6ac774d5f6..7a2375a55d 100644 --- a/Documentation/technical/pack-protocol.txt +++ b/Documentation/technical/pack-protocol.txt @@ -22,6 +22,16 @@ protocol-common.txt. When the grammar indicate `PKT-LINE(...)`, unless otherwise noted the usual pkt-line LF rules apply: the sender SHOULD include a LF, but the receiver MUST NOT complain if it is not present. +An error packet is a special pkt-line that contains an error string. + +---- + error-line = PKT-LINE("ERR" SP explanation-text) +---- + +Throughout the protocol, where `PKT-LINE(...)` is expected, an error packet MAY +be sent. Once this packet is sent by a client or a server, the data transfer +process defined in this protocol is terminated. + Transports ---------- There are three transports over which the packfile protocol is @@ -89,13 +99,6 @@ process on the server side over the Git protocol is this: "0039git-upload-pack /schacon/gitbook.git\0host=example.com\0" | nc -v example.com 9418 -If the server refuses the request for some reasons, it could abort -gracefully with an error message. - ----- - error-line = PKT-LINE("ERR" SP explanation-text) ----- - SSH Transport ------------- @@ -398,12 +401,11 @@ from the client). Then the server will start sending its packfile data. ---- - server-response = *ack_multi ack / nak / error-line + server-response = *ack_multi ack / nak ack_multi = PKT-LINE("ACK" SP obj-id ack_status) ack_status = "continue" / "common" / "ready" ack = PKT-LINE("ACK" SP obj-id) nak = PKT-LINE("NAK") - error-line = PKT-LINE("ERR" SP explanation-text) ---- A simple clone may look like this (with no 'have' lines): diff --git a/builtin/archive.c b/builtin/archive.c index d2455237ce..5d179bbd16 100644 --- a/builtin/archive.c +++ b/builtin/archive.c @@ -59,8 +59,6 @@ static int run_remote_archiver(int argc, const char **argv, if (strcmp(buf, "ACK")) { if (starts_with(buf, "NACK ")) die(_("git archive: NACK %s"), buf + 5); - if (starts_with(buf, "ERR ")) - die(_("remote error: %s"), buf + 4); die(_("git archive: protocol error")); } diff --git a/connect.c b/connect.c index 24281b6082..4813f005ab 100644 --- a/connect.c +++ b/connect.c @@ -296,7 +296,6 @@ struct ref **get_remote_heads(struct packet_reader *reader, struct ref **orig_list = list; int len = 0; enum get_remote_heads_state state = EXPECTING_FIRST_REF; - const char *arg; *list = NULL; @@ -306,8 +305,6 @@ struct ref **get_remote_heads(struct packet_reader *reader, die_initial_contact(1); case PACKET_READ_NORMAL: len = reader->pktlen; - if (len > 4 && skip_prefix(reader->line, "ERR ", &arg)) - die(_("remote error: %s"), arg); break; case PACKET_READ_FLUSH: state = EXPECTING_DONE; diff --git a/fetch-pack.c b/fetch-pack.c index 9691046e64..e66cd7b71b 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -178,8 +178,6 @@ static enum ack_type get_ack(int fd, struct object_id *result_oid) return ACK; } } - if (skip_prefix(line, "ERR ", &arg)) - die(_("remote error: %s"), arg); die(_("git fetch-pack: expected ACK/NAK, got '%s'"), line); } diff --git a/pkt-line.c b/pkt-line.c index 04d10bbd03..ce9e42d10e 100644 --- a/pkt-line.c +++ b/pkt-line.c @@ -346,6 +346,10 @@ enum packet_read_status packet_read_with_status(int fd, char **src_buffer, return PACKET_READ_EOF; } + if (starts_with(buffer, "ERR ")) { + die(_("remote error: %s"), buffer + 4); + } + if ((options & PACKET_READ_CHOMP_NEWLINE) && len && buffer[len-1] == '\n') len--; diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh index 3f58f05cbb..d2a9d0c127 100755 --- a/t/t5703-upload-pack-ref-in-want.sh +++ b/t/t5703-upload-pack-ref-in-want.sh @@ -208,7 +208,7 @@ test_expect_success 'server is initially ahead - no ref in want' ' cp -r "$LOCAL_PRISTINE" local && inconsistency master 1234567890123456789012345678901234567890 && test_must_fail git -C local fetch 2>err && - grep "ERR upload-pack: not our ref" err + grep "fatal: remote error: upload-pack: not our ref" err ' test_expect_success 'server is initially ahead - ref in want' ' @@ -254,7 +254,7 @@ test_expect_success 'server loses a ref - ref in want' ' echo "s/master/raster/" >"$HTTPD_ROOT_PATH/one-time-sed" && test_must_fail git -C local fetch 2>err && - grep "ERR unknown ref refs/heads/raster" err + grep "fatal: remote error: unknown ref refs/heads/raster" err ' stop_httpd