[v4] http-backend: allow empty CONTENT_LENGTH
diff mbox series

Message ID 20180909041016.23980-1-max@max630.net
State New
Headers show
Series
  • [v4] http-backend: allow empty CONTENT_LENGTH
Related show

Commit Message

Max Kirillov Sept. 9, 2018, 4:10 a.m. UTC
Before 817f7dc223, CONTENT_LENGTH variable was never considered,
http-backend was just reading request body from standard input until EOF
when it, or a command started by it, needed it.

Then it was discovered that some HTTP do not close standard input, instead
expecting CGI scripts to obey CONTENT_LENGTH. In 817f7dc223, behavior
was changed to consider the CONTENT_LENGTH variable when it is set. Case
of unset CONTENT_LENGTH was kept to mean "read until EOF" which is not
compliant to the RFC3875 (which treats it as empty body), but
practically is used when client uses chunked encoding to submit big
request.

Case of empty CONTENT_LENGTH has slept through this conditions.
Apparently, it is used for GET requests, and RFC3875 does specify that
it also means empty body. Current implementation, however, fails to
parse it and aborts the request.

Fix the case of empty CONTENT_LENGTH to be treated as zero-length body
is expected, as specified by RFC3875. It does not actually matter what
does it mean because body is never read anyway, it just should not cause
parse error. Add a test for the case.

Reported-By: Jelmer Vernooń≥ <jelmer@jelmer.uk>
Authored-by: Jeff King <peff@peff.net>
Signed-off-by: Max Kirillov <max@max630.net>
---
The fix suggested by Jeff. I supposed there should be "signed-off"
The tests pass as well
 http-backend.c                         | 24 ++++++++++++++++++++++--
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 33 insertions(+), 2 deletions(-)

Comments

Jonathan Nieder Sept. 10, 2018, 5:25 a.m. UTC | #1
Max Kirillov wrote:

> Reported-By: Jelmer Vernooń≥ <jelmer@jelmer.uk>
> Authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Max Kirillov <max@max630.net>

Nit: for this kind of case of forwarding someone else's patch, we put
a From field at the beginning of the body of the message.  "git
format-patch" can produce a message with that format if you commit
with 'git commit --author="Someone Else <person@example.com>"' and run
format-patch with --from="My Name <me@example.com>".  More details are
in the DISCUSSION section of git-format-patch(1).

As with v3, since v2 is already in "next" this should go incremental.

[...]
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
>  	ssize_t val = -1;
>  	const char *str = getenv("CONTENT_LENGTH");
>  
> -	if (str && !git_parse_ssize_t(str, &val))
> -		die("failed to parse CONTENT_LENGTH: %s", str);
> +	if (!str) {
> +		/*
> +		 * RFC3875 says this must mean "no body", but in practice we
> +		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> +		 * caller to read until EOF.
> +		 */
> +		val = -1;
> +	} else if (!*str) {
> +		/*
> +		 * An empty length should be treated as "no body" according to
> +		 * RFC3875, and this seems to hold in practice.
> +		 */
> +		val = 0;

Are there example callers that this version fixes?  Where can I read
more, or what can I run to experience it?

For example, v2.19.0-rc0~45^2~2 (http-backend: respect CONTENT_LENGTH
as specified by rfc3875, 2018-06-10) mentions IIS/Windows; does IIS
make use of this distinction?

Thanks,
Jonathan
Jeff King Sept. 10, 2018, 1:17 p.m. UTC | #2
On Sun, Sep 09, 2018 at 10:25:58PM -0700, Jonathan Nieder wrote:

> > --- a/http-backend.c
> > +++ b/http-backend.c
> > @@ -353,8 +353,28 @@ static ssize_t get_content_length(void)
> >  	ssize_t val = -1;
> >  	const char *str = getenv("CONTENT_LENGTH");
> >  
> > -	if (str && !git_parse_ssize_t(str, &val))
> > -		die("failed to parse CONTENT_LENGTH: %s", str);
> > +	if (!str) {
> > +		/*
> > +		 * RFC3875 says this must mean "no body", but in practice we
> > +		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> > +		 * caller to read until EOF.
> > +		 */
> > +		val = -1;
> > +	} else if (!*str) {
> > +		/*
> > +		 * An empty length should be treated as "no body" according to
> > +		 * RFC3875, and this seems to hold in practice.
> > +		 */
> > +		val = 0;
> 
> Are there example callers that this version fixes?  Where can I read
> more, or what can I run to experience it?
> 
> For example, v2.19.0-rc0~45^2~2 (http-backend: respect CONTENT_LENGTH
> as specified by rfc3875, 2018-06-10) mentions IIS/Windows; does IIS
> make use of this distinction?

So this code is what I recommended based on my reading of the RFC, and
based on my understanding of the Debian bug. But I admit I'm confused.

I thought the complaint was that this:

  CONTENT_LENGTH= git http-backend

was reading a body, when it shouldn't be. And so setting it to 0 here
made sense.

But that couldn't have been what older versions were doing, since they
never looked at CONTENT_LENGTH at all, and instead always read to EOF.
So presumably the original problem wasn't that we tried to read a body,
but that the empty string caused git_parse_ssize_t to report failure,
and we called die(). Which probably should be explained by 574c513e8d
(http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
late for that.

So after that patch, we really do have the original behavior, and that's
enough for v2.19.

But the remaining question then is: what should clients expect on an
empty variable? We know what the RFC says, and we know what dulwich
expected, but I'm not sure we have real world cases beyond that. So it
might actually make sense to punt until we see one, though I don't mind
doing what the rfc says in the meantime. And then the explanation in the
commit message would be "do what the rfc says", and any test probably
ought to be feeding a non-empty empty and confirming that we don't read
it.

-Peff
Junio C Hamano Sept. 10, 2018, 4:37 p.m. UTC | #3
Jeff King <peff@peff.net> writes:

> But that couldn't have been what older versions were doing, since they
> never looked at CONTENT_LENGTH at all, and instead always read to EOF.
> So presumably the original problem wasn't that we tried to read a body,
> but that the empty string caused git_parse_ssize_t to report failure,
> and we called die(). Which probably should be explained by 574c513e8d
> (http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
> late for that.
>
> So after that patch, we really do have the original behavior, and that's
> enough for v2.19.

To recap to make sure I am following it correctly:

 - pay attention to content-length when it is clearly given with a
   byte count, which is an improvement over v2.18

 - mimick what we have been doing until now when content-length is
   missing or set to an empty string, so we are regression free and
   bug-to-bug compatible relative to v2.18 in these two cases.

> But the remaining question then is: what should clients expect on an
> empty variable? We know what the RFC says, and we know what dulwich
> expected, but I'm not sure we have real world cases beyond that. So it
> might actually make sense to punt until we see one, though I don't mind
> doing what the rfc says in the meantime. And then the explanation in the
> commit message would be "do what the rfc says", and any test probably
> ought to be feeding a non-empty empty and confirming that we don't read
> it.

The RFC is pretty clear that no data is signaled by "NULL (or
unset)", meaning an empty string value and missing variable both
mean the same "no message body", but it further says that the
servers MUST set CONTENT_LENGTH if and only if there is a
message-body, which contradicts with itself (if you adhered to 'if
and only if', in no case you would set it to NULL).

Googling "cgi chunked encoding" seems to give us tons of hits to
show that people are puzzled, just like us, that the scripts would
not get to see Chunked (as the server is supposed to deChunk to
count content-length before calling the backend).  So I agree "do
what the rfc says" is a good thing to try early in the next cycle.
Jeff King Sept. 10, 2018, 6:46 p.m. UTC | #4
On Mon, Sep 10, 2018 at 09:37:20AM -0700, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > But that couldn't have been what older versions were doing, since they
> > never looked at CONTENT_LENGTH at all, and instead always read to EOF.
> > So presumably the original problem wasn't that we tried to read a body,
> > but that the empty string caused git_parse_ssize_t to report failure,
> > and we called die(). Which probably should be explained by 574c513e8d
> > (http-backend: allow empty CONTENT_LENGTH, 2018-09-07), but it's too
> > late for that.
> >
> > So after that patch, we really do have the original behavior, and that's
> > enough for v2.19.
> 
> To recap to make sure I am following it correctly:
> 
>  - pay attention to content-length when it is clearly given with a
>    byte count, which is an improvement over v2.18
> 
>  - mimick what we have been doing until now when content-length is
>    missing or set to an empty string, so we are regression free and
>    bug-to-bug compatible relative to v2.18 in these two cases.

That (and what you wrote below) matches my current understanding, too.
Though I did already admit to being confused. ;)

-Peff

Patch
diff mbox series

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..949821b46f 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,8 +353,28 @@  static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
-		die("failed to parse CONTENT_LENGTH: %s", str);
+	if (!str) {
+		/*
+		 * RFC3875 says this must mean "no body", but in practice we
+		 * receive chunked encodings with no CONTENT_LENGTH. Tell the
+		 * caller to read until EOF.
+		 */
+		val = -1;
+	} else if (!*str) {
+		/*
+		 * An empty length should be treated as "no body" according to
+		 * RFC3875, and this seems to hold in practice.
+		 */
+		val = 0;
+	} else {
+		/*
+		 * We have a non-empty CONTENT_LENGTH; trust what's in it as long
+		 * as it can be parsed.
+		 */
+		if (!git_parse_ssize_t(str, &val))
+			die("failed to parse CONTENT_LENGTH: '%s'", str);
+	}
+
 	return val;
 }
 
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..b28c3c4765 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@  test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING="/repo.git/info/refs?service=git-receive-pack" \
+		PATH_TRANSLATED="$PWD"/.git/info/refs \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done