http-backend: allow empty CONTENT_LENGTH
diff mbox series

Message ID 20180906193516.28909-1-max@max630.net
State Superseded
Headers show
Series
  • http-backend: allow empty CONTENT_LENGTH
Related show

Commit Message

Max Kirillov Sept. 6, 2018, 7:35 p.m. UTC
According to RFC3875, empty environment variable is equivalent to unset,
and for CONTENT_LENGTH it should mean zero body to read.

However, as discussed in [1], unset CONTENT_LENGTH is also used for
chunked encoding to indicate reading until EOF, so keep this behavior also
for empty CONTENT_LENGTH.

Add a test for the case.

[1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/

Signed-off-by: Max Kirillov <max@max630.net>
---
Hi.

This should fix it. I'm not sure should it treat it as 0 or "-1"
At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
So keep the existing behavior as much as possible
 http-backend.c                         |  2 +-
 t/t5562-http-backend-content-length.sh | 11 +++++++++++
 2 files changed, 12 insertions(+), 1 deletion(-)

Comments

Junio C Hamano Sept. 6, 2018, 9:54 p.m. UTC | #1
Max Kirillov <max@max630.net> writes:

> According to RFC3875, empty environment variable is equivalent to unset,
> and for CONTENT_LENGTH it should mean zero body to read.
>
> However, as discussed in [1], unset CONTENT_LENGTH is also used for
> chunked encoding to indicate reading until EOF, so keep this behavior also
> for empty CONTENT_LENGTH.

Makes sense.

>
> Add a test for the case.
>
> [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/
>
> Signed-off-by: Max Kirillov <max@max630.net>
> ---
> Hi.
>
> This should fix it. I'm not sure should it treat it as 0 or "-1"
> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> So keep the existing behavior as much as possible

I am not sure what you mean by the above, between 0 and -1.  The
code signals the caller of get_content_length() that req_len is -1
which is used as a sign to read through to the EOF, so it appears to
me that the code treats missing content-length (i.e. str == NULL
case) as "-1".


>  http-backend.c                         |  2 +-
>  t/t5562-http-backend-content-length.sh | 11 +++++++++++
>  2 files changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/http-backend.c b/http-backend.c
> index e88d29f62b..a1230d7ead 100644
> --- a/http-backend.c
> +++ b/http-backend.c
> @@ -353,7 +353,7 @@ static ssize_t get_content_length(void)
>  	ssize_t val = -1;
>  	const char *str = getenv("CONTENT_LENGTH");
>  
> -	if (str && !git_parse_ssize_t(str, &val))
> +	if (str && *str && !git_parse_ssize_t(str, &val))
>  		die("failed to parse CONTENT_LENGTH: %s", str);
>  	return val;
>  }
> diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
> index 057dcb85d6..ca34c2f054 100755
> --- a/t/t5562-http-backend-content-length.sh
> +++ b/t/t5562-http-backend-content-length.sh
> @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
>  	grep "fatal:.*CONTENT_LENGTH" err
>  '
>  
> +test_expect_success 'empty CONTENT_LENGTH' '
> +	env \
> +		QUERY_STRING=/repo.git/HEAD \
> +		PATH_TRANSLATED="$PWD"/.git/HEAD \
> +		GIT_HTTP_EXPORT_ALL=TRUE \
> +		REQUEST_METHOD=GET \
> +		CONTENT_LENGTH="" \
> +		git http-backend <empty_body >act.out 2>act.err &&
> +	verify_http_result "200 OK"
> +'
> +
>  test_done
Jonathan Nieder Sept. 6, 2018, 10:45 p.m. UTC | #2
Hi,

Max Kirillov wrote:

> According to RFC3875, empty environment variable is equivalent to unset,
> and for CONTENT_LENGTH it should mean zero body to read.
>
> However, as discussed in [1], unset CONTENT_LENGTH is also used for
> chunked encoding to indicate reading until EOF, so keep this behavior also
> for empty CONTENT_LENGTH.
>
> Add a test for the case.
>
> [1] https://public-inbox.org/git/20160329201349.GB9527@sigill.intra.peff.net/
>
> Signed-off-by: Max Kirillov <max@max630.net>

Reported-by: Jelmer Vernooń≥ <jelmer@jelmer.uk>

Thanks for fixing it.

Can you include a summary of [1] instead of relying on the mailing
list archive?  Perhaps just omiting "as discussed in [1]" would do the
trick.  Alternatively, if there's a point from that discussion that's
relevant to the change, please include it here.  That way, people
finding this change later can save some time by avoiding having to dig
through that mailing list thread.

For example, it's probably worth mentioning that this was discovered
using dulwich's test suite.

[...]
> This should fix it. I'm not sure should it treat it as 0 or "-1"
> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> So keep the existing behavior as much as possible

That sounds worth figuring out so we can understand and possibly
document it better.  What are the ramifications of this choice ---
what would work / not work with each choice?

[...]
> --- a/t/t5562-http-backend-content-length.sh
> +++ b/t/t5562-http-backend-content-length.sh
> @@ -152,4 +152,15 @@ test_expect_success 'CONTENT_LENGTH overflow ssite_t' '

Yay, thanks for this as well.

Sincerely,
Jonathan
Max Kirillov Sept. 7, 2018, 3:27 a.m. UTC | #3
On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote:
> Max Kirillov <max@max630.net> writes:
>> This should fix it. I'm not sure should it treat it as 0 or "-1"
>> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
>> So keep the existing behavior as much as possible
> 
> I am not sure what you mean by the above, between 0 and -1.  The
> code signals the caller of get_content_length() that req_len is -1
> which is used as a sign to read through to the EOF, so it appears to
> me that the code treats missing content-length (i.e. str == NULL
> case) as "-1".

I made a mistake in this, it should be "if I try to treat missing
CONTENT_LENGTH as 0". This, as far as I understand, what the
RFC specifies.

That is, after the following change, the test "large fetch-pack
requests can be split across POSTs" from t5551 starts faliing:

-- >8 --
@@ -353,8 +353,12 @@ static ssize_t get_content_length(void)
        ssize_t val = -1;
        const char *str = getenv("CONTENT_LENGTH");
 
-       if (str && *str && !git_parse_ssize_t(str, &val))
-               die("failed to parse CONTENT_LENGTH: %s", str);
+       if (str && *str) {
+               if (!git_parse_ssize_t(str, &val))
+                       die("failed to parse CONTENT_LENGTH: %s", str);
+       } else
+               val = 0;
+
        return val;
 }
Jeff King Sept. 7, 2018, 3:38 a.m. UTC | #4
On Fri, Sep 07, 2018 at 06:27:40AM +0300, Max Kirillov wrote:

> On Thu, Sep 06, 2018 at 02:54:18PM -0700, Junio C Hamano wrote:
> > Max Kirillov <max@max630.net> writes:
> >> This should fix it. I'm not sure should it treat it as 0 or "-1"
> >> At least the tests mentioned by Jeff fails if I try to treat missing CONTENT_LENGTH as "-1"
> >> So keep the existing behavior as much as possible
> > 
> > I am not sure what you mean by the above, between 0 and -1.  The
> > code signals the caller of get_content_length() that req_len is -1
> > which is used as a sign to read through to the EOF, so it appears to
> > me that the code treats missing content-length (i.e. str == NULL
> > case) as "-1".
> 
> I made a mistake in this, it should be "if I try to treat missing
> CONTENT_LENGTH as 0". This, as far as I understand, what the
> RFC specifies.
> 
> That is, after the following change, the test "large fetch-pack
> requests can be split across POSTs" from t5551 starts faliing:
> 
> -- >8 --
> @@ -353,8 +353,12 @@ static ssize_t get_content_length(void)
>         ssize_t val = -1;
>         const char *str = getenv("CONTENT_LENGTH");
>  
> -       if (str && *str && !git_parse_ssize_t(str, &val))
> -               die("failed to parse CONTENT_LENGTH: %s", str);
> +       if (str && *str) {
> +               if (!git_parse_ssize_t(str, &val))
> +                       die("failed to parse CONTENT_LENGTH: %s", str);
> +       } else
> +               val = 0;
> +

Right, I'm pretty sure it is a problem if you treat a missing
CONTENT_LENGTH as "present, but zero". Because chunked encodings from
apache really do want us to read until EOF.

My understanding from Jelmer's report is that a present-but-empty
variable should be counted as "0" to mean "do not read any body bytes".
That matches my reading of RFC 3875, which says:

  If no data is attached, then NULL (or unset).

(and earlier they explicitly define NULL as the empty string). That
said, we do not do what they say for the "unset" case. And cannot
without breaking chunked encoding from apache. So I don't know how much
we want to follow that rfc to the letter, but at least it makes sense to
me to revert this case back to what Git used to do, and what the rfc
says.

In other words, I think the logic we want is:

  if (!str) {
	/*
	 * RFC3875 says this must mean "no body", but in practice we
	 * receive chunked encodings with no CONTENT_LENGTH. Tell the
	 * caller to read until EOF.
	 */
	val = -1;
  } else if (!*str) {
	/*
	 * An empty length should be treated as "no body" according to
	 * RFC3875, and this seems to hold in practice.
	 */
	val = 0;
  } else {
	/*
	 * We have a CONTENT_LENGTH; trust what's in it as long as it
	 * can be parsed.
	 */
	if (!git_parse_ssize_t(str, &val))
	        die(...);
  }

-Peff
Max Kirillov Sept. 7, 2018, 4:20 a.m. UTC | #5
On Thu, Sep 06, 2018 at 11:38:31PM -0400, Jeff King wrote:
> My understanding from Jelmer's report is that a present-but-empty
> variable should be counted as "0" to mean "do not read any body bytes".
> That matches my reading of RFC 3875, which says:
> 
>   If no data is attached, then NULL (or unset).
> 
> (and earlier they explicitly define NULL as the empty string). That
> said, we do not do what they say for the "unset" case. And cannot
> without breaking chunked encoding from apache. So I don't know how much
> we want to follow that rfc to the letter, but at least it makes sense to
> me to revert this case back to what Git used to do, and what the rfc
> says.

I could find this discussion about it:
https://lists.gt.net/apache/users/373042

Basically, it says the CGI RFC was written before chunked
encoding appeared, so implementations should choose between
caching all boody before calling script, or breaking the
spec some way. So apache does it so.

(I wonder how IIS would handle it)

> In other words, I think the logic we want is:
> 
>   if (!str) {
> 	/*
> 	 * RFC3875 says this must mean "no body", but in practice we
> 	 * receive chunked encodings with no CONTENT_LENGTH. Tell the
> 	 * caller to read until EOF.
> 	 */
> 	val = -1;
>   } else if (!*str) {
> 	/*
> 	 * An empty length should be treated as "no body" according to
> 	 * RFC3875, and this seems to hold in practice.
> 	 */
> 	val = 0;
>   } else {
> 	/*
> 	 * We have a CONTENT_LENGTH; trust what's in it as long as it
> 	 * can be parsed.
> 	 */
> 	if (!git_parse_ssize_t(str, &val))
> 	        die(...);
>   }

I feel reluctant to treat empty and unset differently, but
probably this is the only thing which could be done.

I'll resumbmit some time later.
Max Kirillov Sept. 7, 2018, 4:59 a.m. UTC | #6
Actually, another reason for the latest issue was that CONTENT_LENGTH
is parsed for GET requests at all. It should be parsed only for POST
requests, or, rather, only for upoad-pack and receive-pack requests.
Junio C Hamano Sept. 7, 2018, 9:49 a.m. UTC | #7
Max Kirillov <max@max630.net> writes:

> Actually, another reason for the latest issue was that CONTENT_LENGTH
> is parsed for GET requests at all. It should be parsed only for POST
> requests, or, rather, only for upoad-pack and receive-pack requests.

Not really.  The layered design of the HTTP protocol means that any
request type can have non-empty body, but request types for which
no semantics of the body is defined must ignore what is in the body,
which in turn means we need to parse and pay attention to the
content length etc. to find the end of the body, if only to ignore
it.

In any case, hopefully we can fix this before the final, as this is
a regression introduced during this cycle?
Max Kirillov Sept. 8, 2018, 5:41 a.m. UTC | #8
On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote:
> Max Kirillov <max@max630.net> writes:
> 
>> Actually, another reason for the latest issue was that CONTENT_LENGTH
>> is parsed for GET requests at all. It should be parsed only for POST
>> requests, or, rather, only for upoad-pack and receive-pack requests.
> 
> Not really.  The layered design of the HTTP protocol means that any
> request type can have non-empty body, but request types for which
> no semantics of the body is defined must ignore what is in the body,
> which in turn means we need to parse and pay attention to the
> content length etc. to find the end of the body, if only to ignore
> it.

I don't think it is git's job to police web server implementations,
especially considering that there is a gap between letter of RFC and
actual behavior.  Anyway, it only runs the check for "*/info/refs" GET
request, which ends up in get_info_refs(). Other GET requests do not
check CONTENT_LENGTH. Also, the version of service which is started from
get_info_refs() do not consume input (I think, actually, the
"--stateless-rpc" argument is not needed there).

> In any case, hopefully we can fix this before the final, as this is
> a regression introduced during this cycle?

Yes, I'm working on it.
Max Kirillov Sept. 9, 2018, 4:40 a.m. UTC | #9
On Fri, Sep 07, 2018 at 02:49:23AM -0700, Junio C Hamano wrote:
> In any case, hopefully we can fix this before the final, as this is
> a regression introduced during this cycle?

I think I am going to stop at the v4. Unless there are some
corrections requested.

Patch
diff mbox series

diff --git a/http-backend.c b/http-backend.c
index e88d29f62b..a1230d7ead 100644
--- a/http-backend.c
+++ b/http-backend.c
@@ -353,7 +353,7 @@  static ssize_t get_content_length(void)
 	ssize_t val = -1;
 	const char *str = getenv("CONTENT_LENGTH");
 
-	if (str && !git_parse_ssize_t(str, &val))
+	if (str && *str && !git_parse_ssize_t(str, &val))
 		die("failed to parse CONTENT_LENGTH: %s", str);
 	return val;
 }
diff --git a/t/t5562-http-backend-content-length.sh b/t/t5562-http-backend-content-length.sh
index 057dcb85d6..ca34c2f054 100755
--- a/t/t5562-http-backend-content-length.sh
+++ b/t/t5562-http-backend-content-length.sh
@@ -152,4 +152,15 @@  test_expect_success 'CONTENT_LENGTH overflow ssite_t' '
 	grep "fatal:.*CONTENT_LENGTH" err
 '
 
+test_expect_success 'empty CONTENT_LENGTH' '
+	env \
+		QUERY_STRING=/repo.git/HEAD \
+		PATH_TRANSLATED="$PWD"/.git/HEAD \
+		GIT_HTTP_EXPORT_ALL=TRUE \
+		REQUEST_METHOD=GET \
+		CONTENT_LENGTH="" \
+		git http-backend <empty_body >act.out 2>act.err &&
+	verify_http_result "200 OK"
+'
+
 test_done