diff mbox series

[1/2] http: reset POSTFIELDSIZE when clearing curl handle

Message ID 20240402200517.GA875182@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 324231174247c70e66908b78924908fbfcebed19
Headers show
Series git+curl 8.7.0 workaround | expand

Commit Message

Jeff King April 2, 2024, 8:05 p.m. UTC
In get_active_slot(), we return a CURL handle that may have been used
before (reusing them is good because it lets curl reuse the same
connection across many requests). We set a few curl options back to
defaults that may have been modified by previous requests.

We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
defaults to "-1"). This usually doesn't matter because most POSTs will
set both fields together anyway. But there is one exception: when
handling a large request in remote-curl's post_rpc(), we don't set
_either_, and instead set a READFUNCTION to stream data into libcurl.

This can interact weirdly with a stale POSTFIELDSIZE setting, because
curl will assume it should read only some set number of bytes from our
READFUNCTION. However, it has worked in practice because we also
manually set a "Transfer-Encoding: chunked" header, which libcurl uses
as a clue to set the POSTFIELDSIZE to -1 itself.

So everything works, but we're better off resetting the size manually
for a few reasons:

  - there was a regression in curl 8.7.0 where the chunked header
    detection didn't kick in, causing any large HTTP requests made by
    Git to fail. This has since been fixed (but not yet released). In
    the issue, curl folks recommended setting it explicitly to -1:

      https://github.com/curl/curl/issues/13229#issuecomment-2029826058

    and it indeed works around the regression. So even though it won't
    be strictly necessary after the fix there, this will help folks who
    end up using the affected libcurl versions.

  - it's consistent with what a new curl handle would look like. Since
    get_active_slot() may or may not return a used handle, this reduces
    the possibility of heisenbugs that only appear with certain request
    patterns.

Note that the recommendation in the curl issue is to actually drop the
manual Transfer-Encoding header. Modern libcurl will add the header
itself when streaming from a READFUNCTION. However, that code wasn't
added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
support back to 7.19.5, so those older versions still need the manual
header.

Signed-off-by: Jeff King <peff@peff.net>
---
 http.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Junio C Hamano April 2, 2024, 8:27 p.m. UTC | #1
Jeff King <peff@peff.net> writes:

> In get_active_slot(), we return a CURL handle that may have been used
> before (reusing them is good because it lets curl reuse the same
> connection across many requests). We set a few curl options back to
> defaults that may have been modified by previous requests.
>
> We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> defaults to "-1"). This usually doesn't matter because most POSTs will
> set both fields together anyway. But there is one exception: when
> handling a large request in remote-curl's post_rpc(), we don't set
> _either_, and instead set a READFUNCTION to stream data into libcurl.
>
> This can interact weirdly with a stale POSTFIELDSIZE setting, because
> curl will assume it should read only some set number of bytes from our
> READFUNCTION. However, it has worked in practice because we also
> manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> as a clue to set the POSTFIELDSIZE to -1 itself.
>
> So everything works, but we're better off resetting the size manually
> for a few reasons:
>
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
>
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
>
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.
>
>   - it's consistent with what a new curl handle would look like. Since
>     get_active_slot() may or may not return a used handle, this reduces
>     the possibility of heisenbugs that only appear with certain request
>     patterns.
>
> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 1 +
>  1 file changed, 1 insertion(+)

As always, well articulated.  Thanks.  Will queue.

> diff --git a/http.c b/http.c
> index e73b136e58..3d80bd6116 100644
> --- a/http.c
> +++ b/http.c
> @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
>  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
>  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
Jeff King April 3, 2024, 3:20 a.m. UTC | #2
On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:

> So everything works, but we're better off resetting the size manually
> for a few reasons:
> 
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
> 
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> 
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.

Hmph. This isn't quite enough to make things work with 8.7.0, because
there are two things wrong there:

  1. curl uses the leftover POSTFIELDSIZE, which this patch fixes. So
     HTTP/1.1 is fine (including t5551).

  2. In HTTP/2 mode, it sends chunked data, even though HTTP/2 does not
     use "Transfer-Encoding: chunked" (it correctly does not send the
     header, but it puts unnecessary chunk framing around the data).

     t5559 (which covers HTTP/2) fails, and so does accessing HTTP/2
     capable hosts like github.com (though score another win for t5559;
     it has introduced some hassles, but it diagnosed a real problem we
     would not have otherwise seen in the test suite).

This second problem can be fixed by eliminating the manual
transfer-encoding header, which is what's confusing curl's HTTP/2 code.
But as I said earlier...

> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.

...we can't do so unconditionally. So we'd need something like:

diff --git a/remote-curl.c b/remote-curl.c
index 31b02b8840..215bcc6e10 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -955,7 +955,9 @@ static int post_rpc(struct rpc_state *rpc, int stateless_connect, int flush_rece
 		/* The request body is large and the size cannot be predicted.
 		 * We must use chunked encoding to send it.
 		 */
+#if LIBCURL_VERSION_NUM < 0x074200
 		headers = curl_slist_append(headers, "Transfer-Encoding: chunked");
+#endif
 		rpc->initial_buffer = 1;
 		curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, rpc_out);
 		curl_easy_setopt(slot->curl, CURLOPT_INFILE, rpc);

I was hoping not to have to carry any version-specific cruft (especially
since newer versions of curl should fix the regression). But it's not
too bad, and it actually marks the code as something that we can ditch
in the future when that version of curl becomes obsolete.

I can try to prepare a cleaner patch for it tomorrow, but comments
welcome in the meantime.

-Peff
Patrick Steinhardt April 3, 2024, 6:30 a.m. UTC | #3
On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:
> In get_active_slot(), we return a CURL handle that may have been used
> before (reusing them is good because it lets curl reuse the same
> connection across many requests). We set a few curl options back to
> defaults that may have been modified by previous requests.
> 
> We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> defaults to "-1"). This usually doesn't matter because most POSTs will
> set both fields together anyway. But there is one exception: when
> handling a large request in remote-curl's post_rpc(), we don't set
> _either_, and instead set a READFUNCTION to stream data into libcurl.
> 
> This can interact weirdly with a stale POSTFIELDSIZE setting, because
> curl will assume it should read only some set number of bytes from our
> READFUNCTION. However, it has worked in practice because we also
> manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> as a clue to set the POSTFIELDSIZE to -1 itself.
> 
> So everything works, but we're better off resetting the size manually
> for a few reasons:
> 
>   - there was a regression in curl 8.7.0 where the chunked header
>     detection didn't kick in, causing any large HTTP requests made by
>     Git to fail. This has since been fixed (but not yet released). In
>     the issue, curl folks recommended setting it explicitly to -1:
> 
>       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> 
>     and it indeed works around the regression. So even though it won't
>     be strictly necessary after the fix there, this will help folks who
>     end up using the affected libcurl versions.
> 
>   - it's consistent with what a new curl handle would look like. Since
>     get_active_slot() may or may not return a used handle, this reduces
>     the possibility of heisenbugs that only appear with certain request
>     patterns.
> 
> Note that the recommendation in the curl issue is to actually drop the
> manual Transfer-Encoding header. Modern libcurl will add the header
> itself when streaming from a READFUNCTION. However, that code wasn't
> added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> support back to 7.19.5, so those older versions still need the manual
> header.
> 
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  http.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/http.c b/http.c
> index e73b136e58..3d80bd6116 100644
> --- a/http.c
> +++ b/http.c
> @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
>  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
>  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
>  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);

Can't we refactor this code to instead use `curl_easy_reset()`? That
function already resets most of the data we want to reset and would also
end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
wouldn't the following be a more sensible fix?

diff --git a/http.c b/http.c
index e73b136e58..e5f5bc23db 100644
--- a/http.c
+++ b/http.c
@@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void)
 	slot->finished = NULL;
 	slot->callback_data = NULL;
 	slot->callback_func = NULL;
+	curl_easy_reset(slot->curl);
 	curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file);
 	if (curl_save_cookies)
 		curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header);
 	curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions);
 	curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr);
-	curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
-	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
-	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
-	curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL);
 
 	/*
 	 * Default following to off unless "ALWAYS" is configured; this gives

Patrick
Patrick Steinhardt April 3, 2024, 6:34 a.m. UTC | #4
On Wed, Apr 03, 2024 at 08:30:54AM +0200, Patrick Steinhardt wrote:
> On Tue, Apr 02, 2024 at 04:05:17PM -0400, Jeff King wrote:
> > In get_active_slot(), we return a CURL handle that may have been used
> > before (reusing them is good because it lets curl reuse the same
> > connection across many requests). We set a few curl options back to
> > defaults that may have been modified by previous requests.
> > 
> > We reset POSTFIELDS to NULL, but do not reset POSTFIELDSIZE (which
> > defaults to "-1"). This usually doesn't matter because most POSTs will
> > set both fields together anyway. But there is one exception: when
> > handling a large request in remote-curl's post_rpc(), we don't set
> > _either_, and instead set a READFUNCTION to stream data into libcurl.
> > 
> > This can interact weirdly with a stale POSTFIELDSIZE setting, because
> > curl will assume it should read only some set number of bytes from our
> > READFUNCTION. However, it has worked in practice because we also
> > manually set a "Transfer-Encoding: chunked" header, which libcurl uses
> > as a clue to set the POSTFIELDSIZE to -1 itself.
> > 
> > So everything works, but we're better off resetting the size manually
> > for a few reasons:
> > 
> >   - there was a regression in curl 8.7.0 where the chunked header
> >     detection didn't kick in, causing any large HTTP requests made by
> >     Git to fail. This has since been fixed (but not yet released). In
> >     the issue, curl folks recommended setting it explicitly to -1:
> > 
> >       https://github.com/curl/curl/issues/13229#issuecomment-2029826058
> > 
> >     and it indeed works around the regression. So even though it won't
> >     be strictly necessary after the fix there, this will help folks who
> >     end up using the affected libcurl versions.
> > 
> >   - it's consistent with what a new curl handle would look like. Since
> >     get_active_slot() may or may not return a used handle, this reduces
> >     the possibility of heisenbugs that only appear with certain request
> >     patterns.
> > 
> > Note that the recommendation in the curl issue is to actually drop the
> > manual Transfer-Encoding header. Modern libcurl will add the header
> > itself when streaming from a READFUNCTION. However, that code wasn't
> > added until 802aa5ae2 (HTTP: use chunked Transfer-Encoding for HTTP_POST
> > if size unknown, 2019-07-22), which is in curl 7.66.0. We claim to
> > support back to 7.19.5, so those older versions still need the manual
> > header.
> > 
> > Signed-off-by: Jeff King <peff@peff.net>
> > ---
> >  http.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/http.c b/http.c
> > index e73b136e58..3d80bd6116 100644
> > --- a/http.c
> > +++ b/http.c
> > @@ -1452,6 +1452,7 @@ struct active_request_slot *get_active_slot(void)
> >  	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
> >  	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
> >  	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> > +	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
> >  	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
> >  	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
> >  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
> 
> Can't we refactor this code to instead use `curl_easy_reset()`? That
> function already resets most of the data we want to reset and would also
> end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
> wouldn't the following be a more sensible fix?
> 
> diff --git a/http.c b/http.c
> index e73b136e58..e5f5bc23db 100644
> --- a/http.c
> +++ b/http.c
> @@ -1442,20 +1442,14 @@ struct active_request_slot *get_active_slot(void)
>  	slot->finished = NULL;
>  	slot->callback_data = NULL;
>  	slot->callback_func = NULL;
> +	curl_easy_reset(slot->curl);
>  	curl_easy_setopt(slot->curl, CURLOPT_COOKIEFILE, curl_cookie_file);
>  	if (curl_save_cookies)
>  		curl_easy_setopt(slot->curl, CURLOPT_COOKIEJAR, curl_cookie_file);
>  	curl_easy_setopt(slot->curl, CURLOPT_HTTPHEADER, pragma_header);
>  	curl_easy_setopt(slot->curl, CURLOPT_RESOLVE, host_resolutions);
>  	curl_easy_setopt(slot->curl, CURLOPT_ERRORBUFFER, curl_errorstr);
> -	curl_easy_setopt(slot->curl, CURLOPT_CUSTOMREQUEST, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
> -	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
> -	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
>  	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);
> -	curl_easy_setopt(slot->curl, CURLOPT_RANGE, NULL);
>  
>  	/*
>  	 * Default following to off unless "ALWAYS" is configured; this gives

Oh well, the answer is "no", or at least not as easily as this, as the
failing tests tell us. I guess it resets more data than we actually want
it to reset, but I didn't dig any deeper than that.

Patrick
Jeff King April 3, 2024, 8:18 p.m. UTC | #5
On Wed, Apr 03, 2024 at 08:34:37AM +0200, Patrick Steinhardt wrote:

> > Can't we refactor this code to instead use `curl_easy_reset()`? That
> > function already resets most of the data we want to reset and would also
> > end up setting `POSFIELDSIZE = -1` via `Curl_init_userdefined()`. So
> > wouldn't the following be a more sensible fix?
> [...]
> Oh well, the answer is "no", or at least not as easily as this, as the
> failing tests tell us. I guess it resets more data than we actually want
> it to reset, but I didn't dig any deeper than that.

Yeah. The curl setup is really in two parts:

  1. we make a handle in get_curl_handle(), which also sets up a bunch
     of options. We use that to make a single "curl_default" handle, and
     then when we want a new handle we curl_easy_duphandle() that

  2. when we want to make a request we call get_active_slot(), which
     will either return an already-used handle or duphandle() a new one.
     And then reset some options, but also do some more setup.

Your patch touches spot (2), so it's erasing all of the setup done in
(1). I don't think there's a way to say "go back to state when we called
duphandle(), but keep reusing connections, etc".

Possibly it could be solved by pushing all of the setup from (1) into
(2). Though that would probably require separating out some config
handling, etc, from the actual curl_easy_setopt() calls (we wouldn't
want to complain about invalid config for _every_ request, for example,
just once per program run).

This is all also a bit more complicated than it needs to be for
smart-http. The dumb-http code may want to have several handles going at
once to do individual object/pack downloads. Whereas for smart http, we
really are only working with one handle, and doing one request at a
time. I don't know if we'll ever drop dumb-http support, but there would
probably be a lot of cleanup possibilities if we did.

-Peff
diff mbox series

Patch

diff --git a/http.c b/http.c
index e73b136e58..3d80bd6116 100644
--- a/http.c
+++ b/http.c
@@ -1452,6 +1452,7 @@  struct active_request_slot *get_active_slot(void)
 	curl_easy_setopt(slot->curl, CURLOPT_READFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_WRITEFUNCTION, NULL);
 	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDS, NULL);
+	curl_easy_setopt(slot->curl, CURLOPT_POSTFIELDSIZE, -1L);
 	curl_easy_setopt(slot->curl, CURLOPT_UPLOAD, 0);
 	curl_easy_setopt(slot->curl, CURLOPT_HTTPGET, 1);
 	curl_easy_setopt(slot->curl, CURLOPT_FAILONERROR, 1);