diff mbox series

[v7,11/12] http: read HTTP WWW-Authenticate response headers

Message ID 5f5e46038cf526714f3c5b89ffef2b895b503242.1674252531.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Commit d3ffed27a11ed83135c70a22614d697e966a6967
Headers show
Series Enhance credential helper protocol to include auth headers | expand

Commit Message

Matthew John Cheetham Jan. 20, 2023, 10:08 p.m. UTC
From: Matthew John Cheetham <mjcheetham@outlook.com>

Read and store the HTTP WWW-Authenticate response headers made for
a particular request.

This will allow us to pass important authentication challenge
information to credential helpers or others that would otherwise have
been lost.

According to RFC2616 Section 4.2 [1], header field names are not
case-sensitive meaning when collecting multiple values for the same
field name, we can just use the case of the first observed instance of
each field name and no normalisation is required.

libcurl only provides us with the ability to read all headers recieved
for a particular request, including any intermediate redirect requests
or proxies. The lines returned by libcurl include HTTP status lines
delinating any intermediate requests such as "HTTP/1.1 200". We use
these lines to reset the strvec of WWW-Authenticate header values as
we encounter them in order to only capture the final response headers.

The collection of all header values matching the WWW-Authenticate
header is complicated by the fact that it is legal for header fields to
be continued over multiple lines, but libcurl only gives us one line at
a time.

In the future [2] we may be able to leverage functions to read headers
from libcurl itself, but as of today we must do this ourselves.

[1] https://datatracker.ietf.org/doc/html/rfc2616#section-4.2
[2] https://daniel.haxx.se/blog/2022/03/22/a-headers-api-for-libcurl/

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 credential.c |  1 +
 credential.h | 15 +++++++++
 http.c       | 94 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 110 insertions(+)

Comments

Jeff King Jan. 26, 2023, 10:31 a.m. UTC | #1
On Fri, Jan 20, 2023 at 10:08:49PM +0000, Matthew John Cheetham via GitGitGadget wrote:

> From: Matthew John Cheetham <mjcheetham@outlook.com>
> 
> Read and store the HTTP WWW-Authenticate response headers made for
> a particular request.
> 
> This will allow us to pass important authentication challenge
> information to credential helpers or others that would otherwise have
> been lost.

Makes sense, and the code looks pretty reasonable overall.

A few observations:

> @@ -115,6 +116,19 @@ struct credential {
>  	 */
>  	struct string_list helpers;
>  
> +	/**
> +	 * A `strvec` of WWW-Authenticate header values. Each string
> +	 * is the value of a WWW-Authenticate header in an HTTP response,
> +	 * in the order they were received in the response.
> +	 */
> +	struct strvec wwwauth_headers;
> +
> +	/**
> +	 * Internal use only. Used to keep track of split header fields
> +	 * in order to fold multiple lines into one value.
> +	 */
> +	unsigned header_is_last_match:1;
> +

Stuffing this into a "struct credential" feels a little weird, just
because it's specific to http parsing (especially this internal flag).
And the credential code is seeing full header lines, not broken down at
all.

I guess I would have expected some level of abstraction here between the
credential subsystem and the http subsystem, where the latter is parsing
and then sticking opaque data into the credential to ferry to the
helpers.

But it probably isn't that big a deal either way. Even though there are
non-http credentials, it's not too unreasonable for the credential
system to be aware of http specifically.

> +static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
> +{
> +	size_t size = st_mult(eltsize, nmemb);

Here's that st_mult() again. Same comment as the previous patch. :)

> +	/*
> +	 * Header lines may not come NULL-terminated from libcurl so we must
> +	 * limit all scans to the maximum length of the header line, or leverage
> +	 * strbufs for all operations.
> +	 *
> +	 * In addition, it is possible that header values can be split over
> +	 * multiple lines as per RFC 2616 (even though this has since been
> +	 * deprecated in RFC 7230). A continuation header field value is
> +	 * identified as starting with a space or horizontal tab.
> +	 *
> +	 * The formal definition of a header field as given in RFC 2616 is:
> +	 *
> +	 *   message-header = field-name ":" [ field-value ]
> +	 *   field-name     = token
> +	 *   field-value    = *( field-content | LWS )
> +	 *   field-content  = <the OCTETs making up the field-value
> +	 *                    and consisting of either *TEXT or combinations
> +	 *                    of token, separators, and quoted-string>
> +	 */
> +
> +	strbuf_add(&buf, ptr, size);

OK, so we just copy the buffer. I don't think it would be too hard to
handle the buffer as-is, but this does make things a bit easier.  Given
that we're going to immediately throw away the copy for anything except
www-authenticate, we could perhaps wait until we've matched it.  That
does mean trimming the CRLF ourselves and using skip_prefix_mem() to
match the start (you'd want skip_iprefix_mem(), of course, but it
doesn't yet exist; I'll leave that as an exercise).

Maybe not worth it to save a few allocations, as an http request is
already pretty heavyweight. Mostly I flagged it because this is going to
run for every header of every request, even though most requests won't
trigger it at all.

> +	/* Strip the CRLF that should be present at the end of each field */
> +	strbuf_trim_trailing_newline(&buf);
> +
> +	/* Start of a new WWW-Authenticate header */
> +	if (skip_iprefix(buf.buf, "www-authenticate:", &val)) {
> +		while (isspace(*val))
> +			val++;
> +
> +		strvec_push(values, val);
> +		http_auth.header_is_last_match = 1;
> +		goto exit;
> +	}

OK, this looks correct from my knowledge of the RFCs. I saw something
about isspace() matching newlines, etc, in an earlier thread, but I
think we'd never see a newline here, as we're expecting curl to be
splitting on our behalf.

> +	/*
> +	 * This line could be a continuation of the previously matched header
> +	 * field. If this is the case then we should append this value to the
> +	 * end of the previously consumed value.
> +	 * Continuation lines start with at least one whitespace, maybe more,
> +	 * so we should collapse these down to a single SP (valid per the spec).
> +	 */
> +	if (http_auth.header_is_last_match && isspace(*buf.buf)) {
> +		/* Trim leading whitespace from this continuation hdr line. */
> +		strbuf_ltrim(&buf);

OK, makes sense. This will memmove(), which is needlessly inefficient
(we could just advance a pointer), but probably not a big deal in
practice. Using the strbuf functions is a nice simplification.

> +		/*
> +		 * At this point we should always have at least one existing
> +		 * value, even if it is empty. Do not bother appending the new
> +		 * value if this continuation header is itself empty.
> +		 */
> +		if (!values->nr) {
> +			BUG("should have at least one existing header value");
> +		} else if (buf.len) {
> +			char *prev = xstrdup(values->v[values->nr - 1]);
> +
> +			/* Join two non-empty values with a single space. */
> +			const char *const sp = *prev ? " " : "";
> +
> +			strvec_pop(values);
> +			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
> +			free(prev);
> +		}

Likewise here we end up with an extra allocation of "prev", just because
we can't pop/push in the right order. But that's probably OK in
practice, as this is triggering only for the header we care about.

The concatenation itself makes the whole thing quadratic, but unless we
are worried about a malicious server DoS-ing us with a billion
www-authenticate continuations, I think we can disregard that.

-Peff
Matthew John Cheetham Feb. 6, 2023, 7:25 p.m. UTC | #2
On 2023-01-26 02:31, Jeff King wrote:

> On Fri, Jan 20, 2023 at 10:08:49PM +0000, Matthew John Cheetham via GitGitGadget wrote:
> 
>> From: Matthew John Cheetham <mjcheetham@outlook.com>
>>
>> Read and store the HTTP WWW-Authenticate response headers made for
>> a particular request.
>>
>> This will allow us to pass important authentication challenge
>> information to credential helpers or others that would otherwise have
>> been lost.
> 
> Makes sense, and the code looks pretty reasonable overall.
> 
> A few observations:
> 
>> @@ -115,6 +116,19 @@ struct credential {
>>  	 */
>>  	struct string_list helpers;
>>  
>> +	/**
>> +	 * A `strvec` of WWW-Authenticate header values. Each string
>> +	 * is the value of a WWW-Authenticate header in an HTTP response,
>> +	 * in the order they were received in the response.
>> +	 */
>> +	struct strvec wwwauth_headers;
>> +
>> +	/**
>> +	 * Internal use only. Used to keep track of split header fields
>> +	 * in order to fold multiple lines into one value.
>> +	 */
>> +	unsigned header_is_last_match:1;
>> +
> 
> Stuffing this into a "struct credential" feels a little weird, just
> because it's specific to http parsing (especially this internal flag).
> And the credential code is seeing full header lines, not broken down at
> all.
> 
> I guess I would have expected some level of abstraction here between the
> credential subsystem and the http subsystem, where the latter is parsing
> and then sticking opaque data into the credential to ferry to the
> helpers.
> 
> But it probably isn't that big a deal either way. Even though there are
> non-http credentials, it's not too unreasonable for the credential
> system to be aware of http specifically.

I had considered possibly introducing an opaque property-bag style of
'protocol-specific properties' that, for example, http.c would add the
WWW-Authenticate headers to as something like `http.wwwauth[]`.
Other protocols (like smtp:// or cert://) could add their own properties
if they needed or wanted to also.

Thoughts?

>> +static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
>> +{
>> +	size_t size = st_mult(eltsize, nmemb);
> 
> Here's that st_mult() again. Same comment as the previous patch. :)

Yeah I'm gonna drop this. Your arguments make sense; it's not going to be a
problem in reality :-)

>> +	/*
>> +	 * Header lines may not come NULL-terminated from libcurl so we must
>> +	 * limit all scans to the maximum length of the header line, or leverage
>> +	 * strbufs for all operations.
>> +	 *
>> +	 * In addition, it is possible that header values can be split over
>> +	 * multiple lines as per RFC 2616 (even though this has since been
>> +	 * deprecated in RFC 7230). A continuation header field value is
>> +	 * identified as starting with a space or horizontal tab.
>> +	 *
>> +	 * The formal definition of a header field as given in RFC 2616 is:
>> +	 *
>> +	 *   message-header = field-name ":" [ field-value ]
>> +	 *   field-name     = token
>> +	 *   field-value    = *( field-content | LWS )
>> +	 *   field-content  = <the OCTETs making up the field-value
>> +	 *                    and consisting of either *TEXT or combinations
>> +	 *                    of token, separators, and quoted-string>
>> +	 */
>> +
>> +	strbuf_add(&buf, ptr, size);
> 
> OK, so we just copy the buffer. I don't think it would be too hard to
> handle the buffer as-is, but this does make things a bit easier.  Given
> that we're going to immediately throw away the copy for anything except
> www-authenticate, we could perhaps wait until we've matched it.  That
> does mean trimming the CRLF ourselves and using skip_prefix_mem() to
> match the start (you'd want skip_iprefix_mem(), of course, but it
> doesn't yet exist; I'll leave that as an exercise).

Fair point! I can replace most of these with operations over the curl ptr.

> Maybe not worth it to save a few allocations, as an http request is
> already pretty heavyweight. Mostly I flagged it because this is going to
> run for every header of every request, even though most requests won't
> trigger it at all.
> 
>> +	/* Strip the CRLF that should be present at the end of each field */
>> +	strbuf_trim_trailing_newline(&buf);
>> +
>> +	/* Start of a new WWW-Authenticate header */
>> +	if (skip_iprefix(buf.buf, "www-authenticate:", &val)) {
>> +		while (isspace(*val))
>> +			val++;
>> +
>> +		strvec_push(values, val);
>> +		http_auth.header_is_last_match = 1;
>> +		goto exit;
>> +	}
> 
> OK, this looks correct from my knowledge of the RFCs. I saw something
> about isspace() matching newlines, etc, in an earlier thread, but I
> think we'd never see a newline here, as we're expecting curl to be
> splitting on our behalf.
> 
>> +	/*
>> +	 * This line could be a continuation of the previously matched header
>> +	 * field. If this is the case then we should append this value to the
>> +	 * end of the previously consumed value.
>> +	 * Continuation lines start with at least one whitespace, maybe more,
>> +	 * so we should collapse these down to a single SP (valid per the spec).
>> +	 */
>> +	if (http_auth.header_is_last_match && isspace(*buf.buf)) {
>> +		/* Trim leading whitespace from this continuation hdr line. */
>> +		strbuf_ltrim(&buf);
> 
> OK, makes sense. This will memmove(), which is needlessly inefficient
> (we could just advance a pointer), but probably not a big deal in
> practice. Using the strbuf functions is a nice simplification.
> 
>> +		/*
>> +		 * At this point we should always have at least one existing
>> +		 * value, even if it is empty. Do not bother appending the new
>> +		 * value if this continuation header is itself empty.
>> +		 */
>> +		if (!values->nr) {
>> +			BUG("should have at least one existing header value");
>> +		} else if (buf.len) {
>> +			char *prev = xstrdup(values->v[values->nr - 1]);
>> +
>> +			/* Join two non-empty values with a single space. */
>> +			const char *const sp = *prev ? " " : "";
>> +
>> +			strvec_pop(values);
>> +			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
>> +			free(prev);
>> +		}
> 
> Likewise here we end up with an extra allocation of "prev", just because
> we can't pop/push in the right order. But that's probably OK in
> practice, as this is triggering only for the header we care about.
> 
> The concatenation itself makes the whole thing quadratic, but unless we
> are worried about a malicious server DoS-ing us with a billion
> www-authenticate continuations, I think we can disregard that.
> 
> -Peff
Jeff King Feb. 9, 2023, 1:12 p.m. UTC | #3
On Mon, Feb 06, 2023 at 11:25:49AM -0800, Matthew John Cheetham wrote:

> > I guess I would have expected some level of abstraction here between the
> > credential subsystem and the http subsystem, where the latter is parsing
> > and then sticking opaque data into the credential to ferry to the
> > helpers.
> > 
> > But it probably isn't that big a deal either way. Even though there are
> > non-http credentials, it's not too unreasonable for the credential
> > system to be aware of http specifically.
> 
> I had considered possibly introducing an opaque property-bag style of
> 'protocol-specific properties' that, for example, http.c would add the
> WWW-Authenticate headers to as something like `http.wwwauth[]`.
> Other protocols (like smtp:// or cert://) could add their own properties
> if they needed or wanted to also.
> 
> Thoughts?

At the protocol level, I don't see much point. wwwauth sufficiently
implies "http", and any helper is free to ignore or respect keys as
appropriate to what it can handle. A flat namespace is fine.

Here I was more talking about the internal implementation. Mostly it was
just funky that this internal http state flag was stuck into the
credential struct. I think it could be removed with some minor pain, but
it's probably not too big a deal (the pain at all is only because we are
having to bring this state across multiple curl callbacks).

So let's go with it for now.

-Peff
diff mbox series

Patch

diff --git a/credential.c b/credential.c
index f6389a50684..897b4679333 100644
--- a/credential.c
+++ b/credential.c
@@ -22,6 +22,7 @@  void credential_clear(struct credential *c)
 	free(c->username);
 	free(c->password);
 	string_list_clear(&c->helpers, 0);
+	strvec_clear(&c->wwwauth_headers);
 
 	credential_init(c);
 }
diff --git a/credential.h b/credential.h
index f430e77fea4..6f2e5bc610b 100644
--- a/credential.h
+++ b/credential.h
@@ -2,6 +2,7 @@ 
 #define CREDENTIAL_H
 
 #include "string-list.h"
+#include "strvec.h"
 
 /**
  * The credentials API provides an abstracted way of gathering username and
@@ -115,6 +116,19 @@  struct credential {
 	 */
 	struct string_list helpers;
 
+	/**
+	 * A `strvec` of WWW-Authenticate header values. Each string
+	 * is the value of a WWW-Authenticate header in an HTTP response,
+	 * in the order they were received in the response.
+	 */
+	struct strvec wwwauth_headers;
+
+	/**
+	 * Internal use only. Used to keep track of split header fields
+	 * in order to fold multiple lines into one value.
+	 */
+	unsigned header_is_last_match:1;
+
 	unsigned approved:1,
 		 configured:1,
 		 quit:1,
@@ -130,6 +144,7 @@  struct credential {
 
 #define CREDENTIAL_INIT { \
 	.helpers = STRING_LIST_INIT_DUP, \
+	.wwwauth_headers = STRVEC_INIT, \
 }
 
 /* Initialize a credential structure, setting all fields to empty. */
diff --git a/http.c b/http.c
index a2a80318bb2..595c93bc7a3 100644
--- a/http.c
+++ b/http.c
@@ -183,6 +183,98 @@  size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
 	return nmemb;
 }
 
+static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
+{
+	size_t size = st_mult(eltsize, nmemb);
+	struct strvec *values = &http_auth.wwwauth_headers;
+	struct strbuf buf = STRBUF_INIT;
+	const char *val;
+
+	/*
+	 * Header lines may not come NULL-terminated from libcurl so we must
+	 * limit all scans to the maximum length of the header line, or leverage
+	 * strbufs for all operations.
+	 *
+	 * In addition, it is possible that header values can be split over
+	 * multiple lines as per RFC 2616 (even though this has since been
+	 * deprecated in RFC 7230). A continuation header field value is
+	 * identified as starting with a space or horizontal tab.
+	 *
+	 * The formal definition of a header field as given in RFC 2616 is:
+	 *
+	 *   message-header = field-name ":" [ field-value ]
+	 *   field-name     = token
+	 *   field-value    = *( field-content | LWS )
+	 *   field-content  = <the OCTETs making up the field-value
+	 *                    and consisting of either *TEXT or combinations
+	 *                    of token, separators, and quoted-string>
+	 */
+
+	strbuf_add(&buf, ptr, size);
+
+	/* Strip the CRLF that should be present at the end of each field */
+	strbuf_trim_trailing_newline(&buf);
+
+	/* Start of a new WWW-Authenticate header */
+	if (skip_iprefix(buf.buf, "www-authenticate:", &val)) {
+		while (isspace(*val))
+			val++;
+
+		strvec_push(values, val);
+		http_auth.header_is_last_match = 1;
+		goto exit;
+	}
+
+	/*
+	 * This line could be a continuation of the previously matched header
+	 * field. If this is the case then we should append this value to the
+	 * end of the previously consumed value.
+	 * Continuation lines start with at least one whitespace, maybe more,
+	 * so we should collapse these down to a single SP (valid per the spec).
+	 */
+	if (http_auth.header_is_last_match && isspace(*buf.buf)) {
+		/* Trim leading whitespace from this continuation hdr line. */
+		strbuf_ltrim(&buf);
+
+		/*
+		 * At this point we should always have at least one existing
+		 * value, even if it is empty. Do not bother appending the new
+		 * value if this continuation header is itself empty.
+		 */
+		if (!values->nr) {
+			BUG("should have at least one existing header value");
+		} else if (buf.len) {
+			char *prev = xstrdup(values->v[values->nr - 1]);
+
+			/* Join two non-empty values with a single space. */
+			const char *const sp = *prev ? " " : "";
+
+			strvec_pop(values);
+			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
+			free(prev);
+		}
+
+		goto exit;
+	}
+
+	/* This is the start of a new header we don't care about */
+	http_auth.header_is_last_match = 0;
+
+	/*
+	 * If this is a HTTP status line and not a header field, this signals
+	 * a different HTTP response. libcurl writes all the output of all
+	 * response headers of all responses, including redirects.
+	 * We only care about the last HTTP request response's headers so clear
+	 * the existing array.
+	 */
+	if (istarts_with(buf.buf, "http/"))
+		strvec_clear(values);
+
+exit:
+	strbuf_release(&buf);
+	return size;
+}
+
 size_t fwrite_null(char *ptr, size_t eltsize, size_t nmemb, void *strbuf)
 {
 	return nmemb;
@@ -1864,6 +1956,8 @@  static int http_request(const char *url,
 					 fwrite_buffer);
 	}
 
+	curl_easy_setopt(slot->curl, CURLOPT_HEADERFUNCTION, fwrite_wwwauth);
+
 	accept_language = http_get_accept_language_header();
 
 	if (accept_language)