[v9,2/3] http: read HTTP WWW-Authenticate response headers

Message ID	f3f28e508c1792cbc8f8d3bd56099c659743ed3e.1676496846.git.gitgitgadget@gmail.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <git-owner@vger.kernel.org> Message-Id: <f3f28e508c1792cbc8f8d3bd56099c659743ed3e.1676496846.git.gitgitgadget@gmail.com> In-Reply-To: <pull.1352.v9.git.1676496846.gitgitgadget@gmail.com> References: <pull.1352.v8.git.1675711789.gitgitgadget@gmail.com> <pull.1352.v9.git.1676496846.gitgitgadget@gmail.com> Date: Wed, 15 Feb 2023 21:34:05 +0000 Subject: [PATCH v9 2/3] http: read HTTP WWW-Authenticate response headers Fcc: Sent Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MIME-Version: 1.0 To: git@vger.kernel.org Cc: Derrick Stolee <derrickstolee@github.com>, Lessley Dennington <lessleydennington@gmail.com>, Matthew John Cheetham <mjcheetham@outlook.com>, M Hickford <mirth.hickford@gmail.com>, Jeff Hostetler <git@jeffhostetler.com>, Glen Choo <chooglen@google.com>, Victoria Dye <vdye@github.com>, =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason <avarab@gmail.com>, Jeff King <peff@peff.net>, Johannes Schindelin <Johannes.Schindelin@gmx.de>, Matthew John Cheetham <mjcheetham@outlook.com>, Matthew John Cheetham <mjcheetham@outlook.com> Precedence: bulk From: Matthew John Cheetham <mjcheetham@outlook.com>
Series	Enhance credential helper protocol to include auth headers \| expand [v9,0/3] Enhance credential helper protocol to include auth headers [v9,1/3] t5563: add tests for basic and anoymous HTTP access [v9,2/3] http: read HTTP WWW-Authenticate response headers [v9,3/3] credential: add WWW-Authenticate header to cred requests

Message ID

f3f28e508c1792cbc8f8d3bd56099c659743ed3e.1676496846.git.gitgitgadget@gmail.com (mailing list archive)

State

Superseded

Headers

Message-Id: 
 <f3f28e508c1792cbc8f8d3bd56099c659743ed3e.1676496846.git.gitgitgadget@gmail.com>
In-Reply-To: <pull.1352.v9.git.1676496846.gitgitgadget@gmail.com>
References: <pull.1352.v8.git.1675711789.gitgitgadget@gmail.com>
        <pull.1352.v9.git.1676496846.gitgitgadget@gmail.com>
Date: Wed, 15 Feb 2023 21:34:05 +0000
Subject: [PATCH v9 2/3] http: read HTTP WWW-Authenticate response headers
Fcc: Sent
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
To: git@vger.kernel.org
Cc: Derrick Stolee <derrickstolee@github.com>,
 Lessley Dennington <lessleydennington@gmail.com>,
 Matthew John Cheetham <mjcheetham@outlook.com>,
 M Hickford <mirth.hickford@gmail.com>,
 Jeff Hostetler <git@jeffhostetler.com>, Glen Choo <chooglen@google.com>,
 Victoria Dye <vdye@github.com>,
 =?utf-8?b?w4Z2YXIgQXJuZmrDtnLDsA==?= Bjarmason  <avarab@gmail.com>,
 Jeff King <peff@peff.net>, Johannes Schindelin <Johannes.Schindelin@gmx.de>,
 Matthew John Cheetham <mjcheetham@outlook.com>,
 Matthew John Cheetham <mjcheetham@outlook.com>
Precedence: bulk
From: Matthew John Cheetham <mjcheetham@outlook.com>

Series

Enhance credential helper protocol to include auth headers | expand

Commit Message

Matthew John Cheetham Feb. 15, 2023, 9:34 p.m. UTC

From: Matthew John Cheetham <mjcheetham@outlook.com>

Read and store the HTTP WWW-Authenticate response headers made for
a particular request.

This will allow us to pass important authentication challenge
information to credential helpers or others that would otherwise have
been lost.

According to RFC2616 Section 4.2 [1], header field names are not
case-sensitive meaning when collecting multiple values for the same
field name, we can just use the case of the first observed instance of
each field name and no normalisation is required.

libcurl only provides us with the ability to read all headers recieved
for a particular request, including any intermediate redirect requests
or proxies. The lines returned by libcurl include HTTP status lines
delinating any intermediate requests such as "HTTP/1.1 200". We use
these lines to reset the strvec of WWW-Authenticate header values as
we encounter them in order to only capture the final response headers.

The collection of all header values matching the WWW-Authenticate
header is complicated by the fact that it is legal for header fields to
be continued over multiple lines, but libcurl only gives us one line at
a time.

In the future [2] we may be able to leverage functions to read headers
from libcurl itself, but as of today we must do this ourselves.

[1] https://datatracker.ietf.org/doc/html/rfc2616#section-4.2
[2] https://daniel.haxx.se/blog/2022/03/22/a-headers-api-for-libcurl/

Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
---
 credential.c      |   1 +
 credential.h      |  15 ++++++
 git-compat-util.h |  23 +++++++++
 http.c            | 120 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 159 insertions(+)

Comments

Junio C Hamano Feb. 15, 2023, 11:26 p.m. UTC | #1

"Matthew John Cheetham via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> According to RFC2616 Section 4.2 [1], header field names are not
> case-sensitive meaning when collecting multiple values for the same
> field name, we can just use the case of the first observed instance of
> each field name and no normalisation is required.

If the names are not case-sensitive, you can choose to first
downcase the names you see, and use that consistently, and the
result would still be valid.  IOW, "not case-sensitive" does not at
all mean you have to use the first observed instance without
normalization.  You are allowed to choose such an implementation,
but "not case-sensitive" is not a justification to choose such an
implementation among possible implementation that would be allowed
under the rule.

> The collection of all header values matching the WWW-Authenticate
> header is complicated by the fact that it is legal for header fields to
> be continued over multiple lines, but libcurl only gives us one line at
> a time.

Saying "one physical line" at a time may make it clear what you are
pointing out as a weak point in the interface libcURL gives us (I
think you are getting at "if they handled header folding for us and
fed us one logical line at a time, it would have been nicer").

> @@ -22,6 +22,7 @@ void credential_clear(struct credential *c)
>  	free(c->username);
>  	free(c->password);
>  	string_list_clear(&c->helpers, 0);
> +	strvec_clear(&c->wwwauth_headers);
>  
>  	credential_init(c);
>  }
> diff --git a/credential.h b/credential.h
> index f430e77fea4..6f2e5bc610b 100644
> --- a/credential.h
> +++ b/credential.h
> @@ -2,6 +2,7 @@
>  #define CREDENTIAL_H
>  
>  #include "string-list.h"
> +#include "strvec.h"
>  
>  /**
>   * The credentials API provides an abstracted way of gathering username and
> @@ -115,6 +116,19 @@ struct credential {
>  	 */
>  	struct string_list helpers;
>  
> +	/**
> +	 * A `strvec` of WWW-Authenticate header values. Each string
> +	 * is the value of a WWW-Authenticate header in an HTTP response,
> +	 * in the order they were received in the response.
> +	 */
> +	struct strvec wwwauth_headers;
> +
> +	/**
> +	 * Internal use only. Used to keep track of split header fields

The technical term for what you call "split header" here seems to be
"line folding" (RFC 7230, which deprecates it).

> +	 * in order to fold multiple lines into one value.
> +	 */
> +	unsigned header_is_last_match:1;
> +
>  	unsigned approved:1,
>  		 configured:1,
>  		 quit:1,
> @@ -130,6 +144,7 @@ struct credential {
>  
>  #define CREDENTIAL_INIT { \
>  	.helpers = STRING_LIST_INIT_DUP, \
> +	.wwwauth_headers = STRVEC_INIT, \
>  }
>  
>  /* Initialize a credential structure, setting all fields to empty. */
> diff --git a/git-compat-util.h b/git-compat-util.h
> index a76d0526f79..a59230564e8 100644
> --- a/git-compat-util.h
> +++ b/git-compat-util.h
> @@ -1266,6 +1266,29 @@ static inline int skip_iprefix(const char *str, const char *prefix,
>  	return 0;
>  }
>  
> +/*
> + * Like skip_prefix_mem, but compare case-insensitively. Note that the
> + * comparison is done via tolower(), so it is strictly ASCII (no multi-byte
> + * characters or locale-specific conversions).
> + */
> +static inline int skip_iprefix_mem(const char *buf, size_t len,
> +				   const char *prefix,
> +				   const char **out, size_t *outlen)
> +{
> +	size_t prefix_len = strlen(prefix);
> +
> +	if (len < prefix_len)
> +		return 0;
> +
> +	if (!strncasecmp(buf, prefix, prefix_len)) {
> +		*out = buf + prefix_len;
> +		*outlen = len - prefix_len;
> +		return 1;
> +	}
> +
> +	return 0;
> +}

OK.

> diff --git a/http.c b/http.c
> index 8a5ba3f4776..7a56a3db5f7 100644
> --- a/http.c
> +++ b/http.c
> @@ -183,6 +183,124 @@ size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
>  	return nmemb;
>  }
>  
> +/*
> + * A folded header continuation line starts with at least one single whitespace
> + * character. It is not a continuation line if the line is *just* a newline.
> + * The RFC for HTTP states that CRLF is the header field line ending, but some
> + * servers may use LF only; we accept both.
> + */

Nice.

> +static inline int is_hdr_continuation(const char *ptr, const size_t size)
> +{
> +	/* totally empty line or normal header */
> +	if (!size || !isspace(*ptr))
> +		return 0;

obs-fold (RFC7230) begins the next line with SP or HTAB, but
isspace() allows not just SP and HT but also CR and LF.  So
this is a bit pessimistic but rejects what is not a folded
continuation line reliably.

> +	/* empty line with LF line ending */
> +	if (size == 1 && ptr[0] == '\n')
> +		return 0;

And this is a blank line after the headers, with LF (not conforming
but is OK).

> +	/* empty line with CRLF line ending */
> +	if (size == 2 && ptr[0] == '\r' && ptr[1] == '\n')
> +		return 0;

And this is another form of a blank line after the headers, with
CRLF.

> +	return 1;
> +}

After rejecting the above two "blank", it is a folded continuation
line.  OK.

I wonder if

	static inline int ... () {
	  	return (size && (*ptr == ' ' || *ptr == '\t'));
	}

sufficient and easier to grok, though.

> +static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
> +{
> +	size_t size = eltsize * nmemb;
> +	struct strvec *values = &http_auth.wwwauth_headers;
> +	struct strbuf buf = STRBUF_INIT;
> +	const char *val;
> +	size_t val_len;
> +
> +	/*
> +	 * Header lines may not come NULL-terminated from libcurl so we must
> +	 * limit all scans to the maximum length of the header line, or leverage
> +	 * strbufs for all operations.
> +	 *
> +	 * In addition, it is possible that header values can be split over
> +	 * multiple lines as per RFC 2616 (even though this has since been
> +	 * deprecated in RFC 7230). A continuation header field value is
> +	 * identified as starting with a space or horizontal tab.
> +	 *
> +	 * The formal definition of a header field as given in RFC 2616 is:
> +	 *
> +	 *   message-header = field-name ":" [ field-value ]
> +	 *   field-name     = token
> +	 *   field-value    = *( field-content | LWS )
> +	 *   field-content  = <the OCTETs making up the field-value
> +	 *                    and consisting of either *TEXT or combinations
> +	 *                    of token, separators, and quoted-string>
> +	 */
> +
> +	/* Start of a new WWW-Authenticate header */
> +	if (skip_iprefix_mem(ptr, size, "www-authenticate:", &val, &val_len)) {
> +		strbuf_add(&buf, val, val_len);
> +
> +		/*
> +		 * Strip the CRLF that should be present at the end of each
> +		 * field as well as any trailing or leading whitespace from the
> +		 * value.
> +		 */
> +		strbuf_trim(&buf);
> +
> +		strvec_push(values, buf.buf);
> +		http_auth.header_is_last_match = 1;
> +		goto exit;

OK.  We remember that we have seen the beginning of a header we are
interested in (so that we can append if it is a continuation we see
next).  Good.

> +	}
> +
> +	/*
> +	 * This line could be a continuation of the previously matched header
> +	 * field. If this is the case then we should append this value to the
> +	 * end of the previously consumed value.
> +	 */
> +	if (http_auth.header_is_last_match && is_hdr_continuation(ptr, size)) {
> +		/*
> +		 * Trim the CRLF and any leading or trailing from this line.
> +		 */
> +		strbuf_add(&buf, ptr, size);
> +		strbuf_trim(&buf);
> +
> +		/*
> +		 * At this point we should always have at least one existing
> +		 * value, even if it is empty. Do not bother appending the new
> +		 * value if this continuation header is itself empty.
> +		 */
> +		if (!values->nr) {
> +			BUG("should have at least one existing header value");

OK, we should set _is_last_match to true only after we recorded the
header that might see a continuation, so it would be a bug if we
didn't have anything there.  Good.

> +		} else if (buf.len) {
> +			char *prev = xstrdup(values->v[values->nr - 1]);
> +
> +			/* Join two non-empty values with a single space. */
> +			const char *const sp = *prev ? " " : "";
> +
> +			strvec_pop(values);
> +			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
> +			free(prev);
> +		}
> +
> +		goto exit;

Good that we are prepared to see a logical line split over more than
two lines (i.e. by not toggling _is_last_match off prematurely here).

> +	}
> +
> +	/* This is the start of a new header we don't care about */
> +	http_auth.header_is_last_match = 0;

Or what we just saw and ignored could be a continuation line of a
header we ignored.  The comment is slightly misleading.

Other than that, looking good.

Thanks.

Matthew John Cheetham Feb. 16, 2023, 10:29 p.m. UTC | #2

On 2023-02-15 15:26, Junio C Hamano wrote:

> "Matthew John Cheetham via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
> 
>> According to RFC2616 Section 4.2 [1], header field names are not
>> case-sensitive meaning when collecting multiple values for the same
>> field name, we can just use the case of the first observed instance of
>> each field name and no normalisation is required.
> 
> If the names are not case-sensitive, you can choose to first
> downcase the names you see, and use that consistently, and the
> result would still be valid.  IOW, "not case-sensitive" does not at
> all mean you have to use the first observed instance without
> normalization.  You are allowed to choose such an implementation,
> but "not case-sensitive" is not a justification to choose such an
> implementation among possible implementation that would be allowed
> under the rule.

Re-reading this paragraph, it doens't really need to even be here. This was
an artefact of a time when I was storing all headers, including keys and
values. Since we're only interested now in the WWW-Authenticate header
_values_, there's no need to call out this out. Will drop this paragraph.

>> The collection of all header values matching the WWW-Authenticate
>> header is complicated by the fact that it is legal for header fields to
>> be continued over multiple lines, but libcurl only gives us one line at
>> a time.
> 
> Saying "one physical line" at a time may make it clear what you are
> pointing out as a weak point in the interface libcURL gives us (I
> think you are getting at "if they handled header folding for us and
> fed us one logical line at a time, it would have been nicer").

Logical header fields vs physical header lines is useful and clearer
terminology - I will update the commit message to reflect in the next
iteration. Thanks!

>> @@ -22,6 +22,7 @@ void credential_clear(struct credential *c)
>>  	free(c->username);
>>  	free(c->password);
>>  	string_list_clear(&c->helpers, 0);
>> +	strvec_clear(&c->wwwauth_headers);
>>  
>>  	credential_init(c);
>>  }
>> diff --git a/credential.h b/credential.h
>> index f430e77fea4..6f2e5bc610b 100644
>> --- a/credential.h
>> +++ b/credential.h
>> @@ -2,6 +2,7 @@
>>  #define CREDENTIAL_H
>>  
>>  #include "string-list.h"
>> +#include "strvec.h"
>>  
>>  /**
>>   * The credentials API provides an abstracted way of gathering username and
>> @@ -115,6 +116,19 @@ struct credential {
>>  	 */
>>  	struct string_list helpers;
>>  
>> +	/**
>> +	 * A `strvec` of WWW-Authenticate header values. Each string
>> +	 * is the value of a WWW-Authenticate header in an HTTP response,
>> +	 * in the order they were received in the response.
>> +	 */
>> +	struct strvec wwwauth_headers;
>> +
>> +	/**
>> +	 * Internal use only. Used to keep track of split header fields
> 
> The technical term for what you call "split header" here seems to be
> "line folding" (RFC 7230, which deprecates it).
> 
>> +	 * in order to fold multiple lines into one value.
>> +	 */
>> +	unsigned header_is_last_match:1;
>> +
>>  	unsigned approved:1,
>>  		 configured:1,
>>  		 quit:1,
>> @@ -130,6 +144,7 @@ struct credential {
>>  
>>  #define CREDENTIAL_INIT { \
>>  	.helpers = STRING_LIST_INIT_DUP, \
>> +	.wwwauth_headers = STRVEC_INIT, \
>>  }
>>  
>>  /* Initialize a credential structure, setting all fields to empty. */
>> diff --git a/git-compat-util.h b/git-compat-util.h
>> index a76d0526f79..a59230564e8 100644
>> --- a/git-compat-util.h
>> +++ b/git-compat-util.h
>> @@ -1266,6 +1266,29 @@ static inline int skip_iprefix(const char *str, const char *prefix,
>>  	return 0;
>>  }
>>  
>> +/*
>> + * Like skip_prefix_mem, but compare case-insensitively. Note that the
>> + * comparison is done via tolower(), so it is strictly ASCII (no multi-byte
>> + * characters or locale-specific conversions).
>> + */
>> +static inline int skip_iprefix_mem(const char *buf, size_t len,
>> +				   const char *prefix,
>> +				   const char **out, size_t *outlen)
>> +{
>> +	size_t prefix_len = strlen(prefix);
>> +
>> +	if (len < prefix_len)
>> +		return 0;
>> +
>> +	if (!strncasecmp(buf, prefix, prefix_len)) {
>> +		*out = buf + prefix_len;
>> +		*outlen = len - prefix_len;
>> +		return 1;
>> +	}
>> +
>> +	return 0;
>> +}
> 
> OK.
> 
>> diff --git a/http.c b/http.c
>> index 8a5ba3f4776..7a56a3db5f7 100644
>> --- a/http.c
>> +++ b/http.c
>> @@ -183,6 +183,124 @@ size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
>>  	return nmemb;
>>  }
>>  
>> +/*
>> + * A folded header continuation line starts with at least one single whitespace
>> + * character. It is not a continuation line if the line is *just* a newline.
>> + * The RFC for HTTP states that CRLF is the header field line ending, but some
>> + * servers may use LF only; we accept both.
>> + */
> 
> Nice.
> 
>> +static inline int is_hdr_continuation(const char *ptr, const size_t size)
>> +{
>> +	/* totally empty line or normal header */
>> +	if (!size || !isspace(*ptr))
>> +		return 0;
> 
> obs-fold (RFC7230) begins the next line with SP or HTAB, but
> isspace() allows not just SP and HT but also CR and LF.  So
> this is a bit pessimistic but rejects what is not a folded
> continuation line reliably.
> 
>> +	/* empty line with LF line ending */
>> +	if (size == 1 && ptr[0] == '\n')
>> +		return 0;
> 
> And this is a blank line after the headers, with LF (not conforming
> but is OK).
> 
>> +	/* empty line with CRLF line ending */
>> +	if (size == 2 && ptr[0] == '\r' && ptr[1] == '\n')
>> +		return 0;
> 
> And this is another form of a blank line after the headers, with
> CRLF.
> 
>> +	return 1;
>> +}
> 
> After rejecting the above two "blank", it is a folded continuation
> line.  OK.
> 
> I wonder if
> 
> 	static inline int ... () {
> 	  	return (size && (*ptr == ' ' || *ptr == '\t'));
> 	}
> 
> sufficient and easier to grok, though.

You're correct. This implementation is 'more correct' and easier to grok.

>> +static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
>> +{
>> +	size_t size = eltsize * nmemb;
>> +	struct strvec *values = &http_auth.wwwauth_headers;
>> +	struct strbuf buf = STRBUF_INIT;
>> +	const char *val;
>> +	size_t val_len;
>> +
>> +	/*
>> +	 * Header lines may not come NULL-terminated from libcurl so we must
>> +	 * limit all scans to the maximum length of the header line, or leverage
>> +	 * strbufs for all operations.
>> +	 *
>> +	 * In addition, it is possible that header values can be split over
>> +	 * multiple lines as per RFC 2616 (even though this has since been
>> +	 * deprecated in RFC 7230). A continuation header field value is
>> +	 * identified as starting with a space or horizontal tab.
>> +	 *
>> +	 * The formal definition of a header field as given in RFC 2616 is:
>> +	 *
>> +	 *   message-header = field-name ":" [ field-value ]
>> +	 *   field-name     = token
>> +	 *   field-value    = *( field-content | LWS )
>> +	 *   field-content  = <the OCTETs making up the field-value
>> +	 *                    and consisting of either *TEXT or combinations
>> +	 *                    of token, separators, and quoted-string>
>> +	 */
>> +
>> +	/* Start of a new WWW-Authenticate header */
>> +	if (skip_iprefix_mem(ptr, size, "www-authenticate:", &val, &val_len)) {
>> +		strbuf_add(&buf, val, val_len);
>> +
>> +		/*
>> +		 * Strip the CRLF that should be present at the end of each
>> +		 * field as well as any trailing or leading whitespace from the
>> +		 * value.
>> +		 */
>> +		strbuf_trim(&buf);
>> +
>> +		strvec_push(values, buf.buf);
>> +		http_auth.header_is_last_match = 1;
>> +		goto exit;
> 
> OK.  We remember that we have seen the beginning of a header we are
> interested in (so that we can append if it is a continuation we see
> next).  Good.
> 
>> +	}
>> +
>> +	/*
>> +	 * This line could be a continuation of the previously matched header
>> +	 * field. If this is the case then we should append this value to the
>> +	 * end of the previously consumed value.
>> +	 */
>> +	if (http_auth.header_is_last_match && is_hdr_continuation(ptr, size)) {
>> +		/*
>> +		 * Trim the CRLF and any leading or trailing from this line.
>> +		 */
>> +		strbuf_add(&buf, ptr, size);
>> +		strbuf_trim(&buf);
>> +
>> +		/*
>> +		 * At this point we should always have at least one existing
>> +		 * value, even if it is empty. Do not bother appending the new
>> +		 * value if this continuation header is itself empty.
>> +		 */
>> +		if (!values->nr) {
>> +			BUG("should have at least one existing header value");
> 
> OK, we should set _is_last_match to true only after we recorded the
> header that might see a continuation, so it would be a bug if we
> didn't have anything there.  Good.
> 
>> +		} else if (buf.len) {
>> +			char *prev = xstrdup(values->v[values->nr - 1]);
>> +
>> +			/* Join two non-empty values with a single space. */
>> +			const char *const sp = *prev ? " " : "";
>> +
>> +			strvec_pop(values);
>> +			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
>> +			free(prev);
>> +		}
>> +
>> +		goto exit;
> 
> Good that we are prepared to see a logical line split over more than
> two lines (i.e. by not toggling _is_last_match off prematurely here).
> 
>> +	}
>> +
>> +	/* This is the start of a new header we don't care about */
>> +	http_auth.header_is_last_match = 0;
> 
> Or what we just saw and ignored could be a continuation line of a
> header we ignored.  The comment is slightly misleading.

I'll try and reword this to make it more accurate - we have determined
this line is not a continuation of the previous WWW-Authenticate header.

> Other than that, looking good.
> 
> Thanks.

diff --git a/credential.c b/credential.c
index f6389a50684..897b4679333 100644
--- a/credential.c
+++ b/credential.c
@@ -22,6 +22,7 @@  void credential_clear(struct credential *c)
 	free(c->username);
 	free(c->password);
 	string_list_clear(&c->helpers, 0);
+	strvec_clear(&c->wwwauth_headers);
 
 	credential_init(c);
 }
diff --git a/credential.h b/credential.h
index f430e77fea4..6f2e5bc610b 100644
--- a/credential.h
+++ b/credential.h
@@ -2,6 +2,7 @@ 
 #define CREDENTIAL_H
 
 #include "string-list.h"
+#include "strvec.h"
 
 /**
  * The credentials API provides an abstracted way of gathering username and
@@ -115,6 +116,19 @@  struct credential {
 	 */
 	struct string_list helpers;
 
+	/**
+	 * A `strvec` of WWW-Authenticate header values. Each string
+	 * is the value of a WWW-Authenticate header in an HTTP response,
+	 * in the order they were received in the response.
+	 */
+	struct strvec wwwauth_headers;
+
+	/**
+	 * Internal use only. Used to keep track of split header fields
+	 * in order to fold multiple lines into one value.
+	 */
+	unsigned header_is_last_match:1;
+
 	unsigned approved:1,
 		 configured:1,
 		 quit:1,
@@ -130,6 +144,7 @@  struct credential {
 
 #define CREDENTIAL_INIT { \
 	.helpers = STRING_LIST_INIT_DUP, \
+	.wwwauth_headers = STRVEC_INIT, \
 }
 
 /* Initialize a credential structure, setting all fields to empty. */
diff --git a/git-compat-util.h b/git-compat-util.h
index a76d0526f79..a59230564e8 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1266,6 +1266,29 @@  static inline int skip_iprefix(const char *str, const char *prefix,
 	return 0;
 }
 
+/*
+ * Like skip_prefix_mem, but compare case-insensitively. Note that the
+ * comparison is done via tolower(), so it is strictly ASCII (no multi-byte
+ * characters or locale-specific conversions).
+ */
+static inline int skip_iprefix_mem(const char *buf, size_t len,
+				   const char *prefix,
+				   const char **out, size_t *outlen)
+{
+	size_t prefix_len = strlen(prefix);
+
+	if (len < prefix_len)
+		return 0;
+
+	if (!strncasecmp(buf, prefix, prefix_len)) {
+		*out = buf + prefix_len;
+		*outlen = len - prefix_len;
+		return 1;
+	}
+
+	return 0;
+}
+
 static inline int strtoul_ui(char const *s, int base, unsigned int *result)
 {
 	unsigned long ul;
diff --git a/http.c b/http.c
index 8a5ba3f4776..7a56a3db5f7 100644
--- a/http.c
+++ b/http.c
@@ -183,6 +183,124 @@  size_t fwrite_buffer(char *ptr, size_t eltsize, size_t nmemb, void *buffer_)
 	return nmemb;
 }
 
+/*
+ * A folded header continuation line starts with at least one single whitespace
+ * character. It is not a continuation line if the line is *just* a newline.
+ * The RFC for HTTP states that CRLF is the header field line ending, but some
+ * servers may use LF only; we accept both.
+ */
+static inline int is_hdr_continuation(const char *ptr, const size_t size)
+{
+	/* totally empty line or normal header */
+	if (!size || !isspace(*ptr))
+		return 0;
+
+	/* empty line with LF line ending */
+	if (size == 1 && ptr[0] == '\n')
+		return 0;
+
+	/* empty line with CRLF line ending */
+	if (size == 2 && ptr[0] == '\r' && ptr[1] == '\n')
+		return 0;
+
+	return 1;
+}
+
+static size_t fwrite_wwwauth(char *ptr, size_t eltsize, size_t nmemb, void *p)
+{
+	size_t size = eltsize * nmemb;
+	struct strvec *values = &http_auth.wwwauth_headers;
+	struct strbuf buf = STRBUF_INIT;
+	const char *val;
+	size_t val_len;
+
+	/*
+	 * Header lines may not come NULL-terminated from libcurl so we must
+	 * limit all scans to the maximum length of the header line, or leverage
+	 * strbufs for all operations.
+	 *
+	 * In addition, it is possible that header values can be split over
+	 * multiple lines as per RFC 2616 (even though this has since been
+	 * deprecated in RFC 7230). A continuation header field value is
+	 * identified as starting with a space or horizontal tab.
+	 *
+	 * The formal definition of a header field as given in RFC 2616 is:
+	 *
+	 *   message-header = field-name ":" [ field-value ]
+	 *   field-name     = token
+	 *   field-value    = *( field-content | LWS )
+	 *   field-content  = <the OCTETs making up the field-value
+	 *                    and consisting of either *TEXT or combinations
+	 *                    of token, separators, and quoted-string>
+	 */
+
+	/* Start of a new WWW-Authenticate header */
+	if (skip_iprefix_mem(ptr, size, "www-authenticate:", &val, &val_len)) {
+		strbuf_add(&buf, val, val_len);
+
+		/*
+		 * Strip the CRLF that should be present at the end of each
+		 * field as well as any trailing or leading whitespace from the
+		 * value.
+		 */
+		strbuf_trim(&buf);
+
+		strvec_push(values, buf.buf);
+		http_auth.header_is_last_match = 1;
+		goto exit;
+	}
+
+	/*
+	 * This line could be a continuation of the previously matched header
+	 * field. If this is the case then we should append this value to the
+	 * end of the previously consumed value.
+	 */
+	if (http_auth.header_is_last_match && is_hdr_continuation(ptr, size)) {
+		/*
+		 * Trim the CRLF and any leading or trailing from this line.
+		 */
+		strbuf_add(&buf, ptr, size);
+		strbuf_trim(&buf);
+
+		/*
+		 * At this point we should always have at least one existing
+		 * value, even if it is empty. Do not bother appending the new
+		 * value if this continuation header is itself empty.
+		 */
+		if (!values->nr) {
+			BUG("should have at least one existing header value");
+		} else if (buf.len) {
+			char *prev = xstrdup(values->v[values->nr - 1]);
+
+			/* Join two non-empty values with a single space. */
+			const char *const sp = *prev ? " " : "";
+
+			strvec_pop(values);
+			strvec_pushf(values, "%s%s%s", prev, sp, buf.buf);
+			free(prev);
+		}
+
+		goto exit;
+	}
+
+	/* This is the start of a new header we don't care about */
+	http_auth.header_is_last_match = 0;
+
+	/*
+	 * If this is a HTTP status line and not a header field, this signals
+	 * a different HTTP response. libcurl writes all the output of all
+	 * response headers of all responses, including redirects.
+	 * We only care about the last HTTP request response's headers so clear
+	 * the existing array.
+	 */
+	if (!strncasecmp(ptr, "http/", 5))
+		strvec_clear(values);
+
+exit:
+	strbuf_release(&buf);
+	return size;
+}
+
 size_t fwrite_null(char *ptr, size_t eltsize, size_t nmemb, void *strbuf)
 {
 	return nmemb;
@@ -1864,6 +1982,8 @@  static int http_request(const char *url,
 					 fwrite_buffer);
 	}
 
+	curl_easy_setopt(slot->curl, CURLOPT_HEADERFUNCTION, fwrite_wwwauth);
+
 	accept_language = http_get_accept_language_header();
 
 	if (accept_language)

[v9,2/3] http: read HTTP WWW-Authenticate response headers

Commit Message

Comments

Patch