diff mbox series

[2/3] credential: teach `credential_from_url()` a non-strict mode

Message ID 1081841b16de31693473e72ff817bed5f0064dda.1587588665.git.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series credential: handle partial URLs in config settings again | expand

Commit Message

Linus Arver via GitGitGadget April 22, 2020, 8:51 p.m. UTC
From: Johannes Schindelin <johannes.schindelin@gmx.de>

Prior to the fixes for CVE-2020-11008, we were _very_ lenient in what we
required from a URL in order to parse it into a `struct credential`.
That led to serious vulnerabilities.

There was one call site, though, that really needed that leniency: when
parsing config settings a la `credential.dev.azure.com.useHTTPPath`.

In preparation for fixing that regression, let's add a parameter called
`strict` to the `credential_from_url()` function and convert the
existing callers to enforce that strict mode.

Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
---
 credential.c | 14 ++++++++------
 credential.h |  6 +++++-
 fsck.c       |  2 +-
 3 files changed, 14 insertions(+), 8 deletions(-)

Comments

Junio C Hamano April 22, 2020, 10:29 p.m. UTC | #1
"Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
writes:

> -	if (!proto_end || proto_end == url) {
> +	if (strict && (!proto_end || proto_end == url)) {
>  		if (!quiet)
>  			warning(_("url has no scheme: %s"), url);
>  		return -1;
>  	}
> -	cp = proto_end + 3;
> +	cp = proto_end ? proto_end + 3 : url;
>  	at = strchr(cp, '@');
>  	colon = strchr(cp, ':');
>  	slash = strchrnul(cp, '/');
> @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
>  		host = at + 1;
>  	}
>  
> -	c->protocol = xmemdupz(url, proto_end - url);
> -	c->host = url_decode_mem(host, slash - host);
> +	if (proto_end && proto_end - url > 0)
> +		c->protocol = xmemdupz(url, proto_end - url);

Missing "proto://" under non-strict mode would leave c->protocol
NULL (not "") here, as described in [0/3].

Here, slash would be pointing at one of "/?#" at the end of the host
and url would be pointing at...?  E.g. for "http:///path", URL
points at 'h' at the beginning, proto_end points at ':', cp points
at the last '/' before "path" and slash is the same as cp.  host
points at cp as there is no '@' at sign.

> +	if (slash - url > 0)
> +		c->host = url_decode_mem(host, slash - host);

This wants to make c->host NULL when host is missing, as described
in [0/3].

Shouldn't the condition based on "slash - host", though?

Other than that, it looks sensible.
Johannes Schindelin April 22, 2020, 10:50 p.m. UTC | #2
Hi Junio,

On Wed, 22 Apr 2020, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <gitgitgadget@gmail.com>
> writes:
>
> > -	if (!proto_end || proto_end == url) {
> > +	if (strict && (!proto_end || proto_end == url)) {
> >  		if (!quiet)
> >  			warning(_("url has no scheme: %s"), url);
> >  		return -1;
> >  	}
> > -	cp = proto_end + 3;
> > +	cp = proto_end ? proto_end + 3 : url;
> >  	at = strchr(cp, '@');
> >  	colon = strchr(cp, ':');
> >  	slash = strchrnul(cp, '/');
> > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> >  		host = at + 1;
> >  	}
> >
> > -	c->protocol = xmemdupz(url, proto_end - url);
> > -	c->host = url_decode_mem(host, slash - host);
> > +	if (proto_end && proto_end - url > 0)
> > +		c->protocol = xmemdupz(url, proto_end - url);
>
> Missing "proto://" under non-strict mode would leave c->protocol
> NULL (not "") here, as described in [0/3].

Right. I debated whether to set it to `NULL` explicitly, or whether to
leave it alone. At the end, I settled with the version in v2.17.4.

> Here, slash would be pointing at one of "/?#" at the end of the host
> and url would be pointing at...?  E.g. for "http:///path", URL
> points at 'h' at the beginning, proto_end points at ':', cp points
> at the last '/' before "path" and slash is the same as cp.  host
> points at cp as there is no '@' at sign.

It is not obvious from the diff: `url` is never changed. It would still
point at the first `h`, as you said.

> > +	if (slash - url > 0)
> > +		c->host = url_decode_mem(host, slash - host);
>
> This wants to make c->host NULL when host is missing, as described
> in [0/3].

I did not describe well enough in 0/3, my apologies.

Again, I tried to go with the version from v2.17.4 here, i.e. to set
`c->host` if we get a value, but not set it to `NULL` otherwise (instead
leave as-is).

> Shouldn't the condition based on "slash - host", though?

That would have been my preferred solution, too. But there is some
subtlety at work: for a `url` like `/this/is/my/path`, we want to leave
`c->host` as-is, but for `cert:///...` we want to set it to "" (t0300.23
and t7416.12 test for this explicitly).

I guess a better condition would be `if (proto_end || slash - host > 0)`.
What do you think?

> Other than that, it looks sensible.

Thanks,
Dscho
Junio C Hamano April 22, 2020, 11:02 p.m. UTC | #3
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> It is not obvious from the diff: `url` is never changed. It would still
> point at the first `h`, as you said.
>
>> > +	if (slash - url > 0)
>> > +		c->host = url_decode_mem(host, slash - host);
> ...
> I guess a better condition would be `if (proto_end || slash - host > 0)`.

Yeah, that would probably be more intuitive to read.  What triggered
my reaction was the mismatch between "slash - url" in the condition
and "slash - host" that specifies the length of the memory region.
Jonathan Nieder April 22, 2020, 11:38 p.m. UTC | #4
Johannes Schindelin wrote:

> There was one call site, though, that really needed that leniency: when
> parsing config settings a la `credential.dev.azure.com.useHTTPPath`.

Thanks for tackling this.

Can the commit message say a little more about the semantics and when
someone would use this?

Is it a shortcut for

	[credential "http://dev.azure.com"]
		useHttpPath = true

	[credential "https://dev.azure.com"]
		useHttpPath = true

?

> In preparation for fixing that regression, let's add a parameter called
> `strict` to the `credential_from_url()` function and convert the
> existing callers to enforce that strict mode.

I suspect this would be easier to read squashed with patch 3.  That
would also mean that the functionality and test coverage come at the
same time.

[...]
> diff --git a/credential.c b/credential.c
> index 64a841eddca..c73260ac40f 100644
> --- a/credential.c
> +++ b/credential.c
> @@ -344,7 +344,7 @@ static int check_url_component(const char *url, int quiet,
>  }
>  
>  int credential_from_url_gently(struct credential *c, const char *url,
> -			       int quiet)
> +			       int strict, int quiet)

The collection of flags makes me wonder whether it's time to use a
single "flags" parameter with flags that are |ed together.  That way,
call sites are easier to read without requiring cross-reference
assistance to see which option each boolean parameter represents.

Alternatively, could the non-strict form be a separate public function
that uses the same static helper that takes two boolean args?  That is,
something like

	int credential_from_url_gently(struct credential *c, const char *url,
				       int quiet)
	{
		return parse_credential_url(c, url, 1, quiet);
	}

	int credential_from_url_nonstrict(struct credential *c, const char *url,
					  int quiet)
	{
		return parse_credential_url(c, url, 0, quiet);
	}

[...]
> @@ -357,12 +357,12 @@ int credential_from_url_gently(struct credential *c, const char *url,
>  	 *   (3) proto://<user>:<pass>@<host>/...
>  	 */
>  	proto_end = strstr(url, "://");
> -	if (!proto_end || proto_end == url) {
> +	if (strict && (!proto_end || proto_end == url)) {
>  		if (!quiet)
>  			warning(_("url has no scheme: %s"), url);
>  		return -1;
>  	}

When !strict, this means we are not requiring a protocol.  No other
difference appears to be intended.

[...]
> @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
>  		host = at + 1;
>  	}
>  
> -	c->protocol = xmemdupz(url, proto_end - url);
> -	c->host = url_decode_mem(host, slash - host);
> +	if (proto_end && proto_end - url > 0)
> +		c->protocol = xmemdupz(url, proto_end - url);

What should happen when the protocol isn't present?  Does this mean
callers will need to be audited to make sure they handle NULL?

> +	if (slash - url > 0)
> +		c->host = url_decode_mem(host, slash - host);

What should happen the URL starts with a slash?

Thanks,
Jonathan
Carlo Marcelo Arenas Belón April 23, 2020, 12:02 a.m. UTC | #5
On Wed, Apr 22, 2020 at 4:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
> Johannes Schindelin wrote:
> > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> >               host = at + 1;
> >       }
> >
> > -     c->protocol = xmemdupz(url, proto_end - url);
> > -     c->host = url_decode_mem(host, slash - host);
> > +     if (proto_end && proto_end - url > 0)
> > +             c->protocol = xmemdupz(url, proto_end - url);
>
> What should happen when the protocol isn't present?  Does this mean
> callers will need to be audited to make sure they handle NULL?

the previous code was ensuring protocol was always at least "" (albeit it
might had been easier to understand with a comment)

removing the proto_end - url check would have the same effect, but then
we will need to also add a explicit xmemdupz("") for the nonstrict part
or audit (and with certainty add) checks to prevent a NULL protocol to
introduce regressions

Carlo
Johannes Schindelin April 23, 2020, 1:28 p.m. UTC | #6
Hi Carlo,

On Wed, 22 Apr 2020, Carlo Arenas wrote:

> On Wed, Apr 22, 2020 at 4:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
> > Johannes Schindelin wrote:
> > > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> > >               host = at + 1;
> > >       }
> > >
> > > -     c->protocol = xmemdupz(url, proto_end - url);
> > > -     c->host = url_decode_mem(host, slash - host);
> > > +     if (proto_end && proto_end - url > 0)
> > > +             c->protocol = xmemdupz(url, proto_end - url);
> >
> > What should happen when the protocol isn't present?  Does this mean
> > callers will need to be audited to make sure they handle NULL?
>
> the previous code was ensuring protocol was always at least "" (albeit it
> might had been easier to understand with a comment)

If you mean v2.17.5 by "the previous code", then yes.

However, what I wanted to reinstate was the behavior of v2.17.4, at least
the behavior that was obviously intended by this code:

	int credential_match(const struct credential *want,
			     const struct credential *have)
	{
	#define CHECK(x) (!want->x || (have->x && !strcmp(want->x, have->x)))
		return CHECK(protocol) &&
		       CHECK(host) &&
		       CHECK(path) &&
		       CHECK(username);
	#undef CHECK
	}

What speaks loudly to me is that _any_ of these can be `NULL` in `want`.
Any of them. Even protocol.

Remember, the call that I modified here to be more lenient populates that
`want`, not that `have`. The `have` still uses the strict mode, and that
is very much by design.

I also saw reports where users try to set `useHTTPPath` for hosts, not for
URLs, and it kind of makes sense. So I wanted to make that work, too.

Except that I uncovered the bug during my investigation that v2.17.4
handled `credential.<hostname>.useHTTPPath` as if no `<hostname>` was
provided at all!

So I wanted to fix that bug as well.

What I do _not_ want is to enforce `protocol` to be set (falling back to
an empty string if not specified). See below as to why.

> removing the proto_end - url check would have the same effect, but then
> we will need to also add a explicit xmemdupz("") for the nonstrict part
> or audit (and with certainty add) checks to prevent a NULL protocol to
> introduce regressions

I fear that my patches did not make it clear that the lenient mode is
_only_ used in the config parsing, in which case we very much do not want
to have the unspecified parts be interpreted as empty strings.

For example, if I specify

	[credential "http://"]
		helper = prevent

to catch _all_ http:// URLs, I definitely do not want that to be used only
for URLs with an empty host name!

Likewise, if I specify

	[credential "dev.azure.com"]
		useHTTPPath = true

I positively want to use the path part of the URL to specify what
credential to use _when accessing http://dev.azure.com/... _or_
https://dev.azure.com/... URLs, I do not expect that setting to be used
only for URLs with an empty protocol (which are now forbidden, anyway,
IIUC)!

So no, that `xmemdupz("")` would introduce very much undesirable behavior,
I believe.

Ciao,
Dscho
Carlo Marcelo Arenas Belón April 23, 2020, 9:22 p.m. UTC | #7
On Thu, Apr 23, 2020 at 03:28:13PM +0200, Johannes Schindelin wrote:
> On Wed, 22 Apr 2020, Carlo Arenas wrote:
> > On Wed, Apr 22, 2020 at 4:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
> > > Johannes Schindelin wrote:
> > > > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> > > >               host = at + 1;
> > > >       }
> > > >
> > > > -     c->protocol = xmemdupz(url, proto_end - url);
> > > > -     c->host = url_decode_mem(host, slash - host);
> > > > +     if (proto_end && proto_end - url > 0)
> > > > +             c->protocol = xmemdupz(url, proto_end - url);
> > >
> > > What should happen when the protocol isn't present?  Does this mean
> > > callers will need to be audited to make sure they handle NULL?
> >
> > the previous code was ensuring protocol was always at least "" (albeit it
> > might had been easier to understand with a comment)
> 
> I fear that my patches did not make it clear that the lenient mode is
> _only_ used in the config parsing, in which case we very much do not want
> to have the unspecified parts be interpreted as empty strings.

I think the concern raised was that since we are using the same function
in both cases there might be unintended consequences on changing the
semantics for the other case.

the argument made by Jonathan to use something else for configuration
(as is done in master) will help in that direction, and might be needed
anyway as your code it otherwise broken for current maint and master, but
if not possible (maybe later?) something like this could probably help :

diff --git a/credential.c b/credential.c
index 88612e583c..f972fcc895 100644
--- a/credential.c
+++ b/credential.c
@@ -389,8 +389,9 @@ int credential_from_url_gently(struct credential *c, const char *url,
 
 	if (proto_end && proto_end - url > 0)
 		c->protocol = xmemdupz(url, proto_end - url);
-	if (slash - url > 0)
+	if (strict || slash > url)
 		c->host = url_decode_mem(host, slash - host);
+
 	/* Trim leading and trailing slashes from path */
 	while (*slash == '/')
 		slash++;

changing the condition there as you suggested to Junio would be a plus IMHO
as well as getting some test that would excercise the new warning that was
introduced in credential.c:57

Carlo
Johannes Schindelin April 23, 2020, 10:03 p.m. UTC | #8
Hi Carlo,

On Thu, 23 Apr 2020, Carlo Marcelo Arenas Belón wrote:

> On Thu, Apr 23, 2020 at 03:28:13PM +0200, Johannes Schindelin wrote:
> > On Wed, 22 Apr 2020, Carlo Arenas wrote:
> > > On Wed, Apr 22, 2020 at 4:41 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
> > > > Johannes Schindelin wrote:
> > > > > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> > > > >               host = at + 1;
> > > > >       }
> > > > >
> > > > > -     c->protocol = xmemdupz(url, proto_end - url);
> > > > > -     c->host = url_decode_mem(host, slash - host);
> > > > > +     if (proto_end && proto_end - url > 0)
> > > > > +             c->protocol = xmemdupz(url, proto_end - url);
> > > >
> > > > What should happen when the protocol isn't present?  Does this mean
> > > > callers will need to be audited to make sure they handle NULL?
> > >
> > > the previous code was ensuring protocol was always at least "" (albeit it
> > > might had been easier to understand with a comment)
> >
> > I fear that my patches did not make it clear that the lenient mode is
> > _only_ used in the config parsing, in which case we very much do not want
> > to have the unspecified parts be interpreted as empty strings.
>
> I think the concern raised was that since we are using the same function
> in both cases there might be unintended consequences on changing the
> semantics for the other case.

Indeed, I share that concern. That's why I wanted to be extra careful
there to make sure that introducing this lenient mode does _not_ change
the non-lenient mode in the least, i.e. it is the reason why I kept 2/3
separate from 3/3.

> the argument made by Jonathan to use something else for configuration
> (as is done in master) will help in that direction, and might be needed
> anyway as your code it otherwise broken for current maint and master,

I am in general not a fan of the idea to have two separate parsers for
essentially the same thing. In this instance, the difference between the
lenient mode and the non-lenient mode is so obvious that I'd rather reuse
the same code (think: Don't Repeat Yourself).

> but if not possible (maybe later?) something like this could probably
> help :
>
> diff --git a/credential.c b/credential.c
> index 88612e583c..f972fcc895 100644
> --- a/credential.c
> +++ b/credential.c
> @@ -389,8 +389,9 @@ int credential_from_url_gently(struct credential *c, const char *url,
>
>  	if (proto_end && proto_end - url > 0)
>  		c->protocol = xmemdupz(url, proto_end - url);
> -	if (slash - url > 0)
> +	if (strict || slash > url)
>  		c->host = url_decode_mem(host, slash - host);
> +
>  	/* Trim leading and trailing slashes from path */
>  	while (*slash == '/')
>  		slash++;
>
> changing the condition there as you suggested to Junio would be a plus IMHO
> as well as getting some test that would excercise the new warning that was
> introduced in credential.c:57

Yes (modulo doing "greater than" comparison on pointers which is IIRC not
permitted in C in general).

Ciao,
Dscho
Junio C Hamano April 23, 2020, 10:11 p.m. UTC | #9
Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

> Yes (modulo doing "greater than" comparison on pointers which is IIRC not
> permitted in C in general).

Of course, people write a loop like this

	char *cp, *ep = strchr(string, '\n');

	for (cp = string; cp < ep; cp++)
		...

all the time, and forbidding pointer comparison would make the
language impossible to use ;-)

I think you are confused with a different rule---what is not kosher
is to compare two pointers that do not point into elements of the
same array.  Whether the comparison is done in (ptr1 < ptr2) way, or
(ptr2 - ptr1 < 0) way, does not change the equation.
Junio C Hamano April 23, 2020, 10:16 p.m. UTC | #10
Junio C Hamano <gitster@pobox.com> writes:

> Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
>
>> Yes (modulo doing "greater than" comparison on pointers which is IIRC not
>> permitted in C in general).
>
> Of course, people write a loop like this
>
> 	char *cp, *ep = strchr(string, '\n');
>
> 	for (cp = string; cp < ep; cp++)
> 		...
>
> all the time, and forbidding pointer comparison would make the
> language impossible to use ;-)
>
> I think you are confused with a different rule---what is not kosher
> is to compare two pointers that do not point into elements of the
> same array.  Whether the comparison is done in (ptr1 < ptr2) way, or
> (ptr2 - ptr1 < 0) way, does not change the equation.

Having said that, between

1.	if (strict || slash - url > 0)
2.	if (strict || slash > url)
 		c->host = url_decode_mem(host, slash - host);

I think the former is moderately easier to read.  It still has the
same "Huh?" factor that a comparison between slash and URL guards
the size of the region being decoded, which is slash - host, and
makes the reader wonder how these two variables, URL and host,
relate to each other at this point in the code, though, either way
the comparison is spelled.

Thanks.
Johannes Schindelin April 23, 2020, 10:50 p.m. UTC | #11
Hi Jonathan,

On Wed, 22 Apr 2020, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
>
> > There was one call site, though, that really needed that leniency: when
> > parsing config settings a la `credential.dev.azure.com.useHTTPPath`.
>
> Thanks for tackling this.
>
> Can the commit message say a little more about the semantics and when
> someone would use this?
>
> Is it a shortcut for
>
> 	[credential "http://dev.azure.com"]
> 		useHttpPath = true
>
> 	[credential "https://dev.azure.com"]
> 		useHttpPath = true
>
> ?

I don't really want to be too verbose about this in _this_ commit message.
But I do see your point. This is my current version of the commit message:

    credential: optionally allow partial URLs in credential_from_url_gently()

    Prior to the fixes for CVE-2020-11008, we were _very_ lenient in what we
    required from a URL in order to parse it into a `struct credential`.
    That led to serious vulnerabilities.

    There was one call site, though, that really needed that leniency: when
    parsing config settings a la `credential.dev.azure.com.useHTTPPath`.
    Settings like this might be desired when users want to use, say, a given
    user name on a given host, regardless of the protocol to be used.

    In preparation for fixing that bug, let's add a parameter called
    `allow_partial_url` to the `credential_from_url_gently()` function and
    convert the existing callers to set that parameter to 0, i.e. do not
    change the existing behavior, just add the option to be more lenient.

    Please note that this patch does more than just reinstating a way to
    imitate the behavior before those CVE-2020-11008 fixes: Before that, we
    would simply ignore URLs without a protocol. In other words,
    misleadingly, the following setting would be applied to _all_ URLs:

            [credential "example.com"]
                    username = that-me

    The obvious intention is to match the host name only. With this patch,
    we allow precisely that: when parsing the URL with non-zero
    `allow_partial_url`, we do not simply return success if there was no
    protocol, but we simply leave the protocol unset and continue parsing
    the URL.

As you see, I added substantially more information there.

> > In preparation for fixing that regression, let's add a parameter
> > called `strict` to the `credential_from_url()` function and convert
> > the existing callers to enforce that strict mode.
>
> I suspect this would be easier to read squashed with patch 3.  That
> would also mean that the functionality and test coverage come at the
> same time.

On the contrary, I _really_ want them to be separate, so that it is easier
to review. I know that _I_ had an easier time looking over the patch that
introduces the non-strict mode, to make sure that it does not change a
thing in strict mode (and yes, v1 made that harder than necessary by _not_
using `strict ||` in the condition that guards the `c->host` assignment).

In short: it is my intention to keep the two patches separate, with the
main goal to make sure that I don't introduce any of those regressions
Peff is worried about.

> [...]
> > diff --git a/credential.c b/credential.c
> > index 64a841eddca..c73260ac40f 100644
> > --- a/credential.c
> > +++ b/credential.c
> > @@ -344,7 +344,7 @@ static int check_url_component(const char *url, int quiet,
> >  }
> >
> >  int credential_from_url_gently(struct credential *c, const char *url,
> > -			       int quiet)
> > +			       int strict, int quiet)
>
> The collection of flags makes me wonder whether it's time to use a
> single "flags" parameter with flags that are |ed together.  That way,
> call sites are easier to read without requiring cross-reference
> assistance to see which option each boolean parameter represents.

I resisted the temptation because I don't want this to be a big patch
series, but a fast one. The reports keep coming in, and the current plan
seems to be for GitHub Desktop to release another version on Monday, for
which I need to release a MinGit backport probably tomorrow, otherwise it
won't work timewise.

Meaning: I don't really want to do "nice to have" work like this flag.

> Alternatively, could the non-strict form be a separate public function
> that uses the same static helper that takes two boolean args?  That is,
> something like
>
> 	int credential_from_url_gently(struct credential *c, const char *url,
> 				       int quiet)
> 	{
> 		return parse_credential_url(c, url, 1, quiet);
> 	}
>
> 	int credential_from_url_nonstrict(struct credential *c, const char *url,
> 					  int quiet)
> 	{
> 		return parse_credential_url(c, url, 0, quiet);
> 	}

Looks good, but there are only three callers (and I don't expect more).
And the only caller interested in the lenient mode (read: the only
now-complicated call) lives in the very same file as the called function.
So it strikes me as quite a bit of an overkill to introduce two new
functions.

> [...]
> > @@ -357,12 +357,12 @@ int credential_from_url_gently(struct credential *c, const char *url,
> >  	 *   (3) proto://<user>:<pass>@<host>/...
> >  	 */
> >  	proto_end = strstr(url, "://");
> > -	if (!proto_end || proto_end == url) {
> > +	if (strict && (!proto_end || proto_end == url)) {
> >  		if (!quiet)
> >  			warning(_("url has no scheme: %s"), url);
> >  		return -1;
> >  	}
>
> When !strict, this means we are not requiring a protocol.  No other
> difference appears to be intended.

Almost. My intention was to handle missing host names, too: `/repo.git`
should also match `https://example.com/repo.git`.

The user name is optional already, anyway.

BTW I realized (while working on these patches) that it would probably be
a good idea to warn about passwords in these `credential.<URL>.<key>`
settings: the password will be ignored as far as `credential_match()` is
concerned, and they should probably not live in the config.

But for aforementioned reasons, I decided against including a patch that
makes that happen.

> [...]
> > @@ -382,8 +382,10 @@ int credential_from_url_gently(struct credential *c, const char *url,
> >  		host = at + 1;
> >  	}
> >
> > -	c->protocol = xmemdupz(url, proto_end - url);
> > -	c->host = url_decode_mem(host, slash - host);
> > +	if (proto_end && proto_end - url > 0)
> > +		c->protocol = xmemdupz(url, proto_end - url);
>
> What should happen when the protocol isn't present?  Does this mean
> callers will need to be audited to make sure they handle NULL?

Again, the sole caller of this lenient mode is the config parser which
then wants to match the (partial) URL against the actual URL, using this
function:

	int credential_match(const struct credential *want,
			     const struct credential *have)
	{
	#define CHECK(x) (!want->x || (have->x && !strcmp(want->x, have->x)))
		return CHECK(protocol) &&
		       CHECK(host) &&
		       CHECK(path) &&
		       CHECK(username);
	#undef CHECK
	}

The lenient parser is supposed to populate the `want` of this function. In
other words, the code _expects_ `protocol` (or for that matter, _any_
attribute of `want`) to be optional.

> > +	if (slash - url > 0)
> > +		c->host = url_decode_mem(host, slash - host);
>
> What should happen the URL starts with a slash?

Oh... I thought it was obvious that a partial URL starting with a slash
would refer to the path part only... Does that not make sense?

Ciao,
Dscho
Johannes Schindelin April 23, 2020, 11:22 p.m. UTC | #12
Hi Junio,

On Thu, 23 Apr 2020, Junio C Hamano wrote:

> Junio C Hamano <gitster@pobox.com> writes:
>
> > Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:
> >
> >> Yes (modulo doing "greater than" comparison on pointers which is IIRC not
> >> permitted in C in general).
> >
> > Of course, people write a loop like this
> >
> > 	char *cp, *ep = strchr(string, '\n');
> >
> > 	for (cp = string; cp < ep; cp++)
> > 		...
> >
> > all the time, and forbidding pointer comparison would make the
> > language impossible to use ;-)
> >
> > I think you are confused with a different rule---what is not kosher
> > is to compare two pointers that do not point into elements of the
> > same array.  Whether the comparison is done in (ptr1 < ptr2) way, or
> > (ptr2 - ptr1 < 0) way, does not change the equation.

Yep, that's my confusion all right.

> Having said that, between
>
> 1.	if (strict || slash - url > 0)
> 2.	if (strict || slash > url)
>  		c->host = url_decode_mem(host, slash - host);
>
> I think the former is moderately easier to read.  It still has the
> same "Huh?" factor that a comparison between slash and URL guards
> the size of the region being decoded, which is slash - host, and
> makes the reader wonder how these two variables, URL and host,
> relate to each other at this point in the code, though, either way
> the comparison is spelled.

I fully agree! That's why I use `strict || slash - host > 0` in my next
iteration (actually, I decided to rename `strict`, but that's beside the
point).

Ciao,
Dscho
diff mbox series

Patch

diff --git a/credential.c b/credential.c
index 64a841eddca..c73260ac40f 100644
--- a/credential.c
+++ b/credential.c
@@ -344,7 +344,7 @@  static int check_url_component(const char *url, int quiet,
 }
 
 int credential_from_url_gently(struct credential *c, const char *url,
-			       int quiet)
+			       int strict, int quiet)
 {
 	const char *at, *colon, *cp, *slash, *host, *proto_end;
 
@@ -357,12 +357,12 @@  int credential_from_url_gently(struct credential *c, const char *url,
 	 *   (3) proto://<user>:<pass>@<host>/...
 	 */
 	proto_end = strstr(url, "://");
-	if (!proto_end || proto_end == url) {
+	if (strict && (!proto_end || proto_end == url)) {
 		if (!quiet)
 			warning(_("url has no scheme: %s"), url);
 		return -1;
 	}
-	cp = proto_end + 3;
+	cp = proto_end ? proto_end + 3 : url;
 	at = strchr(cp, '@');
 	colon = strchr(cp, ':');
 	slash = strchrnul(cp, '/');
@@ -382,8 +382,10 @@  int credential_from_url_gently(struct credential *c, const char *url,
 		host = at + 1;
 	}
 
-	c->protocol = xmemdupz(url, proto_end - url);
-	c->host = url_decode_mem(host, slash - host);
+	if (proto_end && proto_end - url > 0)
+		c->protocol = xmemdupz(url, proto_end - url);
+	if (slash - url > 0)
+		c->host = url_decode_mem(host, slash - host);
 	/* Trim leading and trailing slashes from path */
 	while (*slash == '/')
 		slash++;
@@ -407,6 +409,6 @@  int credential_from_url_gently(struct credential *c, const char *url,
 
 void credential_from_url(struct credential *c, const char *url)
 {
-	if (credential_from_url_gently(c, url, 0) < 0)
+	if (credential_from_url_gently(c, url, 1, 0) < 0)
 		die(_("credential url cannot be parsed: %s"), url);
 }
diff --git a/credential.h b/credential.h
index 5a86502d95c..823ec2caf35 100644
--- a/credential.h
+++ b/credential.h
@@ -41,9 +41,13 @@  void credential_write(const struct credential *, FILE *);
  * an error but leave the broken state in the credential object for further
  * examination.  The non-gentle form will issue a warning to stderr and return
  * an empty credential.
+ *
+ * In strict mode, an empty protocol or an empty host name are not allowed.
+ * The credential_from_url() function enforces strict mode.
  */
 void credential_from_url(struct credential *, const char *url);
-int credential_from_url_gently(struct credential *, const char *url, int quiet);
+int credential_from_url_gently(struct credential *, const char *url,
+			       int strict, int quiet);
 
 int credential_match(const struct credential *have,
 		     const struct credential *want);
diff --git a/fsck.c b/fsck.c
index 31b5be05f54..aa66dc1e742 100644
--- a/fsck.c
+++ b/fsck.c
@@ -1076,7 +1076,7 @@  static int check_submodule_url(const char *url)
 	else if (url_to_curl_url(url, &curl_url)) {
 		struct credential c = CREDENTIAL_INIT;
 		int ret = 0;
-		if (credential_from_url_gently(&c, curl_url, 1) ||
+		if (credential_from_url_gently(&c, curl_url, 1, 1) ||
 		    !*c.host)
 			ret = -1;
 		credential_clear(&c);