diff mbox series

[v2] urlmatch: create fetch.credentialsInUrl config

Message ID pull.1237.v2.git.1653658034086.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [v2] urlmatch: create fetch.credentialsInUrl config | expand

Commit Message

Derrick Stolee May 27, 2022, 1:27 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

Users sometimes provide a "username:password" combination in their
plaintext URLs. Since Git stores these URLs in plaintext in the
.git/config file, this is a very insecure way of storing these
credentials. Credential managers are a more secure way of storing this
information.

System administrators might want to prevent this kind of use by users on
their machines.

Create a new "fetch.credentialsInUrl" config option and teach Git to
warn or die when seeing a URL with this kind of information. The warning
anonymizes the sensitive information of the URL to be clear about the
issue.

This change currently defaults the behavior to "allow" which does
nothing with these URLs. We can consider changing this behavior to
"warn" by default if we wish. At that time, we may want to add some
advice about setting fetch.credentialsInUrl=ignore for users who still
want to follow this pattern (and not receive the warning).

As an attempt to ensure the parsing logic did not catch any
unintentional cases, I modified this change locally to to use the "die"
option by default. Running the test suite succeeds except for the
explicit username:password URLs used in t5550-http-fetch-dumb.sh and
t5541-http-push-smart.sh. This means that all other tested URLs did not
trigger this logic.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
    urlmatch: create fetch.credentialsInUrl config
    
    This is a modified version of the patch I submitted a while ago [1].
    
    Based on the feedback, changing the behavior to fail by default was not
    a good approach. Further, the idea to stop storing the credentials in
    config and redirect them to a credential manager was already considered
    by Peff [2] but not merged.
    
    This patch does what should be the simplest thing we can do: create a
    config option that will cause the user to get a warning or a failure,
    depending on its value. The default is to ignore the setting, identical
    to the current behavior. We can talk about changing this default to
    "warn" in the future, but it would be safest to release with ignore as
    the default until we are sure that we are not going to start warning on
    false positives.
    
    This patch would be sufficient for the interested internal parties that
    want to prevent users from storing credentials this way. System
    administrators can modify system-level Git config into "die" mode to
    prevent this behavior.
    
    [1]
    https://lore.kernel.org/git/pull.945.git.1619807844627.gitgitgadget@gmail.com
    Reject passwords in URLs (April 2021).
    
    [2]
    https://lore.kernel.org/git/20190519050724.GA26179@sigill.intra.peff.net/
    Re: Git ransom campaign incident report - May 2019
    
    
    Updates in v2
    =============
    
     * Documentation is slightly expanded to include the fact that Git
       stores the given URL as plaintext in its config.
     * The new method has a new documentation comment that details the
       necessary preconditions.
     * "ignore" is now "allow"
     * Additional checks on colon_ptr are added.
     * Use strbuf_splice() instead of custom string-walking logic.
     * Use "" instead of asterisks.
     * Config value checks are no longer case sensitive.
    
    Thanks, -Stolee

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-1237%2Fderrickstolee%2Fcreds-in-url-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-1237/derrickstolee/creds-in-url-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/1237

Range-diff vs v1:

 1:  cc2befb3803 ! 1:  364f5c37c70 urlmatch: create fetch.credentialsInUrl config
     @@ Commit message
          anonymizes the sensitive information of the URL to be clear about the
          issue.
      
     -    This change currently defaults the behavior to "ignore" which does
     +    This change currently defaults the behavior to "allow" which does
          nothing with these URLs. We can consider changing this behavior to
          "warn" by default if we wish. At that time, we may want to add some
          advice about setting fetch.credentialsInUrl=ignore for users who still
     @@ Commit message
          As an attempt to ensure the parsing logic did not catch any
          unintentional cases, I modified this change locally to to use the "die"
          option by default. Running the test suite succeeds except for the
     -    explicit username:password URLs used in t5550-http-fetch-dumb.s and
     +    explicit username:password URLs used in t5550-http-fetch-dumb.sh and
          t5541-http-push-smart.sh. This means that all other tested URLs did not
          trigger this logic.
      
     @@ Documentation/config/fetch.txt: fetch.writeCommitGraph::
      +fetch.credentialsInUrl::
      +	A URL can contain plaintext credentials in the form
      +	`protocol://<user>:<password>@domain/path`. Using such URLs is not
     -+	recommended as it exposes the password in multiple ways. The
     ++	recommended as it exposes the password in multiple ways, including
     ++	Git storing the URL as plaintext in the repository config. The
      +	`fetch.credentialsInUrl` option provides instruction for how Git
      +	should react to seeing such a URL, with these values:
      ++
     -+* `ignore` (default): Git will proceed with its activity without warning.
     ++* `allow` (default): Git will proceed with its activity without warning.
      +* `warn`: Git will write a warning message to `stderr` when parsing a URL
      +  with a plaintext credential.
      +* `die`: Git will write a failure message to `stderr` when parsing a URL
     @@ t/t5601-clone.sh: test_expect_success 'clone respects GIT_WORK_TREE' '
       
      +test_expect_success 'clone warns or fails when using username:password' '
      +	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err &&
     -+	grep "warning: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err &&
     ++	grep "warning: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err &&
      +	test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err &&
     -+	grep "fatal: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err
     ++	grep "fatal: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err
     ++'
     ++
     ++test_expect_success 'clone does not detect username:password when it is https://username@domain:port/' '
     ++	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username@localhost:8080 attempt3 2>err &&
     ++	! grep "uses plaintext credentials" err
      +'
      +
       test_expect_success 'clone from hooks' '
     @@ urlmatch.c: static int match_host(const struct url_info *url_info,
       	return (!url_len && !pat_len);
       }
       
     ++/*
     ++ * Call this method when we have detected credentials within the 'url' in
     ++ * the form
     ++ *
     ++ *     scheme://username:password@domain[:port][/path]
     ++ *
     ++ * The 'scheme_len' value should be equal to the string length of the
     ++ * "scheme://" portion of the URL.
     ++ *
     ++ * The fetch.credentialsInUrl config indicates what to do on such a URL,
     ++ * either ignoring, warning, or die()ing. The latter two modes write a
     ++ * redacted URL to stderr.
     ++ */
      +static void detected_credentials_in_url(const char *url, size_t scheme_len)
      +{
     -+	char *value = NULL;
     ++	const char *value;
      +	const char *at_ptr;
      +	const char *colon_ptr;
     -+	struct strbuf anonymized = STRBUF_INIT;
     ++	struct strbuf redacted = STRBUF_INIT;
      +
     -+	/* "ignore" is the default behavior. */
     -+	if (git_config_get_string("fetch.credentialsinurl", &value) ||
     -+	    !strcasecmp("ignore", value))
     -+		goto cleanup;
     ++	/* "allow" is the default behavior. */
     ++	if (git_config_get_string_tmp("fetch.credentialsinurl", &value) ||
     ++	    !strcmp("allow", value))
     ++		return;
      +
      +	at_ptr = strchr(url, '@');
      +	colon_ptr = strchr(url + scheme_len + 3, ':');
      +
     ++	/*
     ++	 * Let's do some defensive programming to ensure the given
     ++	 * URL is of the proper format.
     ++	 */
      +	if (!colon_ptr)
      +		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
      +		    url, (uintmax_t) scheme_len);
     ++	if (colon_ptr > at_ptr)
     ++		BUG("input url '%s' does not include credentials",
     ++		    url);
      +
     -+	/* Include everything including the colon. */
     ++	/* Include the colon when creating the redacted URL. */
      +	colon_ptr++;
     -+	strbuf_add(&anonymized, url, colon_ptr - url);
     -+
     -+	while (colon_ptr < at_ptr) {
     -+		strbuf_addch(&anonymized, '*');
     -+		colon_ptr++;
     -+	}
     -+
     -+	strbuf_addstr(&anonymized, at_ptr);
     ++	strbuf_addstr(&redacted, url);
     ++	strbuf_splice(&redacted, colon_ptr - url, at_ptr - colon_ptr,
     ++		      "<redacted>", 10);
      +
     -+	if (!strcasecmp("warn", value))
     -+		warning(_("URL '%s' uses plaintext credentials"), anonymized.buf);
     -+	if (!strcasecmp("die", value))
     -+		die(_("URL '%s' uses plaintext credentials"), anonymized.buf);
     ++	if (!strcmp("warn", value))
     ++		warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
     ++	if (!strcmp("die", value))
     ++		die(_("URL '%s' uses plaintext credentials"), redacted.buf);
      +
     -+cleanup:
     -+	free(value);
     -+	strbuf_release(&anonymized);
     ++	strbuf_release(&redacted);
      +}
      +
       static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)


 Documentation/config/fetch.txt | 14 +++++++++
 t/t5601-clone.sh               | 12 ++++++++
 urlmatch.c                     | 56 ++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)


base-commit: f9b95943b68b6b8ca5a6072f50a08411c6449b55

Comments

Ævar Arnfjörð Bjarmason May 27, 2022, 2:22 p.m. UTC | #1
On Fri, May 27 2022, Derrick Stolee via GitGitGadget wrote:

> From: Derrick Stolee <derrickstolee@github.com>

Just real quick, I hadn't taken notice of this before (the rest looks
good at a glance):

> +	/*
> +	 * Let's do some defensive programming to ensure the given
> +	 * URL is of the proper format.
> +	 */
> +	if (!colon_ptr)
> +		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
> +		    url, (uintmax_t) scheme_len);
> +	if (colon_ptr > at_ptr)
> +		BUG("input url '%s' does not include credentials",
> +		    url);

So the function is renamed to detected_credentials_in_url(), so as a nit
I'd expect some verb like "strip", "redact" or whatever inthe name or
whatever, to make it clear what we're doing.

But since the only caller here below...

> +
> +	/* Include the colon when creating the redacted URL. */
> +	colon_ptr++;
> +	strbuf_addstr(&redacted, url);
> +	strbuf_splice(&redacted, colon_ptr - url, at_ptr - colon_ptr,
> +		      "<redacted>", 10);
> +
> +	if (!strcmp("warn", value))
> +		warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +	if (!strcmp("die", value))
> +		die(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +
> +	strbuf_release(&redacted);
> +}
> +
>  static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
>  {
>  	/*
> @@ -144,6 +198,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
>  	 */
>  
>  	size_t url_len = strlen(url);
> +	const char *orig_url = url;
>  	struct strbuf norm;
>  	size_t spanned;
>  	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
> @@ -191,6 +246,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al
>  			}
>  			colon_ptr = strchr(norm.buf + scheme_len + 3, ':');
>  			if (colon_ptr) {
> +				detected_credentials_in_url(orig_url, scheme_len);

Has already done the work of finding the colon_ptr (and at_ptr) why
re-do that paranoia since we have a static function, we could just pass
the two pointers we found already to strbuf_splice().

This also seems really close to something we could just add to strbuf.c
as e.g a strbuf_splice_to(). I.e. just:
	
	int strbuf_splice_to(const struct strbuf *in, struct strbuf *sb,
			     size_t pos, size_t len,
			     const void *data, size_t data_len);
	
Which would be used as:
	
	struct strbuf sb = STRBUF_INIT;
	if (!strbuf_splice_to(url, &redacted, /* same as strbuf_splice(...) */))
		warn("oh noes a password in %s", sb.buf);
	else
		warn("have no password in %s, no replacement done", url->buf);

Which re earlier talk of sharing an implementation with the other
<redacted> code looks like it could be dropped into the relevant part of
pkt-line.c.

But maybe that's all going overboard :)
Derrick Stolee May 27, 2022, 2:43 p.m. UTC | #2
On 5/27/2022 10:22 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, May 27 2022, Derrick Stolee via GitGitGadget wrote:
> 
>> From: Derrick Stolee <derrickstolee@github.com>
> 
> Just real quick, I hadn't taken notice of this before (the rest looks
> good at a glance):
> 
>> +	/*
>> +	 * Let's do some defensive programming to ensure the given
>> +	 * URL is of the proper format.
>> +	 */
>> +	if (!colon_ptr)
>> +		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
>> +		    url, (uintmax_t) scheme_len);
>> +	if (colon_ptr > at_ptr)
>> +		BUG("input url '%s' does not include credentials",
>> +		    url);
> 
> So the function is renamed to detected_credentials_in_url(), so as a nit
> I'd expect some verb like "strip", "redact" or whatever inthe name or
> whatever, to make it clear what we're doing.

The name means "Do what needs to be done when creds are detected
in a URL".

If we decide to change the response to creds in a URL, then we
would change the implementation without changing the name.

>>  			colon_ptr = strchr(norm.buf + scheme_len + 3, ':');
>>  			if (colon_ptr) {
>> +				detected_credentials_in_url(orig_url, scheme_len);
> 
> Has already done the work of finding the colon_ptr (and at_ptr) why
> re-do that paranoia since we have a static function,

Unfortunately, at this point 'url' is not equal to 'orig_url' and
'colon_ptr' isn't even within the 'orig_url' string. It's been
copied and mutated. It was my first instinct to share as much as
possible, which is why 'schema' is reused.

Thanks,
-Stolee
Junio C Hamano May 27, 2022, 6:09 p.m. UTC | #3
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
> index cd65d236b43..7fd3ea89f5d 100644
> --- a/Documentation/config/fetch.txt
> +++ b/Documentation/config/fetch.txt
> @@ -96,3 +96,17 @@ fetch.writeCommitGraph::
>  	merge and the write may take longer. Having an updated commit-graph
>  	file helps performance of many Git commands, including `git merge-base`,
>  	`git push -f`, and `git log --graph`. Defaults to false.
> +
> +fetch.credentialsInUrl::
> +	A URL can contain plaintext credentials in the form
> +	`protocol://<user>:<password>@domain/path`. Using such URLs is not

In the above, 'protocol', 'domain' and 'path' are all placeholders,
just like <user> and <password> are, and it made me wonder if we
should be consistent.  Enclosing all placeholders in <angle-braket>
pairs would make the resulting text too loud, so it may make sense
to drop these highlights around user and password.

That makes it consistent with what git-credential-store.txt,
git-svn.txt, gitfaq.txt and urls.txt do

    git-credential-store.txt:https://user:pass@example.com
    git-svn.txt:	the URL, e.g. `svn+ssh://foo@svn.bar.com/project`
    gitfaq.txt:$ echo url=https://author@git.example.org | git credential reject
    gitfaq.txt:	https://author@git.example.org/org1/project1.git and
    gitfaq.txt:	https://committer@git.example.org/org2/project2.git.  This way, when you
    gitfaq.txt:	origin https://author@git.example.org/org1/project1.git` (see


> +	recommended as it exposes the password in multiple ways, including
> +	Git storing the URL as plaintext in the repository config. The
> +	`fetch.credentialsInUrl` option provides instruction for how Git
> +	should react to seeing such a URL, with these values:
> ++
> +* `allow` (default): Git will proceed with its activity without warning.
> +* `warn`: Git will write a warning message to `stderr` when parsing a URL
> +  with a plaintext credential.
> +* `die`: Git will write a failure message to `stderr` when parsing a URL
> +  with a plaintext credential.

Good.

> diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
> index 4a61f2c901e..387da74d175 100755
> --- a/t/t5601-clone.sh
> +++ b/t/t5601-clone.sh
> @@ -71,6 +71,18 @@ test_expect_success 'clone respects GIT_WORK_TREE' '
>  
>  '
>  
> +test_expect_success 'clone warns or fails when using username:password' '
> +	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err &&
> +	grep "warning: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err &&
> +	test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err &&
> +	grep "fatal: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err
> +'
> +
> +test_expect_success 'clone does not detect username:password when it is https://username@domain:port/' '
> +	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username@localhost:8080 attempt3 2>err &&
> +	! grep "uses plaintext credentials" err
> +'

Could we have one negative test here, too?  I.e. with an explicit -c
fetch.credentialsInUrl=allow given, no credential-in-URL errors and
warnings should be issued.

> diff --git a/urlmatch.c b/urlmatch.c
> index b615adc923a..16beda37a3a 100644
> --- a/urlmatch.c
> +++ b/urlmatch.c
> @@ -1,5 +1,6 @@
>  #include "cache.h"
>  #include "urlmatch.h"
> +#include "config.h"
>  
>  #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
>  #define URL_DIGIT "0123456789"
> @@ -106,6 +107,59 @@ static int match_host(const struct url_info *url_info,
>  	return (!url_len && !pat_len);
>  }
>  
> +/*
> + * Call this method when we have detected credentials within the 'url' in
> + * the form
> + *
> + *     scheme://username:password@domain[:port][/path]
> + *
> + * The 'scheme_len' value should be equal to the string length of the
> + * "scheme://" portion of the URL.

"scheme" is probably more technically correct here, even though in
the documentation that faces the end-users, "protocol" would be more
widely understood, so I think this inconsistency in a single patch
across paths is a good thing ;-)

> + * The fetch.credentialsInUrl config indicates what to do on such a URL,
> + * either ignoring, warning, or die()ing. The latter two modes write a
> + * redacted URL to stderr.
> + */
> +static void detected_credentials_in_url(const char *url, size_t scheme_len)
> +{
> +	const char *value;
> +	const char *at_ptr;
> +	const char *colon_ptr;
> +	struct strbuf redacted = STRBUF_INIT;
> +
> +	/* "allow" is the default behavior. */
> +	if (git_config_get_string_tmp("fetch.credentialsinurl", &value) ||
> +	    !strcmp("allow", value))
> +		return;

OK.

> +	at_ptr = strchr(url, '@');
> +	colon_ptr = strchr(url + scheme_len + 3, ':');

I am debating myself if we should explain "+ 3" that counts "://" in
a comment, but it probably is trivial to see.

> +	/*
> +	 * Let's do some defensive programming to ensure the given
> +	 * URL is of the proper format.
> +	 */
> +	if (!colon_ptr)
> +		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
> +		    url, (uintmax_t) scheme_len);
> +	if (colon_ptr > at_ptr)
> +		BUG("input url '%s' does not include credentials",
> +		    url);

OK.  The caller shouldn't have called us in these two cases.

> +	/* Include the colon when creating the redacted URL. */
> +	colon_ptr++;
> +	strbuf_addstr(&redacted, url);
> +	strbuf_splice(&redacted, colon_ptr - url, at_ptr - colon_ptr,
> +		      "<redacted>", 10);
> +
> +	if (!strcmp("warn", value))
> +		warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +	if (!strcmp("die", value))
> +		die(_("URL '%s' uses plaintext credentials"), redacted.buf);
> +
> +	strbuf_release(&redacted);

OK.  Let's look at the caller.

>  			if (colon_ptr) {
> +				detected_credentials_in_url(orig_url, scheme_len);
>  				passwd_off = (colon_ptr + 1) - norm.buf;
>  				passwd_len = norm.len - passwd_off;
>  				user_len = (passwd_off - 1) - (scheme_len + 3);

The caller apparently knows where the password and username are at
this point.  It is unclear which string these offsets are pointing
into at this point, but at the end of the function, we have the
offset and length of user and passwd that point into result, all in
our local variables, even when out_info is NULL.

	result = strbuf_detach(&norm, &result_len);
	if (out_info) {
		out_info->url = result;
		out_info->err = NULL;
		out_info->url_len = result_len;
		out_info->scheme_len = scheme_len;
		out_info->user_off = user_off;
		out_info->user_len = user_len;
		out_info->passwd_off = passwd_off;
		out_info->passwd_len = passwd_len;
		out_info->host_off = host_off;
		out_info->host_len = host_len;
		out_info->port_off = port_off;
		out_info->port_len = port_len;
		out_info->path_off = path_off;
		out_info->path_len = path_len;
	}
	return result;

It does make me wonder if we want another parser in the new
function.  Wouldn't it be easier to manage if we inserted a
call to 

	if (passwd_off)
		apply_fetch_credentials_in_url(result,
        			user_off, user_len, passwd_off, passwd_len);

just after the strbuf_detach() we see near the end of the function, where
apply_fetch_credentials_in_url() only reads the configuration and
calls warning or die as appropriate?
Junio C Hamano May 27, 2022, 6:40 p.m. UTC | #4
Junio C Hamano <gitster@pobox.com> writes:

> It does make me wonder if we want another parser in the new
> function.  Wouldn't it be easier to manage if we inserted a
> call to 
>
> 	if (passwd_off)
> 		apply_fetch_credentials_in_url(result,
>         			user_off, user_len, passwd_off, passwd_len);
>
> just after the strbuf_detach() we see near the end of the function, where
> apply_fetch_credentials_in_url() only reads the configuration and
> calls warning or die as appropriate?

The new function does not even have to take user_off and user_len,
and work only with the while string with <ofs,len> pair for the
password, I think, as that is the only thing it redacts in the
output.  Sorry about the noise.
Junio C Hamano May 30, 2022, 12:16 a.m. UTC | #5
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> Create a new "fetch.credentialsInUrl" config option and teach Git to
> warn or die when seeing a URL with this kind of information. The warning
> anonymizes the sensitive information of the URL to be clear about the
> issue.
>
> This change currently defaults the behavior to "allow" which does
> nothing with these URLs. We can consider changing this behavior to
> "warn" by default if we wish. At that time, we may want to add some
> advice about setting fetch.credentialsInUrl=ignore for users who still
> want to follow this pattern (and not receive the warning).

Can we make this die in a bit more controlled way?

e.g. https://github.com/git/git/runs/6646450422 seems to show that
depending on the timing, the call to die() on the "git clone" side
may cause us stop reading early enough to kill the other side with
SIGPIPE.  The nicely prepared warning message seems to be lost.
Derrick Stolee May 31, 2022, 1:32 p.m. UTC | #6
On 5/29/2022 8:16 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> Create a new "fetch.credentialsInUrl" config option and teach Git to
>> warn or die when seeing a URL with this kind of information. The warning
>> anonymizes the sensitive information of the URL to be clear about the
>> issue.
>>
>> This change currently defaults the behavior to "allow" which does
>> nothing with these URLs. We can consider changing this behavior to
>> "warn" by default if we wish. At that time, we may want to add some
>> advice about setting fetch.credentialsInUrl=ignore for users who still
>> want to follow this pattern (and not receive the warning).
> 
> Can we make this die in a bit more controlled way?
> 
> e.g. https://github.com/git/git/runs/6646450422 seems to show that
> depending on the timing, the call to die() on the "git clone" side
> may cause us stop reading early enough to kill the other side with
> SIGPIPE.  The nicely prepared warning message seems to be lost.

Thanks for pointing this out. It took a while for me to reproduce
this with --stress, but I can get it to happen on my machine.

Investigating now.

Thanks,
-Stolee
diff mbox series

Patch

diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt
index cd65d236b43..7fd3ea89f5d 100644
--- a/Documentation/config/fetch.txt
+++ b/Documentation/config/fetch.txt
@@ -96,3 +96,17 @@  fetch.writeCommitGraph::
 	merge and the write may take longer. Having an updated commit-graph
 	file helps performance of many Git commands, including `git merge-base`,
 	`git push -f`, and `git log --graph`. Defaults to false.
+
+fetch.credentialsInUrl::
+	A URL can contain plaintext credentials in the form
+	`protocol://<user>:<password>@domain/path`. Using such URLs is not
+	recommended as it exposes the password in multiple ways, including
+	Git storing the URL as plaintext in the repository config. The
+	`fetch.credentialsInUrl` option provides instruction for how Git
+	should react to seeing such a URL, with these values:
++
+* `allow` (default): Git will proceed with its activity without warning.
+* `warn`: Git will write a warning message to `stderr` when parsing a URL
+  with a plaintext credential.
+* `die`: Git will write a failure message to `stderr` when parsing a URL
+  with a plaintext credential.
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 4a61f2c901e..387da74d175 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -71,6 +71,18 @@  test_expect_success 'clone respects GIT_WORK_TREE' '
 
 '
 
+test_expect_success 'clone warns or fails when using username:password' '
+	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err &&
+	grep "warning: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err &&
+	test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err &&
+	grep "fatal: URL '\''https://username:<redacted>@localhost/'\'' uses plaintext credentials" err
+'
+
+test_expect_success 'clone does not detect username:password when it is https://username@domain:port/' '
+	test_must_fail git -c fetch.credentialsInUrl=warn clone https://username@localhost:8080 attempt3 2>err &&
+	! grep "uses plaintext credentials" err
+'
+
 test_expect_success 'clone from hooks' '
 
 	test_create_repo r0 &&
diff --git a/urlmatch.c b/urlmatch.c
index b615adc923a..16beda37a3a 100644
--- a/urlmatch.c
+++ b/urlmatch.c
@@ -1,5 +1,6 @@ 
 #include "cache.h"
 #include "urlmatch.h"
+#include "config.h"
 
 #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
 #define URL_DIGIT "0123456789"
@@ -106,6 +107,59 @@  static int match_host(const struct url_info *url_info,
 	return (!url_len && !pat_len);
 }
 
+/*
+ * Call this method when we have detected credentials within the 'url' in
+ * the form
+ *
+ *     scheme://username:password@domain[:port][/path]
+ *
+ * The 'scheme_len' value should be equal to the string length of the
+ * "scheme://" portion of the URL.
+ *
+ * The fetch.credentialsInUrl config indicates what to do on such a URL,
+ * either ignoring, warning, or die()ing. The latter two modes write a
+ * redacted URL to stderr.
+ */
+static void detected_credentials_in_url(const char *url, size_t scheme_len)
+{
+	const char *value;
+	const char *at_ptr;
+	const char *colon_ptr;
+	struct strbuf redacted = STRBUF_INIT;
+
+	/* "allow" is the default behavior. */
+	if (git_config_get_string_tmp("fetch.credentialsinurl", &value) ||
+	    !strcmp("allow", value))
+		return;
+
+	at_ptr = strchr(url, '@');
+	colon_ptr = strchr(url + scheme_len + 3, ':');
+
+	/*
+	 * Let's do some defensive programming to ensure the given
+	 * URL is of the proper format.
+	 */
+	if (!colon_ptr)
+		BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX,
+		    url, (uintmax_t) scheme_len);
+	if (colon_ptr > at_ptr)
+		BUG("input url '%s' does not include credentials",
+		    url);
+
+	/* Include the colon when creating the redacted URL. */
+	colon_ptr++;
+	strbuf_addstr(&redacted, url);
+	strbuf_splice(&redacted, colon_ptr - url, at_ptr - colon_ptr,
+		      "<redacted>", 10);
+
+	if (!strcmp("warn", value))
+		warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
+	if (!strcmp("die", value))
+		die(_("URL '%s' uses plaintext credentials"), redacted.buf);
+
+	strbuf_release(&redacted);
+}
+
 static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs)
 {
 	/*
@@ -144,6 +198,7 @@  static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 	 */
 
 	size_t url_len = strlen(url);
+	const char *orig_url = url;
 	struct strbuf norm;
 	size_t spanned;
 	size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0;
@@ -191,6 +246,7 @@  static char *url_normalize_1(const char *url, struct url_info *out_info, char al
 			}
 			colon_ptr = strchr(norm.buf + scheme_len + 3, ':');
 			if (colon_ptr) {
+				detected_credentials_in_url(orig_url, scheme_len);
 				passwd_off = (colon_ptr + 1) - norm.buf;
 				passwd_len = norm.len - passwd_off;
 				user_len = (passwd_off - 1) - (scheme_len + 3);