Message ID | pull.1237.git.1653329044940.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | urlmatch: create fetch.credentialsInUrl config | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > Create a new "fetch.credentialsInUrl" config option and teach Git to > warn or die when seeing a URL with this kind of information. The warning > anonymizes the sensitive information of the URL to be clear about the > issue. The issue sounds vaguely familiar---I must have heard something similar on this list not in too distant past. > This change currently defaults the behavior to "ignore" which does > nothing with these URLs. We can consider changing this behavior to > "warn" by default if we wish. At that time, we may want to add some > advice about setting fetch.credentialsInUrl=ignore for users who still > want to follow this pattern (and not receive the warning). It sounds more like "pass" than "ignore", the latter of which can be read as "strip" instead of "pass it as-is". The name "warn", and its stronger form "die", both sound right. > ... Running the test suite succeeds except for the > explicit username:password URLs used in t5550-http-fetch-dumb.s and > t5541-http-push-smart.sh. This means that all other tested URLs did not > trigger this logic. We are not testing the form we are not encouraging, in other words ;-). > urlmatch: create fetch.credentialsInUrl config > > This is a modified version of the patch I submitted a while ago [1]. > > Based on the feedback, changing the behavior to fail by default was not > a good approach. Further, the idea to stop storing the credentials in > config and redirect them to a credential manager was already considered > by Peff [2] but not merged. I just peeked [2] and I am not sure why we didn't X-<. The solution there covers "git clone" that records the origin URL but this one would cover URL regardless of where the URL came from---as long as an insecure URL is used, we warn or die, and it is even against the URL that came from the command line. In a sense, I think these are more or less orthogonal. [2]'s "clone can strip the user:pass from the URL it writes to the config, while passing user:pass to the credential API", especially if it is extended to "git remote add", would stop two common avenues that such an insecure URL can go to the configuration file. The approach taken by this patch would complement it to a degree, as long as the user cares. I am not sure if there is a legitimate case where the user does not care, though. For a script, it may be handy if a URL can contain an ever-changing user:pass pair, where the pass is generated by something like s/key, for example, and for such a command line that knowingly have user:pass pair, having to set the configuration to "ignore" may be cumbersome. > +fetch.credentialsInUrl:: > + A URL can contain plaintext credentials in the form > + `protocol://<user>:<password>@domain/path`. Using such URLs is not > + recommended as it exposes the password in multiple ways. The > + `fetch.credentialsInUrl` option provides instruction for how Git > + should react to seeing such a URL, with these values: > ++ > +* `ignore` (default): Git will proceed with its activity without warning. > +* `warn`: Git will write a warning message to `stderr` when parsing a URL > + with a plaintext credential. > +* `die`: Git will write a failure message to `stderr` when parsing a URL > + with a plaintext credential. Sounds sensible (modulo I would suggest "ignore" -> "pass"). > + grep "warning: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err && Makes sure that the password part is redacted, which is good. > + test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err && > + grep "fatal: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err Ditto. > diff --git a/urlmatch.c b/urlmatch.c > index b615adc923a..6b91fb648a7 100644 > --- a/urlmatch.c > +++ b/urlmatch.c > @@ -1,5 +1,6 @@ > #include "cache.h" > #include "urlmatch.h" > +#include "config.h" Yuck. Having to do config lookups at this deep a level in the callchain does not look too attractive to me. I am wondering if we can make it the responsibility of the callers to figure out and pass down the settings of the new configuration variable. Offhand I do not think of an easy and clean way to do so (well, "easy" is easy---add one to the list of globals in environment.c; "clean" is the harder part). > #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" > #define URL_DIGIT "0123456789" > @@ -106,6 +107,46 @@ static int match_host(const struct url_info *url_info, > return (!url_len && !pat_len); > } > > +static void detected_credentials_in_url(const char *url, size_t scheme_len) > +{ > + char *value = NULL; > + const char *at_ptr; > + const char *colon_ptr; > + struct strbuf anonymized = STRBUF_INIT; > + > + /* "ignore" is the default behavior. */ > + if (git_config_get_string("fetch.credentialsinurl", &value) || > + !strcasecmp("ignore", value)) > + goto cleanup; > + > + at_ptr = strchr(url, '@'); > + colon_ptr = strchr(url + scheme_len + 3, ':'); We expect that at_ptr would come after colon_ptr (i.e. in "scheme://<u>:<p>@host", no @ exists in <u> or <p> part) and the while() loop below assumes that for redacting. Are we better off if we assert it here, or has the calling parser already rejected such cases? > + if (!colon_ptr) > + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, > + url, (uintmax_t) scheme_len); > + > + /* Include everything including the colon. */ > + colon_ptr++; > + strbuf_add(&anonymized, url, colon_ptr - url); > + > + while (colon_ptr < at_ptr) { > + strbuf_addch(&anonymized, '*'); > + colon_ptr++; > + } > + > + strbuf_addstr(&anonymized, at_ptr); > + > + if (!strcasecmp("warn", value)) > + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + if (!strcasecmp("die", value)) > + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + > +cleanup: > + free(value); > + strbuf_release(&anonymized); > +} > + So far, looking good. > @@ -144,6 +185,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al > */ > > size_t url_len = strlen(url); > + const char *orig_url = url; > struct strbuf norm; > size_t spanned; > size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0; > @@ -191,6 +233,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al > } > colon_ptr = strchr(norm.buf + scheme_len + 3, ':'); > if (colon_ptr) { > + detected_credentials_in_url(orig_url, scheme_len); > passwd_off = (colon_ptr + 1) - norm.buf; > passwd_len = norm.len - passwd_off; > user_len = (passwd_off - 1) - (scheme_len + 3); > > base-commit: f9b95943b68b6b8ca5a6072f50a08411c6449b55 Thanks.
On 5/23/2022 3:06 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> Create a new "fetch.credentialsInUrl" config option and teach Git to >> warn or die when seeing a URL with this kind of information. The warning >> anonymizes the sensitive information of the URL to be clear about the >> issue. > > The issue sounds vaguely familiar---I must have heard something > similar on this list not in too distant past. It certainly felt like not too distant, but [1] was over a year ago! >> This change currently defaults the behavior to "ignore" which does >> nothing with these URLs. We can consider changing this behavior to >> "warn" by default if we wish. At that time, we may want to add some >> advice about setting fetch.credentialsInUrl=ignore for users who still >> want to follow this pattern (and not receive the warning). > > It sounds more like "pass" than "ignore", the latter of which can be > read as "strip" instead of "pass it as-is". Perhaps "allow" would be more clear than all of these options? > The name "warn", and its stronger form "die", both sound right. > >> ... Running the test suite succeeds except for the >> explicit username:password URLs used in t5550-http-fetch-dumb.s and >> t5541-http-push-smart.sh. This means that all other tested URLs did not >> trigger this logic. > > We are not testing the form we are not encouraging, in other words ;-). Right, but in addition we are hopefully testing most of the forms we _do_ encourage, and without triggering these warnings. >> urlmatch: create fetch.credentialsInUrl config >> >> This is a modified version of the patch I submitted a while ago [1]. >> >> Based on the feedback, changing the behavior to fail by default was not >> a good approach. Further, the idea to stop storing the credentials in >> config and redirect them to a credential manager was already considered >> by Peff [2] but not merged. > > I just peeked [2] and I am not sure why we didn't X-<. The solution > there covers "git clone" that records the origin URL but this one > would cover URL regardless of where the URL came from---as long as > an insecure URL is used, we warn or die, and it is even against the > URL that came from the command line. The only reason I can guess is that credential helpers were not as commonplace then. Perhaps now is the right time to revisit it with the knowledge that more users have credential helpers for HTTPS URLs (or use SSH with registered public keys). > In a sense, I think these are more or less orthogonal. [2]'s "clone > can strip the user:pass from the URL it writes to the config, while > passing user:pass to the credential API", especially if it is > extended to "git remote add", would stop two common avenues that > such an insecure URL can go to the configuration file. The approach > taken by this patch would complement it to a degree, as long as the > user cares. I agree that these are mostly orthogonal. I think that the parsing logic in urlmatch.c would be involved in the > I am not sure if there is a legitimate case where the user does not > care, though. For a script, it may be handy if a URL can contain an > ever-changing user:pass pair, where the pass is generated by > something like s/key, for example, and for such a command line that > knowingly have user:pass pair, having to set the configuration to > "ignore" may be cumbersome. Would it make sense to check isatty(2) if we make "warn" the default? We could avoid breaking scripts and third-party tools that way. >> diff --git a/urlmatch.c b/urlmatch.c >> index b615adc923a..6b91fb648a7 100644 >> --- a/urlmatch.c >> +++ b/urlmatch.c >> @@ -1,5 +1,6 @@ >> #include "cache.h" >> #include "urlmatch.h" >> +#include "config.h" > > Yuck. Having to do config lookups at this deep a level in the > callchain does not look too attractive to me. > > I am wondering if we can make it the responsibility of the callers > to figure out and pass down the settings of the new configuration > variable. > > Offhand I do not think of an easy and clean way to do so (well, > "easy" is easy---add one to the list of globals in environment.c; > "clean" is the harder part). Even with something like a global in environment.c, what would initialize it? Would we make it be part of the default Git config so it is initialized at the start of every builtin? Or, would we initialize the config the first time we parse a URL. With that in mind, it might be good to have a static enum that stores the parsed config value and uses that immediately instead of calling git_config_get_string() repeatedly. Are there places where we might inspect a huge number of URLs? >> #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" >> #define URL_DIGIT "0123456789" >> @@ -106,6 +107,46 @@ static int match_host(const struct url_info *url_info, >> return (!url_len && !pat_len); >> } >> >> +static void detected_credentials_in_url(const char *url, size_t scheme_len) >> +{ >> + char *value = NULL; >> + const char *at_ptr; >> + const char *colon_ptr; >> + struct strbuf anonymized = STRBUF_INIT; >> + >> + /* "ignore" is the default behavior. */ >> + if (git_config_get_string("fetch.credentialsinurl", &value) || >> + !strcasecmp("ignore", value)) >> + goto cleanup; >> + >> + at_ptr = strchr(url, '@'); >> + colon_ptr = strchr(url + scheme_len + 3, ':'); > > We expect that at_ptr would come after colon_ptr (i.e. in > "scheme://<u>:<p>@host", no @ exists in <u> or <p> part) and the > while() loop below assumes that for redacting. Are we better off if > we assert it here, or has the calling parser already rejected such > cases? This computation of at_ptr matches the one in url_normalize_1(), so it at least agrees about where the "username[:password]" section could be. That does mean that the password cannot contain an "@" symbol (unless it is special-cased somehow?). Thanks, -Stolee
Junio C Hamano <gitster@pobox.com> writes: > It sounds more like "pass" than "ignore", the latter of which can be > read as "strip" instead of "pass it as-is". Or "allow" (which I prefer over "pass"). > The name "warn", and its stronger form "die", both sound right.
Derrick Stolee <derrickstolee@github.com> writes: > This computation of at_ptr matches the one in url_normalize_1(), > so it at least agrees about where the "username[:password]" section > could be. OK. > That does mean that the password cannot contain an "@" > symbol (unless it is special-cased somehow?). I wasn't worried about what is valid but more about what attackers can fool users to throw at "git clone" and make our code misbehave (which can include garbage that would not parse correctly). I think the while() loop will just become a no-op, anonymized buffer is left empty and colon_ptr does not get updated at all. Then strbuf_addstr() after the loop will put everything from '@' to the strbuf to be showed, and none of these should lead to any overrun or exploit. Thanks.
On Mon, May 23 2022, Derrick Stolee via GitGitGadget wrote: > +fetch.credentialsInUrl:: > + A URL can contain plaintext credentials in the form > + `protocol://<user>:<password>@domain/path`. Using such URLs is not > + recommended as it exposes the password in multiple ways. The > + `fetch.credentialsInUrl` option provides instruction for how Git > + should react to seeing such a URL, with these values: Re the previous discussion about this (in the v1 patch / https://lore.kernel.org/git/pull.945.git.1619807844627.gitgitgadget@gmail.com/): In what ways? That's rhetorical, the point being: Let's adjust this documentation to discuss exactly why this is thought to be bad, what we're mitigating for the user etc., are there situations where running git like this is perfectly fine & not thought to be an issue? E.g. no password manager and you trust your FS permission? Let's cover those cases too. > ++ > +* `ignore` (default): Git will proceed with its activity without warning. > +* `warn`: Git will write a warning message to `stderr` when parsing a URL > + with a plaintext credential. > +* `die`: Git will write a failure message to `stderr` when parsing a URL > + with a plaintext credential. You're implementing this with strcasecmp, so we also support DIE, DiE etc. Let's not do that and use strcmp() instead. > diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh > index 4a61f2c901e..34be520b783 100755 > --- a/t/t5601-clone.sh > +++ b/t/t5601-clone.sh > @@ -71,6 +71,13 @@ test_expect_success 'clone respects GIT_WORK_TREE' ' > > ' > > +test_expect_success 'clone warns or fails when using username:password' ' > + test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err && > + grep "warning: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err && > + test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err && > + grep "fatal: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err > +' > + > test_expect_success 'clone from hooks' ' > > test_create_repo r0 && > diff --git a/urlmatch.c b/urlmatch.c > index b615adc923a..6b91fb648a7 100644 > --- a/urlmatch.c > +++ b/urlmatch.c > @@ -1,5 +1,6 @@ > #include "cache.h" > #include "urlmatch.h" > +#include "config.h" > > #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" > #define URL_DIGIT "0123456789" > @@ -106,6 +107,46 @@ static int match_host(const struct url_info *url_info, > return (!url_len && !pat_len); > } > > +static void detected_credentials_in_url(const char *url, size_t scheme_len) > +{ Just generally: This is only > + char *value = NULL; This init to NULL should be removedd, as we.... > + const char *at_ptr; > + const char *colon_ptr; > + struct strbuf anonymized = STRBUF_INIT; nit: Just call this "sb"? The's at least one line below over 79 characters that's within the bounds with a shorter variable name, and in this case it's obvious what we're doing here... > + > + /* "ignore" is the default behavior. */ > + if (git_config_get_string("fetch.credentialsinurl", &value) || ...initialize it here, and if we didn't the compiler would have a chance to spot that if we were getting it wrong. > + !strcasecmp("ignore", value)) > + goto cleanup; > + > + at_ptr = strchr(url, '@'); > + colon_ptr = strchr(url + scheme_len + 3, ':'); > + > + if (!colon_ptr) > + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, > + url, (uintmax_t) scheme_len); > + > + /* Include everything including the colon. */ > + colon_ptr++; > + strbuf_add(&anonymized, url, colon_ptr - url); > + > + while (colon_ptr < at_ptr) { > + strbuf_addch(&anonymized, '*'); > + colon_ptr++; > + } Could we share code with 88e9b1e3fcb (fetch-pack: redact packfile urls in traces, 2021-11-10), or for consistency note this as <redacted> instead of stripping it out, as we do for that related URL-part redaction? > + strbuf_addstr(&anonymized, at_ptr); Maybe not worth it, but I wondered if we couldn't just use curl for this, turns out it has an API for it: https://curl.se/libcurl/c/libcurl-url.html But it's too new for us to rely on unconditionally, but we could add that to git-curl-compat.h and ifdef it, then we'll eventually drop this custom code for ryling on the well-tested library. I think doing that would be worth it, to show future authors that curl can do this, so maybe we can start relying on that eventually... > + if (!strcasecmp("warn", value)) > + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + if (!strcasecmp("die", value)) > + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + > +cleanup: > + free(value); I think you can also just use git_config_get_string_tmp() here and avoid the alloc/free. That's safe as long as you're not calling other config API in-between, which you're not.
Hi Stolee, On Mon, 23 May 2022, Derrick Stolee via GitGitGadget wrote: > diff --git a/urlmatch.c b/urlmatch.c > index b615adc923a..6b91fb648a7 100644 > --- a/urlmatch.c > +++ b/urlmatch.c > +static void detected_credentials_in_url(const char *url, size_t scheme_len) > +{ > + char *value = NULL; > + const char *at_ptr; > + const char *colon_ptr; > + struct strbuf anonymized = STRBUF_INIT; > + > + /* "ignore" is the default behavior. */ > + if (git_config_get_string("fetch.credentialsinurl", &value) || > + !strcasecmp("ignore", value)) > + goto cleanup; > + > + at_ptr = strchr(url, '@'); > + colon_ptr = strchr(url + scheme_len + 3, ':'); How certain are we that `url + scheme_len + 3` is still inside the NUL-separated `url`? > + > + if (!colon_ptr) > + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, > + url, (uintmax_t) scheme_len); Wouldn't this mean that `https://github.com/git/git` with a `scheme_len` of 5 would hit that `BUG()` code path? Thanks, Dscho > + > + /* Include everything including the colon. */ > + colon_ptr++; > + strbuf_add(&anonymized, url, colon_ptr - url); > + > + while (colon_ptr < at_ptr) { > + strbuf_addch(&anonymized, '*'); > + colon_ptr++; > + } > + > + strbuf_addstr(&anonymized, at_ptr); > + > + if (!strcasecmp("warn", value)) > + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + if (!strcasecmp("die", value)) > + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); > + > +cleanup: > + free(value); > + strbuf_release(&anonymized); > +} > + > static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs) > { > /*
Hi Stolee, On Mon, 23 May 2022, Derrick Stolee wrote: > On 5/23/2022 3:06 PM, Junio C Hamano wrote: > > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > >> +static void detected_credentials_in_url(const char *url, size_t scheme_len) > >> +{ > >> + char *value = NULL; > >> + const char *at_ptr; > >> + const char *colon_ptr; > >> + struct strbuf anonymized = STRBUF_INIT; > >> + > >> + /* "ignore" is the default behavior. */ > >> + if (git_config_get_string("fetch.credentialsinurl", &value) || > >> + !strcasecmp("ignore", value)) > >> + goto cleanup; > >> + > >> + at_ptr = strchr(url, '@'); > >> + colon_ptr = strchr(url + scheme_len + 3, ':'); > > > > We expect that at_ptr would come after colon_ptr (i.e. in > > "scheme://<u>:<p>@host", no @ exists in <u> or <p> part) and the > > while() loop below assumes that for redacting. Careful here. https://me@there.com:9999/ _is_ a valid URL, too. > > Are we better off if we assert it here, or has the calling parser > > already rejected such cases? > > This computation of at_ptr matches the one in url_normalize_1(), > so it at least agrees about where the "username[:password]" section > could be. That does mean that the password cannot contain an "@" > symbol (unless it is special-cased somehow?). The password cannot contain a literal `@`, and neither can the user name. They have to be URL-encoded, via `%40`. Ciao, Dscho
Hi Junio, On Mon, 23 May 2022, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > > > urlmatch: create fetch.credentialsInUrl config > > > > This is a modified version of the patch I submitted a while ago [1]. > > > > Based on the feedback, changing the behavior to fail by default was not > > a good approach. Further, the idea to stop storing the credentials in > > config and redirect them to a credential manager was already considered > > by Peff [2] but not merged. > > I just peeked [2] and I am not sure why we didn't X-<. The solution > there covers "git clone" that records the origin URL but this one > would cover URL regardless of where the URL came from---as long as > an insecure URL is used, we warn or die, and it is even against the > URL that came from the command line. > > In a sense, I think these are more or less orthogonal. [2]'s "clone > can strip the user:pass from the URL it writes to the config, while > passing user:pass to the credential API", especially if it is > extended to "git remote add", would stop two common avenues that > such an insecure URL can go to the configuration file. The approach > taken by this patch would complement it to a degree, as long as the > user cares. > > I am not sure if there is a legitimate case where the user does not > care, though. For a script, it may be handy if a URL can contain an > ever-changing user:pass pair, where the pass is generated by > something like s/key, for example, and for such a command line that > knowingly have user:pass pair, having to set the configuration to > "ignore" may be cumbersome. To provide one data point: a few of Git for Windows' automated builds use the `https://user@pass:host/` form to clone and push, using a Personal Access Token as password (that is of course marked as a secret, read: it will get redacted out of the logs). So yes, there are scripts that rely on Git's current behavior to work. If Git changes behavior, I will have to adjust those build definitions. In this instance, I believe that the benefit of safeguarding Git's users outweighs the burden of having to adjust such scripts/definitions. Ciao, Dscho
On 5/24/2022 4:18 AM, Ævar Arnfjörð Bjarmason wrote: > > On Mon, May 23 2022, Derrick Stolee via GitGitGadget wrote: > >> +fetch.credentialsInUrl:: >> + A URL can contain plaintext credentials in the form >> + `protocol://<user>:<password>@domain/path`. Using such URLs is not >> + recommended as it exposes the password in multiple ways. The >> + `fetch.credentialsInUrl` option provides instruction for how Git >> + should react to seeing such a URL, with these values: > > Re the previous discussion about this (in the v1 patch / > https://lore.kernel.org/git/pull.945.git.1619807844627.gitgitgadget@gmail.com/): > In what ways? > > That's rhetorical, the point being: Let's adjust this documentation to > discuss exactly why this is thought to be bad, what we're mitigating for > the user etc., are there situations where running git like this is > perfectly fine & not thought to be an issue? E.g. no password manager > and you trust your FS permission? Let's cover those cases too. This documentation is not the proper place to tell the user "do this and you can trust your plaintext creds in the filesystem" because that is asking for problems. I'd rather leave a vague warning and let users go against the recommended behavior only after they have done sufficient work to be confident in taking on that risk. >> ++ >> +* `ignore` (default): Git will proceed with its activity without warning. >> +* `warn`: Git will write a warning message to `stderr` when parsing a URL >> + with a plaintext credential. >> +* `die`: Git will write a failure message to `stderr` when parsing a URL >> + with a plaintext credential. > > You're implementing this with strcasecmp, so we also support DIE, DiE > etc. Let's not do that and use strcmp() instead. Sure. >> +static void detected_credentials_in_url(const char *url, size_t scheme_len) >> +{ > > Just generally: This is only Did you intend to say more here? >> + char *value = NULL; > > This init to NULL should be removedd, as we.... > >> + const char *at_ptr; >> + const char *colon_ptr; >> + struct strbuf anonymized = STRBUF_INIT; > > nit: Just call this "sb"? The's at least one line below over 79 > characters that's within the bounds with a shorter variable name, and in > this case it's obvious what we're doing here... I will not change this name to be less descriptive. >> + >> + /* "ignore" is the default behavior. */ >> + if (git_config_get_string("fetch.credentialsinurl", &value) || > > ...initialize it here, and if we didn't the compiler would have a chance > to spot that if we were getting it wrong. We do not necessarily initialize it here. The compiler doesn't notice it and the free(value) below segfaults. >> + !strcasecmp("ignore", value)) >> + goto cleanup; >> + >> + at_ptr = strchr(url, '@'); >> + colon_ptr = strchr(url + scheme_len + 3, ':'); >> + >> + if (!colon_ptr) >> + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, >> + url, (uintmax_t) scheme_len); >> + >> + /* Include everything including the colon. */ >> + colon_ptr++; >> + strbuf_add(&anonymized, url, colon_ptr - url); >> + >> + while (colon_ptr < at_ptr) { >> + strbuf_addch(&anonymized, '*'); >> + colon_ptr++; >> + } > > Could we share code with 88e9b1e3fcb (fetch-pack: redact packfile urls > in traces, 2021-11-10), or for consistency note this as <redacted> > instead of stripping it out, as we do for that related URL-part > redaction? I'm happy to replace the asterisks with <redacted>. Otherwise, I don't see anything we can do to share across these methods. Specifically, the test in that commit seems to indicate that the redacted portion is only the packfile name (the $HTTPD_URL is not filtered). >> + strbuf_addstr(&anonymized, at_ptr); > > Maybe not worth it, but I wondered if we couldn't just use curl for > this, turns out it has an API for it: > https://curl.se/libcurl/c/libcurl-url.html > > But it's too new for us to rely on unconditionally, but we could add > that to git-curl-compat.h and ifdef it, then we'll eventually drop this > custom code for ryling on the well-tested library. > > I think doing that would be worth it, to show future authors that curl > can do this, so maybe we can start relying on that eventually... Since we can't rely on it, I'll leave that to another (you, perhaps?) to do that ifdef work. I don't think it's worth it right now. >> + if (!strcasecmp("warn", value)) >> + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); >> + if (!strcasecmp("die", value)) >> + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); >> + >> +cleanup: >> + free(value); > > I think you can also just use git_config_get_string_tmp() here and avoid > the alloc/free. That's safe as long as you're not calling other config > API in-between, which you're not. OK. And that also avoids the need for initialization you mentioned. Thanks, -Stolee
On 5/24/2022 7:46 AM, Johannes Schindelin wrote: > Hi Stolee, > > On Mon, 23 May 2022, Derrick Stolee wrote: > >> On 5/23/2022 3:06 PM, Junio C Hamano wrote: >>> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: >>> >>>> +static void detected_credentials_in_url(const char *url, size_t scheme_len) >>>> +{ >>>> + char *value = NULL; >>>> + const char *at_ptr; >>>> + const char *colon_ptr; >>>> + struct strbuf anonymized = STRBUF_INIT; >>>> + >>>> + /* "ignore" is the default behavior. */ >>>> + if (git_config_get_string("fetch.credentialsinurl", &value) || >>>> + !strcasecmp("ignore", value)) >>>> + goto cleanup; >>>> + >>>> + at_ptr = strchr(url, '@'); >>>> + colon_ptr = strchr(url + scheme_len + 3, ':'); >>> >>> We expect that at_ptr would come after colon_ptr (i.e. in >>> "scheme://<u>:<p>@host", no @ exists in <u> or <p> part) and the >>> while() loop below assumes that for redacting. > > Careful here. https://me@there.com:9999/ _is_ a valid URL, too. Thanks for checking. The method should not be called unless the password region was already detected. I'll add a BUG() statement and a comment to prevent future callers from providing incorrect URLs. I can also add a test to show that this warning is not output when the only colon is for the port number. >>> Are we better off if we assert it here, or has the calling parser >>> already rejected such cases? >> >> This computation of at_ptr matches the one in url_normalize_1(), >> so it at least agrees about where the "username[:password]" section >> could be. That does mean that the password cannot contain an "@" >> symbol (unless it is special-cased somehow?). > > The password cannot contain a literal `@`, and neither can the user name. > They have to be URL-encoded, via `%40`. Thanks! -Stolee
On 5/24/2022 7:42 AM, Johannes Schindelin wrote: > Hi Stolee, > > On Mon, 23 May 2022, Derrick Stolee via GitGitGadget wrote: >> + at_ptr = strchr(url, '@'); >> + colon_ptr = strchr(url + scheme_len + 3, ':'); > > How certain are we that `url + scheme_len + 3` is still inside the > NUL-separated `url`? I'll update the method comment to make this clear. >> + >> + if (!colon_ptr) >> + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, >> + url, (uintmax_t) scheme_len); > > Wouldn't this mean that `https://github.com/git/git` with a `scheme_len` > of 5 would hit that `BUG()` code path? Yes. The method is about what to do once we've detected a URL with a "username:password@" combination after the scheme. I mentioned in a different reply, but I'll make this clear with a comment. Thanks, -Stolee
On Tue, May 24 2022, Derrick Stolee wrote: > On 5/24/2022 4:18 AM, Ævar Arnfjörð Bjarmason wrote: >> >> On Mon, May 23 2022, Derrick Stolee via GitGitGadget wrote: >> >>> +fetch.credentialsInUrl:: >>> + A URL can contain plaintext credentials in the form >>> + `protocol://<user>:<password>@domain/path`. Using such URLs is not >>> + recommended as it exposes the password in multiple ways. The >>> + `fetch.credentialsInUrl` option provides instruction for how Git >>> + should react to seeing such a URL, with these values: >> >> Re the previous discussion about this (in the v1 patch / >> https://lore.kernel.org/git/pull.945.git.1619807844627.gitgitgadget@gmail.com/): >> In what ways? >> >> That's rhetorical, the point being: Let's adjust this documentation to >> discuss exactly why this is thought to be bad, what we're mitigating for >> the user etc., are there situations where running git like this is >> perfectly fine & not thought to be an issue? E.g. no password manager >> and you trust your FS permission? Let's cover those cases too. > > This documentation is not the proper place to tell the user "do this > and you can trust your plaintext creds in the filesystem" because that > is asking for problems. I'd rather leave a vague warning and let users > go against the recommended behavior only after they have done sufficient > work to be confident in taking on that risk. I don't mean that we need to cover the full divergent views on different approaches to local password management, but not leave the user hanging with the rather scary "exposes the password in multiple ways". I.e. if I read that for any software whose implementation I wasn't very familiar with I'd be very afraid, and in git's case for no reason. Does in mean that git has some scary git-specific feature that would expose it. perhaps there's a local log that's unsecured where attempted URLs are logged, or perhaps we send the raw requested URL to the server so it can suggest alternatives for us. We do neither, but even a generally knowledgeable user won't know that about git in particular. Whereas what I think you actually mean and are targeting here is better explained by: Git is careful to avoid exposing passwords in URLs on its own, e.g. they won't be logged in trace2 logs. This setting is intended for those who'd like to discourage (warn) or enforce (die) the use of the password helper infrastructure over hardcoded passwords. All of which I *think* is correct, but maybe I've missed something you know about, as that "in multiple ways" is doing a lot of work. I also wonder if this wouldn't be even more useful if we took some lessons from ssh's book. I.e. per "git config -l --show-origin" we know the original of all config. We could be even more useful (and more aggressive about warning about) cases where we have passwords in config files that we detect don't have restrictive permissions, as OpenSSH does with your private key. Ditto perhaps when the origin is "command line", as we do nothing to hide that from the process list on shared systems (and that would be racy whatever we did). >>> ++ >>> +* `ignore` (default): Git will proceed with its activity without warning. >>> +* `warn`: Git will write a warning message to `stderr` when parsing a URL >>> + with a plaintext credential. >>> +* `die`: Git will write a failure message to `stderr` when parsing a URL >>> + with a plaintext credential. >> >> You're implementing this with strcasecmp, so we also support DIE, DiE >> etc. Let's not do that and use strcmp() instead. > > Sure. > >>> +static void detected_credentials_in_url(const char *url, size_t scheme_len) >>> +{ >> >> Just generally: This is only > > Did you intend to say more here? Probably, but if I did I forgot about it, by now. Sorry. >>> + char *value = NULL; >> >> This init to NULL should be removedd, as we.... >> >>> + const char *at_ptr; >>> + const char *colon_ptr; >>> + struct strbuf anonymized = STRBUF_INIT; >> >> nit: Just call this "sb"? The's at least one line below over 79 >> characters that's within the bounds with a shorter variable name, and in >> this case it's obvious what we're doing here... > > I will not change this name to be less descriptive. Sure, just a suggestion. The other way is to just re-wrap that one line... :) In the end I don't care, "just a nit", but just as one datapoint from reading this code: I find this varibale name in particular to be the polar opposite of descriptive, we're explicitly not anonymizing the URL in this function, since we're not stripping the username part. Wouldn't descriptive be something more like uri_redacted_password or uri_no_password in this case? >>> + >>> + /* "ignore" is the default behavior. */ >>> + if (git_config_get_string("fetch.credentialsinurl", &value) || >> >> ...initialize it here, and if we didn't the compiler would have a chance >> to spot that if we were getting it wrong. > > We do not necessarily initialize it here. The compiler doesn't notice > it and the free(value) below segfaults. Yes, sorry I meant in combination with the *_tmp() variant below... >>> + !strcasecmp("ignore", value)) >>> + goto cleanup; >>> + >>> + at_ptr = strchr(url, '@'); >>> + colon_ptr = strchr(url + scheme_len + 3, ':'); >>> + >>> + if (!colon_ptr) >>> + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, >>> + url, (uintmax_t) scheme_len); >>> + >>> + /* Include everything including the colon. */ >>> + colon_ptr++; >>> + strbuf_add(&anonymized, url, colon_ptr - url); >>> + >>> + while (colon_ptr < at_ptr) { >>> + strbuf_addch(&anonymized, '*'); >>> + colon_ptr++; >>> + } >> >> Could we share code with 88e9b1e3fcb (fetch-pack: redact packfile urls >> in traces, 2021-11-10), or for consistency note this as <redacted> >> instead of stripping it out, as we do for that related URL-part >> redaction? > > I'm happy to replace the asterisks with <redacted>. Otherwise, I don't > see anything we can do to share across these methods. Yes, I just meant adding a "<redacted>". I briefly looked at whether it made sense to share the implementation, but I think probably not. I didn't think of this at the time but your implementation also leaks the length of the password, which <redacted> will solve in any case. Just for the implementation: It's slightly more wasteful, but in this case we don't care about performance, so perhaps a strbuf_splice() variant is easier here? I.e. add the full URL, find : and @, then strbuf_splice() it. It gets rid of much of the pointer juggling here & adding things incrementally. > Specifically, > the test in that commit seems to indicate that the redacted portion is > only the packfile name (the $HTTPD_URL is not filtered). By HTTPD_URL it means "the path", i.e. it's the equivalent of stripping CURLUPART_{PATH,QUERY,FRAGMENT}. So a hypothetical shared implementation would just be a matter of searching for the '/' once we're past the (optional) '@', but better to leave it for now. >>> + strbuf_addstr(&anonymized, at_ptr); >> >> Maybe not worth it, but I wondered if we couldn't just use curl for >> this, turns out it has an API for it: >> https://curl.se/libcurl/c/libcurl-url.html >> >> But it's too new for us to rely on unconditionally, but we could add >> that to git-curl-compat.h and ifdef it, then we'll eventually drop this >> custom code for ryling on the well-tested library. >> >> I think doing that would be worth it, to show future authors that curl >> can do this, so maybe we can start relying on that eventually... > > Since we can't rely on it, I'll leave that to another (you, perhaps?) > to do that ifdef work. I don't think it's worth it right now. Yeah, probably not. >>> + if (!strcasecmp("warn", value)) >>> + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); >>> + if (!strcasecmp("die", value)) >>> + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); >>> + >>> +cleanup: >>> + free(value); >> >> I think you can also just use git_config_get_string_tmp() here and avoid >> the alloc/free. That's safe as long as you're not calling other config >> API in-between, which you're not. > > OK. And that also avoids the need for initialization you mentioned. *nod*
On 5/24/2022 5:01 PM, Ævar Arnfjörð Bjarmason wrote: > > On Tue, May 24 2022, Derrick Stolee wrote: > >> On 5/24/2022 4:18 AM, Ævar Arnfjörð Bjarmason wrote: >>> >>> On Mon, May 23 2022, Derrick Stolee via GitGitGadget wrote: >>> >>>> +fetch.credentialsInUrl:: >>>> + A URL can contain plaintext credentials in the form >>>> + `protocol://<user>:<password>@domain/path`. Using such URLs is not >>>> + recommended as it exposes the password in multiple ways. The >>>> + `fetch.credentialsInUrl` option provides instruction for how Git >>>> + should react to seeing such a URL, with these values: >>> >>> Re the previous discussion about this (in the v1 patch / >>> https://lore.kernel.org/git/pull.945.git.1619807844627.gitgitgadget@gmail.com/): >>> In what ways? >>> >>> That's rhetorical, the point being: Let's adjust this documentation to >>> discuss exactly why this is thought to be bad, what we're mitigating for >>> the user etc., are there situations where running git like this is >>> perfectly fine & not thought to be an issue? E.g. no password manager >>> and you trust your FS permission? Let's cover those cases too. >> >> This documentation is not the proper place to tell the user "do this >> and you can trust your plaintext creds in the filesystem" because that >> is asking for problems. I'd rather leave a vague warning and let users >> go against the recommended behavior only after they have done sufficient >> work to be confident in taking on that risk. > > I don't mean that we need to cover the full divergent views on different > approaches to local password management, but not leave the user hanging > with the rather scary "exposes the password in multiple ways". > > I.e. if I read that for any software whose implementation I wasn't very > familiar with I'd be very afraid, and in git's case for no reason. > > Does in mean that git has some scary git-specific feature that would > expose it. perhaps there's a local log that's unsecured where attempted > URLs are logged, or perhaps we send the raw requested URL to the server > so it can suggest alternatives for us. We do neither, but even a > generally knowledgeable user won't know that about git in particular. > > Whereas what I think you actually mean and are targeting here is better > explained by: > > Git is careful to avoid exposing passwords in URLs on its own, > e.g. they won't be logged in trace2 logs. This setting is intended > for those who'd like to discourage (warn) or enforce (die) the use > of the password helper infrastructure over hardcoded passwords. > > All of which I *think* is correct, but maybe I've missed something you > know about, as that "in multiple ways" is doing a lot of work. > > I also wonder if this wouldn't be even more useful if we took some > lessons from ssh's book. I.e. per "git config -l --show-origin" we know > the original of all config. We could be even more useful (and more > aggressive about warning about) cases where we have passwords in config > files that we detect don't have restrictive permissions, as OpenSSH does > with your private key. > > Ditto perhaps when the origin is "command line", as we do nothing to > hide that from the process list on shared systems (and that would be > racy whatever we did). I tried to be careful about how "it" (being "Using such URLs") can expose the password includes things that are not under Git's responsibility (such as command-line histories and other system-level logs) but I can add a bit about how Git stores the plaintext password in the repository's config. >>>> + char *value = NULL; >>> >>> This init to NULL should be removedd, as we.... >>> >>>> + const char *at_ptr; >>>> + const char *colon_ptr; >>>> + struct strbuf anonymized = STRBUF_INIT; >>> >>> nit: Just call this "sb"? The's at least one line below over 79 >>> characters that's within the bounds with a shorter variable name, and in >>> this case it's obvious what we're doing here... >> >> I will not change this name to be less descriptive. > > Sure, just a suggestion. The other way is to just re-wrap that one > line... :) > > In the end I don't care, "just a nit", but just as one datapoint from > reading this code: I find this varibale name in particular to be the > polar opposite of descriptive, we're explicitly not anonymizing the URL > in this function, since we're not stripping the username part. > > Wouldn't descriptive be something more like uri_redacted_password or > uri_no_password in this case? How about "redacted"? > Just for the implementation: It's slightly more wasteful, but in this > case we don't care about performance, so perhaps a strbuf_splice() > variant is easier here? I.e. add the full URL, find : and @, then > strbuf_splice() it. It gets rid of much of the pointer juggling here & > adding things incrementally. TIL. strbuf_splice() will work perfectly. Thanks. -Stolee
diff --git a/Documentation/config/fetch.txt b/Documentation/config/fetch.txt index cd65d236b43..6aa2a0bec19 100644 --- a/Documentation/config/fetch.txt +++ b/Documentation/config/fetch.txt @@ -96,3 +96,16 @@ fetch.writeCommitGraph:: merge and the write may take longer. Having an updated commit-graph file helps performance of many Git commands, including `git merge-base`, `git push -f`, and `git log --graph`. Defaults to false. + +fetch.credentialsInUrl:: + A URL can contain plaintext credentials in the form + `protocol://<user>:<password>@domain/path`. Using such URLs is not + recommended as it exposes the password in multiple ways. The + `fetch.credentialsInUrl` option provides instruction for how Git + should react to seeing such a URL, with these values: ++ +* `ignore` (default): Git will proceed with its activity without warning. +* `warn`: Git will write a warning message to `stderr` when parsing a URL + with a plaintext credential. +* `die`: Git will write a failure message to `stderr` when parsing a URL + with a plaintext credential. diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 4a61f2c901e..34be520b783 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -71,6 +71,13 @@ test_expect_success 'clone respects GIT_WORK_TREE' ' ' +test_expect_success 'clone warns or fails when using username:password' ' + test_must_fail git -c fetch.credentialsInUrl=warn clone https://username:password@localhost attempt1 2>err && + grep "warning: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err && + test_must_fail git -c fetch.credentialsInUrl=die clone https://username:password@localhost attempt2 2>err && + grep "fatal: URL '\''https://username:\*\*\*\*\*\*\*\*@localhost/'\'' uses plaintext credentials" err +' + test_expect_success 'clone from hooks' ' test_create_repo r0 && diff --git a/urlmatch.c b/urlmatch.c index b615adc923a..6b91fb648a7 100644 --- a/urlmatch.c +++ b/urlmatch.c @@ -1,5 +1,6 @@ #include "cache.h" #include "urlmatch.h" +#include "config.h" #define URL_ALPHA "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" #define URL_DIGIT "0123456789" @@ -106,6 +107,46 @@ static int match_host(const struct url_info *url_info, return (!url_len && !pat_len); } +static void detected_credentials_in_url(const char *url, size_t scheme_len) +{ + char *value = NULL; + const char *at_ptr; + const char *colon_ptr; + struct strbuf anonymized = STRBUF_INIT; + + /* "ignore" is the default behavior. */ + if (git_config_get_string("fetch.credentialsinurl", &value) || + !strcasecmp("ignore", value)) + goto cleanup; + + at_ptr = strchr(url, '@'); + colon_ptr = strchr(url + scheme_len + 3, ':'); + + if (!colon_ptr) + BUG("failed to find colon in url '%s' with scheme_len %"PRIuMAX, + url, (uintmax_t) scheme_len); + + /* Include everything including the colon. */ + colon_ptr++; + strbuf_add(&anonymized, url, colon_ptr - url); + + while (colon_ptr < at_ptr) { + strbuf_addch(&anonymized, '*'); + colon_ptr++; + } + + strbuf_addstr(&anonymized, at_ptr); + + if (!strcasecmp("warn", value)) + warning(_("URL '%s' uses plaintext credentials"), anonymized.buf); + if (!strcasecmp("die", value)) + die(_("URL '%s' uses plaintext credentials"), anonymized.buf); + +cleanup: + free(value); + strbuf_release(&anonymized); +} + static char *url_normalize_1(const char *url, struct url_info *out_info, char allow_globs) { /* @@ -144,6 +185,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al */ size_t url_len = strlen(url); + const char *orig_url = url; struct strbuf norm; size_t spanned; size_t scheme_len, user_off=0, user_len=0, passwd_off=0, passwd_len=0; @@ -191,6 +233,7 @@ static char *url_normalize_1(const char *url, struct url_info *out_info, char al } colon_ptr = strchr(norm.buf + scheme_len + 3, ':'); if (colon_ptr) { + detected_credentials_in_url(orig_url, scheme_len); passwd_off = (colon_ptr + 1) - norm.buf; passwd_len = norm.len - passwd_off; user_len = (passwd_off - 1) - (scheme_len + 3);