Message ID | RFC-patch-1.1-0266485bc6c-20221031T204149Z-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] fetch: stop emitting duplicate transfer.credentialsInUrl=warn warnings | expand |
On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > Improve on 6dcbdc0d661 (remote: create fetch.credentialsInUrl config, > 2022-06-06) by fixing the cases where we emit duplicate warnings, > which were: > > * In the same process, as we'd get the same "struct remote *" > again. We could probably save ourselves more work in those scenarios, > but adding a flag to the "struct remote" indicating that we've validated > the URLs will fix those issues. > > * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl") > from "git fetch". For those cases let's pass down the equivalent of a > "-c transfer.credentialsInUrl=allow", since we know that we've already > inspected our remotes in the parent process. > > See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking, > 2022-03-28) for prior use of git_config_push_parameter() for this > purpose, i.e. to push config parameters before invoking a > sub-process. Both make sense. I was skimming the diff before I read the patch message here and was scratching my head at the calls to git_config_push_parameter(). But for crossing process boundaries, that makes sense. Thanks, Taylor
On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl") > from "git fetch". For those cases let's pass down the equivalent of a > "-c transfer.credentialsInUrl=allow", since we know that we've already > inspected our remotes in the parent process. > > See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking, > 2022-03-28) for prior use of git_config_push_parameter() for this > purpose, i.e. to push config parameters before invoking a > sub-process. So I guess I don't have any _specific_ thing that goes wrong with this approach, but it really feels sketchy to me. We are lying to sub-processes about the config the user asked for. And again, I see how it works, but I wonder if this kind of approach would ever backfire on us. For example, if transfer.credentialsInUrl=warn ever ended up having any side effects besides the warning, and the sub-processes ended up skipping those effects. I know that's a hypothetical, and probably not even a likely one, but it just gets my spider sense tingling. > > Since it is not actually important how many times Git prints the > > warning/error message, as long as it prints it at least once, let's just > > make the test a bit more lenient and test for the latter instead of the > > former, which works around these CI issues. > > That being said your change is obviously smaller, so if we'd prefer > that in first as a band-aid I'm fine with that. > > But I'd really like to object to the "it is not actually important how > many...", it's crappy UX to spew duplicate output at the user, and we > should avoid it. > > So it does matter, and we get it wrong not just in this but also some > other cases. Yeah, I think it is crappy UX, too. It's just that I think the tests should not _asserting_ the bad behavior. At most, they should tolerate the bad behavior as a band-aid. So I think Dscho's patch is doing the right thing (and I do agree that we should fix the immediate CI pain by adjusting the tests, and letting the user-visible fix proceed independently). -Peff
On Mon, Oct 31 2022, Jeff King wrote: > On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > [...] >> That being said your change is obviously smaller, so if we'd prefer >> that in first as a band-aid I'm fine with that. >> >> But I'd really like to object to the "it is not actually important how >> many...", it's crappy UX to spew duplicate output at the user, and we >> should avoid it. >> >> So it does matter, and we get it wrong not just in this but also some >> other cases. > > Yeah, I think it is crappy UX, too. It's just that I think the tests > should not _asserting_ the bad behavior. At most, they should tolerate > the bad behavior as a band-aid. So I think Dscho's patch is doing the > right thing (and I do agree that we should fix the immediate CI pain by > adjusting the tests, and letting the user-visible fix proceed > independently). The tests aren't just asserting the bad behavior, they're also ensuring that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but tolerable, but if we start emitting 500 of these it would be nice to know. I've found a couple of regressions like that in other areas, i.e. where our tests didn't spot some output change because we were selectively grepping things, but where it would have been nice to spot it at the time.
On Mon, Oct 31, 2022 at 10:32:36PM -0400, Jeff King wrote: > On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote: > > > * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl") > > from "git fetch". For those cases let's pass down the equivalent of a > > "-c transfer.credentialsInUrl=allow", since we know that we've already > > inspected our remotes in the parent process. > > > > See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking, > > 2022-03-28) for prior use of git_config_push_parameter() for this > > purpose, i.e. to push config parameters before invoking a > > sub-process. > > So I guess I don't have any _specific_ thing that goes wrong with this > approach, but it really feels sketchy to me. We are lying to > sub-processes about the config the user asked for. And again, I see how > it works, but I wonder if this kind of approach would ever backfire on > us. For example, if transfer.credentialsInUrl=warn ever ended up having > any side effects besides the warning, and the sub-processes ended up > skipping those effects. > > I know that's a hypothetical, and probably not even a likely one, but it > just gets my spider sense tingling. So inherently this is going to be ugly because it's crossing process boundaries. But the more minimal fix I was thinking of would be something like this: diff --git a/remote.c b/remote.c index 60869beebe..af5f95c719 100644 --- a/remote.c +++ b/remote.c @@ -615,6 +615,14 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push) return NULL; } +static int test_and_set_env(const char *var) +{ + int ret = git_env_bool(var, 0); + if (!ret) + setenv(var, "1", 0); + return ret; +} + static void validate_remote_url(struct remote *remote) { int i; @@ -634,6 +642,9 @@ static void validate_remote_url(struct remote *remote) else die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value); + if (test_and_set_env("GIT_CHECKED_CREDENTIALS_IN_URL")) + return; + for (i = 0; i < remote->url_nr; i++) { struct url_info url_info = { 0 }; You can also put it lower in the function, when we actually warn, which saves adding to the environment when there is nothing to warn about (though this way, we avoid doing the redundant work). Compared to munging the config, this seems shorter and simpler, and there's no chance of us ever getting confused between the fake "suppress" value and something the user actually asked for. -Peff
On Tue, Nov 01 2022, Jeff King wrote: > On Mon, Oct 31, 2022 at 10:32:36PM -0400, Jeff King wrote: > >> On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote: >> >> > * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl") >> > from "git fetch". For those cases let's pass down the equivalent of a >> > "-c transfer.credentialsInUrl=allow", since we know that we've already >> > inspected our remotes in the parent process. >> > >> > See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking, >> > 2022-03-28) for prior use of git_config_push_parameter() for this >> > purpose, i.e. to push config parameters before invoking a >> > sub-process. >> >> So I guess I don't have any _specific_ thing that goes wrong with this >> approach, but it really feels sketchy to me. We are lying to >> sub-processes about the config the user asked for. And again, I see how >> it works, but I wonder if this kind of approach would ever backfire on >> us. For example, if transfer.credentialsInUrl=warn ever ended up having >> any side effects besides the warning, and the sub-processes ended up >> skipping those effects. >> >> I know that's a hypothetical, and probably not even a likely one, but it >> just gets my spider sense tingling. > > So inherently this is going to be ugly because it's crossing process > boundaries. But the more minimal fix I was thinking of would be > something like this: > > diff --git a/remote.c b/remote.c > index 60869beebe..af5f95c719 100644 > --- a/remote.c > +++ b/remote.c > @@ -615,6 +615,14 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push) > return NULL; > } > > +static int test_and_set_env(const char *var) > +{ > + int ret = git_env_bool(var, 0); > + if (!ret) > + setenv(var, "1", 0); > + return ret; > +} > + > static void validate_remote_url(struct remote *remote) > { > int i; > @@ -634,6 +642,9 @@ static void validate_remote_url(struct remote *remote) > else > die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value); > > + if (test_and_set_env("GIT_CHECKED_CREDENTIALS_IN_URL")) > + return; > + > for (i = 0; i < remote->url_nr; i++) { > struct url_info url_info = { 0 }; > > > You can also put it lower in the function, when we actually warn, which > saves adding to the environment when there is nothing to warn about > (though this way, we avoid doing the redundant work). > > Compared to munging the config, this seems shorter and simpler, and > there's no chance of us ever getting confused between the fake > "suppress" value and something the user actually asked for. Sure, we can do it with an environment variable, in the end that's all git_config_push_parameter() is doing too. It's just setting things in "GIT_CONFIG_PARAMETERS". And yes, we can set this in the low-level function instead of with git_config_push_parameter() in builtin/*.c as I did. I was aiming for something demonstrably narrow, at the cost of some verbosity. But I don't get how other things being equal you think sticking this in "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS" helps. We already pass config to ourselves like that (and via "-c") in other places. Can you think of a case where these would be different? The only ones I can think of are e.g. because we know about "GIT_CONFIG_PARAMETERS", and not this new custom variable, e.g. in "prepare_other_repo_env()", but those seem like exactly the reason to use the existing variable. I can think of potential pitfalls here, e.g. how does it interact with submodules? That's one reason I submitted it as an RFC, the tests need to be better (with or without this change). E.g. "git ls-remote" is also not covered by the upthread patch. But that's all separate from what the environment variable is named, or if it lives in the config space.
On Tue, Nov 01, 2022 at 04:01:18AM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yeah, I think it is crappy UX, too. It's just that I think the tests > > should not _asserting_ the bad behavior. At most, they should tolerate > > the bad behavior as a band-aid. So I think Dscho's patch is doing the > > right thing (and I do agree that we should fix the immediate CI pain by > > adjusting the tests, and letting the user-visible fix proceed > > independently). > > The tests aren't just asserting the bad behavior, they're also ensuring > that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but > tolerable, but if we start emitting 500 of these it would be nice to > know. I admit that this kind of argument does not sway me. Is it likely that we would suddenly start spewing 500 such warnings? If we did, are there no other tests that would catch it? And even if *that* were the case, would nobody happen to notice it in the meantime either during development or when we queue an affected topic onto 'next' for wider testing? I guess the answer is that it's possible that we'd miss such a regression in all of those above places, but to me it seems extremely unlikely that we'd let such a regression through without noticing. Thanks, Taylor
On Tue, Nov 01, 2022 at 02:07:39PM +0100, Ævar Arnfjörð Bjarmason wrote: > > You can also put it lower in the function, when we actually warn, which > > saves adding to the environment when there is nothing to warn about > > (though this way, we avoid doing the redundant work). > > > > Compared to munging the config, this seems shorter and simpler, and > > there's no chance of us ever getting confused between the fake > > "suppress" value and something the user actually asked for. > > Sure, we can do it with an environment variable, in the end that's all > git_config_push_parameter() is doing too. It's just setting things in > "GIT_CONFIG_PARAMETERS". > > And yes, we can set this in the low-level function instead of with > git_config_push_parameter() in builtin/*.c as I did. I was aiming for > something demonstrably narrow, at the cost of some verbosity. > > But I don't get how other things being equal you think sticking this in > "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS" > helps. I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of stuffing it in the configuration. All other things *aren't* equal here, since we're not lying to sub-processes about configuration values set by the user. Instead, we're saying: "regardless of what value the user assigned transfer.credentialsInUrl, we can avoid looking at it because we have already checked for the presence of credentials in the URL". Thanks, Taylor
On Tue, Nov 01 2022, Taylor Blau wrote: > On Tue, Nov 01, 2022 at 02:07:39PM +0100, Ævar Arnfjörð Bjarmason wrote: >> > You can also put it lower in the function, when we actually warn, which >> > saves adding to the environment when there is nothing to warn about >> > (though this way, we avoid doing the redundant work). >> > >> > Compared to munging the config, this seems shorter and simpler, and >> > there's no chance of us ever getting confused between the fake >> > "suppress" value and something the user actually asked for. >> >> Sure, we can do it with an environment variable, in the end that's all >> git_config_push_parameter() is doing too. It's just setting things in >> "GIT_CONFIG_PARAMETERS". >> >> And yes, we can set this in the low-level function instead of with >> git_config_push_parameter() in builtin/*.c as I did. I was aiming for >> something demonstrably narrow, at the cost of some verbosity. >> >> But I don't get how other things being equal you think sticking this in >> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS" >> helps. > > I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of > stuffing it in the configuration.[...] To be clear, I'm asking if there's cases where we think one method or the other produces different results, which I understood Jeff hinting at. > Instead, we're saying: "regardless of what value the user assigned > transfer.credentialsInUrl, we can avoid looking at it because we have > already checked for the presence of credentials in the URL".[...] All > other things *aren't* equal here, since we're not lying to > sub-processes about configuration values set by the user. The user is asking us to warn about storing certain things in config, we know we already warned, so we're looking to flip that value to the "quiet" setting. If you consider that overriding I don't get how doing so via a different environment variable changes anything. It would be doing the same thing, just making it less obvious.
On Tue, Nov 01 2022, Taylor Blau wrote: > On Tue, Nov 01, 2022 at 04:01:18AM +0100, Ævar Arnfjörð Bjarmason wrote: >> > Yeah, I think it is crappy UX, too. It's just that I think the tests >> > should not _asserting_ the bad behavior. At most, they should tolerate >> > the bad behavior as a band-aid. So I think Dscho's patch is doing the >> > right thing (and I do agree that we should fix the immediate CI pain by >> > adjusting the tests, and letting the user-visible fix proceed >> > independently). >> >> The tests aren't just asserting the bad behavior, they're also ensuring >> that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but >> tolerable, but if we start emitting 500 of these it would be nice to >> know. > > I admit that this kind of argument does not sway me. > > Is it likely that we would suddenly start spewing 500 such warnings? If > we did, are there no other tests that would catch it? And even if *that* > were the case, would nobody happen to notice it in the meantime either > during development or when we queue an affected topic onto 'next' for > wider testing? > > I guess the answer is that it's possible that we'd miss such a > regression in all of those above places, but to me it seems extremely > unlikely that we'd let such a regression through without noticing. Literally 500? Probably not, that was hyperbole to make a point, but several, low tens? Yeah, I know of at least a couple in-tree off the top of my head. The point, which I assumed was clear is that we literally wouldn't notice if it were 500, and that sort of thing is a common pattern in our tests. I.e. in most cases we'd ideally test_cmp known output (at least to the extent of assuring ourselves that we're getting it right). Instead we often just grep it, or don't test it at all. Sometimes for a good reason (e.g. the output containing absolute paths), but more often than not for no good reason.
On Tue, Nov 01, 2022 at 11:17:42PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> The tests aren't just asserting the bad behavior, they're also ensuring > >> that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but > >> tolerable, but if we start emitting 500 of these it would be nice to > >> know. > > > > I admit that this kind of argument does not sway me. > > > > Is it likely that we would suddenly start spewing 500 such warnings? If > > we did, are there no other tests that would catch it? And even if *that* > > were the case, would nobody happen to notice it in the meantime either > > during development or when we queue an affected topic onto 'next' for > > wider testing? > > > > I guess the answer is that it's possible that we'd miss such a > > regression in all of those above places, but to me it seems extremely > > unlikely that we'd let such a regression through without noticing. > > Literally 500? Probably not, that was hyperbole to make a point, but > several, low tens? Yeah, I know of at least a couple in-tree off the top > of my head. Still, I find it suspect to assume that a developer working in this area wouldn't notice even a minor regression (say, adding an additional warning). Thanks, Taylor
On Tue, Nov 01, 2022 at 10:57:46PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> Sure, we can do it with an environment variable, in the end that's all > >> git_config_push_parameter() is doing too. It's just setting things in > >> "GIT_CONFIG_PARAMETERS". > >> > >> And yes, we can set this in the low-level function instead of with > >> git_config_push_parameter() in builtin/*.c as I did. I was aiming for > >> something demonstrably narrow, at the cost of some verbosity. > >> > >> But I don't get how other things being equal you think sticking this in > >> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS" > >> helps. > > > > I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of > > stuffing it in the configuration.[...] > > To be clear, I'm asking if there's cases where we think one method or > the other produces different results, which I understood Jeff hinting > at. What I was hinting before was not that I knew of a particular bug in your patch, but that I think the technique of munging GIT_CONFIG_PARAMETERS is fragile in error-prone in the general case, because the sub-programs can't differentiate between the config the user asked for, and what was set by the suppression mechanism. For this variable, there's no need to differentiate between "the user asked us to be quiet" and "the calling program asked us to be quiet", but I could imagine cases where there are subtle distinctions. Imagine if there was a setting for "warn and rewrite the URL". We'd need to change that to "don't warn, but just rewrite the URL", which otherwise is a mode that doesn't need to exist. Keeping it in a separate variable keeps the concerns orthogonal. The code still gets to see what the user actually wants (via the config), but has extra information from the calling process about how noisy/quiet to be. But you mentioned submodules in your other mail. And you're right that the patch I showed doesn't handle that (it would need to add the new variable to local_repo_env). It seems like yours should work because CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it actually does; in prepare_other_repo_env(), we retain the variables for config in the environment, so that: git -c foo.bar=whatever fetch will override variables in both the superproject and in submodules. I didn't try it, but I suspect with your patch that a superproject with "warn" and a submodule with "die" (both in their on-disk config files) would misbehave. The superproject process will warn and say "yes, I've checked everything" by munging the in-environment config to "allow". Then the submodule process will see that config, and will override the on-disk setting (in the usual last-one-wins config way). I.e., the problem is that it cannot tell the difference between "the user asked to override this" and the suppression mechanism. -Peff
On Wed, Nov 02 2022, Jeff King wrote: > On Tue, Nov 01, 2022 at 10:57:46PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> >> Sure, we can do it with an environment variable, in the end that's all >> >> git_config_push_parameter() is doing too. It's just setting things in >> >> "GIT_CONFIG_PARAMETERS". >> >> >> >> And yes, we can set this in the low-level function instead of with >> >> git_config_push_parameter() in builtin/*.c as I did. I was aiming for >> >> something demonstrably narrow, at the cost of some verbosity. >> >> >> >> But I don't get how other things being equal you think sticking this in >> >> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS" >> >> helps. >> > >> > I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of >> > stuffing it in the configuration.[...] >> >> To be clear, I'm asking if there's cases where we think one method or >> the other produces different results, which I understood Jeff hinting >> at. > > What I was hinting before was not that I knew of a particular bug in > your patch, but that I think the technique of munging > GIT_CONFIG_PARAMETERS is fragile in error-prone in the general case, > because the sub-programs can't differentiate between the config the user > asked for, and what was set by the suppression mechanism. > > For this variable, there's no need to differentiate between "the user > asked us to be quiet" and "the calling program asked us to be quiet", > but I could imagine cases where there are subtle distinctions. Imagine > if there was a setting for "warn and rewrite the URL". We'd need to > change that to "don't warn, but just rewrite the URL", which otherwise > is a mode that doesn't need to exist. > > Keeping it in a separate variable keeps the concerns orthogonal. The > code still gets to see what the user actually wants (via the config), > but has extra information from the calling process about how noisy/quiet > to be. ... (replied below) ... > But you mentioned submodules in your other mail. And you're right that > the patch I showed doesn't handle that (it would need to add the new > variable to local_repo_env). It seems like yours should work because > CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it > actually does; in prepare_other_repo_env(), we retain the variables for > config in the environment, so that: > > git -c foo.bar=whatever fetch > > will override variables in both the superproject and in submodules. Replying to your main point below, just an aside on this: To be clear I'm not saying it would handle it sensibly now, but just that if we're using env vars to communicate to sub-processes then using CONFIG_DATA_ENVIRONMENT seems better to me. Because even if we're getting it wrong now, then surely that's something we're probably getting wrong in more than one place. So e.g. if we set an env var "for ourselves", i.e. "pull->fetch" then we could detect that condition in run_command(), and if we then spot that we're carrying an env variable we set earlier up the chain reset it before we spawn a submodule. Whereas if it's all custom env variables here & there we'll need to add all that special-casing in. > I didn't try it, but I suspect with your patch that a superproject with > "warn" and a submodule with "die" (both in their on-disk config files) > would misbehave. The superproject process will warn and say "yes, I've > checked everything" by munging the in-environment config to "allow". > Then the submodule process will see that config, and will override the > on-disk setting (in the usual last-one-wins config way). I.e., the > problem is that it cannot tell the difference between "the user asked to > override this" and the suppression mechanism. Yes, definitely, and now I see what you're saying. I.e. imagine a chain like this (not actual process chains, but let's go with the example); user config = warn run: pull our config = allow # OK: doesn't "warn" now run: fetch # Not warning, but .... run: pre-fetch hook # BAD: ...oh oh, now a hook is fetching some # entirely unrelated repo run: git pull # OK: Doesn't warn either run: remote-curl (now not warning, otherwise would) # BAD: our "warned already" has infected a # submodule, and we downgrade "die" to "allow" user config = die run: git fetch <submodule> ... But, and maybe I'm still not getting it, but the "use a different env var" isn't actually the important part, i.e. these would behave the same after the initial "warn": -c transfer.credentialsInUrlWarnedAlready=true And: GIT_CHECKED_AND_WARNED_ALREADY=1 But not what I was suggesting: # conflated with a later "die" -c transfer.credentialsInUrl=allow But that only goes for e.g. a "pull" where we "warn" followed by submodule whose config is "die". But all suggested variants of this (mine and yours) are going to get e.g. the case where the submodule also wants "warn". I.e. it's not enough that we're saying "warned already". That gets us past conflating an existing "warn" with a "die", but now we can't tell a submodule "warn" from a superproject "warn". For that we'd basically need to do: -c transfer.$(<make path to .git, or other "unique repo id>).credentialsInUrl=allow Or another similar mechanism, of course the most sane one to be to not be playing this game at all, but to just ferry this state e.g. with "--do-not-warn-about-this-repo" to our own children, which we'd not add to the command-lines when we know we're spawning a hook, or doing the submodule "pull/fetch". Does that all seem right?
On Fri, Nov 04, 2022 at 10:01:00AM +0100, Ævar Arnfjörð Bjarmason wrote: > > But you mentioned submodules in your other mail. And you're right that > > the patch I showed doesn't handle that (it would need to add the new > > variable to local_repo_env). It seems like yours should work because > > CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it > > actually does; in prepare_other_repo_env(), we retain the variables for > > config in the environment, so that: > > > > git -c foo.bar=whatever fetch > > > > will override variables in both the superproject and in submodules. > > Replying to your main point below, just an aside on this: > > To be clear I'm not saying it would handle it sensibly now, but just > that if we're using env vars to communicate to sub-processes then using > CONFIG_DATA_ENVIRONMENT seems better to me. > > Because even if we're getting it wrong now, then surely that's something > we're probably getting wrong in more than one place. > > So e.g. if we set an env var "for ourselves", i.e. "pull->fetch" then we > could detect that condition in run_command(), and if we then spot that > we're carrying an env variable we set earlier up the chain reset it > before we spawn a submodule. > > Whereas if it's all custom env variables here & there we'll need to add > all that special-casing in. The idea is that local_repo_env carries that information, and everybody uses it. So yes, it would have to know about the custom env variables, but then any place which enters another repo learns about it "for free". But using CONFIG_DATA_ENVIRONMENT for this doesn't work, because we don't actually clear it when moving between repositories. And in the example I showed, I used a config variable that was specific to this particular problem. But if there were a lot of these, it really could be: GIT_CHECKED_AND_WARNED='remotes some-other-subsytem etc' so that all of them were shoved into one variable, and parsed. But we can do the simplest dumb thing for now, because none of this is a public interface we need to support across versions. We could move from one to the other later. > But, and maybe I'm still not getting it, but the "use a different env > var" isn't actually the important part, i.e. these would behave the > same after the initial "warn": > > -c transfer.credentialsInUrlWarnedAlready=true > > And: > > GIT_CHECKED_AND_WARNED_ALREADY=1 > > But not what I was suggesting: > > # conflated with a later "die" > -c transfer.credentialsInUrl=allow Right. The important thing is not that it's in a different env variable in particular, but that it _isn't_ using the same config variable that the user would set. But once you are not using that same config variable, the question is: do you want it put in with the config data or not? And I think we have seen that putting it with the config data is not going to work. We don't clear those environment variables when moving between repositories. Now obviously if the env-clearing code knew which config variables were to be cleared and which not, it could do so. But why introduce that complexity, when you can just stick all the stuff that should be cleared into a new variable (and again, that can be _one_ variable for all of this stuff). > But all suggested variants of this (mine and yours) are going to get > e.g. the case where the submodule also wants "warn". I.e. it's not > enough that we're saying "warned already". > > That gets us past conflating an existing "warn" with a "die", but now we > can't tell a submodule "warn" from a superproject "warn". > > For that we'd basically need to do: > > -c transfer.$(<make path to .git, or other "unique repo id>).credentialsInUrl=allow > > Or another similar mechanism, of course the most sane one to be to not > be playing this game at all, but to just ferry this state e.g. with > "--do-not-warn-about-this-repo" to our own children, which we'd not add > to the command-lines when we know we're spawning a hook, or doing the > submodule "pull/fetch". > > Does that all seem right? That seems much more complicated. Again, why bother stuffing this extra context information into a config variable, just to work around the fact that config isn't cleared between repositories, when there is already a simple mechanism for clearing state when switching repositories? All that said, the implementation of this whole warning seems really weird to me. If we want to warn about using such a URL, then we should do it at the point of use. From the original commit message, the only reason it was put where it is was to avoid sigpipe when the remote helper dies. The solution there seems to be "have the helper die in a less abrupt way". And TBH, I'm skeptical of the whole feature. I don't see the point in even warning somebody about: git fetch https://username:password@example.com Maybe it's bad practice (though with password stand-ins like restricted-access tokens, it isn't always), but at the point the user has invoked git, there's nothing else bad that happens by doing what they asked for. What _is_ potentially bad is using that URL for cloning, because we then write the credential in plaintext to the config file. And a better solution there is probably to issue a warning, clone anyway, and then omit the password from what we write into config. A follow-on fetch would fail, but that is not any worse than the "die" mode of this current config option. There were even patches we discussed a few years ago: https://lore.kernel.org/git/20190517222031.GA17966@sigill.intra.peff.net/ I think the only really contentious part was the third one, which tried to do so automagic with credential helpers. If we skipped that, then the mode given by patches 1+2 seems strictly better than anything this transfer.credentialsInUrl option provides. -Peff
diff --git a/builtin/clone.c b/builtin/clone.c index 547d6464b3c..da219b74e43 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -903,12 +903,15 @@ int cmd_clone(int argc, const char **argv, const char *prefix) int err = 0, complete_refs_before_fetch = 1; int submodule_progress; int filter_submodules = 0; - + enum credentials_in_url cred_in_url_cfg = get_credentials_in_url(); struct transport_ls_refs_options transport_ls_refs_options = TRANSPORT_LS_REFS_OPTIONS_INIT; packet_trace_identity("clone"); + if (cred_in_url_cfg == CRED_IN_URL_WARN) + git_config_push_parameter("transfer.credentialsInUrl=allow"); + git_config(git_clone_config, NULL); argc = parse_options(argc, argv, prefix, builtin_clone_options, diff --git a/builtin/fetch.c b/builtin/fetch.c index b06e454cbdd..34a10e1927f 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -2110,9 +2110,13 @@ int cmd_fetch(int argc, const char **argv, const char *prefix) struct remote *remote = NULL; int result = 0; int prune_tags_ok = 1; + enum credentials_in_url cred_in_url_cfg = get_credentials_in_url(); packet_trace_identity("fetch"); + if (cred_in_url_cfg == CRED_IN_URL_WARN) + git_config_push_parameter("transfer.credentialsInUrl=allow"); + /* Record the command line for the reflog */ strbuf_addstr(&default_rla, "fetch"); for (i = 1; i < argc; i++) { diff --git a/builtin/push.c b/builtin/push.c index f0329c62a2d..a4890b1586d 100644 --- a/builtin/push.c +++ b/builtin/push.c @@ -576,7 +576,7 @@ int cmd_push(int argc, const char **argv, const char *prefix) struct string_list *push_options; const struct string_list_item *item; struct remote *remote; - + enum credentials_in_url cred_in_url_cfg = get_credentials_in_url(); struct option options[] = { OPT__VERBOSITY(&verbosity), OPT_STRING( 0 , "repo", &repo, N_("repository"), N_("repository")), @@ -619,6 +619,10 @@ int cmd_push(int argc, const char **argv, const char *prefix) }; packet_trace_identity("push"); + + if (cred_in_url_cfg == CRED_IN_URL_WARN) + git_config_push_parameter("transfer.credentialsInUrl=allow"); + git_config(git_push_config, &flags); argc = parse_options(argc, argv, prefix, options, push_usage, 0); push_options = (push_options_cmdline.nr diff --git a/remote.c b/remote.c index 60869beebe7..a35da349629 100644 --- a/remote.c +++ b/remote.c @@ -615,24 +615,42 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push) return NULL; } -static void validate_remote_url(struct remote *remote) +static enum credentials_in_url cred_in_url = CRED_IN_URL_UNKNOWN; +enum credentials_in_url get_credentials_in_url(void) { - int i; + enum credentials_in_url new = CRED_IN_URL_ALLOW; const char *value; - struct strbuf redacted = STRBUF_INIT; - int warn_not_die; + + if (cred_in_url != CRED_IN_URL_UNKNOWN) + return cred_in_url; if (git_config_get_string_tmp("transfer.credentialsinurl", &value)) - return; + goto done; if (!strcmp("warn", value)) - warn_not_die = 1; + new = CRED_IN_URL_WARN; else if (!strcmp("die", value)) - warn_not_die = 0; + new = CRED_IN_URL_DIE; else if (!strcmp("allow", value)) - return; + goto done; else - die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value); + die(_("unrecognized value transfer.credentialsInURL: '%s'"), value); + +done: + cred_in_url = new; + return cred_in_url; +} + +static void validate_remote_url(struct remote *remote) +{ + int i; + struct strbuf redacted = STRBUF_INIT; + enum credentials_in_url cfg = get_credentials_in_url(); + + if (remote->validated_urls) + goto done; + if (cfg == CRED_IN_URL_ALLOW) + goto done; for (i = 0; i < remote->url_nr; i++) { struct url_info url_info = { 0 }; @@ -647,16 +665,25 @@ static void validate_remote_url(struct remote *remote) strbuf_addstr(&redacted, url_info.url + url_info.passwd_off + url_info.passwd_len); - if (warn_not_die) + switch (cfg) { + case CRED_IN_URL_WARN: warning(_("URL '%s' uses plaintext credentials"), redacted.buf); - else + break; + case CRED_IN_URL_DIE: die(_("URL '%s' uses plaintext credentials"), redacted.buf); - -loop_cleanup: + break; + case CRED_IN_URL_ALLOW: + case CRED_IN_URL_UNKNOWN: + BUG("unreachable"); + break; + } + loop_cleanup: free(url_info.url); } strbuf_release(&redacted); +done: + remote->validated_urls = 1; } static struct remote * diff --git a/remote.h b/remote.h index 1c4621b414b..5a2da13b4cb 100644 --- a/remote.h +++ b/remote.h @@ -98,6 +98,8 @@ struct remote { int prune; int prune_tags; + int validated_urls; + /** * The configured helper programs to run on the remote side, for * Git-native protocols. @@ -445,4 +447,16 @@ void apply_push_cas(struct push_cas_option *, struct remote *, struct ref *); char *relative_url(const char *remote_url, const char *url, const char *up_path); +enum credentials_in_url { + CRED_IN_URL_UNKNOWN, + CRED_IN_URL_ALLOW, + CRED_IN_URL_WARN, + CRED_IN_URL_DIE, +}; + +/** + * Get the transfer.credentialsInUrl config setting as an "enum + * credentials_in_url". + */ +enum credentials_in_url get_credentials_in_url(void); #endif diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh index 79dc470c014..d01f5cd349f 100755 --- a/t/t5516-fetch-push.sh +++ b/t/t5516-fetch-push.sh @@ -1860,7 +1860,7 @@ test_expect_success LIBCURL 'fetch warns or fails when using username:password' test_must_fail git -c transfer.credentialsInUrl=warn fetch https://username:password@localhost 2>err && grep "warning: $message" err >warnings && - test_line_count = 3 warnings && + test_line_count = 1 warnings && test_must_fail git -c transfer.credentialsInUrl=die fetch https://username:password@localhost 2>err && grep "fatal: $message" err >warnings && diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh index 45f0803ed4d..37adfd9f83b 100755 --- a/t/t5601-clone.sh +++ b/t/t5601-clone.sh @@ -78,7 +78,7 @@ test_expect_success LIBCURL 'clone warns or fails when using username:password' test_must_fail git -c transfer.credentialsInUrl=warn clone https://username:password@localhost attempt2 2>err && grep "warning: $message" err >warnings && - test_line_count = 2 warnings && + test_line_count = 1 warnings && test_must_fail git -c transfer.credentialsInUrl=die clone https://username:password@localhost attempt3 2>err && grep "fatal: $message" err >warnings &&