diff mbox series

[RFC] fetch: stop emitting duplicate transfer.credentialsInUrl=warn warnings

Message ID RFC-patch-1.1-0266485bc6c-20221031T204149Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] fetch: stop emitting duplicate transfer.credentialsInUrl=warn warnings | expand

Commit Message

Ævar Arnfjörð Bjarmason Oct. 31, 2022, 8:47 p.m. UTC
Improve on 6dcbdc0d661 (remote: create fetch.credentialsInUrl config,
2022-06-06) by fixing the cases where we emit duplicate warnings,
which were:

 * In the same process, as we'd get the same "struct remote *"
   again. We could probably save ourselves more work in those scenarios,
   but adding a flag to the "struct remote" indicating that we've validated
   the URLs will fix those issues.

 * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl")
   from "git fetch". For those cases let's pass down the equivalent of a
   "-c transfer.credentialsInUrl=allow", since we know that we've already
   inspected our remotes in the parent process.

   See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking,
   2022-03-28) for prior use of git_config_push_parameter() for this
   purpose, i.e. to push config parameters before invoking a
   sub-process.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---

[Grabbing a quote from
https://lore.kernel.org/git/98fa5270cb720f2f038c4bb9571c7d306ff5d759.1667245639.git.gitgitgadget@gmail.com/
for a reply]

> From: Johannes Schindelin <johannes.schindelin@gmx.de>
>
> It is unclear as to _why_, but under certain circumstances the warning
> about credentials being passed as part of the URL seems to be swallowed
> by the `git remote-https` helper in the Windows jobs of Git's CI builds.

I think this should fix the root cause of the issue you're seeing. I
have a larger local branch to fix various issues with this
credentialsInUrl facility that I hadn't gotten around to submitting:

	https://github.com/git/git/compare/master...avar:avar/fix-cred-in-url-warnings-2

This is a cherry-pick of 7/* of that (the first were doc changes,
missing test coverage etc).

> Since it is not actually important how many times Git prints the
> warning/error message, as long as it prints it at least once, let's just
> make the test a bit more lenient and test for the latter instead of the
> former, which works around these CI issues.

That being said your change is obviously smaller, so if we'd prefer
that in first as a band-aid I'm fine with that.

But I'd really like to object to the "it is not actually important how
many...", it's crappy UX to spew duplicate output at the user, and we
should avoid it.

So it does matter, and we get it wrong not just in this but also some
other cases.

 builtin/clone.c       |  5 +++-
 builtin/fetch.c       |  4 ++++
 builtin/push.c        |  6 ++++-
 remote.c              | 53 ++++++++++++++++++++++++++++++++-----------
 remote.h              | 14 ++++++++++++
 t/t5516-fetch-push.sh |  2 +-
 t/t5601-clone.sh      |  2 +-
 7 files changed, 69 insertions(+), 17 deletions(-)

Comments

Taylor Blau Nov. 1, 2022, 1:06 a.m. UTC | #1
On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
> Improve on 6dcbdc0d661 (remote: create fetch.credentialsInUrl config,
> 2022-06-06) by fixing the cases where we emit duplicate warnings,
> which were:
>
>  * In the same process, as we'd get the same "struct remote *"
>    again. We could probably save ourselves more work in those scenarios,
>    but adding a flag to the "struct remote" indicating that we've validated
>    the URLs will fix those issues.
>
>  * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl")
>    from "git fetch". For those cases let's pass down the equivalent of a
>    "-c transfer.credentialsInUrl=allow", since we know that we've already
>    inspected our remotes in the parent process.
>
>    See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking,
>    2022-03-28) for prior use of git_config_push_parameter() for this
>    purpose, i.e. to push config parameters before invoking a
>    sub-process.

Both make sense. I was skimming the diff before I read the patch message
here and was scratching my head at the calls to
git_config_push_parameter().

But for crossing process boundaries, that makes sense.

Thanks,
Taylor
Jeff King Nov. 1, 2022, 2:32 a.m. UTC | #2
On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote:

>  * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl")
>    from "git fetch". For those cases let's pass down the equivalent of a
>    "-c transfer.credentialsInUrl=allow", since we know that we've already
>    inspected our remotes in the parent process.
> 
>    See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking,
>    2022-03-28) for prior use of git_config_push_parameter() for this
>    purpose, i.e. to push config parameters before invoking a
>    sub-process.

So I guess I don't have any _specific_ thing that goes wrong with this
approach, but it really feels sketchy to me. We are lying to
sub-processes about the config the user asked for. And again, I see how
it works, but I wonder if this kind of approach would ever backfire on
us. For example, if transfer.credentialsInUrl=warn ever ended up having
any side effects besides the warning, and the sub-processes ended up
skipping those effects.

I know that's a hypothetical, and probably not even a likely one, but it
just gets my spider sense tingling.

> > Since it is not actually important how many times Git prints the
> > warning/error message, as long as it prints it at least once, let's just
> > make the test a bit more lenient and test for the latter instead of the
> > former, which works around these CI issues.
> 
> That being said your change is obviously smaller, so if we'd prefer
> that in first as a band-aid I'm fine with that.
> 
> But I'd really like to object to the "it is not actually important how
> many...", it's crappy UX to spew duplicate output at the user, and we
> should avoid it.
> 
> So it does matter, and we get it wrong not just in this but also some
> other cases.

Yeah, I think it is crappy UX, too. It's just that I think the tests
should not _asserting_ the bad behavior. At most, they should tolerate
the bad behavior as a band-aid. So I think Dscho's patch is doing the
right thing (and I do agree that we should fix the immediate CI pain by
adjusting the tests, and letting the user-visible fix proceed
independently).

-Peff
Ævar Arnfjörð Bjarmason Nov. 1, 2022, 3:01 a.m. UTC | #3
On Mon, Oct 31 2022, Jeff King wrote:

> On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
> [...]
>> That being said your change is obviously smaller, so if we'd prefer
>> that in first as a band-aid I'm fine with that.
>> 
>> But I'd really like to object to the "it is not actually important how
>> many...", it's crappy UX to spew duplicate output at the user, and we
>> should avoid it.
>> 
>> So it does matter, and we get it wrong not just in this but also some
>> other cases.
>
> Yeah, I think it is crappy UX, too. It's just that I think the tests
> should not _asserting_ the bad behavior. At most, they should tolerate
> the bad behavior as a band-aid. So I think Dscho's patch is doing the
> right thing (and I do agree that we should fix the immediate CI pain by
> adjusting the tests, and letting the user-visible fix proceed
> independently).

The tests aren't just asserting the bad behavior, they're also ensuring
that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but
tolerable, but if we start emitting 500 of these it would be nice to
know.

I've found a couple of regressions like that in other areas, i.e. where
our tests didn't spot some output change because we were selectively
grepping things, but where it would have been nice to spot it at the
time.
Jeff King Nov. 1, 2022, 9:35 a.m. UTC | #4
On Mon, Oct 31, 2022 at 10:32:36PM -0400, Jeff King wrote:

> On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> >  * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl")
> >    from "git fetch". For those cases let's pass down the equivalent of a
> >    "-c transfer.credentialsInUrl=allow", since we know that we've already
> >    inspected our remotes in the parent process.
> > 
> >    See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking,
> >    2022-03-28) for prior use of git_config_push_parameter() for this
> >    purpose, i.e. to push config parameters before invoking a
> >    sub-process.
> 
> So I guess I don't have any _specific_ thing that goes wrong with this
> approach, but it really feels sketchy to me. We are lying to
> sub-processes about the config the user asked for. And again, I see how
> it works, but I wonder if this kind of approach would ever backfire on
> us. For example, if transfer.credentialsInUrl=warn ever ended up having
> any side effects besides the warning, and the sub-processes ended up
> skipping those effects.
> 
> I know that's a hypothetical, and probably not even a likely one, but it
> just gets my spider sense tingling.

So inherently this is going to be ugly because it's crossing process
boundaries. But the more minimal fix I was thinking of would be
something like this:

diff --git a/remote.c b/remote.c
index 60869beebe..af5f95c719 100644
--- a/remote.c
+++ b/remote.c
@@ -615,6 +615,14 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push)
 	return NULL;
 }
 
+static int test_and_set_env(const char *var)
+{
+	int ret = git_env_bool(var, 0);
+	if (!ret)
+		setenv(var, "1", 0);
+	return ret;
+}
+
 static void validate_remote_url(struct remote *remote)
 {
 	int i;
@@ -634,6 +642,9 @@ static void validate_remote_url(struct remote *remote)
 	else
 		die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value);
 
+	if (test_and_set_env("GIT_CHECKED_CREDENTIALS_IN_URL"))
+		return;
+
 	for (i = 0; i < remote->url_nr; i++) {
 		struct url_info url_info = { 0 };
 

You can also put it lower in the function, when we actually warn, which
saves adding to the environment when there is nothing to warn about
(though this way, we avoid doing the redundant work).

Compared to munging the config, this seems shorter and simpler, and
there's no chance of us ever getting confused between the fake
"suppress" value and something the user actually asked for.

-Peff
Ævar Arnfjörð Bjarmason Nov. 1, 2022, 1:07 p.m. UTC | #5
On Tue, Nov 01 2022, Jeff King wrote:

> On Mon, Oct 31, 2022 at 10:32:36PM -0400, Jeff King wrote:
>
>> On Mon, Oct 31, 2022 at 09:47:08PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> 
>> >  * When we invoke e.g. "git-remote-https" (aka. "git-remote-curl")
>> >    from "git fetch". For those cases let's pass down the equivalent of a
>> >    "-c transfer.credentialsInUrl=allow", since we know that we've already
>> >    inspected our remotes in the parent process.
>> > 
>> >    See 7390f05a3c6 (fetch: after refetch, encourage auto gc repacking,
>> >    2022-03-28) for prior use of git_config_push_parameter() for this
>> >    purpose, i.e. to push config parameters before invoking a
>> >    sub-process.
>> 
>> So I guess I don't have any _specific_ thing that goes wrong with this
>> approach, but it really feels sketchy to me. We are lying to
>> sub-processes about the config the user asked for. And again, I see how
>> it works, but I wonder if this kind of approach would ever backfire on
>> us. For example, if transfer.credentialsInUrl=warn ever ended up having
>> any side effects besides the warning, and the sub-processes ended up
>> skipping those effects.
>> 
>> I know that's a hypothetical, and probably not even a likely one, but it
>> just gets my spider sense tingling.
>
> So inherently this is going to be ugly because it's crossing process
> boundaries. But the more minimal fix I was thinking of would be
> something like this:
>
> diff --git a/remote.c b/remote.c
> index 60869beebe..af5f95c719 100644
> --- a/remote.c
> +++ b/remote.c
> @@ -615,6 +615,14 @@ const char *remote_ref_for_branch(struct branch *branch, int for_push)
>  	return NULL;
>  }
>  
> +static int test_and_set_env(const char *var)
> +{
> +	int ret = git_env_bool(var, 0);
> +	if (!ret)
> +		setenv(var, "1", 0);
> +	return ret;
> +}
> +
>  static void validate_remote_url(struct remote *remote)
>  {
>  	int i;
> @@ -634,6 +642,9 @@ static void validate_remote_url(struct remote *remote)
>  	else
>  		die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value);
>  
> +	if (test_and_set_env("GIT_CHECKED_CREDENTIALS_IN_URL"))
> +		return;
> +
>  	for (i = 0; i < remote->url_nr; i++) {
>  		struct url_info url_info = { 0 };
>  
>
> You can also put it lower in the function, when we actually warn, which
> saves adding to the environment when there is nothing to warn about
> (though this way, we avoid doing the redundant work).
>
> Compared to munging the config, this seems shorter and simpler, and
> there's no chance of us ever getting confused between the fake
> "suppress" value and something the user actually asked for.

Sure, we can do it with an environment variable, in the end that's all
git_config_push_parameter() is doing too. It's just setting things in
"GIT_CONFIG_PARAMETERS".

And yes, we can set this in the low-level function instead of with
git_config_push_parameter() in builtin/*.c as I did. I was aiming for
something demonstrably narrow, at the cost of some verbosity.

But I don't get how other things being equal you think sticking this in
"GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS"
helps.

We already pass config to ourselves like that (and via "-c") in other
places. Can you think of a case where these would be different?

The only ones I can think of are e.g. because we know about
"GIT_CONFIG_PARAMETERS", and not this new custom variable, e.g. in
"prepare_other_repo_env()", but those seem like exactly the reason to
use the existing variable.

I can think of potential pitfalls here, e.g. how does it interact with
submodules? That's one reason I submitted it as an RFC, the tests need
to be better (with or without this change). E.g. "git ls-remote" is also
not covered by the upthread patch.

But that's all separate from what the environment variable is named, or
if it lives in the config space.
Taylor Blau Nov. 1, 2022, 8:54 p.m. UTC | #6
On Tue, Nov 01, 2022 at 04:01:18AM +0100, Ævar Arnfjörð Bjarmason wrote:
> > Yeah, I think it is crappy UX, too. It's just that I think the tests
> > should not _asserting_ the bad behavior. At most, they should tolerate
> > the bad behavior as a band-aid. So I think Dscho's patch is doing the
> > right thing (and I do agree that we should fix the immediate CI pain by
> > adjusting the tests, and letting the user-visible fix proceed
> > independently).
>
> The tests aren't just asserting the bad behavior, they're also ensuring
> that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but
> tolerable, but if we start emitting 500 of these it would be nice to
> know.

I admit that this kind of argument does not sway me.

Is it likely that we would suddenly start spewing 500 such warnings? If
we did, are there no other tests that would catch it? And even if *that*
were the case, would nobody happen to notice it in the meantime either
during development or when we queue an affected topic onto 'next' for
wider testing?

I guess the answer is that it's possible that we'd miss such a
regression in all of those above places, but to me it seems extremely
unlikely that we'd let such a regression through without noticing.

Thanks,
Taylor
Taylor Blau Nov. 1, 2022, 9 p.m. UTC | #7
On Tue, Nov 01, 2022 at 02:07:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
> > You can also put it lower in the function, when we actually warn, which
> > saves adding to the environment when there is nothing to warn about
> > (though this way, we avoid doing the redundant work).
> >
> > Compared to munging the config, this seems shorter and simpler, and
> > there's no chance of us ever getting confused between the fake
> > "suppress" value and something the user actually asked for.
>
> Sure, we can do it with an environment variable, in the end that's all
> git_config_push_parameter() is doing too. It's just setting things in
> "GIT_CONFIG_PARAMETERS".
>
> And yes, we can set this in the low-level function instead of with
> git_config_push_parameter() in builtin/*.c as I did. I was aiming for
> something demonstrably narrow, at the cost of some verbosity.
>
> But I don't get how other things being equal you think sticking this in
> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS"
> helps.

I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of
stuffing it in the configuration. All other things *aren't* equal here,
since we're not lying to sub-processes about configuration values set
by the user.

Instead, we're saying: "regardless of what value the user assigned
transfer.credentialsInUrl, we can avoid looking at it because we have
already checked for the presence of credentials in the URL".

Thanks,
Taylor
Ævar Arnfjörð Bjarmason Nov. 1, 2022, 9:57 p.m. UTC | #8
On Tue, Nov 01 2022, Taylor Blau wrote:

> On Tue, Nov 01, 2022 at 02:07:39PM +0100, Ævar Arnfjörð Bjarmason wrote:
>> > You can also put it lower in the function, when we actually warn, which
>> > saves adding to the environment when there is nothing to warn about
>> > (though this way, we avoid doing the redundant work).
>> >
>> > Compared to munging the config, this seems shorter and simpler, and
>> > there's no chance of us ever getting confused between the fake
>> > "suppress" value and something the user actually asked for.
>>
>> Sure, we can do it with an environment variable, in the end that's all
>> git_config_push_parameter() is doing too. It's just setting things in
>> "GIT_CONFIG_PARAMETERS".
>>
>> And yes, we can set this in the low-level function instead of with
>> git_config_push_parameter() in builtin/*.c as I did. I was aiming for
>> something demonstrably narrow, at the cost of some verbosity.
>>
>> But I don't get how other things being equal you think sticking this in
>> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS"
>> helps.
>
> I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of
> stuffing it in the configuration.[...]

To be clear, I'm asking if there's cases where we think one method or
the other produces different results, which I understood Jeff hinting
at.

> Instead, we're saying: "regardless of what value the user assigned
> transfer.credentialsInUrl, we can avoid looking at it because we have
> already checked for the presence of credentials in the URL".[...] All
> other things *aren't* equal here, since we're not lying to
> sub-processes about configuration values set by the user.

The user is asking us to warn about storing certain things in config, we
know we already warned, so we're looking to flip that value to the
"quiet" setting.

If you consider that overriding I don't get how doing so via a different
environment variable changes anything. It would be doing the same thing,
just making it less obvious.
Ævar Arnfjörð Bjarmason Nov. 1, 2022, 10:17 p.m. UTC | #9
On Tue, Nov 01 2022, Taylor Blau wrote:

> On Tue, Nov 01, 2022 at 04:01:18AM +0100, Ævar Arnfjörð Bjarmason wrote:
>> > Yeah, I think it is crappy UX, too. It's just that I think the tests
>> > should not _asserting_ the bad behavior. At most, they should tolerate
>> > the bad behavior as a band-aid. So I think Dscho's patch is doing the
>> > right thing (and I do agree that we should fix the immediate CI pain by
>> > adjusting the tests, and letting the user-visible fix proceed
>> > independently).
>>
>> The tests aren't just asserting the bad behavior, they're also ensuring
>> that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but
>> tolerable, but if we start emitting 500 of these it would be nice to
>> know.
>
> I admit that this kind of argument does not sway me.
>
> Is it likely that we would suddenly start spewing 500 such warnings? If
> we did, are there no other tests that would catch it? And even if *that*
> were the case, would nobody happen to notice it in the meantime either
> during development or when we queue an affected topic onto 'next' for
> wider testing?
>
> I guess the answer is that it's possible that we'd miss such a
> regression in all of those above places, but to me it seems extremely
> unlikely that we'd let such a regression through without noticing.

Literally 500? Probably not, that was hyperbole to make a point, but
several, low tens? Yeah, I know of at least a couple in-tree off the top
of my head.

The point, which I assumed was clear is that we literally wouldn't
notice if it were 500, and that sort of thing is a common pattern in our
tests. I.e. in most cases we'd ideally test_cmp known output (at least
to the extent of assuring ourselves that we're getting it right).

Instead we often just grep it, or don't test it at all. Sometimes for a
good reason (e.g. the output containing absolute paths), but more often
than not for no good reason.
Taylor Blau Nov. 2, 2022, 12:53 a.m. UTC | #10
On Tue, Nov 01, 2022 at 11:17:42PM +0100, Ævar Arnfjörð Bjarmason wrote:
> >> The tests aren't just asserting the bad behavior, they're also ensuring
> >> that it doesn't get worse. 1 warning is ideal, 2-3 is bad, but
> >> tolerable, but if we start emitting 500 of these it would be nice to
> >> know.
> >
> > I admit that this kind of argument does not sway me.
> >
> > Is it likely that we would suddenly start spewing 500 such warnings? If
> > we did, are there no other tests that would catch it? And even if *that*
> > were the case, would nobody happen to notice it in the meantime either
> > during development or when we queue an affected topic onto 'next' for
> > wider testing?
> >
> > I guess the answer is that it's possible that we'd miss such a
> > regression in all of those above places, but to me it seems extremely
> > unlikely that we'd let such a regression through without noticing.
>
> Literally 500? Probably not, that was hyperbole to make a point, but
> several, low tens? Yeah, I know of at least a couple in-tree off the top
> of my head.

Still, I find it suspect to assume that a developer working in this area
wouldn't notice even a minor regression (say, adding an additional
warning).

Thanks,
Taylor
Jeff King Nov. 2, 2022, 8:19 a.m. UTC | #11
On Tue, Nov 01, 2022 at 10:57:46PM +0100, Ævar Arnfjörð Bjarmason wrote:

> >> Sure, we can do it with an environment variable, in the end that's all
> >> git_config_push_parameter() is doing too. It's just setting things in
> >> "GIT_CONFIG_PARAMETERS".
> >>
> >> And yes, we can set this in the low-level function instead of with
> >> git_config_push_parameter() in builtin/*.c as I did. I was aiming for
> >> something demonstrably narrow, at the cost of some verbosity.
> >>
> >> But I don't get how other things being equal you think sticking this in
> >> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS"
> >> helps.
> >
> > I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of
> > stuffing it in the configuration.[...]
> 
> To be clear, I'm asking if there's cases where we think one method or
> the other produces different results, which I understood Jeff hinting
> at.

What I was hinting before was not that I knew of a particular bug in
your patch, but that I think the technique of munging
GIT_CONFIG_PARAMETERS is fragile in error-prone in the general case,
because the sub-programs can't differentiate between the config the user
asked for, and what was set by the suppression mechanism.

For this variable, there's no need to differentiate between "the user
asked us to be quiet" and "the calling program asked us to be quiet",
but I could imagine cases where there are subtle distinctions. Imagine
if there was a setting for "warn and rewrite the URL". We'd need to
change that to "don't warn, but just rewrite the URL", which otherwise
is a mode that doesn't need to exist.

Keeping it in a separate variable keeps the concerns orthogonal. The
code still gets to see what the user actually wants (via the config),
but has extra information from the calling process about how noisy/quiet
to be.

But you mentioned submodules in your other mail. And you're right that
the patch I showed doesn't handle that (it would need to add the new
variable to local_repo_env). It seems like yours should work because
CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it
actually does; in prepare_other_repo_env(), we retain the variables for
config in the environment, so that:

  git -c foo.bar=whatever fetch

will override variables in both the superproject and in submodules.

I didn't try it, but I suspect with your patch that a superproject with
"warn" and a submodule with "die" (both in their on-disk config files)
would misbehave. The superproject process will warn and say "yes, I've
checked everything" by munging the in-environment config to "allow".
Then the submodule process will see that config, and will override the
on-disk setting (in the usual last-one-wins config way). I.e., the
problem is that it cannot tell the difference between "the user asked to
override this" and the suppression mechanism.

-Peff
Ævar Arnfjörð Bjarmason Nov. 4, 2022, 9:01 a.m. UTC | #12
On Wed, Nov 02 2022, Jeff King wrote:

> On Tue, Nov 01, 2022 at 10:57:46PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> >> Sure, we can do it with an environment variable, in the end that's all
>> >> git_config_push_parameter() is doing too. It's just setting things in
>> >> "GIT_CONFIG_PARAMETERS".
>> >>
>> >> And yes, we can set this in the low-level function instead of with
>> >> git_config_push_parameter() in builtin/*.c as I did. I was aiming for
>> >> something demonstrably narrow, at the cost of some verbosity.
>> >>
>> >> But I don't get how other things being equal you think sticking this in
>> >> "GIT_CHECKED_CREDENTIALS_IN_URL" instead of "GIT_CONFIG_PARAMETERS"
>> >> helps.
>> >
>> > I vaguely prefer calling this GIT_CHECKED_CREDENTIALS_IN_URL instead of
>> > stuffing it in the configuration.[...]
>> 
>> To be clear, I'm asking if there's cases where we think one method or
>> the other produces different results, which I understood Jeff hinting
>> at.
>
> What I was hinting before was not that I knew of a particular bug in
> your patch, but that I think the technique of munging
> GIT_CONFIG_PARAMETERS is fragile in error-prone in the general case,
> because the sub-programs can't differentiate between the config the user
> asked for, and what was set by the suppression mechanism.
>
> For this variable, there's no need to differentiate between "the user
> asked us to be quiet" and "the calling program asked us to be quiet",
> but I could imagine cases where there are subtle distinctions. Imagine
> if there was a setting for "warn and rewrite the URL". We'd need to
> change that to "don't warn, but just rewrite the URL", which otherwise
> is a mode that doesn't need to exist.
>
> Keeping it in a separate variable keeps the concerns orthogonal. The
> code still gets to see what the user actually wants (via the config),
> but has extra information from the calling process about how noisy/quiet
> to be.

... (replied below) ...

> But you mentioned submodules in your other mail. And you're right that
> the patch I showed doesn't handle that (it would need to add the new
> variable to local_repo_env). It seems like yours should work because
> CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it
> actually does; in prepare_other_repo_env(), we retain the variables for
> config in the environment, so that:
>
>   git -c foo.bar=whatever fetch
>
> will override variables in both the superproject and in submodules.

Replying to your main point below, just an aside on this:

To be clear I'm not saying it would handle it sensibly now, but just
that if we're using env vars to communicate to sub-processes then using
CONFIG_DATA_ENVIRONMENT seems better to me.

Because even if we're getting it wrong now, then surely that's something
we're probably getting wrong in more than one place.

So e.g. if we set an env var "for ourselves", i.e. "pull->fetch" then we
could detect that condition in run_command(), and if we then spot that
we're carrying an env variable we set earlier up the chain reset it
before we spawn a submodule.

Whereas if it's all custom env variables here & there we'll need to add
all that special-casing in.

> I didn't try it, but I suspect with your patch that a superproject with
> "warn" and a submodule with "die" (both in their on-disk config files)
> would misbehave. The superproject process will warn and say "yes, I've
> checked everything" by munging the in-environment config to "allow".
> Then the submodule process will see that config, and will override the
> on-disk setting (in the usual last-one-wins config way). I.e., the
> problem is that it cannot tell the difference between "the user asked to
> override this" and the suppression mechanism.

Yes, definitely, and now I see what you're saying. I.e. imagine a chain
like this (not actual process chains, but let's go with the example);

	user config = warn
	run: pull
	our config = allow
		# OK: doesn't "warn" now
		run: fetch
			# Not warning, but ....
			run: pre-fetch hook
				# BAD: ...oh oh, now a hook is fetching some
                                # entirely unrelated repo
				run: git pull
			# OK: Doesn't warn either
			run: remote-curl (now not warning, otherwise would)
                        # BAD: our "warned already" has infected a
                        # submodule, and we downgrade "die" to "allow"
			user config = die
			run: git fetch <submodule>
				...

But, and maybe I'm still not getting it, but the "use a different env
var" isn't actually the important part, i.e. these would behave the
same after the initial "warn":

	-c transfer.credentialsInUrlWarnedAlready=true

And:

	GIT_CHECKED_AND_WARNED_ALREADY=1

But not what I was suggesting:

	# conflated with a later "die"
	-c transfer.credentialsInUrl=allow

But that only goes for e.g. a "pull" where we "warn" followed by
submodule whose config is "die".

But all suggested variants of this (mine and yours) are going to get
e.g. the case where the submodule also wants "warn". I.e. it's not
enough that we're saying "warned already".

That gets us past conflating an existing "warn" with a "die", but now we
can't tell a submodule "warn" from a superproject "warn".

For that we'd basically need to do:

	-c transfer.$(<make path to .git, or other "unique repo id>).credentialsInUrl=allow

Or another similar mechanism, of course the most sane one to be to not
be playing this game at all, but to just ferry this state e.g. with
"--do-not-warn-about-this-repo" to our own children, which we'd not add
to the command-lines when we know we're spawning a hook, or doing the
submodule "pull/fetch".

Does that all seem right?
Jeff King Nov. 4, 2022, 1:16 p.m. UTC | #13
On Fri, Nov 04, 2022 at 10:01:00AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > But you mentioned submodules in your other mail. And you're right that
> > the patch I showed doesn't handle that (it would need to add the new
> > variable to local_repo_env). It seems like yours should work because
> > CONFIG_DATA_ENVIRONMENT as part of local_repo_env. But I don't think it
> > actually does; in prepare_other_repo_env(), we retain the variables for
> > config in the environment, so that:
> >
> >   git -c foo.bar=whatever fetch
> >
> > will override variables in both the superproject and in submodules.
> 
> Replying to your main point below, just an aside on this:
> 
> To be clear I'm not saying it would handle it sensibly now, but just
> that if we're using env vars to communicate to sub-processes then using
> CONFIG_DATA_ENVIRONMENT seems better to me.
> 
> Because even if we're getting it wrong now, then surely that's something
> we're probably getting wrong in more than one place.
> 
> So e.g. if we set an env var "for ourselves", i.e. "pull->fetch" then we
> could detect that condition in run_command(), and if we then spot that
> we're carrying an env variable we set earlier up the chain reset it
> before we spawn a submodule.
> 
> Whereas if it's all custom env variables here & there we'll need to add
> all that special-casing in.

The idea is that local_repo_env carries that information, and everybody
uses it. So yes, it would have to know about the custom env variables,
but then any place which enters another repo learns about it "for free".
But using CONFIG_DATA_ENVIRONMENT for this doesn't work, because we
don't actually clear it when moving between repositories.

And in the example I showed, I used a config variable that was specific
to this particular problem. But if there were a lot of these, it really
could be:

  GIT_CHECKED_AND_WARNED='remotes some-other-subsytem etc'

so that all of them were shoved into one variable, and parsed. But we
can do the simplest dumb thing for now, because none of this is a public
interface we need to support across versions. We could move from one to
the other later.

> But, and maybe I'm still not getting it, but the "use a different env
> var" isn't actually the important part, i.e. these would behave the
> same after the initial "warn":
> 
> 	-c transfer.credentialsInUrlWarnedAlready=true
> 
> And:
> 
> 	GIT_CHECKED_AND_WARNED_ALREADY=1
> 
> But not what I was suggesting:
> 
> 	# conflated with a later "die"
> 	-c transfer.credentialsInUrl=allow

Right. The important thing is not that it's in a different env variable
in particular, but that it _isn't_ using the same config variable that
the user would set.

But once you are not using that same config variable, the question is:
do you want it put in with the config data or not? And I think we have
seen that putting it with the config data is not going to work. We don't
clear those environment variables when moving between repositories.

Now obviously if the env-clearing code knew which config variables were
to be cleared and which not, it could do so. But why introduce that
complexity, when you can just stick all the stuff that should be cleared
into a new variable (and again, that can be _one_ variable for all of
this stuff).

> But all suggested variants of this (mine and yours) are going to get
> e.g. the case where the submodule also wants "warn". I.e. it's not
> enough that we're saying "warned already".
> 
> That gets us past conflating an existing "warn" with a "die", but now we
> can't tell a submodule "warn" from a superproject "warn".
> 
> For that we'd basically need to do:
> 
> 	-c transfer.$(<make path to .git, or other "unique repo id>).credentialsInUrl=allow
> 
> Or another similar mechanism, of course the most sane one to be to not
> be playing this game at all, but to just ferry this state e.g. with
> "--do-not-warn-about-this-repo" to our own children, which we'd not add
> to the command-lines when we know we're spawning a hook, or doing the
> submodule "pull/fetch".
> 
> Does that all seem right?

That seems much more complicated. Again, why bother stuffing this extra
context information into a config variable, just to work around the fact
that config isn't cleared between repositories, when there is already a
simple mechanism for clearing state when switching repositories?

All that said, the implementation of this whole warning seems really
weird to me. If we want to warn about using such a URL, then we should
do it at the point of use. From the original commit message, the only
reason it was put where it is was to avoid sigpipe when the remote
helper dies. The solution there seems to be "have the helper die in a
less abrupt way".

And TBH, I'm skeptical of the whole feature. I don't see the point in
even warning somebody about:

  git fetch https://username:password@example.com

Maybe it's bad practice (though with password stand-ins like
restricted-access tokens, it isn't always), but at the point the user
has invoked git, there's nothing else bad that happens by doing what
they asked for.

What _is_ potentially bad is using that URL for cloning, because we then
write the credential in plaintext to the config file. And a better
solution there is probably to issue a warning, clone anyway, and then
omit the password from what we write into config. A follow-on fetch
would fail, but that is not any worse than the "die" mode of this
current config option.

There were even patches we discussed a few years ago:

  https://lore.kernel.org/git/20190517222031.GA17966@sigill.intra.peff.net/

I think the only really contentious part was the third one, which tried
to do so automagic with credential helpers. If we skipped that, then the
mode given by patches 1+2 seems strictly better than anything this
transfer.credentialsInUrl option provides.

-Peff
diff mbox series

Patch

diff --git a/builtin/clone.c b/builtin/clone.c
index 547d6464b3c..da219b74e43 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -903,12 +903,15 @@  int cmd_clone(int argc, const char **argv, const char *prefix)
 	int err = 0, complete_refs_before_fetch = 1;
 	int submodule_progress;
 	int filter_submodules = 0;
-
+	enum credentials_in_url cred_in_url_cfg = get_credentials_in_url();
 	struct transport_ls_refs_options transport_ls_refs_options =
 		TRANSPORT_LS_REFS_OPTIONS_INIT;
 
 	packet_trace_identity("clone");
 
+	if (cred_in_url_cfg == CRED_IN_URL_WARN)
+		git_config_push_parameter("transfer.credentialsInUrl=allow");
+
 	git_config(git_clone_config, NULL);
 
 	argc = parse_options(argc, argv, prefix, builtin_clone_options,
diff --git a/builtin/fetch.c b/builtin/fetch.c
index b06e454cbdd..34a10e1927f 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -2110,9 +2110,13 @@  int cmd_fetch(int argc, const char **argv, const char *prefix)
 	struct remote *remote = NULL;
 	int result = 0;
 	int prune_tags_ok = 1;
+	enum credentials_in_url cred_in_url_cfg = get_credentials_in_url();
 
 	packet_trace_identity("fetch");
 
+	if (cred_in_url_cfg == CRED_IN_URL_WARN)
+		git_config_push_parameter("transfer.credentialsInUrl=allow");
+
 	/* Record the command line for the reflog */
 	strbuf_addstr(&default_rla, "fetch");
 	for (i = 1; i < argc; i++) {
diff --git a/builtin/push.c b/builtin/push.c
index f0329c62a2d..a4890b1586d 100644
--- a/builtin/push.c
+++ b/builtin/push.c
@@ -576,7 +576,7 @@  int cmd_push(int argc, const char **argv, const char *prefix)
 	struct string_list *push_options;
 	const struct string_list_item *item;
 	struct remote *remote;
-
+	enum credentials_in_url cred_in_url_cfg = get_credentials_in_url();
 	struct option options[] = {
 		OPT__VERBOSITY(&verbosity),
 		OPT_STRING( 0 , "repo", &repo, N_("repository"), N_("repository")),
@@ -619,6 +619,10 @@  int cmd_push(int argc, const char **argv, const char *prefix)
 	};
 
 	packet_trace_identity("push");
+
+	if (cred_in_url_cfg == CRED_IN_URL_WARN)
+		git_config_push_parameter("transfer.credentialsInUrl=allow");
+
 	git_config(git_push_config, &flags);
 	argc = parse_options(argc, argv, prefix, options, push_usage, 0);
 	push_options = (push_options_cmdline.nr
diff --git a/remote.c b/remote.c
index 60869beebe7..a35da349629 100644
--- a/remote.c
+++ b/remote.c
@@ -615,24 +615,42 @@  const char *remote_ref_for_branch(struct branch *branch, int for_push)
 	return NULL;
 }
 
-static void validate_remote_url(struct remote *remote)
+static enum credentials_in_url cred_in_url = CRED_IN_URL_UNKNOWN;
+enum credentials_in_url get_credentials_in_url(void)
 {
-	int i;
+	enum credentials_in_url new = CRED_IN_URL_ALLOW;
 	const char *value;
-	struct strbuf redacted = STRBUF_INIT;
-	int warn_not_die;
+
+	if (cred_in_url != CRED_IN_URL_UNKNOWN)
+		return cred_in_url;
 
 	if (git_config_get_string_tmp("transfer.credentialsinurl", &value))
-		return;
+		goto done;
 
 	if (!strcmp("warn", value))
-		warn_not_die = 1;
+		new = CRED_IN_URL_WARN;
 	else if (!strcmp("die", value))
-		warn_not_die = 0;
+		new = CRED_IN_URL_DIE;
 	else if (!strcmp("allow", value))
-		return;
+		goto done;
 	else
-		die(_("unrecognized value transfer.credentialsInUrl: '%s'"), value);
+		die(_("unrecognized value transfer.credentialsInURL: '%s'"), value);
+
+done:
+	cred_in_url = new;
+	return cred_in_url;
+}
+
+static void validate_remote_url(struct remote *remote)
+{
+	int i;
+	struct strbuf redacted = STRBUF_INIT;
+	enum credentials_in_url cfg = get_credentials_in_url();
+
+	if (remote->validated_urls)
+		goto done;
+	if (cfg == CRED_IN_URL_ALLOW)
+		goto done;
 
 	for (i = 0; i < remote->url_nr; i++) {
 		struct url_info url_info = { 0 };
@@ -647,16 +665,25 @@  static void validate_remote_url(struct remote *remote)
 		strbuf_addstr(&redacted,
 			      url_info.url + url_info.passwd_off + url_info.passwd_len);
 
-		if (warn_not_die)
+		switch (cfg) {
+		case CRED_IN_URL_WARN:
 			warning(_("URL '%s' uses plaintext credentials"), redacted.buf);
-		else
+			break;
+		case CRED_IN_URL_DIE:
 			die(_("URL '%s' uses plaintext credentials"), redacted.buf);
-
-loop_cleanup:
+			break;
+		case CRED_IN_URL_ALLOW:
+		case CRED_IN_URL_UNKNOWN:
+			BUG("unreachable");
+			break;
+		}
+	loop_cleanup:
 		free(url_info.url);
 	}
 
 	strbuf_release(&redacted);
+done:
+	remote->validated_urls = 1;
 }
 
 static struct remote *
diff --git a/remote.h b/remote.h
index 1c4621b414b..5a2da13b4cb 100644
--- a/remote.h
+++ b/remote.h
@@ -98,6 +98,8 @@  struct remote {
 	int prune;
 	int prune_tags;
 
+	int validated_urls;
+
 	/**
 	 * The configured helper programs to run on the remote side, for
 	 * Git-native protocols.
@@ -445,4 +447,16 @@  void apply_push_cas(struct push_cas_option *, struct remote *, struct ref *);
 char *relative_url(const char *remote_url, const char *url,
 		   const char *up_path);
 
+enum credentials_in_url {
+	CRED_IN_URL_UNKNOWN,
+	CRED_IN_URL_ALLOW,
+	CRED_IN_URL_WARN,
+	CRED_IN_URL_DIE,
+};
+
+/**
+ * Get the transfer.credentialsInUrl config setting as an "enum
+ * credentials_in_url".
+ */
+enum credentials_in_url get_credentials_in_url(void);
 #endif
diff --git a/t/t5516-fetch-push.sh b/t/t5516-fetch-push.sh
index 79dc470c014..d01f5cd349f 100755
--- a/t/t5516-fetch-push.sh
+++ b/t/t5516-fetch-push.sh
@@ -1860,7 +1860,7 @@  test_expect_success LIBCURL 'fetch warns or fails when using username:password'
 
 	test_must_fail git -c transfer.credentialsInUrl=warn fetch https://username:password@localhost 2>err &&
 	grep "warning: $message" err >warnings &&
-	test_line_count = 3 warnings &&
+	test_line_count = 1 warnings &&
 
 	test_must_fail git -c transfer.credentialsInUrl=die fetch https://username:password@localhost 2>err &&
 	grep "fatal: $message" err >warnings &&
diff --git a/t/t5601-clone.sh b/t/t5601-clone.sh
index 45f0803ed4d..37adfd9f83b 100755
--- a/t/t5601-clone.sh
+++ b/t/t5601-clone.sh
@@ -78,7 +78,7 @@  test_expect_success LIBCURL 'clone warns or fails when using username:password'
 
 	test_must_fail git -c transfer.credentialsInUrl=warn clone https://username:password@localhost attempt2 2>err &&
 	grep "warning: $message" err >warnings &&
-	test_line_count = 2 warnings &&
+	test_line_count = 1 warnings &&
 
 	test_must_fail git -c transfer.credentialsInUrl=die clone https://username:password@localhost attempt3 2>err &&
 	grep "fatal: $message" err >warnings &&