[RFC,0/2] Conditional config includes based on remote URL

Message ID	cover.1634077795.git.jonathantanmy@google.com (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 12 Oct 2021 15:57:21 -0700 Message-Id: <cover.1634077795.git.jonathantanmy@google.com> Mime-Version: 1.0 Subject: [RFC PATCH 0/2] Conditional config includes based on remote URL From: Jonathan Tan <jonathantanmy@google.com> To: git@vger.kernel.org Cc: Jonathan Tan <jonathantanmy@google.com> Content-Type: text/plain; charset="UTF-8" Precedence: bulk
Series	Conditional config includes based on remote URL \| expand [RFC,0/2] Conditional config includes based on remote URL [RFC,1/2] config: make git_config_include() static [RFC,2/2] config: include file if remote URL matches a glob

Jonathan Tan Oct. 12, 2021, 10:57 p.m. UTC

Previously [1], I sent a patch set for remote-suggested configs that are
transmitted when fetching, but there were some security concerns. Here
is another way that remote repo administators can provide recommended
configs - through conditionally included files based on the configured
remote. Git itself neither transmits nor prompts for these files, which
hopefully reduces people's concerns.

More information is in the commit message of patch 2.

[1] https://lore.kernel.org/git/cover.1623881977.git.jonathantanmy@google.com/

Jonathan Tan (2):
  config: make git_config_include() static
  config: include file if remote URL matches a glob

 config.c          | 80 ++++++++++++++++++++++++++++++++++++++++++-----
 config.h          | 37 +++-------------------
 t/t1300-config.sh | 27 ++++++++++++++++
 3 files changed, 103 insertions(+), 41 deletions(-)

brian m. carlson Oct. 13, 2021, 12:46 a.m. UTC | #1

On 2021-10-12 at 22:57:21, Jonathan Tan wrote:
> Previously [1], I sent a patch set for remote-suggested configs that are
> transmitted when fetching, but there were some security concerns. Here
> is another way that remote repo administators can provide recommended
> configs - through conditionally included files based on the configured
> remote. Git itself neither transmits nor prompts for these files, which
> hopefully reduces people's concerns.
> 
> More information is in the commit message of patch 2.

I won't go into the details of the patches, since I'm a little low on
time at the moment, but I think from what I've seen of the cover letter
and the commit messages, this approach is much better from a security
perspective and, provided we can get the kinks mentioned downthread
ironed out, I'd be happy to see this merged.

Jonathan Tan Oct. 13, 2021, 6:17 p.m. UTC | #2

> I won't go into the details of the patches, since I'm a little low on
> time at the moment, but I think from what I've seen of the cover letter
> and the commit messages, this approach is much better from a security
> perspective and, provided we can get the kinks mentioned downthread
> ironed out, I'd be happy to see this merged.

Thanks - I really appreciate this note. Thanks also for all your
thoughts up to now about the security perspective.

Jonathan Tan Oct. 18, 2021, 8:48 p.m. UTC | #3

After some in-office discussion, here are the alternatives as I see it:

 (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
     name) command that is executed after all config files are read. (If
     there are multiple, they are executed in order of appearance.)
     Files included by this mechanism cannot directly or indirectly
     contain another "includeAfterIf". This is the same as what was
     introduced in this patch set, except for the name of the directive.

 (2) Leave the name as "includeIf", and when it is encountered with a
     remote-URL condition: continue parsing the config files, skipping
     all "includeIf hasRemoteUrl", only looking for remote.*.url. After
     that, resume the reading of config files at the first "includeIf
     hasRemoteUrl", using the prior remote.*.url information gathered to
     determine which files to include when "includeIf hasRemoteUrl" is
     encountered. Files included by this mechanism cannot contain any
     "remote.*.url" variables.

In all cases, the include is executed if at least one remote URL
matches.

There are other ideas including:

 (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
     wants to match it. (But this doesn't fit our use case, in which a
     repo config has the URL but a system or user config has the
     include.)

 (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
     for remote.*.url. (I think this out-of-order config search is more
     complicated than (2), though.)

For (2), I think that prohibiting "remote.*.url" from any "includeIf
hasRemoteUrl" files sidesteps questions like "what happens when an
included file overrides the URL that made us include this file in the
first place" or "what happens if an included file includes a
remote.*.url that validates or invalidates a prior or subsequent file",
because now that cannot happen at all. My main concern with this
prohibition was that if we were to introduce another similar condition
(say, one based on remote names), what would happen? But I think this is
solvable - make the prohibitions based only on all the conditions that
the actually used, so if the user only uses conditions on remote URLs,
then the user can still set refspecs (for example), even after the
remote-name-condition feature is introduced in Git.

For (1), it is simpler in concept (and also in implementation, I think).
The user just needs to know that certain includes are on-the-spot and
certain includes (the ones with "after" in the name) are deferred - in
particular, if a config variable isn't the value they expect, they'll
need to check that it wasn't introduced in an includeAfterIf file. (And
the user also needs to figure out that if they want to override such a
variable, they'll need to make their own includeAfterIf with an
always-true condition.)

From the prior replies, I think that people will be more interested in
(2) as it preserves the "last config wins" rule, and I'm inclined to go
for (2) too. I'll see if others have any other opinions, and if not I'll
see how the implementation of (2) will look like.

Emily Shaffer Oct. 22, 2021, 3:12 a.m. UTC | #4

On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:
> 
> After some in-office discussion, here are the alternatives as I see it:
> 
>  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
>      name) command that is executed after all config files are read. (If
>      there are multiple, they are executed in order of appearance.)
>      Files included by this mechanism cannot directly or indirectly
>      contain another "includeAfterIf". This is the same as what was
>      introduced in this patch set, except for the name of the directive.
> 
>  (2) Leave the name as "includeIf", and when it is encountered with a
>      remote-URL condition: continue parsing the config files, skipping
>      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
>      that, resume the reading of config files at the first "includeIf
>      hasRemoteUrl", using the prior remote.*.url information gathered to
>      determine which files to include when "includeIf hasRemoteUrl" is
>      encountered. Files included by this mechanism cannot contain any
>      "remote.*.url" variables.
> 
> In all cases, the include is executed if at least one remote URL
> matches.
> 
> There are other ideas including:
> 
>  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
>      wants to match it. (But this doesn't fit our use case, in which a
>      repo config has the URL but a system or user config has the
>      include.)
> 
>  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
>      for remote.*.url. (I think this out-of-order config search is more
>      complicated than (2), though.)
> 
> For (2), I think that prohibiting "remote.*.url" from any "includeIf
> hasRemoteUrl" files sidesteps questions like "what happens when an
> included file overrides the URL that made us include this file in the
> first place" or "what happens if an included file includes a
> remote.*.url that validates or invalidates a prior or subsequent file",
> because now that cannot happen at all. My main concern with this
> prohibition was that if we were to introduce another similar condition
> (say, one based on remote names), what would happen? But I think this is
> solvable - make the prohibitions based only on all the conditions that
> the actually used, so if the user only uses conditions on remote URLs,
> then the user can still set refspecs (for example), even after the
> remote-name-condition feature is introduced in Git.
> 
> For (1), it is simpler in concept (and also in implementation, I think).
> The user just needs to know that certain includes are on-the-spot and
> certain includes (the ones with "after" in the name) are deferred - in
> particular, if a config variable isn't the value they expect, they'll
> need to check that it wasn't introduced in an includeAfterIf file. (And
> the user also needs to figure out that if they want to override such a
> variable, they'll need to make their own includeAfterIf with an
> always-true condition.)
> 
> From the prior replies, I think that people will be more interested in
> (2) as it preserves the "last config wins" rule, and I'm inclined to go
> for (2) too. I'll see if others have any other opinions, and if not I'll
> see how the implementation of (2) will look like.

Another concern which came up for me in a private conversation today -

How difficult will it be for users to override this include directive if
it is set somewhere outside of their control? For example:

/etc/gitconfig:
[includeIf hasRemoteUrl.https://example.com/example.git] // or whatever
  path = /etc/some-special-config

Will it be possible for a user to "un-include" /etc/some-special-config
themselves?

I don't think this should change your patch much - if my understanding
is correct, we also don't have a way to "un-include" existing include or
includeIf directives made outside of the user's control. But I wonder if
it'd be useful to think about some way to do that. Maybe we can teach
the config parse how to include a config file in reverse? Maybe we need
a "neverInclude" directive? Food for thought, anyway.

Sorry, but I won't have time to take a look at the rest of this series
til next week.

 - Emily

Ævar Arnfjörð Bjarmason Oct. 25, 2021, 1:03 p.m. UTC | #5

On Tue, Oct 12 2021, Jonathan Tan wrote:

I tried sending the below (sans some last minute spellchecking now)
around October 19th, but for some reason it didn't make it
on-list. Trying again now, apologies for [near-]duplicates, if any (I
elaborated a bit at the end just now).

> Previously [1], I sent a patch set for remote-suggested configs that are
> transmitted when fetching, but there were some security concerns. Here
> is another way that remote repo administators can provide recommended
> configs - through conditionally included files based on the configured
> remote. Git itself neither transmits nor prompts for these files, which
> hopefully reduces people's concerns.

I had some concerns about the specifics of the implementation/what
seemed to be tailoring it a bit too closely to one use-case[1][2], not
inherently with the idea (although I think e.g. for brian that more
closely reflects his thoughts).

Anyway, just saying that aside from this RFC I don't think we were at
the point of really fleshing out what this would look like, and there
being some hard "no", so I think that idea could still be pursued.

On this proposal: this also applies globally to all history, but I don't
have the same concern with that as the 1=1 mapping of remote-suggested
hooks, our path includes work that way, after all.

I think it would be nice if you could think about if/how this and the
"onbranch" include would work together though to serve the general case
better.

Also if you have a repo with N remotes each where "origin" tracks URLs
at git.example.com, and you add a "dev" tracking dev.example.com, will
the config apply if you're say on a branch tracking the "live" server,
if you've said "include this for repos matching dev.example.com?

Arguably that's what you want, but perhaps something that those more
used to the centralized workflows wouldn't consider as being unintuitive
for users who might want to add this config only for their main "origin"
remote. We don't really have a way of marking that special-ness though,
except maybe checkout.defaultRemote.

I'm also still somewhat mystified at how this would better serve your
userbase than the path-based included, i.e. the selling point of the
remote-suggested configuration was that it would Just Work.

But for this the users would either need to setup the config themselves
for your remote, but that would be easier than pro-actively cloning in
"work" or whatever? I guess, just wondering if I'm missing something.

Or if it's a partly-automated system where some automation is dropping
in a /etc/gitconfig.d/google-remote-config-include I wonder if this
whole thing wouldn't be better for users with such special-needs if we
just supported an "early config hook".

i.e. similar to how we read trace2 config from /etc/gitconfig early, we
could start picking up a hook that just so happens to conform to the
config schema Emily's config-based hooks use.

So the /etc/gitconfig would have say:

    hook.ourConfigThingy.command=/usr/bin/googly-git-config
    hook.ourConfigThingy.event=include-config

That hook would just produce a config snippet to be included on STDOUT.

Since it's an arbitrary external command it would nicely get around any
chicken and egg problems in git itself, it could run "git remote -v",
inspect the equivalent of an "onbranch" etc. etc, then just dynamically
produce config-to-be-included.

Please don't take this as some objection to your current proposal, just
a thought on something that might entirely bypass odd edge cases and
arbitrary limitations associated with doing this all in the "main"
process on-the-fly.

The special-ness with that one would need to be that we'd say it
wouldn't have the normal "last set wins" semantics, or maybe we could do
that and just note that we saw it, and execute the "include" when we
detect the end of the full config parsing (I'm not familiar enough with
those bits to say where that is).

Both of those seem easier than dealing with any chicken & egg problems
in parsing the config stream itself, since such a hook could just invoke
"git remote -v" and the like itself, after e.g. setting some environment
variable of its own to guard against its own recursion (or we'd do it
for it for such hooks...).

1. https://lore.kernel.org/git/87k0mn2dd3.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/87o8awvglr.fsf@evledraar.gmail.com/

Jonathan Tan Oct. 25, 2021, 6:53 p.m. UTC | #6

> I had some concerns about the specifics of the implementation/what
> seemed to be tailoring it a bit too closely to one use-case[1][2], not
> inherently with the idea (although I think e.g. for brian that more
> closely reflects his thoughts).
> 
> Anyway, just saying that aside from this RFC I don't think we were at
> the point of really fleshing out what this would look like, and there
> being some hard "no", so I think that idea could still be pursued.

Which idea specifically do you think could still be pursued?

> On this proposal: this also applies globally to all history, but I don't
> have the same concern with that as the 1=1 mapping of remote-suggested
> hooks, our path includes work that way, after all.
> 
> I think it would be nice if you could think about if/how this and the
> "onbranch" include would work together though to serve the general case
> better.
> 
> Also if you have a repo with N remotes each where "origin" tracks URLs
> at git.example.com, and you add a "dev" tracking dev.example.com, will
> the config apply if you're say on a branch tracking the "live" server,
> if you've said "include this for repos matching dev.example.com?

Right now, the feature is only dependent on remote URLs configured
through remote.?.url. It wouldn't work with "onbranch" because there's
no way to combine conditions (and I have no plans to do that). I think
that if you have something that you want depending on which branch
you're on, you can just use the existing "onbranch" feature.

> Arguably that's what you want, but perhaps something that those more
> used to the centralized workflows wouldn't consider as being unintuitive
> for users who might want to add this config only for their main "origin"
> remote. We don't really have a way of marking that special-ness though,
> except maybe checkout.defaultRemote.

What do you mean by adding a config for a specific remote?

> I'm also still somewhat mystified at how this would better serve your
> userbase than the path-based included, i.e. the selling point of the
> remote-suggested configuration was that it would Just Work.
> 
> But for this the users would either need to setup the config themselves
> for your remote, but that would be easier than pro-actively cloning in
> "work" or whatever? I guess, just wondering if I'm missing something.
> 
> Or if it's a partly-automated system where some automation is dropping
> in a /etc/gitconfig.d/google-remote-config-include 

Yes, the config is meant to be handled e.g. through a package manager
like apt. We don't want to prescribe directory structures like "work",
which is why the include is conditional upon the remote URL.

Even if the user pro-actively clones into "work", the user still needs
to set up the conditional config, so I don't see how that is a net
benefit.

> I wonder if this
> whole thing wouldn't be better for users with such special-needs if we
> just supported an "early config hook".
> 
> i.e. similar to how we read trace2 config from /etc/gitconfig early, we
> could start picking up a hook that just so happens to conform to the
> config schema Emily's config-based hooks use.
> 
> So the /etc/gitconfig would have say:
> 
>     hook.ourConfigThingy.command=/usr/bin/googly-git-config
>     hook.ourConfigThingy.event=include-config
> 
> That hook would just produce a config snippet to be included on STDOUT.
> 
> Since it's an arbitrary external command it would nicely get around any
> chicken and egg problems in git itself, it could run "git remote -v",
> inspect the equivalent of an "onbranch" etc. etc, then just dynamically
> produce config-to-be-included.

I see that later on, you suggest an environment variable to guard
against recursion.

One thing is that if there are multiple such hooks, each one won't be
able to see what the other hooks have produced.

If the feature you described already existed in Git, I think I could use
that, but if we're deciding between implementing the config hook you
describe versus something with more constraints, I think the one I
proposed is better for now. Some design points that have already been
discussed are whether setting a config during processing of an included
file would then invalidate the include and also the order of operations,
both of which would be much more difficult to control with config hooks.

> Please don't take this as some objection to your current proposal, just
> a thought on something that might entirely bypass odd edge cases and
> arbitrary limitations associated with doing this all in the "main"
> process on-the-fly.
> 
> The special-ness with that one would need to be that we'd say it
> wouldn't have the normal "last set wins" semantics, or maybe we could do
> that and just note that we saw it, and execute the "include" when we
> detect the end of the full config parsing (I'm not familiar enough with
> those bits to say where that is).

The "last set" would be those set by the hooks, so yes, a user would
need to know to make their own hook in order to override anything set by
the hooks. The end of the full config parsing is in
config_with_options().

> Both of those seem easier than dealing with any chicken & egg problems
> in parsing the config stream itself, since such a hook could just invoke
> "git remote -v" and the like itself, after e.g. setting some environment
> variable of its own to guard against its own recursion (or we'd do it
> for it for such hooks...).
> 
> 1. https://lore.kernel.org/git/87k0mn2dd3.fsf@evledraar.gmail.com/
> 2. https://lore.kernel.org/git/87o8awvglr.fsf@evledraar.gmail.com/

Ævar Arnfjörð Bjarmason Oct. 26, 2021, 10:12 a.m. UTC | #7

On Mon, Oct 25 2021, Jonathan Tan wrote:

>> I had some concerns about the specifics of the implementation/what
>> seemed to be tailoring it a bit too closely to one use-case[1][2], not
>> inherently with the idea (although I think e.g. for brian that more
>> closely reflects his thoughts).
>> 
>> Anyway, just saying that aside from this RFC I don't think we were at
>> the point of really fleshing out what this would look like, and there
>> being some hard "no", so I think that idea could still be pursued.
>
> Which idea specifically do you think could still be pursued?

I meant the whole in-repo .gitconfig. I.e. to the extent that you're
submitting this as an alternative to that because of the negative
feedback on that RFC.

>> On this proposal: this also applies globally to all history, but I don't
>> have the same concern with that as the 1=1 mapping of remote-suggested
>> hooks, our path includes work that way, after all.
>> 
>> I think it would be nice if you could think about if/how this and the
>> "onbranch" include would work together though to serve the general case
>> better.
>> 
>> Also if you have a repo with N remotes each where "origin" tracks URLs
>> at git.example.com, and you add a "dev" tracking dev.example.com, will
>> the config apply if you're say on a branch tracking the "live" server,
>> if you've said "include this for repos matching dev.example.com?
>
> Right now, the feature is only dependent on remote URLs configured
> through remote.?.url. It wouldn't work with "onbranch" because there's
> no way to combine conditions (and I have no plans to do that). I think
> that if you have something that you want depending on which branch
> you're on, you can just use the existing "onbranch" feature.

I mean with this and the below...

>> Arguably that's what you want, but perhaps something that those more
>> used to the centralized workflows wouldn't consider as being unintuitive
>> for users who might want to add this config only for their main "origin"
>> remote. We don't really have a way of marking that special-ness though,
>> except maybe checkout.defaultRemote.
>
> What do you mean by adding a config for a specific remote?

...what happens if you add a google.com remote for a repository that
"lives" on github.com. I.e. are the semantics "match any remote", or
"match the 'primary' remote (origin?" etc.

>> I'm also still somewhat mystified at how this would better serve your
>> userbase than the path-based included, i.e. the selling point of the
>> remote-suggested configuration was that it would Just Work.
>> 
>> But for this the users would either need to setup the config themselves
>> for your remote, but that would be easier than pro-actively cloning in
>> "work" or whatever? I guess, just wondering if I'm missing something.
>> 
>> Or if it's a partly-automated system where some automation is dropping
>> in a /etc/gitconfig.d/google-remote-config-include 
>
> Yes, the config is meant to be handled e.g. through a package manager
> like apt. We don't want to prescribe directory structures like "work",
> which is why the include is conditional upon the remote URL.
>
> Even if the user pro-actively clones into "work", the user still needs
> to set up the conditional config, so I don't see how that is a net
> benefit.

Ah, that explains it. I assumed both cases would be ones where the user
would need to manually enable the 'configuration' (or cloning to a given
subdir).

>> I wonder if this
>> whole thing wouldn't be better for users with such special-needs if we
>> just supported an "early config hook".
>> 
>> i.e. similar to how we read trace2 config from /etc/gitconfig early, we
>> could start picking up a hook that just so happens to conform to the
>> config schema Emily's config-based hooks use.
>> 
>> So the /etc/gitconfig would have say:
>> 
>>     hook.ourConfigThingy.command=/usr/bin/googly-git-config
>>     hook.ourConfigThingy.event=include-config
>> 
>> That hook would just produce a config snippet to be included on STDOUT.
>> 
>> Since it's an arbitrary external command it would nicely get around any
>> chicken and egg problems in git itself, it could run "git remote -v",
>> inspect the equivalent of an "onbranch" etc. etc, then just dynamically
>> produce config-to-be-included.
>
> I see that later on, you suggest an environment variable to guard
> against recursion.
>
> One thing is that if there are multiple such hooks, each one won't be
> able to see what the other hooks have produced.

Yes, although aside from this hook that's a general caveat with the
proposed config-based hooks, I think if you need a hook that does that
(whether it's this, or pre-receive etc.) our answer is "put it in your
own wrapper".

> If the feature you described already existed in Git, I think I could use
> that, but if we're deciding between implementing the config hook you
> describe versus something with more constraints, I think the one I
> proposed is better for now. Some design points that have already been
> discussed are whether setting a config during processing of an included
> file would then invalidate the include and also the order of operations,
> both of which would be much more difficult to control with config hooks.

I suggested it because maybe it would be a lot simpler, i.e. we don't
need such a feature to be aware of remote config at all, or having to
"read forward" to find it, maybe it would be more complex. I haven't
tried to implement it.

>> Please don't take this as some objection to your current proposal, just
>> a thought on something that might entirely bypass odd edge cases and
>> arbitrary limitations associated with doing this all in the "main"
>> process on-the-fly.
>> 
>> The special-ness with that one would need to be that we'd say it
>> wouldn't have the normal "last set wins" semantics, or maybe we could do
>> that and just note that we saw it, and execute the "include" when we
>> detect the end of the full config parsing (I'm not familiar enough with
>> those bits to say where that is).
>
> The "last set" would be those set by the hooks, so yes, a user would
> need to know to make their own hook in order to override anything set by
> the hooks. The end of the full config parsing is in
> config_with_options().

On the "user would need to know" that's the same if it's config? I.e. in
either case it would be in /etc/gitconfig or whatever shipped by the
*.deb package.

Anyway, I really just meant this as a suggestion, and one that might
make things simpler. If you don't think it makes sense...

Jeff King Oct. 27, 2021, 11:55 a.m. UTC | #8

On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:

>  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
>      name) command that is executed after all config files are read. (If
>      there are multiple, they are executed in order of appearance.)
>      Files included by this mechanism cannot directly or indirectly
>      contain another "includeAfterIf". This is the same as what was
>      introduced in this patch set, except for the name of the directive.

I think this works in terms of having self-consistent rules that make
sense. But deferring things does introduce new complications in terms of
overrides, because we rely on last-one-wins. Emily asked elsewhere about
overriding the inclusion of a file. We don't have a way to do that now,
and I think it would be tricky to add. But what about overriding a
single variable?

Right now this works:

  git config --global foo.bar true
  git config --local foo.bar false

to give you "false". But imagining there was a world of deferred config,
then:

  git config --file ~/.gitconfig-foo foo.bar true
  git config --global deferInclude.path .gitconfig-foo
  git config --local foo.bar false

gives "true". We'd read .gitconfig-foo after everything else, overriding
the repo-level config.

If the deferred includes were processed at the end of each individual
file, that would solve that. You're still left with the slight oddness
that a deferred include may override options within the same file that
come after it, but that's inherent to the "defer" concept, and the
answer is probably "don't do that". It's only when it crosses file
boundaries (which are explicitly ordered by priority) that it really
hurts.

>  (2) Leave the name as "includeIf", and when it is encountered with a
>      remote-URL condition: continue parsing the config files, skipping
>      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
>      that, resume the reading of config files at the first "includeIf
>      hasRemoteUrl", using the prior remote.*.url information gathered to
>      determine which files to include when "includeIf hasRemoteUrl" is
>      encountered. Files included by this mechanism cannot contain any
>      "remote.*.url" variables.

I think doing this as "continue parsing" and "resume" is hard to do.
Because you can't look at other non-remote.*.url entries here (otherwise
you'd see them out of order). So you have to either:

  - complete the parse, stashing all the other variables away, and then
    resolve the include, and then look at all the stashed variables as
    if you were parsing them anew.

  - teach our config parser how to save and restore state, including
    both intra-file state and the progress across the set of files

I think it's much easier if you think of it as "start a new config parse
that does not respect hasRemoteURL". And the easiest way to do that is
to just let remote.c's existing git_config() start that parse (probably
by calling git_config_with_options() and telling it "don't respect
hasRemoteURL includes"). You may also need to teach the config parser to
be reentrant. We did some work on that a while ago, pushing the state
int config_source which functions as a stack, but I don't offhand know
if you can call git_config() from within a config callback.

> There are other ideas including:
> 
>  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
>      wants to match it. (But this doesn't fit our use case, in which a
>      repo config has the URL but a system or user config has the
>      include.)

Yeah, I agree this won't work.

>  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
>      for remote.*.url. (I think this out-of-order config search is more
>      complicated than (2), though.)

I think this is what I described above, and actually is less
complicated. ;)

-Peff

Jonathan Tan Oct. 27, 2021, 5:52 p.m. UTC | #9

> On Mon, Oct 18, 2021 at 01:48:03PM -0700, Jonathan Tan wrote:
> 
> >  (1) Introduce a "includeAfterIf" (or "deferIncludeIf", or some other
> >      name) command that is executed after all config files are read. (If
> >      there are multiple, they are executed in order of appearance.)
> >      Files included by this mechanism cannot directly or indirectly
> >      contain another "includeAfterIf". This is the same as what was
> >      introduced in this patch set, except for the name of the directive.
> 
> I think this works in terms of having self-consistent rules that make
> sense. But deferring things does introduce new complications in terms of
> overrides, because we rely on last-one-wins. Emily asked elsewhere about
> overriding the inclusion of a file. We don't have a way to do that now,
> and I think it would be tricky to add. But what about overriding a
> single variable?
> 
> Right now this works:
> 
>   git config --global foo.bar true
>   git config --local foo.bar false
> 
> to give you "false". But imagining there was a world of deferred config,
> then:
> 
>   git config --file ~/.gitconfig-foo foo.bar true
>   git config --global deferInclude.path .gitconfig-foo
>   git config --local foo.bar false
> 
> gives "true". We'd read .gitconfig-foo after everything else, overriding
> the repo-level config.
> 
> If the deferred includes were processed at the end of each individual
> file, that would solve that. You're still left with the slight oddness
> that a deferred include may override options within the same file that
> come after it, but that's inherent to the "defer" concept, and the
> answer is probably "don't do that". It's only when it crosses file
> boundaries (which are explicitly ordered by priority) that it really
> hurts.

This would indeed solve the issue of the user needing to know the trick
to override variables set by deferred includes. But this wouldn't solve
our primary use case in which a system-level config defines a
conditional include but the repo config defines the URL, I think.

> >  (2) Leave the name as "includeIf", and when it is encountered with a
> >      remote-URL condition: continue parsing the config files, skipping
> >      all "includeIf hasRemoteUrl", only looking for remote.*.url. After
> >      that, resume the reading of config files at the first "includeIf
> >      hasRemoteUrl", using the prior remote.*.url information gathered to
> >      determine which files to include when "includeIf hasRemoteUrl" is
> >      encountered. Files included by this mechanism cannot contain any
> >      "remote.*.url" variables.
> 
> I think doing this as "continue parsing" and "resume" is hard to do.
> Because you can't look at other non-remote.*.url entries here (otherwise
> you'd see them out of order). So you have to either:
> 
>   - complete the parse, stashing all the other variables away, and then
>     resolve the include, and then look at all the stashed variables as
>     if you were parsing them anew.
> 
>   - teach our config parser how to save and restore state, including
>     both intra-file state and the progress across the set of files

I am implementing something similar to your first approach (stashing
things). It's almost done so hopefully we'll have something concrete to
discuss soon.

> I think it's much easier if you think of it as "start a new config parse
> that does not respect hasRemoteURL". And the easiest way to do that is
> to just let remote.c's existing git_config() start that parse (probably
> by calling git_config_with_options() and telling it "don't respect
> hasRemoteURL includes"). You may also need to teach the config parser to
> be reentrant. We did some work on that a while ago, pushing the state
> int config_source which functions as a stack, but I don't offhand know
> if you can call git_config() from within a config callback.

Besides the reentrancy (which may be difficult, as there are some global
variables, but from a glance, some code seems to take care to save and
restore them, so it may already be reentrant or not too difficult to
make reentrant), we would have to bubble down the config (struct
git_config_source and struct config_options) into all the places that
could potentially start the parse and also have a place to store the
URLs we get. If we're already going to stash URLs, it may be easier to
stash the variables instead.

> > There are other ideas including:
> > 
> >  (3) remote.*.url must appear before a "includeIf hasRemoteUrl" that
> >      wants to match it. (But this doesn't fit our use case, in which a
> >      repo config has the URL but a system or user config has the
> >      include.)
> 
> Yeah, I agree this won't work.
> 
> >  (4) "includeIf hasRemoteUrl" triggers a search of the repo config just
> >      for remote.*.url. (I think this out-of-order config search is more
> >      complicated than (2), though.)
> 
> I think this is what I described above, and actually is less
> complicated. ;)
> 
> -Peff

Well, let me finish up (2), and let's see.

Jeff King Oct. 27, 2021, 8:32 p.m. UTC | #10

On Wed, Oct 27, 2021 at 10:52:59AM -0700, Jonathan Tan wrote:

> > If the deferred includes were processed at the end of each individual
> > file, that would solve that. You're still left with the slight oddness
> > that a deferred include may override options within the same file that
> > come after it, but that's inherent to the "defer" concept, and the
> > answer is probably "don't do that". It's only when it crosses file
> > boundaries (which are explicitly ordered by priority) that it really
> > hurts.
> 
> This would indeed solve the issue of the user needing to know the trick
> to override variables set by deferred includes. But this wouldn't solve
> our primary use case in which a system-level config defines a
> conditional include but the repo config defines the URL, I think.

Doh, of course. I forgot that was the whole point of the defer. ;)

> I am implementing something similar to your first approach (stashing
> things). It's almost done so hopefully we'll have something concrete to
> discuss soon.

Sounds good.

-Peff

Ævar Arnfjörð Bjarmason Nov. 29, 2021, 8:48 p.m. UTC | #11

On Mon, Nov 29 2021, Jonathan Tan wrote:

> Thanks everyone for your comments. Here's an update.

Just from skimming this (minor) feedback on v3 still applies:
https://lore.kernel.org/git/211123.86pmqrwtf2.gmgdl@evledraar.gmail.com/

I.e. s/hasremoteurl/hasRemoteURL/ etc. in appropriate places.

Junio C Hamano Nov. 30, 2021, 7:51 a.m. UTC | #12

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> On Mon, Nov 29 2021, Jonathan Tan wrote:
>
>> Thanks everyone for your comments. Here's an update.
>
> Just from skimming this (minor) feedback on v3 still applies:
> https://lore.kernel.org/git/211123.86pmqrwtf2.gmgdl@evledraar.gmail.com/
>
> I.e. s/hasremoteurl/hasRemoteURL/ etc. in appropriate places.

Is there any appropriate place, though?

"hasremoteurl" is a new directive to be used as the leading part of
<condition> in the name of `includeIf.<condition>.path` variable.
The <condtion> part is case sensitive, and we do not want people to
spell it, and the existing "gitdir", "gitdir/i", and "onbranch", in
mixed cases.

See config.c::include_condition_is_true() function and its use of
skip_prefix_mem() to locate these existing conditions.

It is troubling that this patch is *NOT* extend the implementation
of include_condition_is_true() function (which gives a very clean
abstraction and makes the caller very readable); it instead mucks
with the caller of include_condition_is_true() and adds a parallel
logic that include_condition_is_true() does not know about.  It may
have been an expedite way to implement this, and the result may not
seem to hurt when include_condition_is_true() is called by only one
caller, but I find the resulting code structure unnecessarily ugly.

Can't the body of if (skip_prefix_mem(..."hasremoteurl:", ...)) block
become include_by_remoteurl() function, similar to include_by_foo()
functions include_condition_is_true() already calls?

[RFC,0/2] Conditional config includes based on remote URL

Message

Comments