[v4,0/6] config: allow specifying config entries via env

Message ID	cover.1607514692.git.ps@pks.im (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Wed, 9 Dec 2020 12:52:16 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: =?iso-8859-1?q?=C6var_Arnfj=F6r=F0?= Bjarmason <avarab@gmail.com>, Junio C Hamano <gitster@pobox.com>, Jeff King <peff@peff.net>, "brian m. carlson" <sandals@crustytoothpaste.net>, Philip Oakley <philipoakley@iee.email> Subject: [PATCH v4 0/6] config: allow specifying config entries via env Message-ID: <cover.1607514692.git.ps@pks.im> References: <cover.1606214397.git.ps@pks.im> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="VwY3XAvwWaik90wF" Content-Disposition: inline In-Reply-To: <cover.1606214397.git.ps@pks.im> Precedence: bulk
Series	config: allow specifying config entries via env \| expand [v4,0/6] config: allow specifying config entries via env [v4,1/6] git: add `--super-prefix` to usage string [v4,2/6] config: add new way to pass config via `--config-env` [v4,3/6] environment: make `getenv_safe()` non-static [v4,4/6] config: extract function to parse config pairs [v4,5/6] config: refactor parsing of GIT_CONFIG_PARAMETERS [v4,6/6] config: allow specifying config entries via envvar pairs

Patrick Steinhardt Dec. 9, 2020, 11:52 a.m. UTC

Hi,

this is the fourth version of my patch series which aims to implement a
way to pass config entries via the environment while avoiding any
requirements to perform shell quoting on the user's side.

Given that the What's Cooking report notes that my third version is
about to be dropped dropped because the `--config-env` way of doing
things is preferred, I've now adopted that approach. I've taken the
patch which Peff posted originally (with one change strchr->strrchr) and
added documentation and tests to it.

This patch series still includes my old proposal as it would actually be
a better fit for our usecase at GitLab I have in mind, which is to put
all configuration which applies to all git commands into the commands
instead of using a config file for this. I have structured the series in
such a way though that those patches come last -- so if you continue to
think this approach shouldn't make it in, please feel free to drop
patches 3-6.

Patrick

Patrick Steinhardt (6):
  git: add `--super-prefix` to usage string
  config: add new way to pass config via `--config-env`
  environment: make `getenv_safe()` non-static
  config: extract function to parse config pairs
  config: refactor parsing of GIT_CONFIG_PARAMETERS
  config: allow specifying config entries via envvar pairs

 Documentation/git-config.txt |  12 +++
 Documentation/git.txt        |  11 ++-
 cache.h                      |   1 +
 config.c                     | 120 +++++++++++++++++++++-----
 config.h                     |   1 +
 environment.c                |   8 +-
 environment.h                |  12 +++
 git.c                        |   3 +
 t/t1300-config.sh            | 160 ++++++++++++++++++++++++++++++++++-
 9 files changed, 300 insertions(+), 28 deletions(-)
 create mode 100644 environment.h

Ævar Arnfjörð Bjarmason Dec. 9, 2020, 3:29 p.m. UTC | #1

On Wed, Dec 09 2020, Patrick Steinhardt wrote:

> this is the fourth version of my patch series which aims to implement a
> way to pass config entries via the environment while avoiding any
> requirements to perform shell quoting on the user's side.
>
> Given that the What's Cooking report notes that my third version is
> about to be dropped dropped because the `--config-env` way of doing
> things is preferred, I've now adopted that approach. I've taken the
> patch which Peff posted originally (with one change strchr->strrchr) and
> added documentation and tests to it.
>
> This patch series still includes my old proposal as it would actually be
> a better fit for our usecase at GitLab I have in mind, which is to put
> all configuration which applies to all git commands into the commands
> instead of using a config file for this. I have structured the series in
> such a way though that those patches come last -- so if you continue to
> think this approach shouldn't make it in, please feel free to drop
> patches 3-6.

To add even more to your headaches (sorry!) I hadn't really fully looked
at that --config-env proposal.

As noted in my per-patch reply in [1] it will still expose the key part
of the key=value, and in at least one place (url.<base>.insteadOf) the
key is where we'll pass the user/password on the command-line still with
that.

I'd much prefer either your 6/6 over --config-env for that reason & that
--config-env makes it impossible to pass a key with "=" in. For "-c" I
don't think that's much of an issue, but e.g. with
"url.<base>.insteadOf" needing to take arbitrary passwords + us
implicitly/explicitly advertising this as a "here's how you can pass the
password" feature not being able to have "=" is more painful.

I mildly prefer Jeff's suggestion of just getting GIT_CONFIG_PARAMETERS
to the point where we could document it [2][3] to both of those, but
that's mostly an asthetic concern of dealing with N values. It won't
matter for the security aspect (but I think you (but haven't tested)
that you still can't pass a "=", but your 6/6 does allow that).

I still can't quite shake the bad spidey-sense feeling that any of these
are bad in some way we haven't thought of, just from the perspective
that no other tool I can think of that accepts a password has this
mechanism for passing in a user/password or other sensitive data.

E.g. openssh explicitly has refused to add anything of the sort (a
--password parameter, but maybe they didn't consider
--password=ENV_VAR). E.g. curl has a mode where you can have a password
on the command-line, but they then make you use -netrc-file to grab it
from a file. From searching around I see concerns about shell histories
being part of the security model, maybe that's why it's not a common
pattern.

So I still wonder if some version of what I tried with /dev/fd/321 in
[4] would be best, i.e. something that combines transitory+no extra
command invocation+not adding things to shell history. We support that
pattern in general, just not in fetch.c/remote.c for no particular good
reason AFAICT.

I do that that whatever we go for this series would be much better if
the commit messages / added docs explained why we're doing particular
things, and to users why they'd use one method but not the other.

E.g. IIRC this whole series is because it's a hassle to invoke
core.askpass in some stateful program where you'd like to just provide a
transitory password. I think some brief cross-linking or explanation
somewhere of these various ways to pass sensitive values around would be
relly helpful.

1. https://lore.kernel.org/git/871rfzxctq.fsf@evledraar.gmail.com/
2. https://lore.kernel.org/git/20201117023454.GA34754@coredump.intra.peff.net/
3. https://lore.kernel.org/git/20201118015907.GD650959@coredump.intra.peff.net/
4. https://lore.kernel.org/git/87k0upflk4.fsf@evledraar.gmail.com/

Patrick Steinhardt Dec. 11, 2020, 1:35 p.m. UTC | #2

On Wed, Dec 09, 2020 at 04:29:35PM +0100, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Dec 09 2020, Patrick Steinhardt wrote:
> 
> > this is the fourth version of my patch series which aims to implement a
> > way to pass config entries via the environment while avoiding any
> > requirements to perform shell quoting on the user's side.
> >
> > Given that the What's Cooking report notes that my third version is
> > about to be dropped dropped because the `--config-env` way of doing
> > things is preferred, I've now adopted that approach. I've taken the
> > patch which Peff posted originally (with one change strchr->strrchr) and
> > added documentation and tests to it.
> >
> > This patch series still includes my old proposal as it would actually be
> > a better fit for our usecase at GitLab I have in mind, which is to put
> > all configuration which applies to all git commands into the commands
> > instead of using a config file for this. I have structured the series in
> > such a way though that those patches come last -- so if you continue to
> > think this approach shouldn't make it in, please feel free to drop
> > patches 3-6.
> 
> To add even more to your headaches (sorry!) I hadn't really fully looked
> at that --config-env proposal.
> 
> As noted in my per-patch reply in [1] it will still expose the key part
> of the key=value, and in at least one place (url.<base>.insteadOf) the
> key is where we'll pass the user/password on the command-line still with
> that.

True, that's one of the things I don't quite like about `--config-env`.

> I'd much prefer either your 6/6 over --config-env for that reason & that
> --config-env makes it impossible to pass a key with "=" in. For "-c" I
> don't think that's much of an issue, but e.g. with
> "url.<base>.insteadOf" needing to take arbitrary passwords + us
> implicitly/explicitly advertising this as a "here's how you can pass the
> password" feature not being able to have "=" is more painful.
> 
> I mildly prefer Jeff's suggestion of just getting GIT_CONFIG_PARAMETERS
> to the point where we could document it [2][3] to both of those, but
> that's mostly an asthetic concern of dealing with N values. It won't
> matter for the security aspect (but I think you (but haven't tested)
> that you still can't pass a "=", but your 6/6 does allow that).

Documenting the format would be interesting, but I'm still not quite
sure how it'd be used then. Using a separate `git shell-quote` binary
just to correctly convert the strings to what git would expect doesn't
seem ideal to me, also because it would mean a separate process for each
git invocation which wants to use GIT_CONFIG_PARAMETERS. On the other
hand, reimplementing the shellquoting functionality wherever you want to
use it doesn't sound ideal either.

[snip]

> I do that that whatever we go for this series would be much better if
> the commit messages / added docs explained why we're doing particular
> things, and to users why they'd use one method but not the other.

Makes sense. The commit messages do mention it, but docs don't. I plan
to take your explanation anyway as it's a lot better compared to what I
had, and it does explain why one would want to use `--config-env`.

> E.g. IIRC this whole series is because it's a hassle to invoke
> core.askpass in some stateful program where you'd like to just provide a
> transitory password. I think some brief cross-linking or explanation
> somewhere of these various ways to pass sensitive values around would be
> relly helpful.

It had been the original intention, yes. And it still is, but in fact
the usecase has broadened to also use it to get rid of our global git
config in Gitaly. Which is a little bit awkward to do with
`--config-env` or `-c`, as now a ps(1) would first show a dozen of
configuration values only to have the real command buried somewhere at
the back. It would have been easy to implement though with the
GIT_CONFIG_ envvars.

Granted, we could still do the same by just using GIT_CONFIG_PAREMETERS.
But I'm kind of hesitant to reimplement the shell-quoting procedures in
Gitaly, especially considering that we'd put untrusted data in there.

Patrick

Jeff King Dec. 11, 2020, 2:27 p.m. UTC | #3

On Fri, Dec 11, 2020 at 02:35:01PM +0100, Patrick Steinhardt wrote:

> > E.g. IIRC this whole series is because it's a hassle to invoke
> > core.askpass in some stateful program where you'd like to just provide a
> > transitory password. I think some brief cross-linking or explanation
> > somewhere of these various ways to pass sensitive values around would be
> > relly helpful.
> 
> It had been the original intention, yes. And it still is, but in fact
> the usecase has broadened to also use it to get rid of our global git
> config in Gitaly. Which is a little bit awkward to do with
> `--config-env` or `-c`, as now a ps(1) would first show a dozen of
> configuration values only to have the real command buried somewhere at
> the back. It would have been easy to implement though with the
> GIT_CONFIG_ envvars.

I don't know what kinds of variables you want to set exactly, but
another possible option here is some mechanism to point Git to an extra
config file. This would work if you are setting a bunch of options in
some static way, but not if you're setting them to custom values for
each command invocation (because then you'd be dealing with a temp file,
which is annoying and error-prone).

I'm thinking something like a $GIT_CONFIG_ENV_FILE that is parsed after
repo config but before $GIT_CONFIG_PARAMETERS.

Or alternatively, add an includeIf directive that lets you do something
like:

  [includeIf "env:FOO"]
  path = foo.gitconfig

which triggers if $FOO is set. But again, that's only useful if you have
certain "profiles" of config you're trying to set, and not custom
values.

-Peff

Jeff King Dec. 11, 2020, 2:42 p.m. UTC | #4

On Fri, Dec 11, 2020 at 09:27:57AM -0500, Jeff King wrote:

> I don't know what kinds of variables you want to set exactly, but
> another possible option here is some mechanism to point Git to an extra
> config file. This would work if you are setting a bunch of options in
> some static way, but not if you're setting them to custom values for
> each command invocation (because then you'd be dealing with a temp file,
> which is annoying and error-prone).
> 
> I'm thinking something like a $GIT_CONFIG_ENV_FILE that is parsed after
> repo config but before $GIT_CONFIG_PARAMETERS.

One more (probably insane) idea, that you are free to ignore unless it
sparks your interest.

$GIT_CONFIG_ENV could contain an actual config-file snippet itself.
I.e.:

  GIT_CONFIG_ENV='
	[foo]
	bar = value
	[another "section"]
	key = "more complicated value"
  '

In fact, we could have implemented $GIT_CONFIG_PARAMETERS that way from
the very beginning. I'd be hesitant to change it now, though.

It doesn't really make your quoting problem go away, in that you'd now
have to generate a valid and correct config file, which is even more
complicated than shell-quoting. :) But it is at least a well-documented
format whose generator might be used for other things, too.

-Peff

Patrick Steinhardt Dec. 11, 2020, 2:47 p.m. UTC | #5

On Fri, Dec 11, 2020 at 09:27:57AM -0500, Jeff King wrote:
> On Fri, Dec 11, 2020 at 02:35:01PM +0100, Patrick Steinhardt wrote:
> 
> > > E.g. IIRC this whole series is because it's a hassle to invoke
> > > core.askpass in some stateful program where you'd like to just provide a
> > > transitory password. I think some brief cross-linking or explanation
> > > somewhere of these various ways to pass sensitive values around would be
> > > relly helpful.
> > 
> > It had been the original intention, yes. And it still is, but in fact
> > the usecase has broadened to also use it to get rid of our global git
> > config in Gitaly. Which is a little bit awkward to do with
> > `--config-env` or `-c`, as now a ps(1) would first show a dozen of
> > configuration values only to have the real command buried somewhere at
> > the back. It would have been easy to implement though with the
> > GIT_CONFIG_ envvars.
> 
> I don't know what kinds of variables you want to set exactly, but
> another possible option here is some mechanism to point Git to an extra
> config file. This would work if you are setting a bunch of options in
> some static way, but not if you're setting them to custom values for
> each command invocation (because then you'd be dealing with a temp file,
> which is annoying and error-prone).
> 
> I'm thinking something like a $GIT_CONFIG_ENV_FILE that is parsed after
> repo config but before $GIT_CONFIG_PARAMETERS.
> 
> Or alternatively, add an includeIf directive that lets you do something
> like:
> 
>   [includeIf "env:FOO"]
>   path = foo.gitconfig
> 
> which triggers if $FOO is set. But again, that's only useful if you have
> certain "profiles" of config you're trying to set, and not custom
> values.
> 
> -Peff

The issue we have is that the config file isn't necessarily under our
control. It is in most cases, like e.g. when Gitaly gets deployed via
Omnibus. But we also allow for source-based installations, where the
user configures most things manually. And in that case, we have to ask
the user to "Please set config variables A, B and C". Naturally, this is
easy to forget, will drift apart in future releases and so on.

To fix this, the plan is to move all required configuration items into
Gitaly itself, which GIT_CONFIG_COUNT would've allowd to do quite
nicely. Something like Ævar's proposal to allow reading the config from
a file descriptor would also work, and just putting the whole
configuration into an environment variable (similar to your
GIT_CONFIG_ENV_FILE, but containing contents instead of a path). And
finally, using `-c` would also work, with the downside of making it
harder to see what's going on with all the git processes.

With regards to what we require from the config, you can have a look
e.g. at [1]. It doesn't contain much, but we expect the following ones
to be set:

    - core.autocrlf=input
    - gc.auto=0
    - repack.writeBitmaps=true
    - receive.advertisePushOptions=true
    - core.fsyncObjectFiles=true

Anyway, this is all rather specific to Gitaly and may thus not be too
interesting for other. So in the end, we'll just live with the tradeoffs
of whatever solution we end up with.

Patrick

[1]: https://docs.gitlab.com/ee/install/installation.html#configure-it

Patrick Steinhardt Dec. 11, 2020, 2:58 p.m. UTC | #6

On Fri, Dec 11, 2020 at 09:42:14AM -0500, Jeff King wrote:
> On Fri, Dec 11, 2020 at 09:27:57AM -0500, Jeff King wrote:
> 
> > I don't know what kinds of variables you want to set exactly, but
> > another possible option here is some mechanism to point Git to an extra
> > config file. This would work if you are setting a bunch of options in
> > some static way, but not if you're setting them to custom values for
> > each command invocation (because then you'd be dealing with a temp file,
> > which is annoying and error-prone).
> > 
> > I'm thinking something like a $GIT_CONFIG_ENV_FILE that is parsed after
> > repo config but before $GIT_CONFIG_PARAMETERS.
> 
> One more (probably insane) idea, that you are free to ignore unless it
> sparks your interest.
> 
> $GIT_CONFIG_ENV could contain an actual config-file snippet itself.
> I.e.:
> 
>   GIT_CONFIG_ENV='
> 	[foo]
> 	bar = value
> 	[another "section"]
> 	key = "more complicated value"
>   '
> 
> In fact, we could have implemented $GIT_CONFIG_PARAMETERS that way from
> the very beginning. I'd be hesitant to change it now, though.
> 
> It doesn't really make your quoting problem go away, in that you'd now
> have to generate a valid and correct config file, which is even more
> complicated than shell-quoting. :) But it is at least a well-documented
> format whose generator might be used for other things, too.
> 
> -Peff

Our mails crossed, but I did have the same idea. I don't even think it
that insane -- the format is well-documented, it solves some of the
issues I have and implementing it shouldn't be hard considering that all
infra for it exists already. True, it won't fix the quoting issue. But
it would neatly fix our "global config" problem without cluttering
output of ps(1).

Patrick

Ævar Arnfjörð Bjarmason Dec. 11, 2020, 3:21 p.m. UTC | #7

On Fri, Dec 11 2020, Patrick Steinhardt wrote:

> On Fri, Dec 11, 2020 at 09:27:57AM -0500, Jeff King wrote:
>> On Fri, Dec 11, 2020 at 02:35:01PM +0100, Patrick Steinhardt wrote:
>> 
>> > > E.g. IIRC this whole series is because it's a hassle to invoke
>> > > core.askpass in some stateful program where you'd like to just provide a
>> > > transitory password. I think some brief cross-linking or explanation
>> > > somewhere of these various ways to pass sensitive values around would be
>> > > relly helpful.
>> > 
>> > It had been the original intention, yes. And it still is, but in fact
>> > the usecase has broadened to also use it to get rid of our global git
>> > config in Gitaly. Which is a little bit awkward to do with
>> > `--config-env` or `-c`, as now a ps(1) would first show a dozen of
>> > configuration values only to have the real command buried somewhere at
>> > the back. It would have been easy to implement though with the
>> > GIT_CONFIG_ envvars.
>> 
>> I don't know what kinds of variables you want to set exactly, but
>> another possible option here is some mechanism to point Git to an extra
>> config file. This would work if you are setting a bunch of options in
>> some static way, but not if you're setting them to custom values for
>> each command invocation (because then you'd be dealing with a temp file,
>> which is annoying and error-prone).
>> 
>> I'm thinking something like a $GIT_CONFIG_ENV_FILE that is parsed after
>> repo config but before $GIT_CONFIG_PARAMETERS.
>> 
>> Or alternatively, add an includeIf directive that lets you do something
>> like:
>> 
>>   [includeIf "env:FOO"]
>>   path = foo.gitconfig
>> 
>> which triggers if $FOO is set. But again, that's only useful if you have
>> certain "profiles" of config you're trying to set, and not custom
>> values.
>> 
>> -Peff
>
> The issue we have is that the config file isn't necessarily under our
> control. It is in most cases, like e.g. when Gitaly gets deployed via
> Omnibus. But we also allow for source-based installations, where the
> user configures most things manually. And in that case, we have to ask
> the user to "Please set config variables A, B and C". Naturally, this is
> easy to forget, will drift apart in future releases and so on.
>
> To fix this, the plan is to move all required configuration items into
> Gitaly itself, which GIT_CONFIG_COUNT would've allowd to do quite
> nicely. Something like Ævar's proposal to allow reading the config from
> a file descriptor would also work, and just putting the whole
> configuration into an environment variable (similar to your
> GIT_CONFIG_ENV_FILE, but containing contents instead of a path). And
> finally, using `-c` would also work, with the downside of making it
> harder to see what's going on with all the git processes.

Aside from other stuff mentioned in this thread a trick I've used for a
while to make things "git-y" is:

    [alias]
    sh = !sh

Then you can just:

    git -c foo.bar=baz sh -c 'git config --get foo.bar'

Or, with a symlink from "git-aly" to "gitaly" in $PATH:

    git -c foo.bar=baz aly [...]

Although that's more a hack, and may go away depending on what happens
to dashed builtins (I don't know what Johannes was planning there).

Of course this only works for global config and "I want to run this
script doing a bunch of git stuff, and using this config", not
e.g. dynamically setting a password for one request.

> With regards to what we require from the config, you can have a look
> e.g. at [1]. It doesn't contain much, but we expect the following ones
> to be set:
>
>     - core.autocrlf=input
>     - gc.auto=0
>     - repack.writeBitmaps=true
>     - receive.advertisePushOptions=true
>     - core.fsyncObjectFiles=true
>
> Anyway, this is all rather specific to Gitaly and may thus not be too
> interesting for other. So in the end, we'll just live with the tradeoffs
> of whatever solution we end up with.
>
> Patrick
>
> [1]: https://docs.gitlab.com/ee/install/installation.html#configure-it

Jeff King Dec. 11, 2020, 4:02 p.m. UTC | #8

On Fri, Dec 11, 2020 at 03:47:38PM +0100, Patrick Steinhardt wrote:

> The issue we have is that the config file isn't necessarily under our
> control. It is in most cases, like e.g. when Gitaly gets deployed via
> Omnibus. But we also allow for source-based installations, where the
> user configures most things manually. And in that case, we have to ask
> the user to "Please set config variables A, B and C". Naturally, this is
> easy to forget, will drift apart in future releases and so on.

For GitHub, we ship a VM appliance, so we just put what we want into
/etc/gitconfig. I think in the worst case you could simplify everything
down to "put [include]path=/usr/share/gitlab/gitconfig into your system
config.  Though I'm a little surprised that you wouldn't just ship your
own version of Git that is used on the backend (so you know you have the
right version, plus any custom patches you'd want), and then point its
system config to /usr/share/gitlab/ or whatever.

But I'm probably just showing my ignorance of your setup / install
procedures, and there are a dozen reasons why that wouldn't work. ;)

> To fix this, the plan is to move all required configuration items into
> Gitaly itself, which GIT_CONFIG_COUNT would've allowd to do quite
> nicely. Something like Ævar's proposal to allow reading the config from
> a file descriptor would also work, and just putting the whole
> configuration into an environment variable (similar to your
> GIT_CONFIG_ENV_FILE, but containing contents instead of a path). And
> finally, using `-c` would also work, with the downside of making it
> harder to see what's going on with all the git processes.

We do have a couple scripts (like our git-repack wrapper) that make
heavy use of "git -c" to tweak things that don't have a command-line
option, or for which using it is awkward. It does clutter up "ps" a bit,
but it's sometimes nice to see the extra values, too (just as you'd see
command-line options).

-Peff

[v4,0/6] config: allow specifying config entries via env

Message

Comments