diff mbox series

[RFC] setup.c: make bare repo discovery optional

Message ID pull.1261.git.git.1651861810633.gitgitgadget@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] setup.c: make bare repo discovery optional | expand

Commit Message

Glen Choo May 6, 2022, 6:30 p.m. UTC
From: Glen Choo <chooglen@google.com>

Add a config variable, `safe.barerepository`, that tells Git whether or
not to recognize bare repositories when it is trying to discover the
repository. This only affects repository discovery, thus it has no
effect if discovery was not done (e.g. `--git-dir` was passed).

This is motivated by the fact that some workflows don't use bare
repositories at all, and users may prefer to opt out of bare repository
discovery altogether:

- An easy assumption for a user to make is that Git commands run
  anywhere inside a repository's working tree will use the same
  repository. However, if the working tree contains a bare repository
  below the root-level (".git" is preferred at the root-level), any
  operations inside that bare repository use the bare repository
  instead.

  In the worst case, attackers can use this confusion to trick users
  into running arbitrary code (see [1] for a deeper discussion). But
  even in benign situations (e.g. a user renames ".git/" to ".git.old/"
  and commits it for archival purposes), disabling bare repository
  discovery can be a simpler mode of operation (e.g. because the user
  doesn't actually want to use ".git.old/") [2].

- Git won't "accidentally" recognize a directory that wasn't meant to be
  a bare repository, but happens to resemble one. While such accidents
  are probably very rare in practice, this lets users reduce the chance
  to zero.

This config is designed to be used like an allow-list, but it is not yet
clear what a good format for this allow-list would be. As such, this
patch limits the config value to a tri-state of [true|false|unset]:

- [*|(unset)] recognize all bare repositories (like Git does today)
- (empty) recognize no bare repositories

and leaves the full format to be determined later.

[1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
[2]: I don't personally know anyone who does this as part of their
normal workflow, but a cursory search on GitHub suggests that there is a
not insubstantial number of people who munge ".git" in order to store
its contents.

https://github.com/search?l=&o=desc&p=1&q=ref+size%3A%3C1000+filename%3AHEAD&s=indexed&type=Code
(aka search for the text "ref", size:<1000, filename:HEAD)

Signed-off-by: Glen Choo <chooglen@google.com>
---
    RFC setup.c: make bare repo discovery optional
    
    (Forgive the non-standard RFC tag, I haven't figured out how to send as
    RFC using GGG. I also didn't realize that /preview would also respect
    CC...)
    
    = Description
    
    A relatively easy win that came out of the discussions around embedded
    bare repos [1], is to just let users opt-out of discovering bare repos.
    This patch does exactly that, by adding a 'boolean' config variable,
    safe.barerepository.
    
    safe.barerepository is presented to users as an allow-list of
    directories that Git will recognize as a bare repository during the
    repository discovery process (much like safe.directory), but this patch
    only implements (and permits) boolean behavior (i.e. on, off and unset).
    Hopefully, this gives us some room to discuss and experiment with
    possible formats.
    
    Thanks to Taylor for suggesting the allow-list idea :)
    
    I think the core concept of letting users toggle bare repo discovery is
    solid, but I'm sending this as RFC for the following reasons:
    
     * I don't love the name safe.barerepository, because it feels like Git
       is saying that bare repos are unsafe and consequently, that bare repo
       users are behaving unsafely. On the other hand, this is quite similar
       to safe.directory in a few respects, so it might make sense for the
       naming to reflect that.
    
     * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
       yet.
    
    = How this relates to embedded bare repos
    
    This does not change the default behavior (i.e. Git will still discover
    all bare repos by default) because that would be catastrophic for bare
    repo users [2]. As such, this patch isn't intended to solve the problem
    of embedded bare repos for all users once and for all, but I think it
    does improve the our stance on the matter:
    
     * In the short-term, users who know they won't need bare repos (or
       those who are willing to set GIT_DIR for all of their bare repos) can
       opt-in to a safer, easier to reason about mode of operation.
    
     * In the longer-term, we might identify a usable-enough default that we
       can give opt-out protection that works for the vast majority of
       users.
    
    = Other questions/Concerns
    
     * Maybe it's more informative for the user if we die() (or warn()) when
       we find a bare repo instead of silently ignoring it?
    
     * I wonder if it makes sense to separate the toggle for bare repo
       discovery and the allow-list of bare repositories. Something like
       core.barediscovery or discovery.barerepository has a lot less baggage
       than safe.*, and boolean enable/disable is a lot simpler, but this
       isn't good from an extensibility perspective.
    
     * Is there any reason why safe.barerepository shouldn't use the same
       format as (its obvious inspiration) safe.directory?
    
     * Are the docs clear enough? I found those hard to put into words, so
       I'd especially appreciate wording suggestions :)
    
    = Future work
    
     * Like safe.directory, safe.barerepository is only read from system and
       global config. I anticipate that this is too restrictive; there has
       already been some discussion of adding a GIT_SAFE_DIRECTORIES
       environment variable for safe.directory [3], and it would be useful
       to have the same thing for safe.barerepository.
    
    [1]
    https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
    [2] In https://lore.kernel.org/git/xmqqh76ucdg6.fsf@gitster.g, Junio
    experimented with switching off bare repo discovery altogether and
    relying solely on GIT_DIR. The resulting fallout was deemed too big to
    be feasible. [3] https://lore.kernel.org/git/xmqqee1il09v.fsf@gitster.g/

Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1261%2Fchooglen%2Fsetup%2Fdisable-bare-repo-config-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1261/chooglen/setup/disable-bare-repo-config-v1
Pull-Request: https://github.com/git/git/pull/1261

 Documentation/config/safe.txt | 24 +++++++++++++++
 setup.c                       | 36 ++++++++++++++++++++++-
 t/t1510-repo-setup.sh         | 55 +++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 1 deletion(-)


base-commit: 0f828332d5ac36fc63b7d8202652efa152809856

Comments

Junio C Hamano May 6, 2022, 8:33 p.m. UTC | #1
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

> +safe.barerepository::
> +	This config entry specifies directories that Git can recognize as
> +	a bare repository when looking for the repository (aka repository
> +	discovery). This has no effect if repository discovery is not
> +	performed e.g. the path to the repository is set via `--git-dir`
> +	(see linkgit:git[1]).
> ++
> +It is recommended that you set this value so that Git will only use the bare
> +repositories you intend it to. This prevents certain types of security and
> +non-security problems, such as:
> +
> +* `git clone`-ing a repository containing a maliciously bare repository
> +  inside it.

"maliciously bare"? "malicious bare" probably.

> +* Git recognizing a directory that isn't mean to be a bare repository,

"mean to be" -> "meant to be".

> +  but happens to look like one.

> diff --git a/setup.c b/setup.c
> index a7b36f3ffbf..9b5dd877273 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -1133,6 +1133,40 @@ static int ensure_valid_ownership(const char *path)
>  	return data.is_safe;
>  }
>  
> +/*
> + * This is similar to safe_directory_data, but only supports true/false.
> + */
> +struct safe_bare_repository_data {
> +	int is_safe;
> +};
> +
> +static int safe_bare_repository_cb(const char *key, const char *value, void *d)
> +{
> +	struct safe_bare_repository_data *data = d;
> +
> +	if (strcmp(key, "safe.barerepository"))
> +		return 0;
> +
> +	if (!value || !strcmp(value, "*")) {
> +		data->is_safe = 1;
> +		return 0;
> +	}
> +	if (!*value) {
> +		data->is_safe = 0;
> +		return 0;
> +	}
> +	return -1;
> +}
> +
> +static int should_detect_bare(void)
> +{
> +	struct safe_bare_repository_data data;
> +
> +	read_very_early_config(safe_bare_repository_cb, &data);
> +
> +	return data.is_safe;
> +}
> +
>  enum discovery_result {
>  	GIT_DIR_NONE = 0,
>  	GIT_DIR_EXPLICIT,
> @@ -1238,7 +1272,7 @@ static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
>  			return GIT_DIR_DISCOVERED;
>  		}
>  
> -		if (is_git_directory(dir->buf)) {
> +		if (should_detect_bare() && is_git_directory(dir->buf)) {
>  			if (!ensure_valid_ownership(dir->buf))
>  				return GIT_DIR_INVALID_OWNERSHIP;
>  			strbuf_addstr(gitdir, ".");

This is in a loop, which will go up and try the parent directory if
the body of this block is not entered, so it is calling the new
should_detect_bare() helper over and over if it returns false.

Not a very good idea.

Perhaps this would help?  I dunno.

static int should_detect_bare(void)
{
	static int should = -1; /* unknown yet */

	if (should < 0) {
		struct safe_bare_repository_data data = { 0 };
		read_very_early_config(safe_bare_repository_cb, &data);
		should = data.is_safe;
	}
	return should;
}

In any case, I very much appreciate the fact that this touches the
setup_git_directory_gently_1() codepath only minimally, as we have
other plans to update the code further soonish.

Thanks.
Taylor Blau May 9, 2022, 9:42 p.m. UTC | #2
Hi Glen,

On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
> From: Glen Choo <chooglen@google.com>
>
> Add a config variable, `safe.barerepository`, that tells Git whether or
> not to recognize bare repositories when it is trying to discover the
> repository. This only affects repository discovery, thus it has no
> effect if discovery was not done (e.g. `--git-dir` was passed).

Thanks for working on this! I'm excited to see some patches here, though
I'm not totally convinced of this direction. More below.

To summarize, this proposal attempts to work around the problem of
embedding bare repositories in non-bare checkouts by providing a way to
opt-out of bare repository discovery (which is to only discover things
that are listed in the safe.bareRepository configuration).

I agree that this would prevent the problem you're trying to solve, but
I have significant concerns that this patch is going too far (at the
risk of future damage to unrelated workflows) in order to accomplish
that goal.

My concern is that if we ever flipped the default (i.e. that
"safe.bareRepository" might someday be ""), that many legitimate cases
of using bare repositories would be broken. I think there are many such
legitimate use cases that _do_ rely on discovering bare repositories
(i.e., Git invocations that do not have a `--git-dir` in their
command-line). One such example would be forges, but I imagine that
there are many other uses we don't even know about, and I would like to
avoid breaking those if we ended up changing the default.

If it's possible to pursue a more targeted fix that leaves non-embedded
bare repositories alone, I'd like to try and focus these efforts on a
more narrow fix that would address just the case of embedded bare
repositories. I think that the direction I outlined in:

    https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/

could be a good place to start (see the paragraph beginning with "Here's
an alternative approach" and below for the details).

One potential problem with that approach (that this patch doesn't suffer
from) is that any discovery which finds a bare repository would have to
continue up to the root of the volume in order to figure out whether or
not that bare repository is embedded in another non-bare one. That is
probably a non-starter due to performance, but I think you could easily
work around with a top-level setting that controls whether or not you
even _care_ about embedded bare repositories.

For example, if I set safe.bareRepository='*' in my top-level
/etc/gitconfig, then we can avoid having to continue discovery for bare
repositories altogether because we know we'll allow it anyway.

To pursue a change that targets just embedded bare repositories, I think
you fundamentally have to do an exhaustive repository discovery in order
to figure out whether the (bare) repository you're dealing with is
embedded or not. So having an opt-out for users that either (a) don't
care or (b) can't accept the performance degradation that Emily
mentioned as a result of doing unbounded filesystem traversal would be
sensible.

Playing devil's advocate for a moment, though, even if we had something
like the proposal I outlined, flipping the top-level default from '*' to
some value that implies we stop working in embedded bare repositories
will break existing workflows. But that breakage would just be limited
to embedded bare repositories, and not non-embedded ones. So I think on
balance that breakage would affect fewer real-world users, while still
being just as easy to recover from.

>     safe.barerepository is presented to users as an allow-list of
>     directories that Git will recognize as a bare repository during the
>     repository discovery process (much like safe.directory), but this patch
>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>     Hopefully, this gives us some room to discuss and experiment with
>     possible formats.
>
>     Thanks to Taylor for suggesting the allow-list idea :)

I did suggest an allow-list, but not this one ;-).

>     I think the core concept of letting users toggle bare repo discovery is
>     solid, but I'm sending this as RFC for the following reasons:
>
>      * I don't love the name safe.barerepository, because it feels like Git
>        is saying that bare repos are unsafe and consequently, that bare repo
>        users are behaving unsafely. On the other hand, this is quite similar
>        to safe.directory in a few respects, so it might make sense for the
>        naming to reflect that.

Yes, the concerns I outlined above are definitely echoing this
sentiment. Another way to say it is that this feels like too big of a
hammer (i.e., it is targeting _all_ bare repositories, not just embedded
ones) for too small of a nail (embedded bare repositories). As you're
probably sick of hearing me say by now, I would strongly prefer a more
targeted solution (perhaps what I outlined, or perhaps something else,
so long as it doesn't break non-embedded bare repositories if/ever we
decided to change the default value of safe.bareRepository).

>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>        yet.

Interesting. I wouldn't expect this to be the case (since the default is
to allow everything right now).

>      * In the longer-term, we might identify a usable-enough default that we
>        can give opt-out protection that works for the vast majority of
>        users.

Perhaps, and I think if this were the case then I would feel differently
about this patch. But I don't want us to paint ourselves into a corner,
either. It would be unfortunate to, say, find ourselves in a position
where the only protection against some novel embedded bare repository
attack is to change a default that would break many existing workflows
for _non_-embedded bare repositories.

>     = Other questions/Concerns
>
>      * Maybe it's more informative for the user if we die() (or warn()) when
>        we find a bare repo instead of silently ignoring it?

We should definitely provide more feedback to the user. If I set
`safe.bareRepository` to the empty string via a global config, and then
execute a Git command in a non-embedded bare repository, I get:

    $ git.compile config --get --global --default='*' safe.bareRepository

    $ git.compile rev-parse --absolute-git-dir
    fatal: not a git repository (or any of the parent directories): .git

whereas on the last release of Git, I get instead:

    $ git rev-parse --absolute-git-dir
    /home/ttaylorr/repo.git

I'm still not convinced that just reading repository extensions while
ignoring the rest of config and hooks is too confusing, so I'd be more
in favor of something like:

    $ git.compile rev-parse --absolute-git-dir
    warning: ignoring repository config and hooks
    advice: to permit bare repository discovery (which
    advice: will read config and hooks), consider running:
    advice:
    advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
    /home/ttaylorr/repo.git

(though I still feel strongly that we should pursue a more targeted
approach here).

Thanks,
Taylor
Junio C Hamano May 9, 2022, 10:54 p.m. UTC | #3
Taylor Blau <me@ttaylorr.com> writes:

> Thanks for working on this! I'm excited to see some patches here, though
> I'm not totally convinced of this direction. More below.
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.
>
> My concern is that if we ever flipped the default (i.e. that
> "safe.bareRepository" might someday be ""), that many legitimate cases
> of using bare repositories would be broken. I think there are many such
> legitimate use cases that _do_ rely on discovering bare repositories
> (i.e., Git invocations that do not have a `--git-dir` in their
> command-line).

I think 99% of such use is to chdir into the directory with HEAD,
refs/ and objects/ in it and let git recognise the cwd is a git
directory.  Am I mistaken, or are there tools that chdir into
objects/08/ and rely on setup_git_directory_gently_1() to find the
parent directory of that 'objects' directory to be a git directory?

I am wondering if another knob to help that particular use case
easier may be sufficient.  If you are a forge operator, you'd just
set a boolean configuration variable to say "it is sufficient to
chdir into a directory to use it a bare repository without exporting
the environment variable GIT_DIR=."

It is likely that end-user human users would not want to enable such
a variable, of course, but I wonder if a simple single knob would be
sufficient to help other use cases you are worried about?

While I wish "extensions and nothing else", i.e. we use "degraded
access", not "refuse to give access at all", were workable, I am
pessimistic that it would work well in practice.

Saying "nothing else" is easy, but we do "if X exists, use it" for
hook, and to implement "nothing else", you'd need to find such a
code and say "even if X exists, because we are in this strange
embedded bare thing, ignore this part of the logic" for every X.
We've been casually saying "potentially risky config" and then
started mixing "hooks" in the discussion, but who knows what other
things are used from the repository by third-party tools that we
need to yet add to the mix?

> I'm still not convinced that just reading repository extensions while
> ignoring the rest of config and hooks is too confusing, so I'd be more
> in favor of something like:

I do not think it would be confusing.  I am worried about it being
error prone.

>     $ git.compile rev-parse --absolute-git-dir
>     warning: ignoring repository config and hooks
>     advice: to permit bare repository discovery (which
>     advice: will read config and hooks), consider running:
>     advice:
>     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
>     /home/ttaylorr/repo.git

Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
IOW, the warning says you are ignoring, but we are still recognising
it as a repository?

By the way, do we need safe.bareRepository?  Shouldn't
safe.directory cover the same purpose?  

If a directory is on the latter, you are saying that (1) the
directory is OK to use as a repository, and (2) it is so even if the
directory is owned by somebody else, not you.

Theoretically you can argue that there can be cases where you only
want (1) and not (2), but as long as you control such a directory
(like an embedded repository in your project's checkout) yourself,
you do not have to worry about the "ok even if it is owned by
somebody else" part.
Taylor Blau May 9, 2022, 11:57 p.m. UTC | #4
On Mon, May 09, 2022 at 03:54:08PM -0700, Junio C Hamano wrote:
> Taylor Blau <me@ttaylorr.com> writes:
>
> > Thanks for working on this! I'm excited to see some patches here, though
> > I'm not totally convinced of this direction. More below.
> >
> > To summarize, this proposal attempts to work around the problem of
> > embedding bare repositories in non-bare checkouts by providing a way to
> > opt-out of bare repository discovery (which is to only discover things
> > that are listed in the safe.bareRepository configuration).
> >
> > I agree that this would prevent the problem you're trying to solve, but
> > I have significant concerns that this patch is going too far (at the
> > risk of future damage to unrelated workflows) in order to accomplish
> > that goal.
> >
> > My concern is that if we ever flipped the default (i.e. that
> > "safe.bareRepository" might someday be ""), that many legitimate cases
> > of using bare repositories would be broken. I think there are many such
> > legitimate use cases that _do_ rely on discovering bare repositories
> > (i.e., Git invocations that do not have a `--git-dir` in their
> > command-line).
>
> I think 99% of such use is to chdir into the directory with HEAD,
> refs/ and objects/ in it and let git recognise the cwd is a git
> directory.  Am I mistaken, or are there tools that chdir into
> objects/08/ and rely on setup_git_directory_gently_1() to find the
> parent directory of that 'objects' directory to be a git directory?

If you took this change, and then at some point in the future we changed
the default value of safe.bareRepository to "", wouldn't that break that
99% of use cases you are talking about?

When I read your "I think 99% of such use is ...", it makes me think
that this change won't disrupt bare repo discovery when we only traverse
one layer above $CWD. But this change disrupts the case where we don't
need to traverse at all to do discovery (i.e., when $CWD is the root of
a bare repository).

> I am wondering if another knob to help that particular use case
> easier may be sufficient.  If you are a forge operator, you'd just
> set a boolean configuration variable to say "it is sufficient to
> chdir into a directory to use it a bare repository without exporting
> the environment variable GIT_DIR=."

Yes, GitHub would almost certainly set safe.bareRepository to "*"
regardless of what Git's own default would be.

> It is likely that end-user human users would not want to enable such
> a variable, of course, but I wonder if a simple single knob would be
> sufficient to help other use cases you are worried about?

I'm not sure I agree that end-users wouldn't want to touch this knob. If
they have embedded bare repositories that they rely on as test fixtures,
for example, wouldn't safe.bareRepository need to be tweaked?

(On a separate but somewhat-related note, I still think that this
setting should be read from the repository config, too, i.e., it seems
odd that we'd force a user to set safe.bareRepository to some deeply
nested repository (in the embedded case) via their global config.)

> While I wish "extensions and nothing else", i.e. we use "degraded
> access", not "refuse to give access at all", were workable, I am
> pessimistic that it would work well in practice.
>
> Saying "nothing else" is easy, but we do "if X exists, use it" for
> hook, and to implement "nothing else", you'd need to find such a
> code and say "even if X exists, because we are in this strange
> embedded bare thing, ignore this part of the logic" for every X.
> We've been casually saying "potentially risky config" and then
> started mixing "hooks" in the discussion, but who knows what other
> things are used from the repository by third-party tools that we
> need to yet add to the mix?
>
> > I'm still not convinced that just reading repository extensions while
> > ignoring the rest of config and hooks is too confusing, so I'd be more
> > in favor of something like:
>
> I do not think it would be confusing.  I am worried about it being
> error prone.

Yeah, on this and the above quoted hunk, I am fine if our behavior
eventually became "call die()" for when we are in an embedded bare
repository. But I do think this transition should be gradual, i.e., we
should likely emit a warning in those cases that would be broken in the
future to say "this will break, run this `git config` invocation if you
want it to remain working".

> >     $ git.compile rev-parse --absolute-git-dir
> >     warning: ignoring repository config and hooks
> >     advice: to permit bare repository discovery (which
> >     advice: will read config and hooks), consider running:
> >     advice:
> >     advice:   $ git config --global --add safe.bareRepository /home/ttaylorr/repo.git
> >     /home/ttaylorr/repo.git
>
> Is the last line meant to be an output from "rev-parse --absolute-git-dir"?
> IOW, the warning says you are ignoring, but we are still recognising
> it as a repository?

In this example, yes. But again, I'm not so deeply attached to the idea
that we *have* to run in those cases. So I would equally be OK with the
above s/warning/fatal and minus the last line, too (i.e., that we call
die(), obviously we'd have to emit the advice before calling die()).

> By the way, do we need safe.bareRepository?  Shouldn't
> safe.directory cover the same purpose?
>
> If a directory is on the latter, you are saying that (1) the
> directory is OK to use as a repository, and (2) it is so even if the
> directory is owned by somebody else, not you.
>
> Theoretically you can argue that there can be cases where you only
> want (1) and not (2), but as long as you control such a directory
> (like an embedded repository in your project's checkout) yourself,
> you do not have to worry about the "ok even if it is owned by
> somebody else" part.

I'm not sure yet, but will think more about it.

Thanks,
Taylor
Junio C Hamano May 10, 2022, 12:23 a.m. UTC | #5
Taylor Blau <me@ttaylorr.com> writes:

>> > My concern is that if we ever flipped the default (i.e. that
>> > "safe.bareRepository" might someday be ""), that many legitimate cases
>> > of using bare repositories would be broken. I think there are many such
>> > legitimate use cases that _do_ rely on discovering bare repositories
>> > (i.e., Git invocations that do not have a `--git-dir` in their
>> > command-line).
>>
>> I think 99% of such use is to chdir into the directory with HEAD,
>> refs/ and objects/ in it and let git recognise the cwd is a git
>> directory.  Am I mistaken, or are there tools that chdir into
>> objects/08/ and rely on setup_git_directory_gently_1() to find the
>> parent directory of that 'objects' directory to be a git directory?
>
> If you took this change, and then at some point in the future we changed
> the default value of safe.bareRepository to "", wouldn't that break that
> 99% of use cases you are talking about?

Our spawning (e.g. "fetch" run_command()s "upload-pack" in a local
repository, or "fetch" runs "upload-pack" over ssh connection, or
http gateway runs "upload-pack" after learning which repository the
request is fetching from) of subcommands can and should be fixed by
exporting "GIT_DIR=." when we spawn them in the target directory,
and such a fix should be more or less trivial.  It must happen
before such a switch of default happens (if it is what we plan to
do, that is).  Also, the trivial fix must be conveyed to third-party
tool authors and give them time to adjust their ware.

That's part of the usual migration process, and I am not so worried
about it.

If some third-party tool for whatever reason wants to start from a
random subdirectory in a bare repository, that is a different story.
Fixing such a third-party tool would be more involved than "more or
less trivial".

> When I read your "I think 99% of such use is ...", it makes me think
> that this change won't disrupt bare repo discovery when we only traverse
> one layer above $CWD. But this change disrupts the case where we don't
> need to traverse at all to do discovery (i.e., when $CWD is the root of
> a bare repository).

By "this change" you mean what Glen proposes?  I think it was
designed to break the use case where you go there to signal that you
want to use the directory as a repository.

>> I am wondering if another knob to help that particular use case
>> easier may be sufficient.  If you are a forge operator, you'd just
>> set a boolean configuration variable to say "it is sufficient to
>> chdir into a directory to use it a bare repository without exporting
>> the environment variable GIT_DIR=."

And such a boolean, without safe.bareRepository setting, should be
sufficient to cover that 99% of such use, because it disables that
deliberate refusal of treating CWD as a repository without
explicitly saying that is what you want with "GIT_DIR=.".  One thing
I wasn't sure about was if that 99% number is close to reality,
hence my question.

> Yes, GitHub would almost certainly set safe.bareRepository to "*"
> regardless of what Git's own default would be.

And with such a boolean, I am hoping that GitHub do not have to make
such a wildly open setting.  Only $CWD that is the top of a repository,
without allowing it to be any random subdirectory, would be allowed.

> I'm not sure I agree that end-users wouldn't want to touch this knob. If
> they have embedded bare repositories that they rely on as test fixtures,
> for example, wouldn't safe.bareRepository need to be tweaked?

But not in the "My $CWD is always fine" knob, whose only reason is
to simplify things without opening you up unnecessarily too widely
for hosting sites.
Glen Choo May 10, 2022, 10 p.m. UTC | #6
Hi Taylor,

Taylor Blau <me@ttaylorr.com> writes:

> Hi Glen,
>
> On Fri, May 06, 2022 at 06:30:10PM +0000, Glen Choo via GitGitGadget wrote:
>> From: Glen Choo <chooglen@google.com>
>>
>> Add a config variable, `safe.barerepository`, that tells Git whether or
>> not to recognize bare repositories when it is trying to discover the
>> repository. This only affects repository discovery, thus it has no
>> effect if discovery was not done (e.g. `--git-dir` was passed).
>
> To summarize, this proposal attempts to work around the problem of
> embedding bare repositories in non-bare checkouts by providing a way to
> opt-out of bare repository discovery (which is to only discover things
> that are listed in the safe.bareRepository configuration).
>
> I agree that this would prevent the problem you're trying to solve, but
> I have significant concerns that this patch is going too far (at the
> risk of future damage to unrelated workflows) in order to accomplish
> that goal.

Thanks again for the careful read. As I understand it, your concern is
that making bare repository discovery configurable and then flipping the
default to e.g. never detecting bare repositories is too disruptive to
fix the embedded bare repository problem. And to avoid disrupting
non-embedded bare repositories, you would prefer to pursue a more
targeted fix.

If the problem statement were limited to embedded bare repositories,
then I agree that this is way more than overkill, and that a targeted
solution would be preferable.

More generally however, the problem of embedded bare repositories seems
to suggest that bare repository discovery doesn't serve all users well,
and in fact, may even be a net negative for a subset of users. I'd be
interested in hearing your thoughts from that perspective, e.g.

- Should bare repository discovery should be configurable?
- What is a good default for bare repository discovery? (regardless of
  how feasible changing the default is)

This is a somewhat different direction from how the conversation started
(I hope it doesn't look like I'm shifting the goal posts), but I think
it's a good opportunity to step back and simplify something that we
wished we got right in the beginning.

And even if we don't flip the default, shipping the config value still
seems useful e.g. there's a good amount of interest in disabling bare
repository discovery at $DAYJOB (and I think we'll get a lot of
interesting results once we do).

>>     safe.barerepository is presented to users as an allow-list of
>>     directories that Git will recognize as a bare repository during the
>>     repository discovery process (much like safe.directory), but this patch
>>     only implements (and permits) boolean behavior (i.e. on, off and unset).
>>     Hopefully, this gives us some room to discuss and experiment with
>>     possible formats.
>>
>>     Thanks to Taylor for suggesting the allow-list idea :)
>
> I did suggest an allow-list, but not this one ;-).

Ah, yes. Oops. Sorry if it looked like I was putting words in your
mouth.

What I really meant was that an allow-list (untethered from any specific
purpose) seems like a useful 'UI primitive', so thanks for bringing up
the option.

>>     I think the core concept of letting users toggle bare repo discovery is
>>     solid, but I'm sending this as RFC for the following reasons:
>>
>>      * I don't love the name safe.barerepository, because it feels like Git
>>        is saying that bare repos are unsafe and consequently, that bare repo
>>        users are behaving unsafely. On the other hand, this is quite similar
>>        to safe.directory in a few respects, so it might make sense for the
>>        naming to reflect that.
>
> Yes, the concerns I outlined above are definitely echoing this
> sentiment. Another way to say it is that this feels like too big of a
> hammer (i.e., it is targeting _all_ bare repositories, not just embedded
> ones) for too small of a nail (embedded bare repositories). As you're
> probably sick of hearing me say by now, I would strongly prefer a more
> targeted solution (perhaps what I outlined, or perhaps something else,
> so long as it doesn't break non-embedded bare repositories if/ever we
> decided to change the default value of safe.bareRepository).

Ok, yeah I think safe.barerepository is a terrible way to achieve my
purported goal of 'making bare repository discovery
configurable/simpler/' - using the "safe." namespace makes it impossible
to see this as anything other than protection against dangerous, unknown
bare repositories. I'll drop the idea of safe-listing known bare
repositories for now, that seems unproductive.

'Optionally disable bare repository discovery' still sounds like it's on
the table though, but probably with a different kind of UX e.g.
"discovery.barerepository" with the options:

- always: always discover bare repos
- never: never discover bare repos
- cwd-only: only discover bare repos if they are the cwd
- dotgit-only: only discover bare repos if they are a descendant of
  .git/

>>      * The *-gcc CI jobs don't pass. I haven't discerned any kind of pattern
>>        yet.
>
> Interesting. I wouldn't expect this to be the case (since the default is
> to allow everything right now).

This might be a false alarm - I saw similar failures on an unrelated
patch. I think my "master" is just out of date :(
diff mbox series

Patch

diff --git a/Documentation/config/safe.txt b/Documentation/config/safe.txt
index 6d764fe0ccf..02032251ffd 100644
--- a/Documentation/config/safe.txt
+++ b/Documentation/config/safe.txt
@@ -1,3 +1,27 @@ 
+safe.barerepository::
+	This config entry specifies directories that Git can recognize as
+	a bare repository when looking for the repository (aka repository
+	discovery). This has no effect if repository discovery is not
+	performed e.g. the path to the repository is set via `--git-dir`
+	(see linkgit:git[1]).
++
+It is recommended that you set this value so that Git will only use the bare
+repositories you intend it to. This prevents certain types of security and
+non-security problems, such as:
+
+* `git clone`-ing a repository containing a maliciously bare repository
+  inside it.
+* Git recognizing a directory that isn't mean to be a bare repository,
+  but happens to look like one.
++
+The currently supported values are `*` (Git recognizes all bare
+repositories) and the empty value (Git never recognizes bare repositories).
+Defaults to `*`.
++
+This config setting is only respected when specified in a system or global
+config, not when it is specified in a repository config or via the command
+line option `-c safe.barerepository=<path>`.
+
 safe.directory::
 	These config entries specify Git-tracked directories that are
 	considered safe even if they are owned by someone other than the
diff --git a/setup.c b/setup.c
index a7b36f3ffbf..9b5dd877273 100644
--- a/setup.c
+++ b/setup.c
@@ -1133,6 +1133,40 @@  static int ensure_valid_ownership(const char *path)
 	return data.is_safe;
 }
 
+/*
+ * This is similar to safe_directory_data, but only supports true/false.
+ */
+struct safe_bare_repository_data {
+	int is_safe;
+};
+
+static int safe_bare_repository_cb(const char *key, const char *value, void *d)
+{
+	struct safe_bare_repository_data *data = d;
+
+	if (strcmp(key, "safe.barerepository"))
+		return 0;
+
+	if (!value || !strcmp(value, "*")) {
+		data->is_safe = 1;
+		return 0;
+	}
+	if (!*value) {
+		data->is_safe = 0;
+		return 0;
+	}
+	return -1;
+}
+
+static int should_detect_bare(void)
+{
+	struct safe_bare_repository_data data;
+
+	read_very_early_config(safe_bare_repository_cb, &data);
+
+	return data.is_safe;
+}
+
 enum discovery_result {
 	GIT_DIR_NONE = 0,
 	GIT_DIR_EXPLICIT,
@@ -1238,7 +1272,7 @@  static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
 			return GIT_DIR_DISCOVERED;
 		}
 
-		if (is_git_directory(dir->buf)) {
+		if (should_detect_bare() && is_git_directory(dir->buf)) {
 			if (!ensure_valid_ownership(dir->buf))
 				return GIT_DIR_INVALID_OWNERSHIP;
 			strbuf_addstr(gitdir, ".");
diff --git a/t/t1510-repo-setup.sh b/t/t1510-repo-setup.sh
index 591505a39c0..3ce8f776921 100755
--- a/t/t1510-repo-setup.sh
+++ b/t/t1510-repo-setup.sh
@@ -541,6 +541,61 @@  test_expect_success '#16e: bareness preserved by --bare' '
 	)
 '
 
+# Test the tri-state of [(unset)|""|"*"].
+test_expect_success '#16f: bare repo in worktree' '
+	test_when_finished "git config --global --unset safe.barerepository" &&
+	setup_repo 16f unset "" unset &&
+
+	git init --bare 16f/default/bare &&
+	git init --bare 16f/default/bare/bare &&
+	try_case 16f/default/bare unset unset \
+		. "(null)" "$here/16f/default/bare" "(null)" &&
+	try_case 16f/default/bare/bare unset unset \
+		. "(null)" "$here/16f/default/bare/bare" "(null)" &&
+
+	git config --global safe.barerepository "*" &&
+	git init --bare 16f/all/bare &&
+	git init --bare 16f/all/bare/bare &&
+	try_case 16f/all/bare unset unset \
+		. "(null)" "$here/16f/all/bare" "(null)" &&
+	try_case 16f/all/bare/bare unset unset \
+		. "(null)" "$here/16f/all/bare/bare" "(null)" &&
+
+	git config --global safe.barerepository "" &&
+	git init --bare 16f/never/bare &&
+	git init --bare 16f/never/bare/bare &&
+	try_case 16f/never/bare unset unset \
+		".git" "$here/16f" "$here/16f" "never/bare/" &&
+	try_case 16f/never/bare/bare unset unset \
+		".git" "$here/16f" "$here/16f" "never/bare/bare/"
+'
+
+test_expect_success '#16g: inside .git with safe.barerepository' '
+	test_when_finished "git config --global --unset safe.barerepository" &&
+
+	# Omit the "default" case; it is covered by 16a.
+
+	git config --global safe.barerepository "*" &&
+	setup_repo 16g/all unset "" unset &&
+	mkdir -p 16g/all/.git/wt/sub &&
+	try_case 16g/all/.git unset unset \
+		. "(null)" "$here/16g/all/.git" "(null)" &&
+	try_case 16g/all/.git/wt unset unset \
+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt" "(null)" &&
+	try_case 16g/all/.git/wt/sub unset unset \
+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt/sub" "(null)" &&
+
+	git config --global safe.barerepository "" &&
+	setup_repo 16g/never unset "" unset &&
+	mkdir -p 16g/never/.git/wt/sub &&
+	try_case 16g/never/.git unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/" &&
+	try_case 16g/never/.git/wt unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/" &&
+	try_case 16g/never/.git/wt/sub unset unset \
+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/sub/"
+'
+
 test_expect_success '#17: GIT_WORK_TREE without explicit GIT_DIR is accepted (bare case)' '
 	# Just like #16.
 	setup_repo 17a unset "" true &&