mbox series

[v2,0/2] setup.c: make bare repo discovery optional

Message ID pull.1261.v2.git.git.1652485058.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series setup.c: make bare repo discovery optional | expand

Message

Johannes Schindelin via GitGitGadget May 13, 2022, 11:37 p.m. UTC
Thanks all for the comments on v1, I've expanded this series somewhat to
address them, notably:

 * The config value is now named discovery.bare and is an enum (instead of
   an allow-list). This hopefully moves us away from "bare repos are unsafe
   and need to be guarded against" and towards "bare repos can be made
   optional if it serves your needs better".
 * discovery.bare now causes git to die() instead of silently ignoring the
   bare repo.
 * Add an option that allows a bare repo if it is the CWD, since this is
   presumably a reasonable default for 99% of bare repo users [1].

= Questions/Concerns

 * die()-ing is necessary if we're trying to flip the default value of
   discovery.bare. We'd expect many bare repo users to be broken, and it's
   more helpful to fail loudly than to silently ignore the bare repo.
   
   But in the long term, long after we've flipped the default and users know
   that they need to opt into bare repo discovery, would it be a better UX
   to just silently ignore the bare repo?

= Patch organization

 * Patch 1 introduces discovery.bare with allowed values [always|never].

 * Patch 2 adds discover.bare=cwd, which is useful when users don't always
   set GIT_DIR e.g. their workflow really depends on it, they are in the
   midst of migration.

= Series history

Changes since v1:

 * Rename safe.barerepository to discovery.bare and make it die()
 * Move tests into t/t0034-discovery-bare.sh
 * Avoid unnecessary config reading by using a static variable
 * Add discovery.bare=cwd
 * Fix typos

[1] I tried this 'cwd' setting on our test suite, with some pretty promising
results.

https://github.com/chooglen/git/actions/runs/2321914777

Out of the 8 failing scripts:

 * 6 are of the form "make sure we 'do the right thing' inside a
   subdirectory of a bare repo" (which typically means .git) e.g.
   t9903-bash-prompt.sh. We should be setting discovery.bare=always for
   these tests, so this is a non-issue.
 * t5323-pack-redundant.sh can be rewritten to -C into the root of the bare
   repo instead of a subdirectory.
 * t3310-notes-merge-manual-resolve.sh: not sure what the test is checking
   in particular, but I think this can be rewritten.

IOW, I don't think we have any commands that require that CWD is a
subdirectory of a bare repo, and we could use discovery.bare without much
hassle.

Glen Choo (2):
  setup.c: make bare repo discovery optional
  setup.c: learn discovery.bareRepository=cwd

 Documentation/config/discovery.txt | 26 +++++++++
 setup.c                            | 89 ++++++++++++++++++++++++++++--
 t/t0034-discovery-bare.sh          | 69 +++++++++++++++++++++++
 3 files changed, 178 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/config/discovery.txt
 create mode 100755 t/t0034-discovery-bare.sh


base-commit: e8005e4871f130c4e402ddca2032c111252f070a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1261%2Fchooglen%2Fsetup%2Fdisable-bare-repo-config-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1261/chooglen/setup/disable-bare-repo-config-v2
Pull-Request: https://github.com/git/git/pull/1261

Range-diff vs v1:

 1:  3370258c4b3 ! 1:  22b10bf9da8 [RFC] setup.c: make bare repo discovery optional
     @@ Metadata
      Author: Glen Choo <chooglen@google.com>
      
       ## Commit message ##
     -    [RFC] setup.c: make bare repo discovery optional
     +    setup.c: make bare repo discovery optional
      
     -    Add a config variable, `safe.barerepository`, that tells Git whether or
     -    not to recognize bare repositories when it is trying to discover the
     -    repository. This only affects repository discovery, thus it has no
     +    Add a config variable, `discovery.bare`, that tells Git whether or not
     +    it should work with the bare repository it has discovered i.e. Git will
     +    die() if it discovers a bare repository, but it is not allowed by
     +    `discovery.bare`. This only affects repository discovery, thus it has no
          effect if discovery was not done (e.g. `--git-dir` was passed).
      
          This is motivated by the fact that some workflows don't use bare
     @@ Commit message
            are probably very rare in practice, this lets users reduce the chance
            to zero.
      
     -    This config is designed to be used like an allow-list, but it is not yet
     -    clear what a good format for this allow-list would be. As such, this
     -    patch limits the config value to a tri-state of [true|false|unset]:
     +    This config is an enum of:
      
     -    - [*|(unset)] recognize all bare repositories (like Git does today)
     -    - (empty) recognize no bare repositories
     +    - ["always"|(unset)]: always recognize bare repositories (like Git does
     +      today)
     +    - "never": never recognize bare repositories
      
     -    and leaves the full format to be determined later.
     +    More values are expected to be added later, and the default is expected
     +    to change (i.e. to something other than "always").
      
          [1]: https://lore.kernel.org/git/kl6lsfqpygsj.fsf@chooglen-macbookpro.roam.corp.google.com
          [2]: I don't personally know anyone who does this as part of their
     @@ Commit message
      
          Signed-off-by: Glen Choo <chooglen@google.com>
      
     - ## Documentation/config/safe.txt ##
     +    WIP setup.c: make discovery.bare die on failure
     +
     +    Signed-off-by: Glen Choo <chooglen@google.com>
     +
     + ## Documentation/config/discovery.txt (new) ##
      @@
     -+safe.barerepository::
     -+	This config entry specifies directories that Git can recognize as
     -+	a bare repository when looking for the repository (aka repository
     ++discovery.bare::
     ++	Specifies what kinds of directories Git can recognize as a bare
     ++	repository when looking for the repository (aka repository
      +	discovery). This has no effect if repository discovery is not
      +	performed e.g. the path to the repository is set via `--git-dir`
      +	(see linkgit:git[1]).
      ++
     -+It is recommended that you set this value so that Git will only use the bare
     -+repositories you intend it to. This prevents certain types of security and
     -+non-security problems, such as:
     -+
     -+* `git clone`-ing a repository containing a maliciously bare repository
     -+  inside it.
     -+* Git recognizing a directory that isn't mean to be a bare repository,
     -+  but happens to look like one.
     -++
     -+The currently supported values are `*` (Git recognizes all bare
     -+repositories) and the empty value (Git never recognizes bare repositories).
     -+Defaults to `*`.
     -++
      +This config setting is only respected when specified in a system or global
      +config, not when it is specified in a repository config or via the command
     -+line option `-c safe.barerepository=<path>`.
     ++line option `-c discovery.bare=<value>`.
     +++
     ++The currently supported values are `always` (Git always recognizes bare
     ++repositories) and `never` (Git never recognizes bare repositories).
     ++This defaults to `always`, but this default is likely to change.
     +++
     ++If your workflow does not rely on bare repositories, it is recommended that
     ++you set this value to `never`. This makes repository discovery easier to
     ++reason about and prevents certain types of security and non-security
     ++problems, such as:
      +
     - safe.directory::
     - 	These config entries specify Git-tracked directories that are
     - 	considered safe even if they are owned by someone other than the
     ++* `git clone`-ing a repository containing a malicious bare repository
     ++  inside it.
     ++* Git recognizing a directory that isn't meant to be a bare repository,
     ++  but happens to look like one.
      
       ## setup.c ##
     +@@
     + static int inside_git_dir = -1;
     + static int inside_work_tree = -1;
     + static int work_tree_config_is_bogus;
     ++enum discovery_bare_config {
     ++	DISCOVERY_BARE_UNKNOWN = -1,
     ++	DISCOVERY_BARE_NEVER = 0,
     ++	DISCOVERY_BARE_ALWAYS,
     ++};
     ++static enum discovery_bare_config discovery_bare_config =
     ++	DISCOVERY_BARE_UNKNOWN;
     + 
     + static struct startup_info the_startup_info;
     + struct startup_info *startup_info = &the_startup_info;
      @@ setup.c: static int ensure_valid_ownership(const char *path)
       	return data.is_safe;
       }
       
     -+/*
     -+ * This is similar to safe_directory_data, but only supports true/false.
     -+ */
     -+struct safe_bare_repository_data {
     -+	int is_safe;
     -+};
     -+
     -+static int safe_bare_repository_cb(const char *key, const char *value, void *d)
     ++static int discovery_bare_cb(const char *key, const char *value, void *d)
      +{
     -+	struct safe_bare_repository_data *data = d;
     -+
     -+	if (strcmp(key, "safe.barerepository"))
     ++	if (strcmp(key, "discovery.bare"))
      +		return 0;
      +
     -+	if (!value || !strcmp(value, "*")) {
     -+		data->is_safe = 1;
     ++	if (!strcmp(value, "never")) {
     ++		discovery_bare_config = DISCOVERY_BARE_NEVER;
      +		return 0;
      +	}
     -+	if (!*value) {
     -+		data->is_safe = 0;
     ++	if (!strcmp(value, "always")) {
     ++		discovery_bare_config = DISCOVERY_BARE_ALWAYS;
      +		return 0;
      +	}
      +	return -1;
      +}
      +
     -+static int should_detect_bare(void)
     ++static int check_bare_repo_allowed(void)
      +{
     -+	struct safe_bare_repository_data data;
     -+
     -+	read_very_early_config(safe_bare_repository_cb, &data);
     ++	if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN) {
     ++		read_very_early_config(discovery_bare_cb, NULL);
     ++		/* We didn't find a value; use the default. */
     ++		if (discovery_bare_config == DISCOVERY_BARE_UNKNOWN)
     ++			discovery_bare_config = DISCOVERY_BARE_ALWAYS;
     ++	}
     ++	switch (discovery_bare_config) {
     ++	case DISCOVERY_BARE_NEVER:
     ++		return 0;
     ++	case DISCOVERY_BARE_ALWAYS:
     ++		return 1;
     ++	default:
     ++		BUG("invalid discovery_bare_config %d", discovery_bare_config);
     ++	}
     ++}
      +
     -+	return data.is_safe;
     ++static const char *discovery_bare_config_to_string(void)
     ++{
     ++	switch (discovery_bare_config) {
     ++	case DISCOVERY_BARE_NEVER:
     ++		return "never";
     ++	case DISCOVERY_BARE_ALWAYS:
     ++		return "always";
     ++	default:
     ++		BUG("invalid discovery_bare_config %d", discovery_bare_config);
     ++	}
      +}
      +
       enum discovery_result {
       	GIT_DIR_NONE = 0,
       	GIT_DIR_EXPLICIT,
     +@@ setup.c: enum discovery_result {
     + 	GIT_DIR_HIT_CEILING = -1,
     + 	GIT_DIR_HIT_MOUNT_POINT = -2,
     + 	GIT_DIR_INVALID_GITFILE = -3,
     +-	GIT_DIR_INVALID_OWNERSHIP = -4
     ++	GIT_DIR_INVALID_OWNERSHIP = -4,
     ++	GIT_DIR_DISALLOWED_BARE = -5
     + };
     + 
     + /*
      @@ setup.c: static enum discovery_result setup_git_directory_gently_1(struct strbuf *dir,
     - 			return GIT_DIR_DISCOVERED;
       		}
       
     --		if (is_git_directory(dir->buf)) {
     -+		if (should_detect_bare() && is_git_directory(dir->buf)) {
     + 		if (is_git_directory(dir->buf)) {
     ++			if (!check_bare_repo_allowed())
     ++				return GIT_DIR_DISALLOWED_BARE;
       			if (!ensure_valid_ownership(dir->buf))
       				return GIT_DIR_INVALID_OWNERSHIP;
       			strbuf_addstr(gitdir, ".");
     +@@ setup.c: const char *setup_git_directory_gently(int *nongit_ok)
     + 		}
     + 		*nongit_ok = 1;
     + 		break;
     ++	case GIT_DIR_DISALLOWED_BARE:
     ++		if (!nongit_ok) {
     ++			die(_("cannot use bare repository '%s' (discovery.bare is '%s')"),
     ++			    dir.buf,
     ++			    discovery_bare_config_to_string());
     ++		}
     ++		*nongit_ok = 1;
     ++		break;
     + 	case GIT_DIR_NONE:
     + 		/*
     + 		 * As a safeguard against setup_git_directory_gently_1 returning
      
     - ## t/t1510-repo-setup.sh ##
     -@@ t/t1510-repo-setup.sh: test_expect_success '#16e: bareness preserved by --bare' '
     - 	)
     - '
     - 
     -+# Test the tri-state of [(unset)|""|"*"].
     -+test_expect_success '#16f: bare repo in worktree' '
     -+	test_when_finished "git config --global --unset safe.barerepository" &&
     -+	setup_repo 16f unset "" unset &&
     + ## t/t0034-discovery-bare.sh (new) ##
     +@@
     ++#!/bin/sh
      +
     -+	git init --bare 16f/default/bare &&
     -+	git init --bare 16f/default/bare/bare &&
     -+	try_case 16f/default/bare unset unset \
     -+		. "(null)" "$here/16f/default/bare" "(null)" &&
     -+	try_case 16f/default/bare/bare unset unset \
     -+		. "(null)" "$here/16f/default/bare/bare" "(null)" &&
     ++test_description='verify discovery.bare checks'
      +
     -+	git config --global safe.barerepository "*" &&
     -+	git init --bare 16f/all/bare &&
     -+	git init --bare 16f/all/bare/bare &&
     -+	try_case 16f/all/bare unset unset \
     -+		. "(null)" "$here/16f/all/bare" "(null)" &&
     -+	try_case 16f/all/bare/bare unset unset \
     -+		. "(null)" "$here/16f/all/bare/bare" "(null)" &&
     ++. ./test-lib.sh
      +
     -+	git config --global safe.barerepository "" &&
     -+	git init --bare 16f/never/bare &&
     -+	git init --bare 16f/never/bare/bare &&
     -+	try_case 16f/never/bare unset unset \
     -+		".git" "$here/16f" "$here/16f" "never/bare/" &&
     -+	try_case 16f/never/bare/bare unset unset \
     -+		".git" "$here/16f" "$here/16f" "never/bare/bare/"
     -+'
     ++pwd="$(pwd)"
      +
     -+test_expect_success '#16g: inside .git with safe.barerepository' '
     -+	test_when_finished "git config --global --unset safe.barerepository" &&
     ++expect_allowed () {
     ++	git rev-parse --absolute-git-dir >actual &&
     ++	echo "$pwd/outer-repo/bare-repo" >expected &&
     ++	test_cmp expected actual
     ++}
      +
     -+	# Omit the "default" case; it is covered by 16a.
     ++expect_rejected () {
     ++	test_must_fail git rev-parse --absolute-git-dir 2>err &&
     ++	grep "discovery.bare" err
     ++}
      +
     -+	git config --global safe.barerepository "*" &&
     -+	setup_repo 16g/all unset "" unset &&
     -+	mkdir -p 16g/all/.git/wt/sub &&
     -+	try_case 16g/all/.git unset unset \
     -+		. "(null)" "$here/16g/all/.git" "(null)" &&
     -+	try_case 16g/all/.git/wt unset unset \
     -+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt" "(null)" &&
     -+	try_case 16g/all/.git/wt/sub unset unset \
     -+		"$here/16g/all/.git" "(null)" "$here/16g/all/.git/wt/sub" "(null)" &&
     ++test_expect_success 'setup bare repo in worktree' '
     ++	git init outer-repo &&
     ++	git init --bare outer-repo/bare-repo
     ++'
     ++
     ++test_expect_success 'discovery.bare unset' '
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_allowed &&
     ++		cd refs/ &&
     ++		expect_allowed
     ++	)
     ++'
     ++
     ++test_expect_success 'discovery.bare=always' '
     ++	git config --global discovery.bare always &&
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_allowed &&
     ++		cd refs/ &&
     ++		expect_allowed
     ++	)
     ++'
      +
     -+	git config --global safe.barerepository "" &&
     -+	setup_repo 16g/never unset "" unset &&
     -+	mkdir -p 16g/never/.git/wt/sub &&
     -+	try_case 16g/never/.git unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/" &&
     -+	try_case 16g/never/.git/wt unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/" &&
     -+	try_case 16g/never/.git/wt/sub unset unset \
     -+		".git" "$here/16g/never" "$here/16g/never" ".git/wt/sub/"
     ++test_expect_success 'discovery.bare=never' '
     ++	git config --global discovery.bare never &&
     ++	(
     ++		cd outer-repo/bare-repo &&
     ++		expect_rejected &&
     ++		cd refs/ &&
     ++		expect_rejected
     ++	) &&
     ++	(
     ++		GIT_DIR=outer-repo/bare-repo &&
     ++		export GIT_DIR &&
     ++		expect_allowed
     ++	)
      +'
      +
     - test_expect_success '#17: GIT_WORK_TREE without explicit GIT_DIR is accepted (bare case)' '
     - 	# Just like #16.
     - 	setup_repo 17a unset "" true &&
     ++test_done
 -:  ----------- > 2:  62070aab7eb setup.c: learn discovery.bareRepository=cwd

Comments

Junio C Hamano May 16, 2022, 4:40 p.m. UTC | #1
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  * die()-ing is necessary if we're trying to flip the default value of
>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>    more helpful to fail loudly than to silently ignore the bare repo.
>
>    But in the long term, long after we've flipped the default and users know
>    that they need to opt into bare repo discovery, would it be a better UX
>    to just silently ignore the bare repo?

Would a middle-ground of giving a warning() message help?  Can it be
loud and annoying enough to knudge the users to adjust without
breaking the functionality?

The longer-term default should be "cwd is allowed, but we do not
bother going up from object/04 subdirectory of a bare repository",
not "bare repositories should not be usable at all without GIT_DIR".

>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>      +    it should work with the bare repository it has discovered i.e. Git will
>      +    die() if it discovers a bare repository, but it is not allowed by

Missing comma before "i.e."

>      ++discovery.bare::
>      ++	Specifies what kinds of directories Git can recognize as a bare
>      ++	repository when looking for the repository (aka repository
>       +	discovery). This has no effect if repository discovery is not
>       +	performed e.g. the path to the repository is set via `--git-dir`
>       +	(see linkgit:git[1]).
>       ++
>       +This config setting is only respected when specified in a system or global
>       +config, not when it is specified in a repository config or via the command
>      ++line option `-c discovery.bare=<value>`.

;-)

>      +++
>      ++The currently supported values are `always` (Git always recognizes bare
>      ++repositories) and `never` (Git never recognizes bare repositories).
>      ++This defaults to `always`, but this default is likely to change.
>      +++
>      ++If your workflow does not rely on bare repositories, it is recommended that
>      ++you set this value to `never`. This makes repository discovery easier to
>      ++reason about and prevents certain types of security and non-security
>      ++problems, such as:

Hopefully "git fetch" over ssh:// and file:/// would run the other
side with GIT_DIR explicitly set?  As long as this recommendation
does not break these use cases, I think we are OK, but I do not yet
find these "problems, such as..." so convincing.
Junio C Hamano May 16, 2022, 4:43 p.m. UTC | #2
"Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:

>  t/t0034-discovery-bare.sh          | 69 +++++++++++++++++++++++

This number is already in use by an in-flight topic, if I am not
mistaken.  Please make it a habit to always check your topic works
well when merged to 'next' and to 'seen'.

Thanks.
Glen Choo May 16, 2022, 6:36 p.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
>>  * die()-ing is necessary if we're trying to flip the default value of
>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>    more helpful to fail loudly than to silently ignore the bare repo.
>>
>>    But in the long term, long after we've flipped the default and users know
>>    that they need to opt into bare repo discovery, would it be a better UX
>>    to just silently ignore the bare repo?
>
> Would a middle-ground of giving a warning() message help?  Can it be
> loud and annoying enough to knudge the users to adjust without
> breaking the functionality?

Personally, when my tool changes its behavior, I would strongly prefer
it to die than to "change behavior + warn". I'd feel more comfortable
knowing that the tool did nothing as opposed to doing the wrong thing
and only being informed after the fact. Also, I sometimes ignore
warnings ;)

When we _do_ transition away from die(), ignore + warning() sounds like
a good first step.

But if any of this flies in the face of the project's conventions, let
me know as such.

>>      +    Add a config variable, `discovery.bare`, that tells Git whether or not
>>      +    it should work with the bare repository it has discovered i.e. Git will
>>      +    die() if it discovers a bare repository, but it is not allowed by
>
> Missing comma before "i.e."

Thanks.

>>      +++
>>      ++The currently supported values are `always` (Git always recognizes bare
>>      ++repositories) and `never` (Git never recognizes bare repositories).
>>      ++This defaults to `always`, but this default is likely to change.
>>      +++
>>      ++If your workflow does not rely on bare repositories, it is recommended that
>>      ++you set this value to `never`. This makes repository discovery easier to
>>      ++reason about and prevents certain types of security and non-security
>>      ++problems, such as:
>
> Hopefully "git fetch" over ssh:// and file:/// would run the other
> side with GIT_DIR explicitly set?

Ah, I'll check this and get back to you.

>                                                        I do not yet
> find these "problems, such as..." so convincing.

What would be a convincing rationale to you? I'll capture that here.

I'm assuming that you already have such an rationale in mind when you
say that the longer-term default is that "we respect bare repositories
only if they are the cwd.". I'm also assuming that this rationale is
something other than embedded bare repos, because "cwd-only" does not
protect against that.

Perhaps "never" sounds better to folks who don't ever expect bare
repositories and want to lock down the environment. Randall (cc-ed)
suggests one such use case in [1].

(To Randall: Oops, I actually meant to cc you earlier, since you were
the first to suggest a practical use case for never allowing bare repos.
It must've slipped my mind).

[1] https://lore.kernel.org/git/005d01d84ad0$782e8fc0$688baf40$@nexbridge.com.
Derrick Stolee May 16, 2022, 7:07 p.m. UTC | #4
On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> Thanks all for the comments on v1, I've expanded this series somewhat to
> address them,...

Please include a full cover letter with each version, so reviewers
can respond to the full series goals.

Your series here intends to start protecting against malicious
embedded bare repositories by allowing users to opt-in to a more
protected state. When the 'discovery.bare' option is set, then
Git may die() on a bare repository that is discovered based on
the current working directory (these protections are ignored if
the user specifies the directory directly through --git-dir or
$GIT_DIR).

The 'discovery.bare' option has these values at the end of your
series:

* 'always' (default) allows all bare repos, matching the current
  behavior of Git.

* 'never' avoids operating in bare repositories altogether.

* 'cwd' operates in a bare repository only if the current directory
  is exactly the root of the bare repository.

It is important that we keep 'always' as the default at first,
because we do not want to introduce a breaking change without
warning (at least for an issue like this that has been around
for a long time).

The 'never' option is a good one for very security-conscious
users who really want to avoid problems. I don't anticipate that
users who know about this option and set it themselves are the
type that would fall for the social engineering required to
attack using this vector, but I can imagine an IT department
installing the value in system config across a fleet of machines.

I find the 'cwd' option to not be valuable. It unblocks most
existing users, but also almost completely removes the protection
that the option was supposed to provide.

I find neither the 'never' or 'cwd' options an acceptable choice
for a future default.

I also think that this protection is too rigid: it restricts
_all_ bare repositories, not just embedded ones. There is no check
to see if the parent directory of the bare repository is inside a
non-bare repository.

This leads to what I think would be a valuable replacement for
the 'cwd' option:

* 'no-embedded' allows non-embedded bare repositories. An
  _embedded bare repository_ is a bare repository whose parent
  directory is contained in the worktree of a non-bare Git
  repository. When in this mode, embedded bare repositories are
  not allowed unless the parent non-bare Git repository has a
  'safe.embedded' config value storing the path to the current
  embedded bare repository.

That was certainly difficult to write, but here it is as
pseudo-code to hopefully remove some doubt as to how this might
work:

  if repo is bare:
    if value == "always":
       return ALLOWED
    if value == "never":
       return FORBIDDEN;

    path = get_parent_repo()

    if !path:
       return ALLOWED
    
    if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
       return ALLOWED

    return FORBIDDEN

With this kind of option, we can protect users from these
social engineering attacks while providing an opt-in protection
for scenarios where embedded bare repos are currently being used
(while also not breaking anyone using non-embedded bare repos).

I think Taylor was mentioning something like this in his previous
replies, perhaps even to the previous thread on this topic.

This 'no-embedded' option is something that I could see as a
potential new default, after it has proven itself in a released
version of Git.

There are performance drawbacks to checking the parent path for
a Git repo, which is why it is only done when in "no-embedded"
mode.

I mentioned some other concerns in your PATCH 1 about how we
are now adding the third use of read_very_early_config() and that
we should probably refactor that before adding the third option,
in order to avoid additional performance costs as well as it
being difficult to audit which config options are only checked
from these "protected" config files.

Thanks,
-Stolee
Junio C Hamano May 16, 2022, 7:16 p.m. UTC | #5
Glen Choo <chooglen@google.com> writes:

> Junio C Hamano <gitster@pobox.com> writes:
>
>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>
>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>
>>>    But in the long term, long after we've flipped the default and users know
>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>    to just silently ignore the bare repo?
>>
>> Would a middle-ground of giving a warning() message help?  Can it be
>> loud and annoying enough to knudge the users to adjust without
>> breaking the functionality?
>
> Personally, when my tool changes its behavior, I would strongly prefer
> it to die than to "change behavior + warn". I'd feel more comfortable
> knowing that the tool did nothing as opposed to doing the wrong thing
> and only being informed after the fact. Also, I sometimes ignore
> warnings ;)

Heh, personally I would try very hard not to change the behaviour
without explicitly asked by the users with configuration or command
line option.  Flipping the default has traditionally been done in
two or three phases.

 (1) We start by giving a loud and annoying warning to those who
     haven't configured and tell them the default *will* change, how
     to keep the current behaviour forever, and how to live in the
     future by adopting the future default early.

 (2) After a while, we flip the default.  Those who haven't
     configured are given a notice that the default has changed, how
     to keep the old behaviour forever, and how to explicitly choose
     the same value as the default to squelch the notice.

 (3) After yet another while, we stop giving the notice.  If we
     omitted (2), here is where we flip the default.

Strictly speaking, we can have (1) in one release and then could
directly jump to (3), but some distros may skip the releases that
has (1), and (2) is an attempt to help users of such distros.

>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>> side with GIT_DIR explicitly set?
>
> Ah, I'll check this and get back to you.
>
>>                                                        I do not yet
>> find these "problems, such as..." so convincing.
>
> What would be a convincing rationale to you? I'll capture that here.

That is a wrong question.  You are the one pushing for castrating
the bare repositories.

> I'm assuming that you already have such an rationale in mind when you
> say that the longer-term default is that "we respect bare repositories
> only if they are the cwd.". I'm also assuming that this rationale is
> something other than embedded bare repos, because "cwd-only" does not
> protect against that.

No, I do not have such a "different" rationale to justify the change
proposed in this patch.  I was saying that the claim "embedded bare
repos are risky", backed by your two examples, did not sound all
that serious a problem.  Presented with a more serious brekage
scenario, it may make the description more convincing.

Thanks.
Glen Choo May 16, 2022, 8:27 p.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

> Glen Choo <chooglen@google.com> writes:
>
>> Junio C Hamano <gitster@pobox.com> writes:
>>
>>> "Glen Choo via GitGitGadget" <gitgitgadget@gmail.com> writes:
>>>
>>>>  * die()-ing is necessary if we're trying to flip the default value of
>>>>    discovery.bare. We'd expect many bare repo users to be broken, and it's
>>>>    more helpful to fail loudly than to silently ignore the bare repo.
>>>>
>>>>    But in the long term, long after we've flipped the default and users know
>>>>    that they need to opt into bare repo discovery, would it be a better UX
>>>>    to just silently ignore the bare repo?
>>>
>>> Would a middle-ground of giving a warning() message help?  Can it be
>>> loud and annoying enough to knudge the users to adjust without
>>> breaking the functionality?
>>
>> Personally, when my tool changes its behavior, I would strongly prefer
>> it to die than to "change behavior + warn". I'd feel more comfortable
>> knowing that the tool did nothing as opposed to doing the wrong thing
>> and only being informed after the fact. Also, I sometimes ignore
>> warnings ;)
>
> Heh, personally I would try very hard not to change the behaviour
> without explicitly asked by the users with configuration or command
> line option.  Flipping the default has traditionally been done in
> two or three phases.
>
>  (1) We start by giving a loud and annoying warning to those who
>      haven't configured and tell them the default *will* change, how
>      to keep the current behaviour forever, and how to live in the
>      future by adopting the future default early.
>
>  (2) After a while, we flip the default.  Those who haven't
>      configured are given a notice that the default has changed, how
>      to keep the old behaviour forever, and how to explicitly choose
>      the same value as the default to squelch the notice.
>
>  (3) After yet another while, we stop giving the notice.  If we
>      omitted (2), here is where we flip the default.
>
> Strictly speaking, we can have (1) in one release and then could
> directly jump to (3), but some distros may skip the releases that
> has (1), and (2) is an attempt to help users of such distros.

Ah, that is very helpful. Thanks. It's pretty clear that I misunderstood
what you meant by "giving a warning() message" - the warning() is there
to prepare users in advance of the change; we don't actually want the
warning() in the long term.

For something as disruptive as discovering bare repos, having all of
(1), (2) and (3) sounds appropriate.

>>> Hopefully "git fetch" over ssh:// and file:/// would run the other
>>> side with GIT_DIR explicitly set?
>>
>> Ah, I'll check this and get back to you.
>>
>>>                                                        I do not yet
>>> find these "problems, such as..." so convincing.
>>
>> What would be a convincing rationale to you? I'll capture that here.
>
> That is a wrong question.  You are the one pushing for castrating
> the bare repositories.

Let me clarify in case this wasn't received the way I intended. Earlier
in the thread, you mentioned:

  The longer-term default should be "cwd is allowed, but we do not
  bother going up from object/04 subdirectory of a bare repository",
  [...]

which I took to mean "Junio thinks that, by default, Git should stop
walking up to find a bare repo, and thinks this is better because of
rationale X.", and not, "Junio does not think that the default needs to
change, but is just suggesting a better default than Glen's".

If it is the former, then there is obviously some thought process here
that is worth sharing.

If it the latter, then I'm in favor of taking Stolee's suggestion to
drop "cwd", since nobody else finds it useful enough. (I like the
'simplification' story, but not enough to push "cwd" through, especially
since it does quite little security-wise.)

>> I'm assuming that you already have such an rationale in mind when you
>> say that the longer-term default is that "we respect bare repositories
>> only if they are the cwd.". I'm also assuming that this rationale is
>> something other than embedded bare repos, because "cwd-only" does not
>> protect against that.
>
> No, I do not have such a "different" rationale to justify the change
> proposed in this patch.  I was saying that the claim "embedded bare
> repos are risky", backed by your two examples, did not sound all
> that serious a problem.  Presented with a more serious brekage
> scenario, it may make the description more convincing.

Fair. I'll mull over this.
Junio C Hamano May 16, 2022, 10:16 p.m. UTC | #7
Glen Choo <chooglen@google.com> writes:

> which I took to mean "Junio thinks that, by default, Git should stop
> walking up to find a bare repo, and thinks this is better because of
> rationale X."

The X is "it would not break existing use case too badly, just to
address a 'security' story whose severity is not so clearly
expressed".

> If it the latter, then I'm in favor of taking Stolee's suggestion to
> drop "cwd", since nobody else finds it useful enough. (I like the
> 'simplification' story, but not enough to push "cwd" through, especially
> since it does quite little security-wise.)

As long as you'll be there to answer the angry mob that complain
loudly (and irritatingly enough, the only do so after a release is
made to flip the default), I do not care too much either way ;-).

Thanks.
Taylor Blau May 16, 2022, 10:43 p.m. UTC | #8
On Mon, May 16, 2022 at 03:07:35PM -0400, Derrick Stolee wrote:
> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
> > Thanks all for the comments on v1, I've expanded this series somewhat to
> > address them,...
>
> Please include a full cover letter with each version, so reviewers
> can respond to the full series goals.
>
> Your series here intends to start protecting against malicious
> embedded bare repositories by allowing users to opt-in to a more
> protected state. [...]

Thanks for the summary, which I think will be especially helpful to
others looking at this series for the very first time.

> The 'never' option is a good one for very security-conscious
> users who really want to avoid problems. I don't anticipate that
> users who know about this option and set it themselves are the
> type that would fall for the social engineering required to
> attack using this vector, but I can imagine an IT department
> installing the value in system config across a fleet of machines.

When I first read this, I disagreed, since presumably that same crowd
has legitimate bare repositories that they want to continue being able
to operate in without having to pass `--git-dir` or `$GIT_DIR` in.

In fact...

> I also think that this protection is too rigid: it restricts
> _all_ bare repositories, not just embedded ones. There is no check
> to see if the parent directory of the bare repository is inside a
> non-bare repository.

...this resonates quite a bit more with me. "never" isn't a good option
unless you aren't a user of bare repositories _and_ don't have any
embedded bare repositories (either at all, or any ones that you trust).

> This leads to what I think would be a valuable replacement for
> the 'cwd' option:
>
> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.
>
> That was certainly difficult to write, but here it is as
> pseudo-code to hopefully remove some doubt as to how this might
> work:
>
>   if repo is bare:
>     if value == "always":
>        return ALLOWED
>     if value == "never":
>        return FORBIDDEN;

This is indeed very similar to a proposal I had made upthread (which you
note lower down in this email). One thing that's nice is that we only
have to traverse up to the parent repo when in the "no-embedded" mode.
That may be slow (since it's unbounded all the way up to the filesystem
root or a ceiling directory, whichever we encounter first), but I think
it's unavoidable if you need to distinguish between embedded and
non-embedded bare repositories.

>     path = get_parent_repo()
>
>     if !path:
>        return ALLOWED
>
>     if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
>        return ALLOWED
>
>     return FORBIDDEN
>
> With this kind of option, we can protect users from these
> social engineering attacks while providing an opt-in protection
> for scenarios where embedded bare repos are currently being used
> (while also not breaking anyone using non-embedded bare repos).
>
> I think Taylor was mentioning something like this in his previous
> replies, perhaps even to the previous thread on this topic.

Yep, see: https://lore.kernel.org/git/Ylobp7sntKeWTLDX@nand.local/.a

> This 'no-embedded' option is something that I could see as a
> potential new default, after it has proven itself in a released
> version of Git.

I would be totally happy to see "no-embedded" become the default. It
might be nice to issue a warning when the top-level config is unset,
to give users a heads up about cases that may be broken, perhaps like:

    if repo is bare:
      switch (value) {
      case "always":
        return ALLOWED;
      case "never":
        return FORBIDDEN;
      case "no-embedded": # fallthrough
      case "":
        path = get_parent_repo()
        if !path
          return ALLOWED;

        if config_file_has_value("{path}/.git/config", "safe.embedded", repo)
          return ALLOWED;

        if value == "no-embedded":
          return FORBIDDEN;

        # otherwise, we're in an embedded bare repository with an unset
        # discovery.bare config.
        #
        # warn that this will break in the future...
        warning(_("%s is embedded within %s"), the_repository.path, path);
        advise(_("to allow discovery for this embedded repo, either run"));
        advise(_(""));
        advise(_("  $ git config --global discovery.bare always, or"));
        advise(_("  $ git -C '%s' config --local safe.embedded '%s'"),
               path, relpath(path, the_repository.path));

        # ...but allow the invocation for now until the default is
        # changed.
        return ALLOWED;
      default:
        die(_("unrecognized value of discovery.bare: '%s'"), value);
      }

...where relpath is similar to Go's path/filepath.Rel function.

With an appropriate deprecation period, I think we could even get away
from the "continue executing, but don't read config+hooks", which in
retrospect is more error-prone and difficult to reason about than I
initially had given it credit for.

Thanks,
Taylor
Junio C Hamano May 16, 2022, 11:19 p.m. UTC | #9
Derrick Stolee <derrickstolee@github.com> writes:

> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.

Sounds sensible.  I wonder how expensive this will be in practice,
but the behaviour seems well thought out.
Glen Choo May 17, 2022, 6:56 p.m. UTC | #10
Thanks, I think this has advanced the conversation quite a bit.

Derrick Stolee <derrickstolee@github.com> writes:

> On 5/13/2022 7:37 PM, Glen Choo via GitGitGadget wrote:
>> Thanks all for the comments on v1, I've expanded this series somewhat to
>> address them,...
>
> Please include a full cover letter with each version, so reviewers
> can respond to the full series goals.
>
> Your series here intends to start protecting against malicious
> embedded bare repositories by allowing users to opt-in to a more
> protected state. When the 'discovery.bare' option is set, then
> Git may die() on a bare repository that is discovered based on
> the current working directory (these protections are ignored if
> the user specifies the directory directly through --git-dir or
> $GIT_DIR).
>
> The 'discovery.bare' option has these values at the end of your
> series:
>
> * 'always' (default) allows all bare repos, matching the current
>   behavior of Git.
>
> * 'never' avoids operating in bare repositories altogether.
>
> * 'cwd' operates in a bare repository only if the current directory
>   is exactly the root of the bare repository.

My mistake, I should have prepared this summary myself. Thanks again.

> It is important that we keep 'always' as the default at first,
> because we do not want to introduce a breaking change without
> warning (at least for an issue like this that has been around
> for a long time).

Yes.

> The 'never' option is a good one for very security-conscious
> users who really want to avoid problems. I don't anticipate that
> users who know about this option and set it themselves are the
> type that would fall for the social engineering required to
> attack using this vector, but I can imagine an IT department
> installing the value in system config across a fleet of machines.

Yes. Setting the 'never' option in a system config is the use case that
motivated this.

> I find the 'cwd' option to not be valuable. It unblocks most
> existing users, but also almost completely removes the protection
> that the option was supposed to provide.

Ok, I agree that it provides next-to-no protection. I'll drop it in this
series; it's easy enough to reimplement if users really want it anyway.

> This leads to what I think would be a valuable replacement for
> the 'cwd' option:
>
> * 'no-embedded' allows non-embedded bare repositories. An
>   _embedded bare repository_ is a bare repository whose parent
>   directory is contained in the worktree of a non-bare Git
>   repository. When in this mode, embedded bare repositories are
>   not allowed unless the parent non-bare Git repository has a
>   'safe.embedded' config value storing the path to the current
>   embedded bare repository.
>
> That was certainly difficult to write, but here it is as
> pseudo-code to hopefully remove some doubt as to how this might
> work:
>
>   if repo is bare:
>     if value == "always":
>        return ALLOWED
>     if value == "never":
>        return FORBIDDEN;
>
>     path = get_parent_repo()
>
>     if !path:
>        return ALLOWED
>     
>     if config_file_has_value("{path}/.git/config", "safe.embedded", repo):
>        return ALLOWED
>
>     return FORBIDDEN
>
> With this kind of option, we can protect users from these
> social engineering attacks while providing an opt-in protection
> for scenarios where embedded bare repos are currently being used
> (while also not breaking anyone using non-embedded bare repos).

[...]

> This 'no-embedded' option is something that I could see as a
> potential new default, after it has proven itself in a released
> version of Git.

I agree, this sounds like a good default that should work for most
users.

That said, I don't think I will implement it, and even if I do, it won't
be in this series. I have serious doubts that I'd be able to deliver it
in a reasonable amount of time (I tried preparing patches to this effect
and failed [1]), and 'never' is sufficient for $DAYJOB's current needs.

I would be very happy to see this come to fruition though. I have no
objections to anyone preparing patches for this, and I'll gladly review
those if that's helpful.

[1] The specific trouble I had was figuring out whether or not the
 'parent' repo was tracking the bare repo, since an untracked bare repo
 in the working tree isn't (in some sense) really "embedded" and it
 can't have come from a remote.

 But maybe the tracking check is unnecessary. We would break a few more
 users without it, but 'safe.embedded' is an easy enough way for a user
 to unbreak themselves.

> I mentioned some other concerns in your PATCH 1 about how we
> are now adding the third use of read_very_early_config() and that
> we should probably refactor that before adding the third option,
> in order to avoid additional performance costs as well as it
> being difficult to audit which config options are only checked
> from these "protected" config files.

Makes sense. I'll ask about specifics on that subthread.

>
> Thanks,
> -Stolee