diff mbox series

[2/2] config: fix evaluating "onbranch" with nonexistent git dir

Message ID 535d0d07506e8248e47f90c1a7581679fc297b3d.1727171197.git.ps@pks.im (mailing list archive)
State New
Headers show
Series config: fix evaluating "onbranch" with nonexistent git dir | expand

Commit Message

Patrick Steinhardt Sept. 24, 2024, 10:05 a.m. UTC
The `include_by_branch()` function is responsible for evaluating whether
or not a specific include should be pulled in based on the currently
checked out branch. Naturally, his condition can only be evaluated when
we have a properly initialized repository with a ref store in the first
place. This is why the function guards against the case when either
`data->repo` or `data->repo->gitdir` are `NULL` pointers.

But the second check is insufficient: the `gitdir` may be set even
though the repository has not been initialized. Quoting "setup.c":

  NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
  code paths so we also need to explicitly setup the environment if the
  user has set GIT_DIR.  It may be beneficial to disallow bogus GIT_DIR
  values at some point in the future.

So when either the GIT_DIR environment variable or the `--git-dir`
global option are set by the user then `the_repository` may end up with
an initialized `gitdir` variable. And this happens even when the dir is
invalid, like for example when it doesn't exist. It follows that only
checking for whether or not `gitdir` is `NULL` is not sufficient for us
to determine whether the repository has been properly initialized.

This issue can lead to us triggering a BUG: when using a config with an
"includeIf.onbranch:" condition outside of a repository while using the
`--git-dir` option pointing to an invalid Git directory we may end up
trying to evaluate the condition even though the ref storage format has
not been set up.

This bisects to 173761e21b (setup: start tracking ref storage format,
2023-12-29), but that commit really only starts to surface the issue
that has already existed beforehand. The code to check for `gitdir` was
introduced via 85fe0e800c (config: work around bug with
includeif:onbranch and early config, 2019-07-31), which tried to fix
similar issues when we didn't yet have a repository set up. But the fix
was incomplete as it missed the described scenario.

As the quoted comment mentions, we'd ideally refactor the code to not
set up `gitdir` with an invalid value in the first place, but that may
be a bigger undertaking. Instead, refactor the code to use the ref
storage format as an indicator of whether or not the ref store has been
set up to fix the bug.

Reported-by: Ronan Pigott <ronan@rjp.ie>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 config.c                  | 15 +++++++++------
 t/t1305-config-include.sh |  2 +-
 2 files changed, 10 insertions(+), 7 deletions(-)

Comments

shejialuo Sept. 24, 2024, 2:11 p.m. UTC | #1
On Tue, Sep 24, 2024 at 12:05:46PM +0200, Patrick Steinhardt wrote:
> The `include_by_branch()` function is responsible for evaluating whether
> or not a specific include should be pulled in based on the currently
> checked out branch. Naturally, his condition can only be evaluated when
> we have a properly initialized repository with a ref store in the first
> place. This is why the function guards against the case when either
> `data->repo` or `data->repo->gitdir` are `NULL` pointers.
> 
> But the second check is insufficient: the `gitdir` may be set even
> though the repository has not been initialized. Quoting "setup.c":
> 
>   NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
>   code paths so we also need to explicitly setup the environment if the
>   user has set GIT_DIR.  It may be beneficial to disallow bogus GIT_DIR
>   values at some point in the future.
> 
> So when either the GIT_DIR environment variable or the `--git-dir`
> global option are set by the user then `the_repository` may end up with
> an initialized `gitdir` variable. And this happens even when the dir is
> invalid, like for example when it doesn't exist. It follows that only
> checking for whether or not `gitdir` is `NULL` is not sufficient for us
> to determine whether the repository has been properly initialized.
> 

When I dive into this bug report, I feel so wired about this behavior. I
don't mind whether the code sets "gitdir" field. This is not important.
In "setup.c::setup_git_directory_gently", it will set the
"the_repository->gitdir" by checking the environment variable
"GIT_DIR_ENVIRONMENT". And by using `--git-dir` option, the code will
set this environment variable, so these two ways will set the "gitdir"
field in the global variable "the_repository".

We actually check the validation of "--gir-dir" option before we set
this value. Let me give you an example here:

  $ git --git-dir=notexist -c includeIf.onbranch:main.path=any fsck
  fatal: not a git repository: 'notexist'

I am curious here. And I notice the following difference in
"setup.c::setup_explicit_git_dir" function:

    if (!is_git_directory(gitdirenv)) {
        if (nongit_ok) {
            *nongit_ok = 1;
            ...
            return NULL;
        }
        die(_("not a git repository: '%s'"), gitdirenv);
    }

Apparently, the above example will execute the "die". For
"git-archive(1)", it will simply return NULL. This is because we allow
some commands to run outside of the git repo. And we distinguish them by
using the ".option" filed:

    { "apply", cmd_apply, RUN_SETUP_GENTLY },
    { "fsck", cmd_fsck, RUN_SETUP },

As we can see, the code will check whether the "gitdir" (although it is
not set into "the_repository" structure yet) is a valid git repository.
We already have this information.

In f7d61c4135 (config: don't depend on `the_repository` with branch
conditions, 2024-08-13). "config.c::include_by_branch" drops the global
variable "the_repository". This solves the problem reported by Ronan.
Because it happens in the set up process, the "data->repo" will be NULL.

    config_with_options(cb, data, NULL, NULL, &opts);

    int config_with_options(config_fn_t fn, void *data,
                            const struct git_config_source *config_source,
                            struct repository *repo,
                            const struct config_options *opts)
    {
        struct config_include_data inc = CONFIG_INCLUDE_INIT;

        ...

        inc.repo = repo;
    }

But the problem still exists for "git-archive(1)", in "cmd_archive"
function, we will initialize the configurations by using "repo_config".
But we are not inside the repo.

I wonder how we use the global variable "the_repository". I think the
main problem here is that we use "the_repository" structure outside of
the repo where we have already broken the semantics of the
"the_repository" variable.

> This issue can lead to us triggering a BUG: when using a config with an
> "includeIf.onbranch:" condition outside of a repository while using the
> `--git-dir` option pointing to an invalid Git directory we may end up
> trying to evaluate the condition even though the ref storage format has
> not been set up.
> 
> This bisects to 173761e21b (setup: start tracking ref storage format,
> 2023-12-29), but that commit really only starts to surface the issue
> that has already existed beforehand. The code to check for `gitdir` was
> introduced via 85fe0e800c (config: work around bug with
> includeif:onbranch and early config, 2019-07-31), which tried to fix
> similar issues when we didn't yet have a repository set up. But the fix
> was incomplete as it missed the described scenario.
> 

Yes, exactly. Because before 173761e21b, the code will always find the
files backend, it does care about whether we are in the git repo or not.

    unsigned int format = REF_STORAGE_FORMAT_FILES;
    const struct ref_storage_be *be = find_ref_storage_backend(format);

So, it won't complain.

> As the quoted comment mentions, we'd ideally refactor the code to not
> set up `gitdir` with an invalid value in the first place, but that may
> be a bigger undertaking. Instead, refactor the code to use the ref
> storage format as an indicator of whether or not the ref store has been
> set up to fix the bug.
> 

Should we? From my above comments, it does no matter whether we set the
"gitdir" in the "the_repository". Because we already check this and we
have this information. I think the main problem is that we use
"the_repository" badly.

We could run git commands inside the repo or outside the repo. If we run
git commands outside the repo, should we use the "the_repository"
variable? I guess we should not.

Because "git_config", "repo_config" and so on use the global variable
"the_repository", so we will encounter the trouble. But if we could use
something like "data->repo". We will make everything OK:

1. When running commands inside the git repo, we set "data->repo" to be
"the_repository".
2. When ruing commands outside the git repo, we set "data->repo" to be
NULL.

> Reported-by: Ronan Pigott <ronan@rjp.ie>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  config.c                  | 15 +++++++++------
>  t/t1305-config-include.sh |  2 +-
>  2 files changed, 10 insertions(+), 7 deletions(-)
> 
> diff --git a/config.c b/config.c
> index 1266eab0860..a11bb85da30 100644
> --- a/config.c
> +++ b/config.c
> @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data,
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = (!data->repo || !data->repo->gitdir) ?
> -		NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> -					       "HEAD", 0, NULL, &flags);
> -	const char *shortname;
> +	const char *refname, *shortname;
>  
> -	if (!refname || !(flags & REF_ISSYMREF)	||
> -			!skip_prefix(refname, "refs/heads/", &shortname))
> +	if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> +		return 0;
> +
> +	refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> +					  "HEAD", 0, NULL, &flags);
> +	if (!refname ||
> +	    !(flags & REF_ISSYMREF) ||
> +	    !skip_prefix(refname, "refs/heads/", &shortname))
>  		return 0;
>  
>  	strbuf_add(&pattern, cond, cond_len);
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index ad08db72308..517d6c86937 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' '
>  	test_must_fail nongit git config get foo.bar
>  '
>  
> -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' '
> +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' '
>  	test_when_finished "rm -f .gitconfig config.inc" &&
>  	git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc &&
>  	git config set -f config.inc foo.bar baz &&
> -- 
> 2.46.0.551.gc5ee8f2d1c.dirty
Junio C Hamano Sept. 24, 2024, 4:17 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> As the quoted comment mentions, we'd ideally refactor the code to not
> set up `gitdir` with an invalid value in the first place, but that may
> be a bigger undertaking. Instead, refactor the code to use the ref
> storage format as an indicator of whether or not the ref store has been
> set up to fix the bug.
>
> Reported-by: Ronan Pigott <ronan@rjp.ie>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> ...
> -	const char *refname = (!data->repo || !data->repo->gitdir) ?
> -		NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
> -					       "HEAD", 0, NULL, &flags);
> -	const char *shortname;
> +	const char *refname, *shortname;
>  
> -	if (!refname || !(flags & REF_ISSYMREF)	||
> -			!skip_prefix(refname, "refs/heads/", &shortname))
> +	if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
> +		return 0;

OK.  A very direct check to see what we really care about.
diff mbox series

Patch

diff --git a/config.c b/config.c
index 1266eab0860..a11bb85da30 100644
--- a/config.c
+++ b/config.c
@@ -306,13 +306,16 @@  static int include_by_branch(struct config_include_data *data,
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = (!data->repo || !data->repo->gitdir) ?
-		NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
-					       "HEAD", 0, NULL, &flags);
-	const char *shortname;
+	const char *refname, *shortname;
 
-	if (!refname || !(flags & REF_ISSYMREF)	||
-			!skip_prefix(refname, "refs/heads/", &shortname))
+	if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN)
+		return 0;
+
+	refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo),
+					  "HEAD", 0, NULL, &flags);
+	if (!refname ||
+	    !(flags & REF_ISSYMREF) ||
+	    !skip_prefix(refname, "refs/heads/", &shortname))
 		return 0;
 
 	strbuf_add(&pattern, cond, cond_len);
diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index ad08db72308..517d6c86937 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -389,7 +389,7 @@  test_expect_success 'onbranch without repository' '
 	test_must_fail nongit git config get foo.bar
 '
 
-test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' '
+test_expect_success 'onbranch without repository but explicit nonexistent Git directory' '
 	test_when_finished "rm -f .gitconfig config.inc" &&
 	git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc &&
 	git config set -f config.inc foo.bar baz &&