Message ID | 535d0d07506e8248e47f90c1a7581679fc297b3d.1727171197.git.ps@pks.im (mailing list archive) |
---|---|
State | Accepted |
Commit | 320c96b0cb16c50b2270ec46557268e041ff292c |
Headers | show |
Series | config: fix evaluating "onbranch" with nonexistent git dir | expand |
On Tue, Sep 24, 2024 at 12:05:46PM +0200, Patrick Steinhardt wrote: > The `include_by_branch()` function is responsible for evaluating whether > or not a specific include should be pulled in based on the currently > checked out branch. Naturally, his condition can only be evaluated when > we have a properly initialized repository with a ref store in the first > place. This is why the function guards against the case when either > `data->repo` or `data->repo->gitdir` are `NULL` pointers. > > But the second check is insufficient: the `gitdir` may be set even > though the repository has not been initialized. Quoting "setup.c": > > NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some > code paths so we also need to explicitly setup the environment if the > user has set GIT_DIR. It may be beneficial to disallow bogus GIT_DIR > values at some point in the future. > > So when either the GIT_DIR environment variable or the `--git-dir` > global option are set by the user then `the_repository` may end up with > an initialized `gitdir` variable. And this happens even when the dir is > invalid, like for example when it doesn't exist. It follows that only > checking for whether or not `gitdir` is `NULL` is not sufficient for us > to determine whether the repository has been properly initialized. > When I dive into this bug report, I feel so wired about this behavior. I don't mind whether the code sets "gitdir" field. This is not important. In "setup.c::setup_git_directory_gently", it will set the "the_repository->gitdir" by checking the environment variable "GIT_DIR_ENVIRONMENT". And by using `--git-dir` option, the code will set this environment variable, so these two ways will set the "gitdir" field in the global variable "the_repository". We actually check the validation of "--gir-dir" option before we set this value. Let me give you an example here: $ git --git-dir=notexist -c includeIf.onbranch:main.path=any fsck fatal: not a git repository: 'notexist' I am curious here. And I notice the following difference in "setup.c::setup_explicit_git_dir" function: if (!is_git_directory(gitdirenv)) { if (nongit_ok) { *nongit_ok = 1; ... return NULL; } die(_("not a git repository: '%s'"), gitdirenv); } Apparently, the above example will execute the "die". For "git-archive(1)", it will simply return NULL. This is because we allow some commands to run outside of the git repo. And we distinguish them by using the ".option" filed: { "apply", cmd_apply, RUN_SETUP_GENTLY }, { "fsck", cmd_fsck, RUN_SETUP }, As we can see, the code will check whether the "gitdir" (although it is not set into "the_repository" structure yet) is a valid git repository. We already have this information. In f7d61c4135 (config: don't depend on `the_repository` with branch conditions, 2024-08-13). "config.c::include_by_branch" drops the global variable "the_repository". This solves the problem reported by Ronan. Because it happens in the set up process, the "data->repo" will be NULL. config_with_options(cb, data, NULL, NULL, &opts); int config_with_options(config_fn_t fn, void *data, const struct git_config_source *config_source, struct repository *repo, const struct config_options *opts) { struct config_include_data inc = CONFIG_INCLUDE_INIT; ... inc.repo = repo; } But the problem still exists for "git-archive(1)", in "cmd_archive" function, we will initialize the configurations by using "repo_config". But we are not inside the repo. I wonder how we use the global variable "the_repository". I think the main problem here is that we use "the_repository" structure outside of the repo where we have already broken the semantics of the "the_repository" variable. > This issue can lead to us triggering a BUG: when using a config with an > "includeIf.onbranch:" condition outside of a repository while using the > `--git-dir` option pointing to an invalid Git directory we may end up > trying to evaluate the condition even though the ref storage format has > not been set up. > > This bisects to 173761e21b (setup: start tracking ref storage format, > 2023-12-29), but that commit really only starts to surface the issue > that has already existed beforehand. The code to check for `gitdir` was > introduced via 85fe0e800c (config: work around bug with > includeif:onbranch and early config, 2019-07-31), which tried to fix > similar issues when we didn't yet have a repository set up. But the fix > was incomplete as it missed the described scenario. > Yes, exactly. Because before 173761e21b, the code will always find the files backend, it does care about whether we are in the git repo or not. unsigned int format = REF_STORAGE_FORMAT_FILES; const struct ref_storage_be *be = find_ref_storage_backend(format); So, it won't complain. > As the quoted comment mentions, we'd ideally refactor the code to not > set up `gitdir` with an invalid value in the first place, but that may > be a bigger undertaking. Instead, refactor the code to use the ref > storage format as an indicator of whether or not the ref store has been > set up to fix the bug. > Should we? From my above comments, it does no matter whether we set the "gitdir" in the "the_repository". Because we already check this and we have this information. I think the main problem is that we use "the_repository" badly. We could run git commands inside the repo or outside the repo. If we run git commands outside the repo, should we use the "the_repository" variable? I guess we should not. Because "git_config", "repo_config" and so on use the global variable "the_repository", so we will encounter the trouble. But if we could use something like "data->repo". We will make everything OK: 1. When running commands inside the git repo, we set "data->repo" to be "the_repository". 2. When ruing commands outside the git repo, we set "data->repo" to be NULL. > Reported-by: Ronan Pigott <ronan@rjp.ie> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > config.c | 15 +++++++++------ > t/t1305-config-include.sh | 2 +- > 2 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/config.c b/config.c > index 1266eab0860..a11bb85da30 100644 > --- a/config.c > +++ b/config.c > @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data, > int flags; > int ret; > struct strbuf pattern = STRBUF_INIT; > - const char *refname = (!data->repo || !data->repo->gitdir) ? > - NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo), > - "HEAD", 0, NULL, &flags); > - const char *shortname; > + const char *refname, *shortname; > > - if (!refname || !(flags & REF_ISSYMREF) || > - !skip_prefix(refname, "refs/heads/", &shortname)) > + if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) > + return 0; > + > + refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo), > + "HEAD", 0, NULL, &flags); > + if (!refname || > + !(flags & REF_ISSYMREF) || > + !skip_prefix(refname, "refs/heads/", &shortname)) > return 0; > > strbuf_add(&pattern, cond, cond_len); > diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh > index ad08db72308..517d6c86937 100755 > --- a/t/t1305-config-include.sh > +++ b/t/t1305-config-include.sh > @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' ' > test_must_fail nongit git config get foo.bar > ' > > -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' ' > +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' ' > test_when_finished "rm -f .gitconfig config.inc" && > git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc && > git config set -f config.inc foo.bar baz && > -- > 2.46.0.551.gc5ee8f2d1c.dirty
Patrick Steinhardt <ps@pks.im> writes: > As the quoted comment mentions, we'd ideally refactor the code to not > set up `gitdir` with an invalid value in the first place, but that may > be a bigger undertaking. Instead, refactor the code to use the ref > storage format as an indicator of whether or not the ref store has been > set up to fix the bug. > > Reported-by: Ronan Pigott <ronan@rjp.ie> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > ... > - const char *refname = (!data->repo || !data->repo->gitdir) ? > - NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo), > - "HEAD", 0, NULL, &flags); > - const char *shortname; > + const char *refname, *shortname; > > - if (!refname || !(flags & REF_ISSYMREF) || > - !skip_prefix(refname, "refs/heads/", &shortname)) > + if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) > + return 0; OK. A very direct check to see what we really care about.
diff --git a/config.c b/config.c index 1266eab0860..a11bb85da30 100644 --- a/config.c +++ b/config.c @@ -306,13 +306,16 @@ static int include_by_branch(struct config_include_data *data, int flags; int ret; struct strbuf pattern = STRBUF_INIT; - const char *refname = (!data->repo || !data->repo->gitdir) ? - NULL : refs_resolve_ref_unsafe(get_main_ref_store(data->repo), - "HEAD", 0, NULL, &flags); - const char *shortname; + const char *refname, *shortname; - if (!refname || !(flags & REF_ISSYMREF) || - !skip_prefix(refname, "refs/heads/", &shortname)) + if (!data->repo || data->repo->ref_storage_format == REF_STORAGE_FORMAT_UNKNOWN) + return 0; + + refname = refs_resolve_ref_unsafe(get_main_ref_store(data->repo), + "HEAD", 0, NULL, &flags); + if (!refname || + !(flags & REF_ISSYMREF) || + !skip_prefix(refname, "refs/heads/", &shortname)) return 0; strbuf_add(&pattern, cond, cond_len); diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh index ad08db72308..517d6c86937 100755 --- a/t/t1305-config-include.sh +++ b/t/t1305-config-include.sh @@ -389,7 +389,7 @@ test_expect_success 'onbranch without repository' ' test_must_fail nongit git config get foo.bar ' -test_expect_failure 'onbranch without repository but explicit nonexistent Git directory' ' +test_expect_success 'onbranch without repository but explicit nonexistent Git directory' ' test_when_finished "rm -f .gitconfig config.inc" && git config set -f .gitconfig "includeIf.onbranch:**.path" config.inc && git config set -f config.inc foo.bar baz &&
The `include_by_branch()` function is responsible for evaluating whether or not a specific include should be pulled in based on the currently checked out branch. Naturally, his condition can only be evaluated when we have a properly initialized repository with a ref store in the first place. This is why the function guards against the case when either `data->repo` or `data->repo->gitdir` are `NULL` pointers. But the second check is insufficient: the `gitdir` may be set even though the repository has not been initialized. Quoting "setup.c": NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some code paths so we also need to explicitly setup the environment if the user has set GIT_DIR. It may be beneficial to disallow bogus GIT_DIR values at some point in the future. So when either the GIT_DIR environment variable or the `--git-dir` global option are set by the user then `the_repository` may end up with an initialized `gitdir` variable. And this happens even when the dir is invalid, like for example when it doesn't exist. It follows that only checking for whether or not `gitdir` is `NULL` is not sufficient for us to determine whether the repository has been properly initialized. This issue can lead to us triggering a BUG: when using a config with an "includeIf.onbranch:" condition outside of a repository while using the `--git-dir` option pointing to an invalid Git directory we may end up trying to evaluate the condition even though the ref storage format has not been set up. This bisects to 173761e21b (setup: start tracking ref storage format, 2023-12-29), but that commit really only starts to surface the issue that has already existed beforehand. The code to check for `gitdir` was introduced via 85fe0e800c (config: work around bug with includeif:onbranch and early config, 2019-07-31), which tried to fix similar issues when we didn't yet have a repository set up. But the fix was incomplete as it missed the described scenario. As the quoted comment mentions, we'd ideally refactor the code to not set up `gitdir` with an invalid value in the first place, but that may be a bigger undertaking. Instead, refactor the code to use the ref storage format as an indicator of whether or not the ref store has been set up to fix the bug. Reported-by: Ronan Pigott <ronan@rjp.ie> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- config.c | 15 +++++++++------ t/t1305-config-include.sh | 2 +- 2 files changed, 10 insertions(+), 7 deletions(-)