diff mbox series

[3/3] config: stop checking whether the_repository is NULL

Message ID 20190806122758.GC13513@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series the_repository initialization cleanup | expand

Commit Message

Jeff King Aug. 6, 2019, 12:27 p.m. UTC
Since the previous commit, our invariant that the_repository is never
NULL is restored, and we can stop being defensive in include_by_branch().

We can confirm the fix by showing that an onbranch config include will
not cause a segfault when run outside a git repository. I've put this in
t1309-early-config since it's related to the case added by 85fe0e800c
(config: work around bug with includeif:onbranch and early config,
2019-07-31), though technically the issue was with
read_very_early_config() and not read_early_config().

Signed-off-by: Jeff King <peff@peff.net>
---
 config.c                | 2 +-
 t/t1309-early-config.sh | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Jeff King Aug. 6, 2019, 12:49 p.m. UTC | #1
On Tue, Aug 06, 2019 at 08:27:58AM -0400, Jeff King wrote:

> diff --git a/config.c b/config.c
> index 3900e4947b..cc637363bb 100644
> --- a/config.c
> +++ b/config.c
> @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = !the_repository || !the_repository->gitdir ?
> +	const char *refname = !the_repository->gitdir ?
>  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);

I stopped here, even though I argued earlier that this probably ought to
be have_git_dir() to match the rest of the config code. What I found
when I started poking at it is that there are a few odd cases still in
the startup code, and that IMHO the final form we want to end up with
actually matches what's written here (but I'm not quite ready to sink
the time into taking us there just yet). Read on for the gory details.

The most immediate issue I saw was that the_repository->gitdir doesn't
actually tell you whether or not we have a repository! I.e., that entry
may be non-NULL even though startup_info->have_repository is 0.

So these two tests fail:

diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index d20b4d150d..7ec91c4c1c 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -348,6 +348,18 @@ test_expect_success 'conditional include, onbranch, implicit /** for /' '
 	test_cmp expect actual
 '
 
+test_expect_success 'onbranch include not fooled by fake GIT_DIR' '
+	mkdir not-a-git-dir &&
+	echo "ref: refs/heads/master" >not-a-git-dir/HEAD &&
+	git config --file=inner the.value inner &&
+	git config --file=outer the.value outer &&
+	git config --file=outer includeIf.onbranch:master.path inner &&
+	GIT_DIR=not-a-git-dir \
+		git config --file=outer --includes the.value >actual &&
+	echo outer >expect &&
+	test_cmp expect actual
+'
+
 test_expect_success 'include cycles are detected' '
 	git init --bare cycle &&
 	git -C cycle config include.path cycle &&
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index 3a0de0ddaa..91af2c2f89 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -99,4 +99,12 @@ test_expect_success 'onbranch config outside of git repo' '
 	nongit git help
 '
 
+test_expect_success 'early config is not fooled by bogus GIT_DIR' '
+	mkdir not-a-git-dir &&
+	git config --file=not-a-git-dir/config \
+		alias.should-not-run \
+		"!echo should-not-run" &&
+	test_must_fail env GIT_DIR="not-a-git-dir" git should-not-run
+'
+
 test_done

As you can see from that case, it's a curiosity that we still set the
internal gitdir variable from $GIT_DIR, even if we found that it's not a
valid repo.

The onbranch case is fooled by our direct check of gitdir, but likewise
have_git_dir() allows either have_repository or a non-NULL gitdir (as I
mentioned earlier, I think this is vestigial at this point).

So doing this fixes the onbranch case:

diff --git a/config.c b/config.c
index cc637363bb..134637f3ad 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = !the_repository->gitdir ?
+	const char *refname = !have_git_dir() ?
 		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
diff --git a/environment.c b/environment.c
index 89af47cb85..08bdf79d6b 100644
--- a/environment.c
+++ b/environment.c
@@ -203,8 +203,7 @@ int is_bare_repository(void)
 
 int have_git_dir(void)
 {
-	return startup_info->have_repository
-		|| the_repository->gitdir;
+	return startup_info->have_repository;
 }
 
 const char *get_git_dir(void)


but curiously it _doesn't_ fix the alias one, because after seeing that
we have no configured git dir, we then fall back to discover_git_dir(),
which happily reports back the same bogus $GIT_DIR.

So I think there'd be some more cleanup required there. But the general
direction is that in the near future we probably ought to be checking
startup_info->have_repository and not the_repository->gitdir.

But I think in the long-term, we probably ought to be getting rid of
startup_info->have_repository itself. It's really just hiding a subtle
dependency on whether the_repository is valid, and we're better off
actually seeing all the spots that depend on the_repository (and
eventually converting them to take a "struct repository" as
appropriate).

Where we want to end up is (I think):

  - stop setting the_repository->gitdir when we know that there's no
    repo (and possibly clear $GIT_DIR as well to avoid confusion). That
    should make it a robust way to check whether a "struct repository"
    is valid.

    This is the step I've stalled on, because I'm a bit worried that
    there's some subtle case that depends on it working this way.

  - drop startup_info->have_repository entirely. At that point it would
    be redundant with checking the_repository->gitdir.

    If you do this without the first step, t4201 will notice and fail
    (because it expects GIT_DIR=not-a-repo to notice that there is no
    repo).

  - optionally, drop have_git_dir() in favor of just checking
    the_repository->gitdir. This loses a layer of abstraction, but I
    think it makes it much more clear where we'd depending on
    the_repository.

And so in the end, the current state of include_by_branch() is what we
want, and all the other call sites should change to match it. Whether it
should be changed in the interim (until we fix the discrepancy between
gitdir/have_repository), I don't feel strongly about. There are
user-visible cases that I believe we get wrong today, but as you can see
from the tests above, they're pretty ridiculous and unlikely in
practice.

-Peff
Johannes Schindelin Aug. 8, 2019, 7:48 p.m. UTC | #2
Hi Peff,

ACK on the three patches, and I think your outline is valid (post
v2.23.0, possibly an Outreachy project?).

Thanks,
Dscho

P.S.: Sorry for top-posting, but I felt that I did not even respond to
anything you said, concretely, so...

On Tue, 6 Aug 2019, Jeff King wrote:

> On Tue, Aug 06, 2019 at 08:27:58AM -0400, Jeff King wrote:
>
> > diff --git a/config.c b/config.c
> > index 3900e4947b..cc637363bb 100644
> > --- a/config.c
> > +++ b/config.c
> > @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
> >  	int flags;
> >  	int ret;
> >  	struct strbuf pattern = STRBUF_INIT;
> > -	const char *refname = !the_repository || !the_repository->gitdir ?
> > +	const char *refname = !the_repository->gitdir ?
> >  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
>
> I stopped here, even though I argued earlier that this probably ought to
> be have_git_dir() to match the rest of the config code. What I found
> when I started poking at it is that there are a few odd cases still in
> the startup code, and that IMHO the final form we want to end up with
> actually matches what's written here (but I'm not quite ready to sink
> the time into taking us there just yet). Read on for the gory details.
>
> The most immediate issue I saw was that the_repository->gitdir doesn't
> actually tell you whether or not we have a repository! I.e., that entry
> may be non-NULL even though startup_info->have_repository is 0.
>
> So these two tests fail:
>
> diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
> index d20b4d150d..7ec91c4c1c 100755
> --- a/t/t1305-config-include.sh
> +++ b/t/t1305-config-include.sh
> @@ -348,6 +348,18 @@ test_expect_success 'conditional include, onbranch, implicit /** for /' '
>  	test_cmp expect actual
>  '
>
> +test_expect_success 'onbranch include not fooled by fake GIT_DIR' '
> +	mkdir not-a-git-dir &&
> +	echo "ref: refs/heads/master" >not-a-git-dir/HEAD &&
> +	git config --file=inner the.value inner &&
> +	git config --file=outer the.value outer &&
> +	git config --file=outer includeIf.onbranch:master.path inner &&
> +	GIT_DIR=not-a-git-dir \
> +		git config --file=outer --includes the.value >actual &&
> +	echo outer >expect &&
> +	test_cmp expect actual
> +'
> +
>  test_expect_success 'include cycles are detected' '
>  	git init --bare cycle &&
>  	git -C cycle config include.path cycle &&
> diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
> index 3a0de0ddaa..91af2c2f89 100755
> --- a/t/t1309-early-config.sh
> +++ b/t/t1309-early-config.sh
> @@ -99,4 +99,12 @@ test_expect_success 'onbranch config outside of git repo' '
>  	nongit git help
>  '
>
> +test_expect_success 'early config is not fooled by bogus GIT_DIR' '
> +	mkdir not-a-git-dir &&
> +	git config --file=not-a-git-dir/config \
> +		alias.should-not-run \
> +		"!echo should-not-run" &&
> +	test_must_fail env GIT_DIR="not-a-git-dir" git should-not-run
> +'
> +
>  test_done
>
> As you can see from that case, it's a curiosity that we still set the
> internal gitdir variable from $GIT_DIR, even if we found that it's not a
> valid repo.
>
> The onbranch case is fooled by our direct check of gitdir, but likewise
> have_git_dir() allows either have_repository or a non-NULL gitdir (as I
> mentioned earlier, I think this is vestigial at this point).
>
> So doing this fixes the onbranch case:
>
> diff --git a/config.c b/config.c
> index cc637363bb..134637f3ad 100644
> --- a/config.c
> +++ b/config.c
> @@ -275,7 +275,7 @@ static int include_by_branch(const char *cond, size_t cond_len)
>  	int flags;
>  	int ret;
>  	struct strbuf pattern = STRBUF_INIT;
> -	const char *refname = !the_repository->gitdir ?
> +	const char *refname = !have_git_dir() ?
>  		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
>  	const char *shortname;
>
> diff --git a/environment.c b/environment.c
> index 89af47cb85..08bdf79d6b 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -203,8 +203,7 @@ int is_bare_repository(void)
>
>  int have_git_dir(void)
>  {
> -	return startup_info->have_repository
> -		|| the_repository->gitdir;
> +	return startup_info->have_repository;
>  }
>
>  const char *get_git_dir(void)
>
>
> but curiously it _doesn't_ fix the alias one, because after seeing that
> we have no configured git dir, we then fall back to discover_git_dir(),
> which happily reports back the same bogus $GIT_DIR.
>
> So I think there'd be some more cleanup required there. But the general
> direction is that in the near future we probably ought to be checking
> startup_info->have_repository and not the_repository->gitdir.
>
> But I think in the long-term, we probably ought to be getting rid of
> startup_info->have_repository itself. It's really just hiding a subtle
> dependency on whether the_repository is valid, and we're better off
> actually seeing all the spots that depend on the_repository (and
> eventually converting them to take a "struct repository" as
> appropriate).
>
> Where we want to end up is (I think):
>
>   - stop setting the_repository->gitdir when we know that there's no
>     repo (and possibly clear $GIT_DIR as well to avoid confusion). That
>     should make it a robust way to check whether a "struct repository"
>     is valid.
>
>     This is the step I've stalled on, because I'm a bit worried that
>     there's some subtle case that depends on it working this way.
>
>   - drop startup_info->have_repository entirely. At that point it would
>     be redundant with checking the_repository->gitdir.
>
>     If you do this without the first step, t4201 will notice and fail
>     (because it expects GIT_DIR=not-a-repo to notice that there is no
>     repo).
>
>   - optionally, drop have_git_dir() in favor of just checking
>     the_repository->gitdir. This loses a layer of abstraction, but I
>     think it makes it much more clear where we'd depending on
>     the_repository.
>
> And so in the end, the current state of include_by_branch() is what we
> want, and all the other call sites should change to match it. Whether it
> should be changed in the interim (until we fix the discrepancy between
> gitdir/have_repository), I don't feel strongly about. There are
> user-visible cases that I believe we get wrong today, but as you can see
> from the tests above, they're pretty ridiculous and unlikely in
> practice.
>
> -Peff
>
diff mbox series

Patch

diff --git a/config.c b/config.c
index 3900e4947b..cc637363bb 100644
--- a/config.c
+++ b/config.c
@@ -275,7 +275,7 @@  static int include_by_branch(const char *cond, size_t cond_len)
 	int flags;
 	int ret;
 	struct strbuf pattern = STRBUF_INIT;
-	const char *refname = !the_repository || !the_repository->gitdir ?
+	const char *refname = !the_repository->gitdir ?
 		NULL : resolve_ref_unsafe("HEAD", 0, NULL, &flags);
 	const char *shortname;
 
diff --git a/t/t1309-early-config.sh b/t/t1309-early-config.sh
index eeb60e4143..3a0de0ddaa 100755
--- a/t/t1309-early-config.sh
+++ b/t/t1309-early-config.sh
@@ -94,4 +94,9 @@  test_expect_success 'early config and onbranch' '
 	test_with_config "[includeif \"onbranch:master\"]path=../broken"
 '
 
+test_expect_success 'onbranch config outside of git repo' '
+	test_config_global includeIf.onbranch:master.path non-existent &&
+	nongit git help
+'
+
 test_done