diff mbox series

[v3,07/10] refs: root refs can be symbolic refs

Message ID 92a71222e1067ca4ce9ecaaa555d78d0cce0d9d0.1714637671.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Clarify pseudo-ref terminology | expand

Commit Message

Patrick Steinhardt May 2, 2024, 8:17 a.m. UTC
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.

This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.

Furthermore, the behaviour is different to `is_headref()`, which only
checks for the ref to exist. While that is in line with our glossary,
this inconsistency only adds to the confusion.

Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`.

Let's loosen the restrictions in accordance to the new definition of
root refs, which are simply plain refs that may as well be a symbolic
ref. Consequently, we can just check for the ref to exist instead of
requiring it to be a regular ref.

Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                         | 50 ++++++++++++++++++++++++----------
 t/t6302-for-each-ref-filter.sh | 17 ++++++++++++
 2 files changed, 53 insertions(+), 14 deletions(-)

Comments

Jeff King May 3, 2024, 6:13 p.m. UTC | #1
On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:

> Before this patch series, root refs except for "HEAD" and our special
> refs were classified as pseudorefs. Furthermore, our terminology
> clarified that pseudorefs must not be symbolic refs. This restriction
> is enforced in `is_root_ref()`, which explicitly checks that a supposed
> root ref resolves to an object ID without recursing.
> 
> This has been extremely confusing right from the start because (in old
> terminology) a ref name may sometimes be a pseudoref and sometimes not
> depending on whether it is a symbolic or regular ref. This behaviour
> does not seem reasonable at all and I very much doubt that it results in
> anything sane.
> 
> Furthermore, the behaviour is different to `is_headref()`, which only
> checks for the ref to exist. While that is in line with our glossary,
> this inconsistency only adds to the confusion.
> 
> Last but not least, the current behaviour can actually lead to a
> segfault when calling `is_root_ref()` with a reference that either does
> not exist or that is a symbolic ref because we never initialized `oid`.
> 
> Let's loosen the restrictions in accordance to the new definition of
> root refs, which are simply plain refs that may as well be a symbolic
> ref. Consequently, we can just check for the ref to exist instead of
> requiring it to be a regular ref.

It's not clear to me that this existence check is particularly useful.
Something that fails read_raw_ref() will fail if:

  - the file does not exist at all. But then how did somebody find out
    about it at all to ask is_pseudoref()?

  - it does exist, but does not look like a ref. Is this important? If I
    do "echo foo >.git/CHERRY_PICK_HEAD", does it become not a root ref
    anymore? Or is it a root ref that is broken? I'd have thought the
    latter, and the syntax is what distinguishes it.

Making the classification purely syntactic based on the name feels
simpler to me to reason about. You'll never run into confusing cases
where repo state changes how commands may behave.

And arguably is_pseudoref_syntax() should be taking into account the
"_HEAD" restriction and special names anyway. It is a bit weird that
even if we tighten up the refname checking to use is_pseudoref_syntax(),
you'd still be able to "git update-ref FOO" but then not see it as a
root ref!

-Peff
Patrick Steinhardt May 15, 2024, 4:16 a.m. UTC | #2
On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> 
> > Before this patch series, root refs except for "HEAD" and our special
> > refs were classified as pseudorefs. Furthermore, our terminology
> > clarified that pseudorefs must not be symbolic refs. This restriction
> > is enforced in `is_root_ref()`, which explicitly checks that a supposed
> > root ref resolves to an object ID without recursing.
> > 
> > This has been extremely confusing right from the start because (in old
> > terminology) a ref name may sometimes be a pseudoref and sometimes not
> > depending on whether it is a symbolic or regular ref. This behaviour
> > does not seem reasonable at all and I very much doubt that it results in
> > anything sane.
> > 
> > Furthermore, the behaviour is different to `is_headref()`, which only
> > checks for the ref to exist. While that is in line with our glossary,
> > this inconsistency only adds to the confusion.
> > 
> > Last but not least, the current behaviour can actually lead to a
> > segfault when calling `is_root_ref()` with a reference that either does
> > not exist or that is a symbolic ref because we never initialized `oid`.
> > 
> > Let's loosen the restrictions in accordance to the new definition of
> > root refs, which are simply plain refs that may as well be a symbolic
> > ref. Consequently, we can just check for the ref to exist instead of
> > requiring it to be a regular ref.
> 
> It's not clear to me that this existence check is particularly useful.
> Something that fails read_raw_ref() will fail if:
> 
>   - the file does not exist at all. But then how did somebody find out
>     about it at all to ask is_pseudoref()?
> 
>   - it does exist, but does not look like a ref. Is this important? If I
>     do "echo foo >.git/CHERRY_PICK_HEAD", does it become not a root ref
>     anymore? Or is it a root ref that is broken? I'd have thought the
>     latter, and the syntax is what distinguishes it.
> 
> Making the classification purely syntactic based on the name feels
> simpler to me to reason about. You'll never run into confusing cases
> where repo state changes how commands may behave.

I certainly agree and have been complaining about that in the past, too.
I didn't dare to change the semantics this far yet. Let's have a look at
the callers:

  - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
    It's clear that the intent is to classify based on the ref name,
    only.

  - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
    determine whether it should add a ref to the root directory. It
    feels fishy that this uses ref existence checks to do that.

  - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
    filter root refs. Again, using existence checks is pointless here as
    the iterator has just surfaced the ref, so it does exist.

  - "refs.c:is_current_worktree_ref()" uses it. Fishy as well, as the
    call to `is_per_worktree_ref()` also only checks for the refname.

So let's remove these existence checks altogether and make this a check
that purely checks semantics.

> And arguably is_pseudoref_syntax() should be taking into account the
> "_HEAD" restriction and special names anyway. It is a bit weird that
> even if we tighten up the refname checking to use is_pseudoref_syntax(),
> you'd still be able to "git update-ref FOO" but then not see it as a
> root ref!

True, as well. I'm less comfortable with doing that change in this
series though as it does impose a major restriction that did not exist
previously. We probably want some escape hatches so that it would still
be possible to modify those refs when really required, for example to
delete such broken refs.

I would thus like to defer this to a follow up patch series, if you
don't mind.

Patrick
Patrick Steinhardt May 15, 2024, 4:39 a.m. UTC | #3
On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
[snip]
> > And arguably is_pseudoref_syntax() should be taking into account the
> > "_HEAD" restriction and special names anyway. It is a bit weird that
> > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > you'd still be able to "git update-ref FOO" but then not see it as a
> > root ref!
> 
> True, as well. I'm less comfortable with doing that change in this
> series though as it does impose a major restriction that did not exist
> previously. We probably want some escape hatches so that it would still
> be possible to modify those refs when really required, for example to
> delete such broken refs.
> 
> I would thus like to defer this to a follow up patch series, if you
> don't mind.

Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
to `is_root_ref_syntax()`) at all anymore after this series lands
because it can be neatly rolled into `is_root_ref()`. The only caller,
`is_current_worktree_ref()`, should really call `is_roof_ref()` and not
`is_root_ref_syntax()`.

But again, I'll defer this to a follow-up patch series.

Patrick
Jeff King May 15, 2024, 6:20 a.m. UTC | #4
On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:

> > Making the classification purely syntactic based on the name feels
> > simpler to me to reason about. You'll never run into confusing cases
> > where repo state changes how commands may behave.
> 
> I certainly agree and have been complaining about that in the past, too.
> I didn't dare to change the semantics this far yet. Let's have a look at
> the callers:
> 
>   - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
>     It's clear that the intent is to classify based on the ref name,
>     only.
> 
>   - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
>     determine whether it should add a ref to the root directory. It
>     feels fishy that this uses ref existence checks to do that.
> 
>   - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
>     filter root refs. Again, using existence checks is pointless here as
>     the iterator has just surfaced the ref, so it does exist.
> 
>   - "refs.c:is_current_worktree_ref()" uses it. Fishy as well, as the
>     call to `is_per_worktree_ref()` also only checks for the refname.
> 
> So let's remove these existence checks altogether and make this a check
> that purely checks semantics.

Thanks for doing the digging on the callers. That matches my intuition /
light analysis, which is good. ;)

> > And arguably is_pseudoref_syntax() should be taking into account the
> > "_HEAD" restriction and special names anyway. It is a bit weird that
> > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > you'd still be able to "git update-ref FOO" but then not see it as a
> > root ref!
> 
> True, as well. I'm less comfortable with doing that change in this
> series though as it does impose a major restriction that did not exist
> previously. We probably want some escape hatches so that it would still
> be possible to modify those refs when really required, for example to
> delete such broken refs.
> 
> I would thus like to defer this to a follow up patch series, if you
> don't mind.

I don't mind deferring. I thought it might make the simplifications
you're doing in this series easier to reason about. But TBH I haven't
had the chance to look through your series very carefully yet (and I'm
still a bit back-logged), so I'm happy to go with your judgement on how
to split it up.

-Peff
Jeff King May 15, 2024, 6:22 a.m. UTC | #5
On Wed, May 15, 2024 at 06:39:52AM +0200, Patrick Steinhardt wrote:

> On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> > On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> [snip]
> > > And arguably is_pseudoref_syntax() should be taking into account the
> > > "_HEAD" restriction and special names anyway. It is a bit weird that
> > > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > > you'd still be able to "git update-ref FOO" but then not see it as a
> > > root ref!
> > 
> > True, as well. I'm less comfortable with doing that change in this
> > series though as it does impose a major restriction that did not exist
> > previously. We probably want some escape hatches so that it would still
> > be possible to modify those refs when really required, for example to
> > delete such broken refs.
> > 
> > I would thus like to defer this to a follow up patch series, if you
> > don't mind.
> 
> Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
> to `is_root_ref_syntax()`) at all anymore after this series lands
> because it can be neatly rolled into `is_root_ref()`. The only caller,
> `is_current_worktree_ref()`, should really call `is_roof_ref()` and not
> `is_root_ref_syntax()`.

Yeah, and I'd expect that the more-strict check_refname_format() that I
proposed elsewhere would be in the same boat. The only reason I used the
"_syntax()" variant is that it was obviously wrong to do existence
checks there. Once those are gone, then naturally it should be able to
rely on is_root_ref() itself.

-Peff
Patrick Steinhardt May 15, 2024, 6:35 a.m. UTC | #6
On Wed, May 15, 2024 at 02:22:20AM -0400, Jeff King wrote:
> On Wed, May 15, 2024 at 06:39:52AM +0200, Patrick Steinhardt wrote:
> 
> > On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> > > On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > > > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> > [snip]
> > > > And arguably is_pseudoref_syntax() should be taking into account the
> > > > "_HEAD" restriction and special names anyway. It is a bit weird that
> > > > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > > > you'd still be able to "git update-ref FOO" but then not see it as a
> > > > root ref!
> > > 
> > > True, as well. I'm less comfortable with doing that change in this
> > > series though as it does impose a major restriction that did not exist
> > > previously. We probably want some escape hatches so that it would still
> > > be possible to modify those refs when really required, for example to
> > > delete such broken refs.
> > > 
> > > I would thus like to defer this to a follow up patch series, if you
> > > don't mind.
> > 
> > Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
> > to `is_root_ref_syntax()`) at all anymore after this series lands
> > because it can be neatly rolled into `is_root_ref()`. The only caller,
> > `is_current_worktree_ref()`, should really call `is_roof_ref()` and not
> > `is_root_ref_syntax()`.
> 
> Yeah, and I'd expect that the more-strict check_refname_format() that I
> proposed elsewhere would be in the same boat. The only reason I used the
> "_syntax()" variant is that it was obviously wrong to do existence
> checks there. Once those are gone, then naturally it should be able to
> rely on is_root_ref() itself.

This series hasn't been queued/merged yet, right? Do you plan to reroll
it? I think that the changes in there are a good complementary addition
to the clarifications in my patch series.

Patrick
Jeff King May 15, 2024, 6:49 a.m. UTC | #7
On Wed, May 15, 2024 at 08:35:52AM +0200, Patrick Steinhardt wrote:

> > Yeah, and I'd expect that the more-strict check_refname_format() that I
> > proposed elsewhere would be in the same boat. The only reason I used the
> > "_syntax()" variant is that it was obviously wrong to do existence
> > checks there. Once those are gone, then naturally it should be able to
> > rely on is_root_ref() itself.
> 
> This series hasn't been queued/merged yet, right? Do you plan to reroll
> it? I think that the changes in there are a good complementary addition
> to the clarifications in my patch series.

Correct, I don't think Junio picked it up. It needed a re-roll anyway,
so I'd plan to do it on top of your patches, assuming they are on track
to get merged (and it sounds like there are no real objections).

-Peff
Patrick Steinhardt May 15, 2024, 6:59 a.m. UTC | #8
On Wed, May 15, 2024 at 02:49:12AM -0400, Jeff King wrote:
> On Wed, May 15, 2024 at 08:35:52AM +0200, Patrick Steinhardt wrote:
> 
> > > Yeah, and I'd expect that the more-strict check_refname_format() that I
> > > proposed elsewhere would be in the same boat. The only reason I used the
> > > "_syntax()" variant is that it was obviously wrong to do existence
> > > checks there. Once those are gone, then naturally it should be able to
> > > rely on is_root_ref() itself.
> > 
> > This series hasn't been queued/merged yet, right? Do you plan to reroll
> > it? I think that the changes in there are a good complementary addition
> > to the clarifications in my patch series.
> 
> Correct, I don't think Junio picked it up. It needed a re-roll anyway,
> so I'd plan to do it on top of your patches, assuming they are on track
> to get merged (and it sounds like there are no real objections).

That feels sensible. The series needs another thorough review and an Ack
by somebody before Junio wants to merge it, but until now I'm not aware
of any objections, yeah. So it should hopefully land soonish.

Patrick
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 5b89e83ad7..ca9844bc3e 100644
--- a/refs.c
+++ b/refs.c
@@ -869,7 +869,10 @@  int is_root_ref(struct ref_store *refs, const char *refname)
 		"NOTES_MERGE_REF",
 		"MERGE_AUTOSTASH",
 	};
-	struct object_id oid;
+	struct strbuf referent = STRBUF_INIT;
+	struct object_id oid = { 0 };
+	int failure_errno, ret = 0;
+	unsigned int flags;
 	size_t i;
 
 	if (!is_root_ref_syntax(refname))
@@ -877,30 +880,49 @@  int is_root_ref(struct ref_store *refs, const char *refname)
 	if (is_headref(refs, refname))
 		return 1;
 
+	/*
+	 * Note that we cannot use `refs_ref_exists()` here because that also
+	 * checks whether its target ref exists in case refname is a symbolic
+	 * ref.
+	 */
 	if (ends_with(refname, "_HEAD")) {
-		refs_resolve_ref_unsafe(refs, refname,
-					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
-					&oid, NULL);
-		return !is_null_oid(&oid);
+		ret = !refs_read_raw_ref(refs, refname, &oid, &referent,
+					 &flags, &failure_errno);
+		goto done;
 	}
 
-	for (i = 0; i < ARRAY_SIZE(irregular_root_refs); i++)
+	for (i = 0; i < ARRAY_SIZE(irregular_root_refs); i++) {
 		if (!strcmp(refname, irregular_root_refs[i])) {
-			refs_resolve_ref_unsafe(refs, refname,
-						RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
-						&oid, NULL);
-			return !is_null_oid(&oid);
+			ret = !refs_read_raw_ref(refs, refname, &oid, &referent,
+						 &flags, &failure_errno);
+			goto done;
 		}
+	}
 
-	return 0;
+done:
+	strbuf_release(&referent);
+	return ret;
 }
 
 int is_headref(struct ref_store *refs, const char *refname)
 {
-	if (!strcmp(refname, "HEAD"))
-		return refs_ref_exists(refs, refname);
+	struct strbuf referent = STRBUF_INIT;
+	struct object_id oid = { 0 };
+	int failure_errno, ret = 0;
+	unsigned int flags;
 
-	return 0;
+	/*
+	 * Note that we cannot use `refs_ref_exists()` here because that also
+	 * checks whether its target ref exists in case refname is a symbolic
+	 * ref.
+	 */
+	if (!strcmp(refname, "HEAD")) {
+		ret = !refs_read_raw_ref(refs, refname, &oid, &referent,
+					 &flags, &failure_errno);
+	}
+
+	strbuf_release(&referent);
+	return ret;
 }
 
 static int is_current_worktree_ref(const char *ref) {
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 948f1bb5f4..92ed8957c8 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -62,6 +62,23 @@  test_expect_success '--include-root-refs with other patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success '--include-root-refs omits dangling symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		git symbolic-ref DANGLING_HEAD refs/heads/missing &&
+		cat >expect <<-EOF &&
+		HEAD
+		$(git symbolic-ref HEAD)
+		refs/tags/initial
+		EOF
+		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'filtering with --points-at' '
 	cat >expect <<-\EOF &&
 	refs/heads/main