[v3,07/10] refs: root refs can be symbolic refs

Message ID	92a71222e1067ca4ce9ecaaa555d78d0cce0d9d0.1714637671.git.ps@pks.im (mailing list archive)
State	Superseded
Headers	show Received: from fhigh1-smtp.messagingengine.com (fhigh1-smtp.messagingengine.com [103.168.172.152]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id AA6E336B0D for <git@vger.kernel.org>; Thu, 2 May 2024 08:17:46 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Thu, 2 May 2024 10:17:42 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Karthik Nayak <karthik.188@gmail.com>, Phillip Wood <phillip.wood123@gmail.com>, Junio C Hamano <gitster@pobox.com>, Justin Tobler <jltobler@gmail.com>, Kristoffer Haugsbakk <code@khaugsbakk.name> Subject: [PATCH v3 07/10] refs: root refs can be symbolic refs Message-ID: <92a71222e1067ca4ce9ecaaa555d78d0cce0d9d0.1714637671.git.ps@pks.im> References: <cover.1714398019.git.ps@pks.im> <cover.1714637671.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="gV5dmvb7bTCU0oYl" Content-Disposition: inline In-Reply-To: <cover.1714637671.git.ps@pks.im>
Series	Clarify pseudo-ref terminology \| expand [v3,00/10] Clarify pseudo-ref terminology [v3,01/10] Documentation/glossary: redefine pseudorefs as special refs [v3,02/10] Documentation/glossary: clarify limitations of pseudorefs [v3,03/10] Documentation/glossary: define root refs as refs [v3,04/10] refs: rename `is_pseudoref()` to `is_root_ref()` [v3,05/10] refs: refname `is_special_ref()` to `is_pseudo_ref()` [v3,06/10] refs: classify HEAD as a root ref [v3,07/10] refs: root refs can be symbolic refs [v3,08/10] refs: pseudorefs are no refs [v3,09/10] ref-filter: properly distinuish pseudo and root refs [v3,10/10] refs: refuse to write pseudorefs

Patrick Steinhardt May 2, 2024, 8:17 a.m. UTC

Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.

This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.

Furthermore, the behaviour is different to `is_headref()`, which only
checks for the ref to exist. While that is in line with our glossary,
this inconsistency only adds to the confusion.

Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`.

Let's loosen the restrictions in accordance to the new definition of
root refs, which are simply plain refs that may as well be a symbolic
ref. Consequently, we can just check for the ref to exist instead of
requiring it to be a regular ref.

Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 refs.c                         | 50 ++++++++++++++++++++++++----------
 t/t6302-for-each-ref-filter.sh | 17 ++++++++++++
 2 files changed, 53 insertions(+), 14 deletions(-)

Jeff King May 3, 2024, 6:13 p.m. UTC | #1

On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:

> Before this patch series, root refs except for "HEAD" and our special
> refs were classified as pseudorefs. Furthermore, our terminology
> clarified that pseudorefs must not be symbolic refs. This restriction
> is enforced in `is_root_ref()`, which explicitly checks that a supposed
> root ref resolves to an object ID without recursing.
> 
> This has been extremely confusing right from the start because (in old
> terminology) a ref name may sometimes be a pseudoref and sometimes not
> depending on whether it is a symbolic or regular ref. This behaviour
> does not seem reasonable at all and I very much doubt that it results in
> anything sane.
> 
> Furthermore, the behaviour is different to `is_headref()`, which only
> checks for the ref to exist. While that is in line with our glossary,
> this inconsistency only adds to the confusion.
> 
> Last but not least, the current behaviour can actually lead to a
> segfault when calling `is_root_ref()` with a reference that either does
> not exist or that is a symbolic ref because we never initialized `oid`.
> 
> Let's loosen the restrictions in accordance to the new definition of
> root refs, which are simply plain refs that may as well be a symbolic
> ref. Consequently, we can just check for the ref to exist instead of
> requiring it to be a regular ref.

It's not clear to me that this existence check is particularly useful.
Something that fails read_raw_ref() will fail if:

  - the file does not exist at all. But then how did somebody find out
    about it at all to ask is_pseudoref()?

  - it does exist, but does not look like a ref. Is this important? If I
    do "echo foo >.git/CHERRY_PICK_HEAD", does it become not a root ref
    anymore? Or is it a root ref that is broken? I'd have thought the
    latter, and the syntax is what distinguishes it.

Making the classification purely syntactic based on the name feels
simpler to me to reason about. You'll never run into confusing cases
where repo state changes how commands may behave.

And arguably is_pseudoref_syntax() should be taking into account the
"_HEAD" restriction and special names anyway. It is a bit weird that
even if we tighten up the refname checking to use is_pseudoref_syntax(),
you'd still be able to "git update-ref FOO" but then not see it as a
root ref!

-Peff

Patrick Steinhardt May 15, 2024, 4:16 a.m. UTC | #2

On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> 
> > Before this patch series, root refs except for "HEAD" and our special
> > refs were classified as pseudorefs. Furthermore, our terminology
> > clarified that pseudorefs must not be symbolic refs. This restriction
> > is enforced in `is_root_ref()`, which explicitly checks that a supposed
> > root ref resolves to an object ID without recursing.
> > 
> > This has been extremely confusing right from the start because (in old
> > terminology) a ref name may sometimes be a pseudoref and sometimes not
> > depending on whether it is a symbolic or regular ref. This behaviour
> > does not seem reasonable at all and I very much doubt that it results in
> > anything sane.
> > 
> > Furthermore, the behaviour is different to `is_headref()`, which only
> > checks for the ref to exist. While that is in line with our glossary,
> > this inconsistency only adds to the confusion.
> > 
> > Last but not least, the current behaviour can actually lead to a
> > segfault when calling `is_root_ref()` with a reference that either does
> > not exist or that is a symbolic ref because we never initialized `oid`.
> > 
> > Let's loosen the restrictions in accordance to the new definition of
> > root refs, which are simply plain refs that may as well be a symbolic
> > ref. Consequently, we can just check for the ref to exist instead of
> > requiring it to be a regular ref.
> 
> It's not clear to me that this existence check is particularly useful.
> Something that fails read_raw_ref() will fail if:
> 
>   - the file does not exist at all. But then how did somebody find out
>     about it at all to ask is_pseudoref()?
> 
>   - it does exist, but does not look like a ref. Is this important? If I
>     do "echo foo >.git/CHERRY_PICK_HEAD", does it become not a root ref
>     anymore? Or is it a root ref that is broken? I'd have thought the
>     latter, and the syntax is what distinguishes it.
> 
> Making the classification purely syntactic based on the name feels
> simpler to me to reason about. You'll never run into confusing cases
> where repo state changes how commands may behave.

I certainly agree and have been complaining about that in the past, too.
I didn't dare to change the semantics this far yet. Let's have a look at
the callers:

  - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
    It's clear that the intent is to classify based on the ref name,
    only.

  - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
    determine whether it should add a ref to the root directory. It
    feels fishy that this uses ref existence checks to do that.

  - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
    filter root refs. Again, using existence checks is pointless here as
    the iterator has just surfaced the ref, so it does exist.

  - "refs.c:is_current_worktree_ref()" uses it. Fishy as well, as the
    call to `is_per_worktree_ref()` also only checks for the refname.

So let's remove these existence checks altogether and make this a check
that purely checks semantics.

> And arguably is_pseudoref_syntax() should be taking into account the
> "_HEAD" restriction and special names anyway. It is a bit weird that
> even if we tighten up the refname checking to use is_pseudoref_syntax(),
> you'd still be able to "git update-ref FOO" but then not see it as a
> root ref!

True, as well. I'm less comfortable with doing that change in this
series though as it does impose a major restriction that did not exist
previously. We probably want some escape hatches so that it would still
be possible to modify those refs when really required, for example to
delete such broken refs.

I would thus like to defer this to a follow up patch series, if you
don't mind.

Patrick

Patrick Steinhardt May 15, 2024, 4:39 a.m. UTC | #3

On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
[snip]
> > And arguably is_pseudoref_syntax() should be taking into account the
> > "_HEAD" restriction and special names anyway. It is a bit weird that
> > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > you'd still be able to "git update-ref FOO" but then not see it as a
> > root ref!
> 
> True, as well. I'm less comfortable with doing that change in this
> series though as it does impose a major restriction that did not exist
> previously. We probably want some escape hatches so that it would still
> be possible to modify those refs when really required, for example to
> delete such broken refs.
> 
> I would thus like to defer this to a follow up patch series, if you
> don't mind.

Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
to `is_root_ref_syntax()`) at all anymore after this series lands
because it can be neatly rolled into `is_root_ref()`. The only caller,
`is_current_worktree_ref()`, should really call `is_roof_ref()` and not
`is_root_ref_syntax()`.

But again, I'll defer this to a follow-up patch series.

Patrick

Jeff King May 15, 2024, 6:20 a.m. UTC | #4

On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:

> > Making the classification purely syntactic based on the name feels
> > simpler to me to reason about. You'll never run into confusing cases
> > where repo state changes how commands may behave.
> 
> I certainly agree and have been complaining about that in the past, too.
> I didn't dare to change the semantics this far yet. Let's have a look at
> the callers:
> 
>   - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
>     It's clear that the intent is to classify based on the ref name,
>     only.
> 
>   - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
>     determine whether it should add a ref to the root directory. It
>     feels fishy that this uses ref existence checks to do that.
> 
>   - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
>     filter root refs. Again, using existence checks is pointless here as
>     the iterator has just surfaced the ref, so it does exist.
> 
>   - "refs.c:is_current_worktree_ref()" uses it. Fishy as well, as the
>     call to `is_per_worktree_ref()` also only checks for the refname.
> 
> So let's remove these existence checks altogether and make this a check
> that purely checks semantics.

Thanks for doing the digging on the callers. That matches my intuition /
light analysis, which is good. ;)

> > And arguably is_pseudoref_syntax() should be taking into account the
> > "_HEAD" restriction and special names anyway. It is a bit weird that
> > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > you'd still be able to "git update-ref FOO" but then not see it as a
> > root ref!
> 
> True, as well. I'm less comfortable with doing that change in this
> series though as it does impose a major restriction that did not exist
> previously. We probably want some escape hatches so that it would still
> be possible to modify those refs when really required, for example to
> delete such broken refs.
> 
> I would thus like to defer this to a follow up patch series, if you
> don't mind.

I don't mind deferring. I thought it might make the simplifications
you're doing in this series easier to reason about. But TBH I haven't
had the chance to look through your series very carefully yet (and I'm
still a bit back-logged), so I'm happy to go with your judgement on how
to split it up.

-Peff

Jeff King May 15, 2024, 6:22 a.m. UTC | #5

On Wed, May 15, 2024 at 06:39:52AM +0200, Patrick Steinhardt wrote:

> On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> > On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> [snip]
> > > And arguably is_pseudoref_syntax() should be taking into account the
> > > "_HEAD" restriction and special names anyway. It is a bit weird that
> > > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > > you'd still be able to "git update-ref FOO" but then not see it as a
> > > root ref!
> > 
> > True, as well. I'm less comfortable with doing that change in this
> > series though as it does impose a major restriction that did not exist
> > previously. We probably want some escape hatches so that it would still
> > be possible to modify those refs when really required, for example to
> > delete such broken refs.
> > 
> > I would thus like to defer this to a follow up patch series, if you
> > don't mind.
> 
> Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
> to `is_root_ref_syntax()`) at all anymore after this series lands
> because it can be neatly rolled into `is_root_ref()`. The only caller,
> `is_current_worktree_ref()`, should really call `is_roof_ref()` and not
> `is_root_ref_syntax()`.

Yeah, and I'd expect that the more-strict check_refname_format() that I
proposed elsewhere would be in the same boat. The only reason I used the
"_syntax()" variant is that it was obviously wrong to do existence
checks there. Once those are gone, then naturally it should be able to
rely on is_root_ref() itself.

-Peff

Patrick Steinhardt May 15, 2024, 6:35 a.m. UTC | #6

On Wed, May 15, 2024 at 02:22:20AM -0400, Jeff King wrote:
> On Wed, May 15, 2024 at 06:39:52AM +0200, Patrick Steinhardt wrote:
> 
> > On Wed, May 15, 2024 at 06:16:18AM +0200, Patrick Steinhardt wrote:
> > > On Fri, May 03, 2024 at 02:13:39PM -0400, Jeff King wrote:
> > > > On Thu, May 02, 2024 at 10:17:42AM +0200, Patrick Steinhardt wrote:
> > [snip]
> > > > And arguably is_pseudoref_syntax() should be taking into account the
> > > > "_HEAD" restriction and special names anyway. It is a bit weird that
> > > > even if we tighten up the refname checking to use is_pseudoref_syntax(),
> > > > you'd still be able to "git update-ref FOO" but then not see it as a
> > > > root ref!
> > > 
> > > True, as well. I'm less comfortable with doing that change in this
> > > series though as it does impose a major restriction that did not exist
> > > previously. We probably want some escape hatches so that it would still
> > > be possible to modify those refs when really required, for example to
> > > delete such broken refs.
> > > 
> > > I would thus like to defer this to a follow up patch series, if you
> > > don't mind.
> > 
> > Arguably, we don't need `is_pseudoref_syntax()` (which is being renamed
> > to `is_root_ref_syntax()`) at all anymore after this series lands
> > because it can be neatly rolled into `is_root_ref()`. The only caller,
> > `is_current_worktree_ref()`, should really call `is_roof_ref()` and not
> > `is_root_ref_syntax()`.
> 
> Yeah, and I'd expect that the more-strict check_refname_format() that I
> proposed elsewhere would be in the same boat. The only reason I used the
> "_syntax()" variant is that it was obviously wrong to do existence
> checks there. Once those are gone, then naturally it should be able to
> rely on is_root_ref() itself.

This series hasn't been queued/merged yet, right? Do you plan to reroll
it? I think that the changes in there are a good complementary addition
to the clarifications in my patch series.

Patrick

Jeff King May 15, 2024, 6:49 a.m. UTC | #7

On Wed, May 15, 2024 at 08:35:52AM +0200, Patrick Steinhardt wrote:

> > Yeah, and I'd expect that the more-strict check_refname_format() that I
> > proposed elsewhere would be in the same boat. The only reason I used the
> > "_syntax()" variant is that it was obviously wrong to do existence
> > checks there. Once those are gone, then naturally it should be able to
> > rely on is_root_ref() itself.
> 
> This series hasn't been queued/merged yet, right? Do you plan to reroll
> it? I think that the changes in there are a good complementary addition
> to the clarifications in my patch series.

Correct, I don't think Junio picked it up. It needed a re-roll anyway,
so I'd plan to do it on top of your patches, assuming they are on track
to get merged (and it sounds like there are no real objections).

-Peff

Patrick Steinhardt May 15, 2024, 6:59 a.m. UTC | #8

On Wed, May 15, 2024 at 02:49:12AM -0400, Jeff King wrote:
> On Wed, May 15, 2024 at 08:35:52AM +0200, Patrick Steinhardt wrote:
> 
> > > Yeah, and I'd expect that the more-strict check_refname_format() that I
> > > proposed elsewhere would be in the same boat. The only reason I used the
> > > "_syntax()" variant is that it was obviously wrong to do existence
> > > checks there. Once those are gone, then naturally it should be able to
> > > rely on is_root_ref() itself.
> > 
> > This series hasn't been queued/merged yet, right? Do you plan to reroll
> > it? I think that the changes in there are a good complementary addition
> > to the clarifications in my patch series.
> 
> Correct, I don't think Junio picked it up. It needed a re-roll anyway,
> so I'd plan to do it on top of your patches, assuming they are on track
> to get merged (and it sounds like there are no real objections).

That feels sensible. The series needs another thorough review and an Ack
by somebody before Junio wants to merge it, but until now I'm not aware
of any objections, yeah. So it should hopefully land soonish.

Patrick

[v3,07/10] refs: root refs can be symbolic refs

Commit Message

Comments

Patch