diff mbox series

[2/3] refs: do not label special refs as pseudo refs

Message ID b5e7ddb1e30acb7e3871a189beb2c828b18f9e73.1714398019.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series Clarify pseudo-ref terminology | expand

Commit Message

Patrick Steinhardt April 29, 2024, 1:41 p.m. UTC
We have two refs which almost behave like a ref in many contexts, but
aren't really:

  - MERGE_HEAD contains the list of parents during a merge.

  - FETCH_HEAD contains the list of fetched references after
    git-fetch(1) with some annotations.

These references have been declared "special refs" in 8df4c5d205
(Documentation: add "special refs" to the glossary, 2024-01-19).

Due to their "_HEAD" suffix, those special refs also almost look like a
pseudo ref, even though they aren't. But because `is_pseudoref()` labels
anything as a pseudo ref that ends with the `_HEAD` suffix, it will also
happily label both of the above special refs as pseudo refs.

This mis-labeling creates some weirdness and inconsistent behaviour
across ref backends. As special refs are never stored via a ref backend,
they theoretically speaking cannot know about special refs. But with the
recent introduction of the `--include-root-refs` flag this isn't quite
true anymore: the "files" backend will yield all refs that look like a
pseudo ref or "HEAD" stored in the root directory. And given that both
of the above look like pseudo refs, the "files" backend will list those,
too. The "reftable" backend naturally cannot know about those, and
teaching it to parse and yield these special refs very much feels like
the wrong way to go. So, arguably, the better direction to go is to mark
the "files" behaviour as a bug and stop yielding special refs there.

Conceptually, this feels like the right thing to do, too. Special refs
really aren't refs, they are a different file format that for some part
may behave like a ref. If we were designing these special refs from
scratch, we would have likely never named it anything like a "ref" at
all.

So let's double down on the path that the mentioned commit has started,
which is to cleanly distinguish special refs and pseudo refs.

Ideally, the proper way would be to return to the original meaning that
pseudo refs really had: a ref that behaves like a ref for most of the
part, but isn't really a ref. We would essentially replace the current
"pseudoref" term with the "special ref" term. The consequence is that
all refs except for FETCH_HEAD and MERGE_HEAD would be normal refs,
regardless of whether they live in the root hierarchy or not. The way
that pseudorefs are enforced now would then change to be a naming policy
for refs, only. It's unclear though how sensible it would be to do such
a large change to terminology now, which is why this commit does the
next best thing.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 Documentation/glossary-content.txt | 36 ++++++++++++++++++------------
 refs.c                             |  2 ++
 t/t6302-for-each-ref-filter.sh     | 17 ++++++++++++++
 3 files changed, 41 insertions(+), 14 deletions(-)

Comments

Phillip Wood April 29, 2024, 3:12 p.m. UTC | #1
Hi Patrick

On 29/04/2024 14:41, Patrick Steinhardt wrote:
> diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
> index d71b199955..4275918fa0 100644
> --- a/Documentation/glossary-content.txt
> +++ b/Documentation/glossary-content.txt
> @@ -497,20 +497,28 @@ exclude;;
>   	unusual refs.
>   
>   [[def_pseudoref]]pseudoref::
> -	Pseudorefs are a class of files under `$GIT_DIR` which behave
> -	like refs for the purposes of rev-parse, but which are treated
> -	specially by git.  Pseudorefs both have names that are all-caps,
> -	and always start with a line consisting of a
> -	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
> -	pseudoref, because it is sometimes a symbolic ref.  They might
> -	optionally contain some additional data.  `MERGE_HEAD` and
> -	`CHERRY_PICK_HEAD` are examples.  Unlike
> -	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
> -	be symbolic refs, and never have reflogs.  They also cannot be
> -	updated through the normal ref update machinery.  Instead,
> -	they are updated by directly writing to the files.  However,
> -	they can be read as if they were refs, so `git rev-parse
> -	MERGE_HEAD` will work.
> +	Pseudorefs are references that live in the root of the reference
> +	hierarchy, outside of the usual "refs/" hierarchy. Pseudorefs have an
> +	all-uppercase name and must end with a "_HEAD" suffix, for example
> +	"`BISECT_HEAD`". Other than that, pseudorefs behave the exact same as
> +	any other reference and can be both read and written via regular Git
> +	tooling.

This changes the definition to allow pseudorefs to by symbolic refs. 
When is_pseudoref() was introduced Junio and I had a brief discussion 
about this restriction and he was not in favor of allowing pseudorefs to 
be symbolic refs [1].

Are there any practical implications of the changes in this patch for 
users running commands like "git log FETCH_HEAD" (I can't think of any 
off the top of my head but it would be good to have some reassurance on 
that point in the commit message)

Best Wishes

Phillip

[1] https://lore.kernel.org/git/xmqq34u2q3zs.fsf@gitster.g/

> +<<def_special_ref>,Special refs>> are not pseudorefs.
> ++
> +Due to historic reasons, Git has several irregular pseudo refs that do not
> +follow above rules. The following list of irregular pseudo refs is exhaustive
> +and shall not be extended in the future:
> +
> + - "`AUTO_MERGE`"
> +
> + - "`BISECT_EXPECTED_REV`"
> +
> + - "`NOTES_MERGE_PARTIAL`"
> +
> + - "`NOTES_MERGE_REF`"
> +
> + - "`MERGE_AUTOSTASH`"
>   
>   [[def_pull]]pull::
>   	Pulling a <<def_branch,branch>> means to <<def_fetch,fetch>> it and
> diff --git a/refs.c b/refs.c
> index c64f66bff9..567c6fc6ff 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -905,6 +905,8 @@ int is_pseudoref(struct ref_store *refs, const char *refname)
>   
>   	if (!is_pseudoref_syntax(refname))
>   		return 0;
> +	if (is_special_ref(refname))
> +		return 0;
>   
>   	if (ends_with(refname, "_HEAD")) {
>   		refs_resolve_ref_unsafe(refs, refname,
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 948f1bb5f4..8c92fbde79 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -52,6 +52,23 @@ test_expect_success '--include-root-refs pattern prints pseudorefs' '
>   	test_cmp expect actual
>   '
>   
> +test_expect_success '--include-root-refs pattern does not print special refs' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +		git rev-parse HEAD >.git/MERGE_HEAD &&
> +		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		$(git symbolic-ref HEAD)
> +		refs/tags/initial
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
>   test_expect_success '--include-root-refs with other patterns' '
>   	cat >expect <<-\EOF &&
>   	HEAD
Junio C Hamano April 29, 2024, 4:24 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> +Due to historic reasons, Git has several irregular pseudo refs that do not
> +follow above rules. The following list of irregular pseudo refs is exhaustive
> +and shall not be extended in the future:

I like this part of the patch the most ;-).
Justin Tobler April 29, 2024, 10:52 p.m. UTC | #3
On 24/04/29 03:41PM, Patrick Steinhardt wrote:
> diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
> index d71b199955..4275918fa0 100644
> --- a/Documentation/glossary-content.txt
> +++ b/Documentation/glossary-content.txt
> @@ -497,20 +497,28 @@ exclude;;
>  	unusual refs.
>  
>  [[def_pseudoref]]pseudoref::
> -	Pseudorefs are a class of files under `$GIT_DIR` which behave
> -	like refs for the purposes of rev-parse, but which are treated
> -	specially by git.  Pseudorefs both have names that are all-caps,
> -	and always start with a line consisting of a
> -	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
> -	pseudoref, because it is sometimes a symbolic ref.  They might

We remove the example here about HEAD not being a pseudoref. This
example seems helpful to indicate that a pseudoref cannot be a symbolic
ref. Is this no longer the case and the change intended?

> -	optionally contain some additional data.  `MERGE_HEAD` and
> -	`CHERRY_PICK_HEAD` are examples.  Unlike
> -	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
> -	be symbolic refs, and never have reflogs.  They also cannot be
> -	updated through the normal ref update machinery.  Instead,
> -	they are updated by directly writing to the files.  However,
> -	they can be read as if they were refs, so `git rev-parse
> -	MERGE_HEAD` will work.
> +	Pseudorefs are references that live in the root of the reference
> +	hierarchy, outside of the usual "refs/" hierarchy. Pseudorefs have an
> +	all-uppercase name and must end with a "_HEAD" suffix, for example
> +	"`BISECT_HEAD`". Other than that, pseudorefs behave the exact same as
> +	any other reference and can be both read and written via regular Git
> +	tooling.

Pseudorefs behaving the same and using the same tooling seems to
contridict the previous documentation. I assume the previous information
was out-of-date, but it might be nice to explain this in the message.

> ++
> +<<def_special_ref>,Special refs>> are not pseudorefs.
> ++
> +Due to historic reasons, Git has several irregular pseudo refs that do not
> +follow above rules. The following list of irregular pseudo refs is exhaustive

We seem to be inconsistent between using "pseudoref" and "pseudo ref".
Not sure it we want to be consistent here. 

-Justin

> +and shall not be extended in the future:
> +
> + - "`AUTO_MERGE`"
> +
> + - "`BISECT_EXPECTED_REV`"
> +
> + - "`NOTES_MERGE_PARTIAL`"
> +
> + - "`NOTES_MERGE_REF`"
> +
> + - "`MERGE_AUTOSTASH`"
>  
>  [[def_pull]]pull::
>  	Pulling a <<def_branch,branch>> means to <<def_fetch,fetch>> it and
> diff --git a/refs.c b/refs.c
> index c64f66bff9..567c6fc6ff 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -905,6 +905,8 @@ int is_pseudoref(struct ref_store *refs, const char *refname)
>  
>  	if (!is_pseudoref_syntax(refname))
>  		return 0;
> +	if (is_special_ref(refname))
> +		return 0;
>  
>  	if (ends_with(refname, "_HEAD")) {
>  		refs_resolve_ref_unsafe(refs, refname,
> diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> index 948f1bb5f4..8c92fbde79 100755
> --- a/t/t6302-for-each-ref-filter.sh
> +++ b/t/t6302-for-each-ref-filter.sh
> @@ -52,6 +52,23 @@ test_expect_success '--include-root-refs pattern prints pseudorefs' '
>  	test_cmp expect actual
>  '
>  
> +test_expect_success '--include-root-refs pattern does not print special refs' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit initial &&
> +		git rev-parse HEAD >.git/MERGE_HEAD &&
> +		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
> +		cat >expect <<-EOF &&
> +		HEAD
> +		$(git symbolic-ref HEAD)
> +		refs/tags/initial
> +		EOF
> +		test_cmp expect actual
> +	)
> +'
> +
>  test_expect_success '--include-root-refs with other patterns' '
>  	cat >expect <<-\EOF &&
>  	HEAD
> -- 
> 2.45.0-rc1
>
Patrick Steinhardt April 30, 2024, 7:29 a.m. UTC | #4
On Mon, Apr 29, 2024 at 05:52:41PM -0500, Justin Tobler wrote:
> On 24/04/29 03:41PM, Patrick Steinhardt wrote:
> > diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
> > index d71b199955..4275918fa0 100644
> > --- a/Documentation/glossary-content.txt
> > +++ b/Documentation/glossary-content.txt
> > @@ -497,20 +497,28 @@ exclude;;
> >  	unusual refs.
> >  
> >  [[def_pseudoref]]pseudoref::
> > -	Pseudorefs are a class of files under `$GIT_DIR` which behave
> > -	like refs for the purposes of rev-parse, but which are treated
> > -	specially by git.  Pseudorefs both have names that are all-caps,
> > -	and always start with a line consisting of a
> > -	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
> > -	pseudoref, because it is sometimes a symbolic ref.  They might
> 
> We remove the example here about HEAD not being a pseudoref. This
> example seems helpful to indicate that a pseudoref cannot be a symbolic
> ref. Is this no longer the case and the change intended?

I just don't see why we would want to have this restriction. Honestly,
the more I think about this whole topic the more I want to go into the
direction I've hinted at in the cover letter: drop "special refs" and
define pseudo refs as either FETCH_HEAD or MERGE_HEAD. Everything else
is just a normal ref, even though some of those may live in the root
directory if they conform to a set of strict rules:

  - All upppercase characters plus underscores.

  - Must end with "_HEAD", except a list of known irregular root refs.

I feel like the world would be better like this.

> > -	optionally contain some additional data.  `MERGE_HEAD` and
> > -	`CHERRY_PICK_HEAD` are examples.  Unlike
> > -	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
> > -	be symbolic refs, and never have reflogs.  They also cannot be
> > -	updated through the normal ref update machinery.  Instead,
> > -	they are updated by directly writing to the files.  However,
> > -	they can be read as if they were refs, so `git rev-parse
> > -	MERGE_HEAD` will work.
> > +	Pseudorefs are references that live in the root of the reference
> > +	hierarchy, outside of the usual "refs/" hierarchy. Pseudorefs have an
> > +	all-uppercase name and must end with a "_HEAD" suffix, for example
> > +	"`BISECT_HEAD`". Other than that, pseudorefs behave the exact same as
> > +	any other reference and can be both read and written via regular Git
> > +	tooling.
> 
> Pseudorefs behaving the same and using the same tooling seems to
> contridict the previous documentation. I assume the previous information
> was out-of-date, but it might be nice to explain this in the message.

Yes, and I actually want to change this. We never enforced restrictions
for pseudorefs anyway, they can be symrefs just fine. And neither would
I see any reason why that should be the case in the first place.

> > ++
> > +<<def_special_ref>,Special refs>> are not pseudorefs.
> > ++
> > +Due to historic reasons, Git has several irregular pseudo refs that do not
> > +follow above rules. The following list of irregular pseudo refs is exhaustive
> 
> We seem to be inconsistent between using "pseudoref" and "pseudo ref".
> Not sure it we want to be consistent here. 

Makes sense.

Patrick

> -Justin
> 
> > +and shall not be extended in the future:
> > +
> > + - "`AUTO_MERGE`"
> > +
> > + - "`BISECT_EXPECTED_REV`"
> > +
> > + - "`NOTES_MERGE_PARTIAL`"
> > +
> > + - "`NOTES_MERGE_REF`"
> > +
> > + - "`MERGE_AUTOSTASH`"
> >  
> >  [[def_pull]]pull::
> >  	Pulling a <<def_branch,branch>> means to <<def_fetch,fetch>> it and
> > diff --git a/refs.c b/refs.c
> > index c64f66bff9..567c6fc6ff 100644
> > --- a/refs.c
> > +++ b/refs.c
> > @@ -905,6 +905,8 @@ int is_pseudoref(struct ref_store *refs, const char *refname)
> >  
> >  	if (!is_pseudoref_syntax(refname))
> >  		return 0;
> > +	if (is_special_ref(refname))
> > +		return 0;
> >  
> >  	if (ends_with(refname, "_HEAD")) {
> >  		refs_resolve_ref_unsafe(refs, refname,
> > diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
> > index 948f1bb5f4..8c92fbde79 100755
> > --- a/t/t6302-for-each-ref-filter.sh
> > +++ b/t/t6302-for-each-ref-filter.sh
> > @@ -52,6 +52,23 @@ test_expect_success '--include-root-refs pattern prints pseudorefs' '
> >  	test_cmp expect actual
> >  '
> >  
> > +test_expect_success '--include-root-refs pattern does not print special refs' '
> > +	test_when_finished "rm -rf repo" &&
> > +	git init repo &&
> > +	(
> > +		cd repo &&
> > +		test_commit initial &&
> > +		git rev-parse HEAD >.git/MERGE_HEAD &&
> > +		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
> > +		cat >expect <<-EOF &&
> > +		HEAD
> > +		$(git symbolic-ref HEAD)
> > +		refs/tags/initial
> > +		EOF
> > +		test_cmp expect actual
> > +	)
> > +'
> > +
> >  test_expect_success '--include-root-refs with other patterns' '
> >  	cat >expect <<-\EOF &&
> >  	HEAD
> > -- 
> > 2.45.0-rc1
> > 
> 
>
Patrick Steinhardt April 30, 2024, 7:30 a.m. UTC | #5
On Mon, Apr 29, 2024 at 04:12:37PM +0100, Phillip Wood wrote:
> Hi Patrick
> 
> On 29/04/2024 14:41, Patrick Steinhardt wrote:
> > diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
> > index d71b199955..4275918fa0 100644
> > --- a/Documentation/glossary-content.txt
> > +++ b/Documentation/glossary-content.txt
> > @@ -497,20 +497,28 @@ exclude;;
> >   	unusual refs.
> >   [[def_pseudoref]]pseudoref::
> > -	Pseudorefs are a class of files under `$GIT_DIR` which behave
> > -	like refs for the purposes of rev-parse, but which are treated
> > -	specially by git.  Pseudorefs both have names that are all-caps,
> > -	and always start with a line consisting of a
> > -	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
> > -	pseudoref, because it is sometimes a symbolic ref.  They might
> > -	optionally contain some additional data.  `MERGE_HEAD` and
> > -	`CHERRY_PICK_HEAD` are examples.  Unlike
> > -	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
> > -	be symbolic refs, and never have reflogs.  They also cannot be
> > -	updated through the normal ref update machinery.  Instead,
> > -	they are updated by directly writing to the files.  However,
> > -	they can be read as if they were refs, so `git rev-parse
> > -	MERGE_HEAD` will work.
> > +	Pseudorefs are references that live in the root of the reference
> > +	hierarchy, outside of the usual "refs/" hierarchy. Pseudorefs have an
> > +	all-uppercase name and must end with a "_HEAD" suffix, for example
> > +	"`BISECT_HEAD`". Other than that, pseudorefs behave the exact same as
> > +	any other reference and can be both read and written via regular Git
> > +	tooling.
> 
> This changes the definition to allow pseudorefs to by symbolic refs. When
> is_pseudoref() was introduced Junio and I had a brief discussion about this
> restriction and he was not in favor of allowing pseudorefs to be symbolic
> refs [1].

So the reason why pseudorefs exist is that some refs behave like a ref
sometimes, but not always. And in my book that really only applies to
MERGE_HEAD and FETCH_HEAD, because those contain additional metadata
that makes them not-a-ref. And for those I very much see that they
should not ever be a symref.

But everyhing else living in the root of the ref hierarchy is not
special in any way, at least not in my opinion. We have never enforced
that those cannot be symrefs, and it makes our terminology needlessly
confusing.

I think I'm going to reroll this patch series and go down the nuclear
path that I've hinted at in the cover letter:

  - Pseudo refs can only be either FETCH_HEAD or MERGE_HEAD.

  - Refs starting with "refs/" are just plain normal refs.

  - Refs living in the root of the ref hierarchy need to conform to a
    set of strict rules, as Peff is starting to enforce in a separate
    patch series. These are just normal refs, as well, even though we
    may call them "root ref" in our tooling as they live in the root of
    the ref hierarchy.

I just don't think that the current state makes sense to anybody. It's
majorly confusing -- I've spent the last 8 months working in our refs
code almost exclusively and still forget what's what. How are our users
expected to understand this?

> Are there any practical implications of the changes in this patch for users
> running commands like "git log FETCH_HEAD" (I can't think of any off the top
> of my head but it would be good to have some reassurance on that point in
> the commit message)

Not really, no. We have never been doing a good job at enforcing the
difference between pseudo refs or normal refs anyway. Pseudo refs can be
symrefs just fine, and our tooling won't complain. The only exception
where I want us to become stricter is in how we enforce the syntax rules
for root refs (which is handled by Peff in a separate patch series), and
that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
They should still resolve when you ask git-rev-parse(1), but when you
iterate through refs they should not be surfaced as they _aren't_ refs.

Patrick
Phillip Wood April 30, 2024, 9:59 a.m. UTC | #6
Hi Patrick

On 30/04/2024 08:30, Patrick Steinhardt wrote:
> On Mon, Apr 29, 2024 at 04:12:37PM +0100, Phillip Wood wrote:
>
>> This changes the definition to allow pseudorefs to by symbolic refs. When
>> is_pseudoref() was introduced Junio and I had a brief discussion about this
>> restriction and he was not in favor of allowing pseudorefs to be symbolic
>> refs [1].
> 
> So the reason why pseudorefs exist is that some refs behave like a ref
> sometimes, but not always. And in my book that really only applies to
> MERGE_HEAD and FETCH_HEAD, because those contain additional metadata
> that makes them not-a-ref. And for those I very much see that they
> should not ever be a symref.
> 
> But everyhing else living in the root of the ref hierarchy is not
> special in any way, at least not in my opinion. We have never enforced
> that those cannot be symrefs, and it makes our terminology needlessly
> confusing.

I agree HEAD not being a pseudoref and having special refs as well as 
pseudorefs refs is confusing. I do have some sympathy for the argument 
that pseudorefs should not be symbolic refs though as AUTO_MERGE, 
CHERRY_PICK_HEAD, ORIG_HEAD etc. are all pointers to a commit and it 
would be a bug for them to be a symbolic ref. It is unfortunate that in 
the move away from assessing those refs as files we lost the check that 
they are not symbolic refs.

> I think I'm going to reroll this patch series and go down the nuclear
> path that I've hinted at in the cover letter:
> 
>    - Pseudo refs can only be either FETCH_HEAD or MERGE_HEAD.
> 
>    - Refs starting with "refs/" are just plain normal refs.
> 
>    - Refs living in the root of the ref hierarchy need to conform to a
>      set of strict rules, as Peff is starting to enforce in a separate
>      patch series. These are just normal refs, as well, even though we
>      may call them "root ref" in our tooling as they live in the root of
>      the ref hierarchy.

That would certainly be simpler.

> I just don't think that the current state makes sense to anybody. It's
> majorly confusing -- I've spent the last 8 months working in our refs
> code almost exclusively and still forget what's what. How are our users
> expected to understand this?

The current state is confusing but arguably there is a logic to the 
various distinctions - whether those distinctions are useful in practice 
is open to debate though. I wonder how much users really care about 
these distinctions and whether it affects their use of git. I was 
unaware of the distinction between HEAD and pseudorefs until I reviewed 
Karthik's for-each-ref series a couple of months ago and I don't think 
that lack of knowledge had caused me any trouble when using git.

>> Are there any practical implications of the changes in this patch for users
>> running commands like "git log FETCH_HEAD" (I can't think of any off the top
>> of my head but it would be good to have some reassurance on that point in
>> the commit message)
> 
> Not really, no. We have never been doing a good job at enforcing the
> difference between pseudo refs or normal refs anyway. Pseudo refs can be
> symrefs just fine, and our tooling won't complain. The only exception
> where I want us to become stricter is in how we enforce the syntax rules
> for root refs (which is handled by Peff in a separate patch series), and
> that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
> They should still resolve when you ask git-rev-parse(1), but when you
> iterate through refs they should not be surfaced as they _aren't_ refs.

That's good

Thanks

Phillip
Jeff King April 30, 2024, 10:23 a.m. UTC | #7
On Tue, Apr 30, 2024 at 09:30:05AM +0200, Patrick Steinhardt wrote:

> So the reason why pseudorefs exist is that some refs behave like a ref
> sometimes, but not always. And in my book that really only applies to
> MERGE_HEAD and FETCH_HEAD, because those contain additional metadata
> that makes them not-a-ref. And for those I very much see that they
> should not ever be a symref.
> 
> But everyhing else living in the root of the ref hierarchy is not
> special in any way, at least not in my opinion. We have never enforced
> that those cannot be symrefs, and it makes our terminology needlessly
> confusing.
> 
> I think I'm going to reroll this patch series and go down the nuclear
> path that I've hinted at in the cover letter:
> 
>   - Pseudo refs can only be either FETCH_HEAD or MERGE_HEAD.
> 
>   - Refs starting with "refs/" are just plain normal refs.
> 
>   - Refs living in the root of the ref hierarchy need to conform to a
>     set of strict rules, as Peff is starting to enforce in a separate
>     patch series. These are just normal refs, as well, even though we
>     may call them "root ref" in our tooling as they live in the root of
>     the ref hierarchy.
> 
> I just don't think that the current state makes sense to anybody. It's
> majorly confusing -- I've spent the last 8 months working in our refs
> code almost exclusively and still forget what's what. How are our users
> expected to understand this?

Yes, I very much agree with your final paragraph. I have been working on
Git for 18 years, and am learning new things about pseudo and special
refs in this thread. ;) (Admittedly, I think that distinction is new in
the past few months).

I think the "everything is a ref, even at the root" is the simplest
thing for users. And the only rules they need to know are the syntactic
ones: names start with "refs/" or are all-caps and underscore. But I do
not see the value in them caring that HEAD can be a symref or that
MERGE_HEAD cannot (nor the value in the code making such a distinction).

My series does not enforce the "_HEAD" suffix (plus special cases) as a
syntactic rule, but we could do that easily on top. That would help
protect case-insensitive filesystems from the same shenanigans that my
series aims for (e.g., "CONFIG" on such a system will still look at the
"config" file).

It is unfortunate to me that we even need to call out FETCH_HEAD and
MERGE_HEAD. I know they are special within Git, and probably ref
backends need to be aware (because they have to be able to carry extra
data). But from a user's perspective they resolve in the normal way
(unless you are trying to look at them in their special non-ref way).
I guess the user must care that they will always be in the filesystem in
order to access them in that special way, though.

> > Are there any practical implications of the changes in this patch for users
> > running commands like "git log FETCH_HEAD" (I can't think of any off the top
> > of my head but it would be good to have some reassurance on that point in
> > the commit message)
> 
> Not really, no. We have never been doing a good job at enforcing the
> difference between pseudo refs or normal refs anyway. Pseudo refs can be
> symrefs just fine, and our tooling won't complain. The only exception
> where I want us to become stricter is in how we enforce the syntax rules
> for root refs (which is handled by Peff in a separate patch series), and
> that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
> They should still resolve when you ask git-rev-parse(1), but when you
> iterate through refs they should not be surfaced as they _aren't_ refs.

I actually would not even mind if they are surfaced when iterating with
--include-root-refs. But then I am a little skeptical of the purpose of
that feature in the first place. After all, the reason code shoves stuff
into .git/FOO_HEAD is precisely because we don't want other stuff
iterating over them, using them for reachability, and so on. That is why
"--all" does not include them, for example.

But I did not follow the development of the feature, so maybe I am
missing some cool use case.

-Peff
karthik nayak April 30, 2024, 12:07 p.m. UTC | #8
Jeff King <peff@peff.net> writes:

> On Tue, Apr 30, 2024 at 09:30:05AM +0200, Patrick Steinhardt wrote:
>
>> So the reason why pseudorefs exist is that some refs behave like a ref
>> sometimes, but not always. And in my book that really only applies to
>> MERGE_HEAD and FETCH_HEAD, because those contain additional metadata
>> that makes them not-a-ref. And for those I very much see that they
>> should not ever be a symref.
>>
>> But everyhing else living in the root of the ref hierarchy is not
>> special in any way, at least not in my opinion. We have never enforced
>> that those cannot be symrefs, and it makes our terminology needlessly
>> confusing.
>>
>> I think I'm going to reroll this patch series and go down the nuclear
>> path that I've hinted at in the cover letter:
>>
>>   - Pseudo refs can only be either FETCH_HEAD or MERGE_HEAD.
>>
>>   - Refs starting with "refs/" are just plain normal refs.
>>
>>   - Refs living in the root of the ref hierarchy need to conform to a
>>     set of strict rules, as Peff is starting to enforce in a separate
>>     patch series. These are just normal refs, as well, even though we
>>     may call them "root ref" in our tooling as they live in the root of
>>     the ref hierarchy.
>>
>> I just don't think that the current state makes sense to anybody. It's
>> majorly confusing -- I've spent the last 8 months working in our refs
>> code almost exclusively and still forget what's what. How are our users
>> expected to understand this?
>
> Yes, I very much agree with your final paragraph. I have been working on
> Git for 18 years, and am learning new things about pseudo and special
> refs in this thread. ;) (Admittedly, I think that distinction is new in
> the past few months).
>
> I think the "everything is a ref, even at the root" is the simplest
> thing for users. And the only rules they need to know are the syntactic
> ones: names start with "refs/" or are all-caps and underscore. But I do
> not see the value in them caring that HEAD can be a symref or that
> MERGE_HEAD cannot (nor the value in the code making such a distinction).
>
> My series does not enforce the "_HEAD" suffix (plus special cases) as a
> syntactic rule, but we could do that easily on top. That would help
> protect case-insensitive filesystems from the same shenanigans that my
> series aims for (e.g., "CONFIG" on such a system will still look at the
> "config" file).
>
> It is unfortunate to me that we even need to call out FETCH_HEAD and
> MERGE_HEAD. I know they are special within Git, and probably ref
> backends need to be aware (because they have to be able to carry extra
> data). But from a user's perspective they resolve in the normal way
> (unless you are trying to look at them in their special non-ref way).
> I guess the user must care that they will always be in the filesystem in
> order to access them in that special way, though.
>
>> > Are there any practical implications of the changes in this patch for users
>> > running commands like "git log FETCH_HEAD" (I can't think of any off the top
>> > of my head but it would be good to have some reassurance on that point in
>> > the commit message)
>>
>> Not really, no. We have never been doing a good job at enforcing the
>> difference between pseudo refs or normal refs anyway. Pseudo refs can be
>> symrefs just fine, and our tooling won't complain. The only exception
>> where I want us to become stricter is in how we enforce the syntax rules
>> for root refs (which is handled by Peff in a separate patch series), and
>> that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
>> They should still resolve when you ask git-rev-parse(1), but when you
>> iterate through refs they should not be surfaced as they _aren't_ refs.
>
> I actually would not even mind if they are surfaced when iterating with
> --include-root-refs. But then I am a little skeptical of the purpose of
> that feature in the first place. After all, the reason code shoves stuff
> into .git/FOO_HEAD is precisely because we don't want other stuff
> iterating over them, using them for reachability, and so on. That is why
> "--all" does not include them, for example.
>
> But I did not follow the development of the feature, so maybe I am
> missing some cool use case.
>

The use case was to allow us to look at these refs when working with
the reftable backend. Currently there is no way to do that, with the
files backend, well you could just read the files. So mostly a debugging
usecase.
Patrick Steinhardt April 30, 2024, 12:11 p.m. UTC | #9
On Tue, Apr 30, 2024 at 10:59:36AM +0100, Phillip Wood wrote:
> Hi Patrick
> 
> On 30/04/2024 08:30, Patrick Steinhardt wrote:
> > On Mon, Apr 29, 2024 at 04:12:37PM +0100, Phillip Wood wrote:
> > 
> > > This changes the definition to allow pseudorefs to by symbolic refs. When
> > > is_pseudoref() was introduced Junio and I had a brief discussion about this
> > > restriction and he was not in favor of allowing pseudorefs to be symbolic
> > > refs [1].
> > 
> > So the reason why pseudorefs exist is that some refs behave like a ref
> > sometimes, but not always. And in my book that really only applies to
> > MERGE_HEAD and FETCH_HEAD, because those contain additional metadata
> > that makes them not-a-ref. And for those I very much see that they
> > should not ever be a symref.
> > 
> > But everyhing else living in the root of the ref hierarchy is not
> > special in any way, at least not in my opinion. We have never enforced
> > that those cannot be symrefs, and it makes our terminology needlessly
> > confusing.
> 
> I agree HEAD not being a pseudoref and having special refs as well as
> pseudorefs refs is confusing. I do have some sympathy for the argument that
> pseudorefs should not be symbolic refs though as AUTO_MERGE,
> CHERRY_PICK_HEAD, ORIG_HEAD etc. are all pointers to a commit and it would
> be a bug for them to be a symbolic ref. It is unfortunate that in the move
> away from assessing those refs as files we lost the check that they are not
> symbolic refs.

While I agree that conceptually these should always be "regular" refs, I
feel like that is higher-level logic that belongs into the respective
subsystems that write those. I just don't see why the ref backend should
care about the particular usecases that those higher-level subsystems
have, and I can very much see that there might eventually be another
subsystem that actually wants a specific ref to be a symref.

No we could of course start to hard code all kinds of refs into the ref
layer. But I think that this is the wrong way to go, and treating the
ref store as just that, a generic store where you can store refs,
without attaching specific meaning to any of the refs, is the proper way
to go.

> > I think I'm going to reroll this patch series and go down the nuclear
> > path that I've hinted at in the cover letter:
> > 
> >    - Pseudo refs can only be either FETCH_HEAD or MERGE_HEAD.
> > 
> >    - Refs starting with "refs/" are just plain normal refs.
> > 
> >    - Refs living in the root of the ref hierarchy need to conform to a
> >      set of strict rules, as Peff is starting to enforce in a separate
> >      patch series. These are just normal refs, as well, even though we
> >      may call them "root ref" in our tooling as they live in the root of
> >      the ref hierarchy.
> 
> That would certainly be simpler.
> 
> > I just don't think that the current state makes sense to anybody. It's
> > majorly confusing -- I've spent the last 8 months working in our refs
> > code almost exclusively and still forget what's what. How are our users
> > expected to understand this?
> 
> The current state is confusing but arguably there is a logic to the various
> distinctions - whether those distinctions are useful in practice is open to
> debate though. I wonder how much users really care about these distinctions
> and whether it affects their use of git. I was unaware of the distinction
> between HEAD and pseudorefs until I reviewed Karthik's for-each-ref series a
> couple of months ago and I don't think that lack of knowledge had caused me
> any trouble when using git.

There is some logic, that's true enough. I just don't think that anybody
understands the logic.

Patrick

> > > Are there any practical implications of the changes in this patch for users
> > > running commands like "git log FETCH_HEAD" (I can't think of any off the top
> > > of my head but it would be good to have some reassurance on that point in
> > > the commit message)
> > 
> > Not really, no. We have never been doing a good job at enforcing the
> > difference between pseudo refs or normal refs anyway. Pseudo refs can be
> > symrefs just fine, and our tooling won't complain. The only exception
> > where I want us to become stricter is in how we enforce the syntax rules
> > for root refs (which is handled by Peff in a separate patch series), and
> > that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
> > They should still resolve when you ask git-rev-parse(1), but when you
> > iterate through refs they should not be surfaced as they _aren't_ refs.
> 
> That's good
> 
> Thanks
> 
> Phillip
>
Patrick Steinhardt April 30, 2024, 12:16 p.m. UTC | #10
On Tue, Apr 30, 2024 at 06:23:10AM -0400, Jeff King wrote:
> On Tue, Apr 30, 2024 at 09:30:05AM +0200, Patrick Steinhardt wrote:
[snip]
> > > Are there any practical implications of the changes in this patch for users
> > > running commands like "git log FETCH_HEAD" (I can't think of any off the top
> > > of my head but it would be good to have some reassurance on that point in
> > > the commit message)
> > 
> > Not really, no. We have never been doing a good job at enforcing the
> > difference between pseudo refs or normal refs anyway. Pseudo refs can be
> > symrefs just fine, and our tooling won't complain. The only exception
> > where I want us to become stricter is in how we enforce the syntax rules
> > for root refs (which is handled by Peff in a separate patch series), and
> > that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
> > They should still resolve when you ask git-rev-parse(1), but when you
> > iterate through refs they should not be surfaced as they _aren't_ refs.
> 
> I actually would not even mind if they are surfaced when iterating with
> --include-root-refs. But then I am a little skeptical of the purpose of
> that feature in the first place. After all, the reason code shoves stuff
> into .git/FOO_HEAD is precisely because we don't want other stuff
> iterating over them, using them for reachability, and so on. That is why
> "--all" does not include them, for example.
> 
> But I did not follow the development of the feature, so maybe I am
> missing some cool use case.

The thing is that once we start to surface pseudorefs (in the sense of
these _really_ aren't refs) in ref-related tooling, users will want to
treat them as a ref, as well. And that's just bound to happen with
plumbing like `git for-each-ref`, where a user may rightfully expect
that all output here can be treated like a normal ref.

In fact though, I want to double down on restrictions regarding the
pseudorefs FETCH_HEAD and MERGE_HEAD. While it's fair enough that those
can be read like a ref, writing to them is a totally different thing. It
does not make any sense to try and write such refs, and our abstractions
aren't even prepared to write them correctly. They go through the ref
backend, and thus the "reftable" backend would write them into the
reftable stack instead of into the filesystem. Now you could argue that
this should be fixed, but I don't think it is reasonable to expect the
reftable backend to start writing loose refs for those pseudorefs.

So I'd really like to stick with the current explanation that we have in
the "special ref" glossary: pseudorefs must be written via the
filesystem and can't ever go through the ref backends.

Patrick
Patrick Steinhardt April 30, 2024, 12:33 p.m. UTC | #11
On Tue, Apr 30, 2024 at 05:07:19AM -0700, Karthik Nayak wrote:
> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Apr 30, 2024 at 09:30:05AM +0200, Patrick Steinhardt wrote:
[snip]
> >> > Are there any practical implications of the changes in this patch for users
> >> > running commands like "git log FETCH_HEAD" (I can't think of any off the top
> >> > of my head but it would be good to have some reassurance on that point in
> >> > the commit message)
> >>
> >> Not really, no. We have never been doing a good job at enforcing the
> >> difference between pseudo refs or normal refs anyway. Pseudo refs can be
> >> symrefs just fine, and our tooling won't complain. The only exception
> >> where I want us to become stricter is in how we enforce the syntax rules
> >> for root refs (which is handled by Peff in a separate patch series), and
> >> that we start to not treat FETCH_HEAD and MERGE_HEAD as proper refs.
> >> They should still resolve when you ask git-rev-parse(1), but when you
> >> iterate through refs they should not be surfaced as they _aren't_ refs.
> >
> > I actually would not even mind if they are surfaced when iterating with
> > --include-root-refs. But then I am a little skeptical of the purpose of
> > that feature in the first place. After all, the reason code shoves stuff
> > into .git/FOO_HEAD is precisely because we don't want other stuff
> > iterating over them, using them for reachability, and so on. That is why
> > "--all" does not include them, for example.
> >
> > But I did not follow the development of the feature, so maybe I am
> > missing some cool use case.
> >
> 
> The use case was to allow us to look at these refs when working with
> the reftable backend. Currently there is no way to do that, with the
> files backend, well you could just read the files. So mostly a debugging
> usecase.

That's true for normal root refs, only, though. The pseudorefs (current
special refs) can still be surfaced even if the for-each-ref machinery
doesn't surface them because by definition, they always live in the
filesystem.

Patrick
Jean-Noël Avila May 9, 2024, 5:29 p.m. UTC | #12
Hello,

On Monday, 29 April 2024 15:41:28 CEST Patrick Steinhardt wrote:
> ---
>  Documentation/glossary-content.txt | 36 ++++++++++++++++++------------
>  refs.c                             |  2 ++
>  t/t6302-for-each-ref-filter.sh     | 17 ++++++++++++++
>  3 files changed, 41 insertions(+), 14 deletions(-)
> 
> diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-
content.txt
> index d71b199955..4275918fa0 100644
> --- a/Documentation/glossary-content.txt
> +++ b/Documentation/glossary-content.txt
> @@ -497,20 +497,28 @@ exclude;;
...
> +
> + - "`AUTO_MERGE`"
> +
> + - "`BISECT_EXPECTED_REV`"
> +
> + - "`NOTES_MERGE_PARTIAL`"
> +
> + - "`NOTES_MERGE_REF`"
> +
> + - "`MERGE_AUTOSTASH`"

Quoting the names seems overkill here, as they are already formatted with the 
back-quotes. The rendering as bold or monospace is enough.

Regards,

Jean-Noël
Patrick Steinhardt May 10, 2024, 8:33 a.m. UTC | #13
On Thu, May 09, 2024 at 07:29:18PM +0200, Jean-Noël AVILA wrote:
> Hello,
> 
> On Monday, 29 April 2024 15:41:28 CEST Patrick Steinhardt wrote:
> > ---
> >  Documentation/glossary-content.txt | 36 ++++++++++++++++++------------
> >  refs.c                             |  2 ++
> >  t/t6302-for-each-ref-filter.sh     | 17 ++++++++++++++
> >  3 files changed, 41 insertions(+), 14 deletions(-)
> > 
> > diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-
> content.txt
> > index d71b199955..4275918fa0 100644
> > --- a/Documentation/glossary-content.txt
> > +++ b/Documentation/glossary-content.txt
> > @@ -497,20 +497,28 @@ exclude;;
> ...
> > +
> > + - "`AUTO_MERGE`"
> > +
> > + - "`BISECT_EXPECTED_REV`"
> > +
> > + - "`NOTES_MERGE_PARTIAL`"
> > +
> > + - "`NOTES_MERGE_REF`"
> > +
> > + - "`MERGE_AUTOSTASH`"
> 
> Quoting the names seems overkill here, as they are already formatted with the 
> back-quotes. The rendering as bold or monospace is enough.
> 
> Regards,
> 
> Jean-Noël

These don't exist in later versions of this patch series anymore. But
there are other cases where I did the same in the current version, so
let me drop that.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
index d71b199955..4275918fa0 100644
--- a/Documentation/glossary-content.txt
+++ b/Documentation/glossary-content.txt
@@ -497,20 +497,28 @@  exclude;;
 	unusual refs.
 
 [[def_pseudoref]]pseudoref::
-	Pseudorefs are a class of files under `$GIT_DIR` which behave
-	like refs for the purposes of rev-parse, but which are treated
-	specially by git.  Pseudorefs both have names that are all-caps,
-	and always start with a line consisting of a
-	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
-	pseudoref, because it is sometimes a symbolic ref.  They might
-	optionally contain some additional data.  `MERGE_HEAD` and
-	`CHERRY_PICK_HEAD` are examples.  Unlike
-	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
-	be symbolic refs, and never have reflogs.  They also cannot be
-	updated through the normal ref update machinery.  Instead,
-	they are updated by directly writing to the files.  However,
-	they can be read as if they were refs, so `git rev-parse
-	MERGE_HEAD` will work.
+	Pseudorefs are references that live in the root of the reference
+	hierarchy, outside of the usual "refs/" hierarchy. Pseudorefs have an
+	all-uppercase name and must end with a "_HEAD" suffix, for example
+	"`BISECT_HEAD`". Other than that, pseudorefs behave the exact same as
+	any other reference and can be both read and written via regular Git
+	tooling.
++
+<<def_special_ref>,Special refs>> are not pseudorefs.
++
+Due to historic reasons, Git has several irregular pseudo refs that do not
+follow above rules. The following list of irregular pseudo refs is exhaustive
+and shall not be extended in the future:
+
+ - "`AUTO_MERGE`"
+
+ - "`BISECT_EXPECTED_REV`"
+
+ - "`NOTES_MERGE_PARTIAL`"
+
+ - "`NOTES_MERGE_REF`"
+
+ - "`MERGE_AUTOSTASH`"
 
 [[def_pull]]pull::
 	Pulling a <<def_branch,branch>> means to <<def_fetch,fetch>> it and
diff --git a/refs.c b/refs.c
index c64f66bff9..567c6fc6ff 100644
--- a/refs.c
+++ b/refs.c
@@ -905,6 +905,8 @@  int is_pseudoref(struct ref_store *refs, const char *refname)
 
 	if (!is_pseudoref_syntax(refname))
 		return 0;
+	if (is_special_ref(refname))
+		return 0;
 
 	if (ends_with(refname, "_HEAD")) {
 		refs_resolve_ref_unsafe(refs, refname,
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 948f1bb5f4..8c92fbde79 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -52,6 +52,23 @@  test_expect_success '--include-root-refs pattern prints pseudorefs' '
 	test_cmp expect actual
 '
 
+test_expect_success '--include-root-refs pattern does not print special refs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		git rev-parse HEAD >.git/MERGE_HEAD &&
+		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
+		cat >expect <<-EOF &&
+		HEAD
+		$(git symbolic-ref HEAD)
+		refs/tags/initial
+		EOF
+		test_cmp expect actual
+	)
+'
+
 test_expect_success '--include-root-refs with other patterns' '
 	cat >expect <<-\EOF &&
 	HEAD