mbox series

[0/5] make room for "special ref"

Message ID 20231215203245.3622299-1-gitster@pobox.com (mailing list archive)
Headers show
Series make room for "special ref" | expand

Message

Junio C Hamano Dec. 15, 2023, 8:32 p.m. UTC
Patrick's reftable work is progressing nicely and wants to establish
"special ref" as a phrase with some defined meaning that is somewhat
different from a mere "pseudo ref".

A pseudo ref is merely a normal ref with a funny naming convention,
i.e., being outside the refs/ hierarchy and has names with all
uppercase letters (or an underscore).  But there truly are refs that
are more than that.  For example, FETCH_HEAD currently stores not
just a single object name, but can and is used to store multiple
object names, each with annotations to record where they came from.
There indeed may be a need to introduce a new term to refer to such
"special refs".

Existing documentation, however, uses "special ref" to refer to
pseudo refs without any "special" property, like FETCH_HEAD does.

This series merely corrects such existing uses of the word, to make
room for Patrick's series to introduce (and formally define in the
glossary) "special refs".

Junio C Hamano (5):
  git.txt: HEAD is not that special
  git-bisect.txt: BISECT_HEAD is not that special
  refs.h: HEAD is not that special
  docs: AUTO_MERGE is not that special
  docs: MERGE_AUTOSTASH is not that special

 Documentation/git-bisect.txt    | 2 +-
 Documentation/git-diff.txt      | 2 +-
 Documentation/git-merge.txt     | 2 +-
 Documentation/git.txt           | 7 ++++---
 Documentation/merge-options.txt | 2 +-
 Documentation/user-manual.txt   | 2 +-
 refs.h                          | 2 +-
 7 files changed, 10 insertions(+), 9 deletions(-)

Comments

Junio C Hamano Dec. 15, 2023, 9:21 p.m. UTC | #1
Junio C Hamano <gitster@pobox.com> writes:

> ...  For example, FETCH_HEAD currently stores not
> just a single object name, but can and is used to store multiple
> object names, each with annotations to record where they came from.
> There indeed may be a need to introduce a new term to refer to such
> "special refs".

The "may be" here vaguely hints another possibility.  If we manage
to get rid of the "special refs", we do not even have to mention
"special refs", and more importantly, we do not need extra code to
deal with them.

For FETCH_HEAD, for example, I wonder if an update along this line
is possible:

 * Teach "git fetch" to store what it writes to FETCH_HEAD to a
   different file, under a distinctly different filename (e.g.,
   $GIT_DIR/fetched-tips).  Demote FETCH_HEAD to a pseudoref, and
   store the first object name in that "fetched-tips" file to it.

 * Teach "git pull" to learn what it used to learn from FETCH_HEAD
   (i.e., list of fetched tips, each annotated with what ref at what
   repository it came from and if it is to be merged) from the new
   "fetched-tips" file.

The "special" ness of FETCH_HEAD is really an implementation detail
of how "git pull" works and how the findings of "git fetch" are
communicated to "git pull".  The general refs API should not have to
worry about it, and the refs backends should not have to worry about
storing more than just an object name (or if it is a symbolic ref,
the target refname).

An end-user command like "git log ORIG_HEAD..FETCH_HEAD" would not
be affected by changes along the above line, because the current
FETCH_HEAD, when used as a revision, will work as if it stores the
single object name that is listed first in the file.

If somebody is reading FETCH_HEAD and acting on its contents (rather
than merely consuming it as a ref of the first object), perhaps
feeding it to "git fmt-merge-msg", they will be broken by such a
change (indeed, our own "git pull" will be broken by the change to
"git fetch", and the second bullet point above is about fixing the
exact fallout from it), but I am not sure if that is a use case worth
worrying about.

Hmm?
Ramsay Jones Dec. 15, 2023, 10:44 p.m. UTC | #2
On 15/12/2023 21:21, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
>> ...  For example, FETCH_HEAD currently stores not
>> just a single object name, but can and is used to store multiple
>> object names, each with annotations to record where they came from.
>> There indeed may be a need to introduce a new term to refer to such
>> "special refs".
> 
> The "may be" here vaguely hints another possibility.  If we manage
> to get rid of the "special refs", we do not even have to mention
> "special refs", and more importantly, we do not need extra code to
> deal with them.
> 
> For FETCH_HEAD, for example, I wonder if an update along this line
> is possible:
> 
>  * Teach "git fetch" to store what it writes to FETCH_HEAD to a
>    different file, under a distinctly different filename (e.g.,
>    $GIT_DIR/fetched-tips).  Demote FETCH_HEAD to a pseudoref, and
>    store the first object name in that "fetched-tips" file to it.
> 
>  * Teach "git pull" to learn what it used to learn from FETCH_HEAD
>    (i.e., list of fetched tips, each annotated with what ref at what
>    repository it came from and if it is to be merged) from the new
>    "fetched-tips" file.
> 
> The "special" ness of FETCH_HEAD is really an implementation detail
> of how "git pull" works and how the findings of "git fetch" are
> communicated to "git pull".  The general refs API should not have to
> worry about it, and the refs backends should not have to worry about
> storing more than just an object name (or if it is a symbolic ref,
> the target refname).
> 
> An end-user command like "git log ORIG_HEAD..FETCH_HEAD" would not
> be affected by changes along the above line, because the current
> FETCH_HEAD, when used as a revision, will work as if it stores the
> single object name that is listed first in the file.
> 
> If somebody is reading FETCH_HEAD and acting on its contents (rather
> than merely consuming it as a ref of the first object), perhaps
> feeding it to "git fmt-merge-msg", they will be broken by such a
> change (indeed, our own "git pull" will be broken by the change to
> "git fetch", and the second bullet point above is about fixing the
> exact fallout from it), but I am not sure if that is a use case worth
> worrying about.
> 
> Hmm?
> 

Yes, I was going to suggest exactly this, after Patrick pointed out
that there were only two 'special psuedo-refs' (I had a vague feeling
there were some more than that) FETCH_HEAD and MERGE_HEAD.

ATB,
Ramsay Jones
Junio C Hamano Dec. 16, 2023, 12:44 a.m. UTC | #3
Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

> Yes, I was going to suggest exactly this, after Patrick pointed out
> that there were only two 'special psuedo-refs' (I had a vague feeling
> there were some more than that) FETCH_HEAD and MERGE_HEAD.

Glad to see that I am not alone.  We should be able to treat
MERGE_HEAD similarly.  It is used to communicate the list of "other
parents" from "git merge" that stops in the middle (either for merge
conflict, or in response to the "--no-commit" command line option)
to "git commit" that concludes such an unfinished merge.  Many
commands merely use the presence of MERGE_HEAD as a sign that a
merge is in progress (e.g. "git status"), which would not break if
we just started to record the first parent in a pseudoref MERGE_HEAD
and wrote the other octopus parents elsewhere, but some commands do
need all these parents from MERGE_HEAD (e.g. "git blame" that
synthesizes a fake starting commit out of the working tree state).

If we cannot get rid of all "special refs" anyway, however, I think
there is little that we can gain from doing such "make FETCH_HEAD
and MERGE_HEAD into a single-object pseudoref, and write other info
in separate files" exercise.  We can treat the current FETCH_HEAD
and MERGE_HEAD as "file that is not and is more than a ref", which
is what the current code is doing anyway, which means we would
declare that they have to stay to be files under $GIT_DIR/ and will
be accessed via the filesystem access.  At that point, calling them
"special ref" might even be more misleading than its worth and we
may be better off to admit that they are not even refs but a
datafile some commands can use to obtain input from, but the phrase
we use to refer to them, be it "special ref" or some random
datafile, does not make a fundamental change on anything.
Andy Koppe Dec. 16, 2023, 10:20 a.m. UTC | #4
On 15/12/2023 22:44, Ramsay Jones wrote:
> On 15/12/2023 21:21, Junio C Hamano wrote:

>> If somebody is reading FETCH_HEAD and acting on its contents (rather
>> than merely consuming it as a ref of the first object), perhaps
>> feeding it to "git fmt-merge-msg", they will be broken by such a
>> change (indeed, our own "git pull" will be broken by the change to
>> "git fetch", and the second bullet point above is about fixing the
>> exact fallout from it), but I am not sure if that is a use case worth
>> worrying about.
> 
> Yes, I was going to suggest exactly this, after Patrick pointed out
> that there were only two 'special psuedo-refs' (I had a vague feeling
> there were some more than that) FETCH_HEAD and MERGE_HEAD.

According to the pseudoref entry of gitglossary, CHERRY_PICK_HEAD also 
stores additional data (which would imply that REVERT_HEAD does too).
Looking at CHERRY_PICK_HEAD during a pick though, I only see a single 
hash, even when picking multiple commits.

Regards,
Andy
Andy Koppe Dec. 16, 2023, 10:56 a.m. UTC | #5
On 15/12/2023 20:32, Junio C Hamano wrote:
> A pseudo ref is merely a normal ref with a funny naming convention,
> i.e., being outside the refs/ hierarchy and has names with all
> uppercase letters (or an underscore).

I know what you mean, but gitglossary defines pseudorefs as separate 
from refs, albeit behaving like refs. Their name itself implies the same.

Although the 'ref' entry then goes on to say that "there are a few 
special-purpose refs that do not begin with 'refs/', the most notable 
example being HEAD."

That implies that at least some of the pseudorefs are refs after all, 
while keeping in mind that "HEAD is not a pseudoref,  because it is 
sometimes a symbolic ref" according to the 'pseudoref' entry.

I think a clearer answer on whether pseudorefs are refs is needed, or at 
least a better-defined fudge, such as "pseudorefs are refs except when ...".

Defining everything under "refs/" as refs, and the stuff outside it 
including HEAD itself as pseudorefs, would draw clearer lines. The fact 
HEAD is usually symbolic doesn't seem all that relevant from the 
perspective of a user trying to get a grasp of refs and pseudorefs.

Regards,
Andy
Patrick Steinhardt Dec. 18, 2023, 8:24 a.m. UTC | #6
On Sat, Dec 16, 2023 at 10:20:09AM +0000, Andy Koppe wrote:
> On 15/12/2023 22:44, Ramsay Jones wrote:
> > On 15/12/2023 21:21, Junio C Hamano wrote:
> 
> > > If somebody is reading FETCH_HEAD and acting on its contents (rather
> > > than merely consuming it as a ref of the first object), perhaps
> > > feeding it to "git fmt-merge-msg", they will be broken by such a
> > > change (indeed, our own "git pull" will be broken by the change to
> > > "git fetch", and the second bullet point above is about fixing the
> > > exact fallout from it), but I am not sure if that is a use case worth
> > > worrying about.
> > 
> > Yes, I was going to suggest exactly this, after Patrick pointed out
> > that there were only two 'special psuedo-refs' (I had a vague feeling
> > there were some more than that) FETCH_HEAD and MERGE_HEAD.
> 
> According to the pseudoref entry of gitglossary, CHERRY_PICK_HEAD also
> stores additional data (which would imply that REVERT_HEAD does too).
> Looking at CHERRY_PICK_HEAD during a pick though, I only see a single hash,
> even when picking multiple commits.

Both CHERRY_PICK_HEAD and REVERT_HEAD are only ever updated via the refs
API, so neither of them ever contains anything other than a normal ref.
I guess we should update the glossary accordingly.

Patrick
Patrick Steinhardt Dec. 18, 2023, 8:41 a.m. UTC | #7
On Fri, Dec 15, 2023 at 04:44:47PM -0800, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
> > Yes, I was going to suggest exactly this, after Patrick pointed out
> > that there were only two 'special psuedo-refs' (I had a vague feeling
> > there were some more than that) FETCH_HEAD and MERGE_HEAD.

I don't think there are more special refs than those two. Andy pointed
out CHERRY_PICK_HEAD and REVERT_HEAD, but both of them actually get
accessed via the ref backend exclusively and thus cannot be special in
any way. Also, the test suite of Git passes with only those two refs
marked as special refs with the reftable backend, which is another good
indicator that I didn't miss anything here because we definitely can't
store special information in the reftable backend.

It's of course still possible that our test suite has a blind spot and
that I missed any special refs. If so, I would love to hear about them.

> Glad to see that I am not alone.  We should be able to treat
> MERGE_HEAD similarly.  It is used to communicate the list of "other
> parents" from "git merge" that stops in the middle (either for merge
> conflict, or in response to the "--no-commit" command line option)
> to "git commit" that concludes such an unfinished merge.  Many
> commands merely use the presence of MERGE_HEAD as a sign that a
> merge is in progress (e.g. "git status"), which would not break if
> we just started to record the first parent in a pseudoref MERGE_HEAD
> and wrote the other octopus parents elsewhere, but some commands do
> need all these parents from MERGE_HEAD (e.g. "git blame" that
> synthesizes a fake starting commit out of the working tree state).

I would certainly love to drop the "specialness" of both FETCH_HEAD and
MERGE_HEAD, but I am a bit pessimistic about whether we really can. The
format of those refs has been around for quite a long time already, and
I do expect that there is tooling out there that parses those files.

I would claim that it's especially likely that FETCH_HEAD is getting
parsed by external tools. Historically, there has not been a way to
really figure out which refs have been updated in git-fetch(1). So any
scripts that perform a fetch and want to learn about what was updated
would very likely resort to parsing FETCH_HEAD. This has changed a bit
with the introduction of the machine-parsable interface of git-fetch(1),
but it has only been introduced rather recently with Git v2.42.

> If we cannot get rid of all "special refs" anyway, however, I think
> there is little that we can gain from doing such "make FETCH_HEAD
> and MERGE_HEAD into a single-object pseudoref, and write other info
> in separate files" exercise.  We can treat the current FETCH_HEAD
> and MERGE_HEAD as "file that is not and is more than a ref", which
> is what the current code is doing anyway, which means we would
> declare that they have to stay to be files under $GIT_DIR/ and will
> be accessed via the filesystem access.

I'd like for it to be otherwise, but I think this is the only sensible
thing to do. I think it was a mistake to introduce those special refs
like this and treat them almost like a real ref, but that's always easy
to say in hindsight.

> At that point, calling them "special ref" might even be more
> misleading than its worth and we may be better off to admit that they
> are not even refs but a datafile some commands can use to obtain input
> from, but the phrase we use to refer to them, be it "special ref" or
> some random datafile, does not make a fundamental change on anything.

Well, the problem is that these do indeed behave like a ref for most of
the part: you can ask for them via git-rev-parse(1) and we'll resolve
them just fine, even though we only ever return the first object ID. So
even though I'm not a huge fan of calling them "special ref", I think we
should at least highlight the reflike-nature in whatever we want to call
them.

Patrick
Patrick Steinhardt Dec. 18, 2023, 8:56 a.m. UTC | #8
On Fri, Dec 15, 2023 at 12:32:40PM -0800, Junio C Hamano wrote:
> Patrick's reftable work is progressing nicely and wants to establish
> "special ref" as a phrase with some defined meaning that is somewhat
> different from a mere "pseudo ref".
> 
> A pseudo ref is merely a normal ref with a funny naming convention,
> i.e., being outside the refs/ hierarchy and has names with all
> uppercase letters (or an underscore).  But there truly are refs that
> are more than that.  For example, FETCH_HEAD currently stores not
> just a single object name, but can and is used to store multiple
> object names, each with annotations to record where they came from.
> There indeed may be a need to introduce a new term to refer to such
> "special refs".
> 
> Existing documentation, however, uses "special ref" to refer to
> pseudo refs without any "special" property, like FETCH_HEAD does.
> 
> This series merely corrects such existing uses of the word, to make
> room for Patrick's series to introduce (and formally define in the
> glossary) "special refs".

Thanks for helping out with this effort and kicking off the discussion,
I highly appreciate it!

Patrick

> Junio C Hamano (5):
>   git.txt: HEAD is not that special
>   git-bisect.txt: BISECT_HEAD is not that special
>   refs.h: HEAD is not that special
>   docs: AUTO_MERGE is not that special
>   docs: MERGE_AUTOSTASH is not that special
> 
>  Documentation/git-bisect.txt    | 2 +-
>  Documentation/git-diff.txt      | 2 +-
>  Documentation/git-merge.txt     | 2 +-
>  Documentation/git.txt           | 7 ++++---
>  Documentation/merge-options.txt | 2 +-
>  Documentation/user-manual.txt   | 2 +-
>  refs.h                          | 2 +-
>  7 files changed, 10 insertions(+), 9 deletions(-)
> 
> -- 
> 2.43.0-76-g1a87c842ec
>