glossary: describe "worktree"

Message ID	xmqqczjvxy3o.fsf@gitster.g (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Junio C Hamano <gitster@pobox.com> To: git@vger.kernel.org Cc: Derrick Stolee <stolee@gmail.com>, Elijah Newren <newren@gmail.com> Subject: [PATCH] glossary: describe "worktree" Date: Wed, 09 Feb 2022 18:19:07 -0800 Message-ID: <xmqqczjvxy3o.fsf@gitster.g> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	glossary: describe "worktree" \| expand glossary: describe "worktree"

Junio C Hamano Feb. 10, 2022, 2:19 a.m. UTC

We have description on "per worktree ref", but "worktree" is not
described in the glossary.  We do have "working tree", though.

Casually put, a "working tree" is what your editor and compiler
interacts with.  "worktree" is a mechanism to allow one or more
"working tree"s to be attached to a repository and used to check out
different commits and branches independently, which includes not
just a "working tree" but also repository metadata like HEAD, the
index to support simultaneous use of them.  Historically, we used
these terms interchangeably but we have been trying to use "working
tree" when we mean it, instead of "worktree".

Most of the existing references to "working tree" in the glossary do
refer primarily to the working tree portion, except for one that
said refs like HEAD and refs/bisect/* are per "working tree", but it
is more precise to say they are per "worktree".

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---

 * Mostly unchanged from the version in the original discussion
   https://lore.kernel.org/git/xmqqo83hatm1.fsf@gitster.g/ except
   that we now mention that pseudorefs are also per worktree.

   One thing that makes me worried somewhat is what I did not touch,
   namely, how pseudo refs are defined.  I know MERGE_HEAD is very
   special and it may be impossible to coax it into refs API for
   writing, so the text there makes sense for it, but there are
   other all-caps-and-directly-under-dot-git-directory files like
   ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
   API, so the description would have to be updated there.

 Documentation/glossary-content.txt | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

Derrick Stolee Feb. 10, 2022, 2:40 p.m. UTC | #1

On 2/9/2022 9:19 PM, Junio C Hamano wrote:
> We have description on "per worktree ref", but "worktree" is not
> described in the glossary.  We do have "working tree", though.

>  * Mostly unchanged from the version in the original discussion
>    https://lore.kernel.org/git/xmqqo83hatm1.fsf@gitster.g/ except
>    that we now mention that pseudorefs are also per worktree.

This version looks good to me! Thanks.

>    One thing that makes me worried somewhat is what I did not touch,
>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
>    special and it may be impossible to coax it into refs API for
>    writing, so the text there makes sense for it, but there are
>    other all-caps-and-directly-under-dot-git-directory files like
>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
>    API, so the description would have to be updated there.

I agree that such changes would be nice. Shouldn't hold up
this change, though.

Thanks,
-Stolee

Elijah Newren Feb. 10, 2022, 3:50 p.m. UTC | #2

On Wed, Feb 9, 2022 at 6:19 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> We have description on "per worktree ref", but "worktree" is not
> described in the glossary.  We do have "working tree", though.
>
> Casually put, a "working tree" is what your editor and compiler
> interacts with.  "worktree" is a mechanism to allow one or more
> "working tree"s to be attached to a repository and used to check out
> different commits and branches independently, which includes not
> just a "working tree" but also repository metadata like HEAD, the
> index to support simultaneous use of them.  Historically, we used
> these terms interchangeably but we have been trying to use "working
> tree" when we mean it, instead of "worktree".
>
> Most of the existing references to "working tree" in the glossary do
> refer primarily to the working tree portion, except for one that
> said refs like HEAD and refs/bisect/* are per "working tree", but it
> is more precise to say they are per "worktree".
>
> Signed-off-by: Junio C Hamano <gitster@pobox.com>
> ---
>
>  * Mostly unchanged from the version in the original discussion
>    https://lore.kernel.org/git/xmqqo83hatm1.fsf@gitster.g/ except
>    that we now mention that pseudorefs are also per worktree.
>
>    One thing that makes me worried somewhat is what I did not touch,
>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
>    special and it may be impossible to coax it into refs API for
>    writing, so the text there makes sense for it, but there are
>    other all-caps-and-directly-under-dot-git-directory files like
>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
>    API, so the description would have to be updated there.

I'm not quite following; why would the description need to be updated?
 Sure MERGE_HEAD is written without using the refs API, but we didn't
mention how the pseduorefs were written in the description, and all of
MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
as the reader is concerned?

>  Documentation/glossary-content.txt | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/Documentation/glossary-content.txt b/Documentation/glossary-content.txt
> index c077971335..9eb8920552 100644
> --- a/Documentation/glossary-content.txt
> +++ b/Documentation/glossary-content.txt
> @@ -312,7 +312,7 @@ Pathspecs are used on the command line of "git ls-files", "git
>  ls-tree", "git add", "git grep", "git diff", "git checkout",
>  and many other commands to
>  limit the scope of operations to some subset of the tree or
> -worktree.  See the documentation of each command for whether
> +working tree.  See the documentation of each command for whether
>  paths are relative to the current directory or toplevel.  The
>  pathspec syntax is as follows:
>  +
> @@ -446,7 +446,7 @@ exclude;;
>         interface than the <<def_plumbing,plumbing>>.
>
>  [[def_per_worktree_ref]]per-worktree ref::
> -       Refs that are per-<<def_working_tree,worktree>>, rather than
> +       Refs that are per-<<def_worktree,worktree>>, rather than
>         global.  This is presently only <<def_HEAD,HEAD>> and any refs
>         that start with `refs/bisect/`, but might later include other
>         unusual refs.
> @@ -669,3 +669,12 @@ The most notable example is `HEAD`.
>         The tree of actual checked out files.  The working tree normally
>         contains the contents of the <<def_HEAD,HEAD>> commit's tree,
>         plus any local changes that you have made but not yet committed.
> +
> +[[def_work_tree]]worktree::
> +       A repository can have zero (i.e. bare repository) or one or
> +       more worktrees attached to it. One "worktree" consists of a
> +       "working tree" and repository metadata, most of which are
> +       shared among other worktrees of a single repository, and
> +       some of which are maintained separately per worktree
> +       (e.g. the index, HEAD and pseudorefs like MERGE_HEAD,
> +       per-worktree refs and per-worktree configuration file).
> --
> 2.35.1-102-g2b9c120970

The text looks good to me.

Junio C Hamano Feb. 10, 2022, 4:35 p.m. UTC | #3

Elijah Newren <newren@gmail.com> writes:

>>    One thing that makes me worried somewhat is what I did not touch,
>>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
>>    special and it may be impossible to coax it into refs API for
>>    writing, so the text there makes sense for it, but there are
>>    other all-caps-and-directly-under-dot-git-directory files like
>>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
>>    API, so the description would have to be updated there.
>
> I'm not quite following; why would the description need to be updated?
>  Sure MERGE_HEAD is written without using the refs API, but we didn't
> mention how the pseduorefs were written in the description, and all of
> MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
> per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
> as the reader is concerned?

Here is how pseudo refs are defined.

[[def_pseudoref]]pseudoref::
	Pseudorefs are a class of files under `$GIT_DIR` which behave
	like refs for the purposes of rev-parse, but which are treated
	specially by git.  Pseudorefs both have names that are all-caps,
	and always start with a line consisting of a
	<<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
	pseudoref, because it is sometimes a symbolic ref.  They might
	optionally contain some additional data.  `MERGE_HEAD` and
	`CHERRY_PICK_HEAD` are examples.  Unlike
	<<def_per_worktree_ref,per-worktree refs>>, these files cannot
	be symbolic refs, and never have reflogs.  They also cannot be
	updated through the normal ref update machinery.  Instead,
	they are updated by directly writing to the files.  However,
	they can be read as if they were refs, so `git rev-parse
	MERGE_HEAD` will work.

Points that may need to be looked at in the world where files
backend is not the only ref backend are:

 - "are ... files under `$GIT_DIR`" may no longer be true, once some
   of them are stored in reftable, for example.

 - "followed by whitespace" may be an irrelevant detail for the
   purpose of this paragraph.

 - CHERRY_PICK_HEAD, as written in sequencer.c::do_pick_commit(),
   use update_ref() to write a named file out, so "followed by
   whitesspace" (and other cruft, like MERGE_HEAD does) certainly
   does not apply.

 - Also "cannot be updated through the normal ref update machinery"
   is no longer true.  sequencer.c::do_pick_commit() even calls
   update_ref() with REF_NO_DEREF to ensure "cannot be symbolic
   refs".

 - "never have reflogs" would make sense for the current set of
   pseudorefs (does reflog on CHERRY_PICK_HEAD, for example, have
   real use case?), but I do not know if it stays that way.  I do
   not care too deeply either way, but I want to avoid over
   specifying things.

What worries me the most is that we cannot simply say "all-caps
names that end with '_HEAD' all behave like refs except that they
will not be symrefs without reflog." MERGE_HEAD is the only known
exception if I am not mistaken, and I am OK to single it out as an
oddball.  The current description however gives that there are a lot
more differences _among_ pseudorefs.

Elijah Newren Feb. 10, 2022, 5:03 p.m. UTC | #4

On Thu, Feb 10, 2022 at 8:35 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Elijah Newren <newren@gmail.com> writes:
>
> >>    One thing that makes me worried somewhat is what I did not touch,
> >>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
> >>    special and it may be impossible to coax it into refs API for
> >>    writing, so the text there makes sense for it, but there are
> >>    other all-caps-and-directly-under-dot-git-directory files like
> >>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
> >>    API, so the description would have to be updated there.
> >
> > I'm not quite following; why would the description need to be updated?
> >  Sure MERGE_HEAD is written without using the refs API, but we didn't
> > mention how the pseduorefs were written in the description, and all of
> > MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
> > per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
> > as the reader is concerned?
>
> Here is how pseudo refs are defined.
>
> [[def_pseudoref]]pseudoref::
>         Pseudorefs are a class of files under `$GIT_DIR` which behave
>         like refs for the purposes of rev-parse, but which are treated
>         specially by git.  Pseudorefs both have names that are all-caps,
>         and always start with a line consisting of a
>         <<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
>         pseudoref, because it is sometimes a symbolic ref.  They might
>         optionally contain some additional data.  `MERGE_HEAD` and
>         `CHERRY_PICK_HEAD` are examples.  Unlike
>         <<def_per_worktree_ref,per-worktree refs>>, these files cannot
>         be symbolic refs, and never have reflogs.  They also cannot be
>         updated through the normal ref update machinery.  Instead,
>         they are updated by directly writing to the files.  However,
>         they can be read as if they were refs, so `git rev-parse
>         MERGE_HEAD` will work.
>
> Points that may need to be looked at in the world where files
> backend is not the only ref backend are:

Ah, sorry, I assumed in "the description would have to be updated
there" you used "there" to refer to some part of your new patch text.
Re-reading, I can see you did specify the pseudoref section, but I
just somehow missed it.  Sorry about that.

>  - "are ... files under `$GIT_DIR`" may no longer be true, once some
>    of them are stored in reftable, for example.
>
>  - "followed by whitespace" may be an irrelevant detail for the
>    purpose of this paragraph.
>
>  - CHERRY_PICK_HEAD, as written in sequencer.c::do_pick_commit(),
>    use update_ref() to write a named file out, so "followed by
>    whitesspace" (and other cruft, like MERGE_HEAD does) certainly
>    does not apply.
>
>  - Also "cannot be updated through the normal ref update machinery"
>    is no longer true.  sequencer.c::do_pick_commit() even calls
>    update_ref() with REF_NO_DEREF to ensure "cannot be symbolic
>    refs".
>
>  - "never have reflogs" would make sense for the current set of
>    pseudorefs (does reflog on CHERRY_PICK_HEAD, for example, have
>    real use case?), but I do not know if it stays that way.  I do
>    not care too deeply either way, but I want to avoid over
>    specifying things.
>
> What worries me the most is that we cannot simply say "all-caps
> names that end with '_HEAD' all behave like refs except that they
> will not be symrefs without reflog." MERGE_HEAD is the only known
> exception if I am not mistaken, and I am OK to single it out as an
> oddball.  The current description however gives that there are a lot
> more differences _among_ pseudorefs.

Makes sense; thanks for clarifying for me.

Han-Wen Nienhuys Feb. 10, 2022, 6:07 p.m. UTC | #5

On Thu, Feb 10, 2022 at 5:36 PM Junio C Hamano <gitster@pobox.com> wrote:
> >>    One thing that makes me worried somewhat is what I did not touch,
> >>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
> >>    special and it may be impossible to coax it into refs API for
> >>    writing, so the text there makes sense for it, but there are
> >>    other all-caps-and-directly-under-dot-git-directory files like
> >>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
> >>    API, so the description would have to be updated there.
> >
> > I'm not quite following; why would the description need to be updated?
> >  Sure MERGE_HEAD is written without using the refs API, but we didn't
> > mention how the pseduorefs were written in the description, and all of
> > MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
> > per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
> > as the reader is concerned?
>
> Here is how pseudo refs are defined.
>
> [[def_pseudoref]]pseudoref::
>         Pseudorefs are a class of files under `$GIT_DIR` which behave
>         like refs for the purposes of rev-parse, but which are treated
>         specially by git.  Pseudorefs both have names that are all-caps,
>         and always start with a line consisting of a
>         <<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
>         pseudoref, because it is sometimes a symbolic ref.  They might

refs.c says

        if (is_pseudoref_syntax(refname))
                return REF_TYPE_PSEUDOREF;

Ie. ref_type("HEAD") == REF_TYPE_PSEUDOREF

This may be partly my fault (commit 55dd8b910 "Make HEAD a PSEUDOREF
rather than PER_WORKTREE.").

From the source code I had only understood that pseudorefs are ALLCAPS
names and are in the toplevel namespace.
(HEAD, FETCH_HEAD and MERGE_HEAD have special-cased support in various places).

Is this glossary the official definition of what things are? If so,
the source code should refer to there. If not -except for confusion-
how bad is it if the info in the glossary is inaccurate?

> What worries me the most is that we cannot simply say "all-caps
> names that end with '_HEAD' all behave like refs except that they
> will not be symrefs without reflog." MERGE_HEAD is the only known
> exception if I am not mistaken, and I am OK to single it out as an
> oddball.  The current description however gives that there are a lot
> more differences _among_ pseudorefs.

It might be possible to add this extra info the reftable format as a
further subtype of the ref record.  We'd have to update the JGit
implementation, though.

Junio C Hamano Feb. 10, 2022, 6:28 p.m. UTC | #6

Han-Wen Nienhuys <hanwen@google.com> writes:

> On Thu, Feb 10, 2022 at 5:36 PM Junio C Hamano <gitster@pobox.com> wrote:
>> >>    One thing that makes me worried somewhat is what I did not touch,
>> >>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
>> >>    special and it may be impossible to coax it into refs API for
>> >>    writing, so the text there makes sense for it, but there are
>> >>    other all-caps-and-directly-under-dot-git-directory files like
>> >>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
>> >>    API, so the description would have to be updated there.
>> >
>> > I'm not quite following; why would the description need to be updated?
>> >  Sure MERGE_HEAD is written without using the refs API, but we didn't
>> > mention how the pseduorefs were written in the description, and all of
>> > MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
>> > per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
>> > as the reader is concerned?
>>
>> Here is how pseudo refs are defined.
>>
>> [[def_pseudoref]]pseudoref::
>>         Pseudorefs are a class of files under `$GIT_DIR` which behave
>>         like refs for the purposes of rev-parse, but which are treated
>>         specially by git.  Pseudorefs both have names that are all-caps,
>>         and always start with a line consisting of a
>>         <<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
>>         pseudoref, because it is sometimes a symbolic ref.  They might
>
> refs.c says
>
>         if (is_pseudoref_syntax(refname))
>                 return REF_TYPE_PSEUDOREF;
>
> Ie. ref_type("HEAD") == REF_TYPE_PSEUDOREF
>
> This may be partly my fault (commit 55dd8b910 "Make HEAD a PSEUDOREF
> rather than PER_WORKTREE.").
>
> From the source code I had only understood that pseudorefs are ALLCAPS
> names and are in the toplevel namespace.
> (HEAD, FETCH_HEAD and MERGE_HEAD have special-cased support in various places).
>
> Is this glossary the official definition of what things are? If so,
> the source code should refer to there. If not -except for confusion-
> how bad is it if the info in the glossary is inaccurate?

Developer and end-user confusion ensues.

>> What worries me the most is that we cannot simply say "all-caps
>> names that end with '_HEAD' all behave like refs except that they
>> will not be symrefs without reflog." MERGE_HEAD is the only known
>> exception if I am not mistaken, and I am OK to single it out as an
>> oddball.  The current description however gives that there are a lot
>> more differences _among_ pseudorefs.
>
> It might be possible to add this extra info the reftable format as a
> further subtype of the ref record.  We'd have to update the JGit
> implementation, though.

As you said earlier, the true oddball is FETCH_HEAD X-<.  I actually
think in all the discussion in this thread around pseudoref, I meant
that one when I mentioned MERGE_HEAD, and I suspect the glossary
entry also made the same mistake of not mentioning it.  MERGE_HEAD
is also different in that it can list more than one commit, but
FETCH_HEAD has a lot more information per commit.

The format was invented solely for the purpose of passing
information from "git fetch" to "git merge" as an implementation
detail of "git pull" to describe each commit/tag that are being
merged, so that it can in turn pass the extra info to drive "git
fmt-merge-msg" internally to prepare merge template.

"git pull" did have to use such a special format temporary file
(which is what FETCH_HEAD really is, rather than a ref that records
more than one commits), but we didn't have to treat such an oddball
temporary file as if it were a ref.  But we allowed "git rev-parse"
to read only the first object name in the file and ignore the rest
as if nothing strange happened, which probably was a mistake made
out of my sloppyness when we did "pull is fetch followed by merge"
callchain.

For backward compatibility, "git merge FETCH_HEAD" still may have to
work the way it does (i.e. if FETCH_HEAD has multiple lines, the
resulting merge would become an octopus merge, and merge message
will say not just the commit but mention where they came from).  But
I am not sure if it is essential for us to keep treating these
oddball temporary files as if they are (sort of) refs.

Han-Wen Nienhuys Feb. 10, 2022, 6:36 p.m. UTC | #7

On Thu, Feb 10, 2022 at 7:28 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > On Thu, Feb 10, 2022 at 5:36 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> >>    One thing that makes me worried somewhat is what I did not touch,
> >> >>    namely, how pseudo refs are defined.  I know MERGE_HEAD is very
> >> >>    special and it may be impossible to coax it into refs API for
> >> >>    writing, so the text there makes sense for it, but there are
> >> >>    other all-caps-and-directly-under-dot-git-directory files like
> >> >>    ORIG_HEAD and CHERRY_PICK_HEAD that are written using the refs
> >> >>    API, so the description would have to be updated there.
> >> >
> >> > I'm not quite following; why would the description need to be updated?
> >> >  Sure MERGE_HEAD is written without using the refs API, but we didn't
> >> > mention how the pseduorefs were written in the description, and all of
> >> > MERGE_HEAD, CHERRY_PICK_HEAD, ORIG_HEAD, REVERT_HEAD get written
> >> > per-worktree so doesn't "pseudorefs like MERGE_HEAD" cover it as far
> >> > as the reader is concerned?
> >>
> >> Here is how pseudo refs are defined.
> >>
> >> [[def_pseudoref]]pseudoref::
> >>         Pseudorefs are a class of files under `$GIT_DIR` which behave
> >>         like refs for the purposes of rev-parse, but which are treated
> >>         specially by git.  Pseudorefs both have names that are all-caps,
> >>         and always start with a line consisting of a
> >>         <<def_SHA1,SHA-1>> followed by whitespace.  So, HEAD is not a
> >>         pseudoref, because it is sometimes a symbolic ref.  They might
> >
> > refs.c says
> >
> >         if (is_pseudoref_syntax(refname))
> >                 return REF_TYPE_PSEUDOREF;
> >
> > Ie. ref_type("HEAD") == REF_TYPE_PSEUDOREF
> >
> > This may be partly my fault (commit 55dd8b910 "Make HEAD a PSEUDOREF
> > rather than PER_WORKTREE.").
> >
> > From the source code I had only understood that pseudorefs are ALLCAPS
> > names and are in the toplevel namespace.
> > (HEAD, FETCH_HEAD and MERGE_HEAD have special-cased support in various places).
> >
> > Is this glossary the official definition of what things are? If so,
> > the source code should refer to there. If not -except for confusion-
> > how bad is it if the info in the glossary is inaccurate?
>
> Developer and end-user confusion ensues.

that's why I said: "except for confusion" :-)

I'm asking to understand if there is anything stopping us from
changing the glossary to match the current code.

> For backward compatibility, "git merge FETCH_HEAD" still may have to
> work the way it does (i.e. if FETCH_HEAD has multiple lines, the
> resulting merge would become an octopus merge, and merge message
> will say not just the commit but mention where they came from).  But
> I am not sure if it is essential for us to keep treating these
> oddball temporary files as if they are (sort of) refs.

on a tangent: I posted a patch to write MERGE_AUTOSTASH,
rebase-merge/autostash, etc. as refs.
Is that the right direction? They are read like refs, but they are
together in a directory with other bits of stateful data (similar to
what is appended to FETCH_HEAD). Perhaps I should rather change the
read path, so they're always read as files rather than refs?

Junio C Hamano Feb. 10, 2022, 7:14 p.m. UTC | #8

Han-Wen Nienhuys <hanwen@google.com> writes:

> on a tangent: I posted a patch to write MERGE_AUTOSTASH,
> rebase-merge/autostash, etc. as refs.
> Is that the right direction? They are read like refs, but they are
> together in a directory with other bits of stateful data (similar to
> what is appended to FETCH_HEAD). Perhaps I should rather change the
> read path, so they're always read as files rather than refs?

I think that would be a lot more preferrable.  If a file is written
to record pieces of info, among which an object name happens to be
included, it does not have to be recorded as a ref.  Especially if
it is an ephemeral file like MERGE_AUTOSTASH and FETCH_HEAD.

Junio C Hamano Feb. 10, 2022, 7:17 p.m. UTC | #9

Han-Wen Nienhuys <hanwen@google.com> writes:

>> > Is this glossary the official definition of what things are? If so,
>> > the source code should refer to there. If not -except for confusion-
>> > how bad is it if the info in the glossary is inaccurate?
>>
>> Developer and end-user confusion ensues.
>
> that's why I said: "except for confusion" :-)
>
> I'm asking to understand if there is anything stopping us from
> changing the glossary to match the current code.

I do not think so.  It will give us a chance to rethink what we have
in the code, too.  It is possible that we may end up concluding that
it is better to leave a "pseudoref" always as a file inside $GIT_DIR
regardless of what ref backend is in use, for example.

Han-Wen Nienhuys Feb. 17, 2022, 10 a.m. UTC | #10

On Thu, Feb 10, 2022 at 8:14 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > on a tangent: I posted a patch to write MERGE_AUTOSTASH,
> > rebase-merge/autostash, etc. as refs.
> > Is that the right direction? They are read like refs, but they are
> > together in a directory with other bits of stateful data (similar to
> > what is appended to FETCH_HEAD). Perhaps I should rather change the
> > read path, so they're always read as files rather than refs?
>
> I think that would be a lot more preferrable.  If a file is written
> to record pieces of info, among which an object name happens to be
> included, it does not have to be recorded as a ref.  Especially if
> it is an ephemeral file like MERGE_AUTOSTASH and FETCH_HEAD.

For FETCH_HEAD, doing

  git fetch host refs/changes/23/123/1 && git checkout FETCH_HEAD

is the standard idiom for downloading a change from Gerrit. I suspect
there might be other similar idioms. This means we have to read them
through the refs machinery.

I think the most sensible approach is to pass the read/write through
refs_* functions, but special-case the storage, so it doesn't go
through reftable. We already do this for FETCH_HEAD and MERGE_HEAD in
refs_read_raw_refs.

This means we need a formal definition of which refs should be treated
as files. Maybe we could do as follows:

Pseudorefs are
  1) all uppercase toplevel names except for HEAD
  2) all refs that are not under refs/* (for example:
rebase-{merge,apply}/autostash)

Pseudorefs are always stored as files containing a hex object_id.

Pseudorefs can be read or written through refs_* functions, but given
the storage guarantees, it's also valid to read/write them outside
refs_* functions

It is forbidden to make cross-ref transactions that involve pseudorefs.

Junio C Hamano Feb. 17, 2022, 7:10 p.m. UTC | #11

Han-Wen Nienhuys <hanwen@google.com> writes:

> For FETCH_HEAD, doing
>
>   git fetch host refs/changes/23/123/1 && git checkout FETCH_HEAD
>
> is the standard idiom for downloading a change from Gerrit. I suspect
> there might be other similar idioms. This means we have to read them
> through the refs machinery.

This merely means we have to read them through the object-name
machinery around get_oid().  Historically that was done by calling
repo_dwin_ref() from get_oid_basic(), which is where refs machinery
enters the picture, and because we had only files backend, it was OK
and convenient to treat .git/FETCH_HEAD and .git/refs/heads/master
in the same codepath.  But there is no reason for the arrangement to
stay that way.  .git/FOOBAR_HEAD files can be read as a file (we can
say we let files-backend to handle it, but we can also extract a helper
function out of it and make it clear that it truly has no dependence
on the refs machinery) while .git/refs/* can be read from the refs
machinery that may be backed by reftable backend.

> I think the most sensible approach is to pass the read/write through
> refs_* functions, but special-case the storage, so it doesn't go
> through reftable. We already do this for FETCH_HEAD and MERGE_HEAD in
> refs_read_raw_refs.

I think we are more or less on the same page.  I do not think these
files behave like refs (they have no reflog, and they do not serve
as anchoring points for the purpose of gc/fsck) and we need a
special code path, which might be identical to the current ref-files
backend code, to handle them no matter what backend is used for true
refs.

> This means we need a formal definition of which refs should be treated
> as files. Maybe we could do as follows:
>
> Pseudorefs are
>   1) all uppercase toplevel names except for HEAD
>   2) all refs that are not under refs/* (for example:
> rebase-{merge,apply}/autostash)
>
> Pseudorefs are always stored as files containing a hex object_id.
>
> Pseudorefs can be read or written through refs_* functions, but given
> the storage guarantees, it's also valid to read/write them outside
> refs_* functions
>
> It is forbidden to make cross-ref transactions that involve pseudorefs.

Ævar Arnfjörð Bjarmason Feb. 18, 2022, 8:25 p.m. UTC | #12

On Thu, Feb 17 2022, Junio C Hamano wrote:

> Han-Wen Nienhuys <hanwen@google.com> writes:
>
>> For FETCH_HEAD, doing
>>
>>   git fetch host refs/changes/23/123/1 && git checkout FETCH_HEAD
>>
>> is the standard idiom for downloading a change from Gerrit. I suspect
>> there might be other similar idioms. This means we have to read them
>> through the refs machinery.
>
> This merely means we have to read them through the object-name
> machinery around get_oid().  Historically that was done by calling
> repo_dwin_ref() from get_oid_basic(), which is where refs machinery
> enters the picture, and because we had only files backend, it was OK
> and convenient to treat .git/FETCH_HEAD and .git/refs/heads/master
> in the same codepath.  But there is no reason for the arrangement to
> stay that way.  .git/FOOBAR_HEAD files can be read as a file (we can
> say we let files-backend to handle it, but we can also extract a helper
> function out of it and make it clear that it truly has no dependence
> on the refs machinery) while .git/refs/* can be read from the refs
> machinery that may be backed by reftable backend.
>
>> I think the most sensible approach is to pass the read/write through
>> refs_* functions, but special-case the storage, so it doesn't go
>> through reftable. We already do this for FETCH_HEAD and MERGE_HEAD in
>> refs_read_raw_refs.
>
> I think we are more or less on the same page.  I do not think these
> files behave like refs (they have no reflog, and they do not serve
> as anchoring points for the purpose of gc/fsck) and we need a
> special code path, which might be identical to the current ref-files
> backend code, to handle them no matter what backend is used for true
> refs.

I'm not sure I get all the concerns in this thread, are we talking about
having FETCH_HEAD be not-in-reftable mainly because it's multi-value?

Maybe we will need these special refs on-disk forever, but it seems
preferrable to pursue a plan where we use the preferred ref backend for
them.

That means that we can make them part of the normal ref transaction for
the backend, and could eventually support a world where a repo
e.g. talks to remote DB service for its refs (with shared storage for
repos).

For the "no reflog" is that really a critical property to maintain, or
just how the file backend happens to work now? In any case weren't we
talking about explicitly supporting the "explicit no reflog", "reflog
only for this ref" etc. that we support in the file backend in reftable?
Then we could presumably turn off the reflog for these special refs.

Similarly, for "gc" etc. supporting that doesn't seem like such a big
deal in that codepath, even if these end up being stored in some "real"
ref store. We can just ignore them when we're checking reachability.

But I may be entirely missing the point here...

Junio C Hamano Feb. 18, 2022, 8:50 p.m. UTC | #13

Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes:

> But I may be entirely missing the point here...

Not only FETCH_HEAD is a multi-line file, but each entry in it carry
extra information that do not belong to a 'ref', like where it came
from and whether its intended use is to merge into the current
branch.

The file is much closer to what "rebase -i" uses its "todo list"
for, than being a "ref-like" thing.  The only similarity with a
loose ref is that it begins with 40-hex and get_sha1_hex() will
happily return success.  The file format was designed to take
advantage of the loose-ness of get_sha1_hex(), historically, so
that the reading side can reuse the logic to read loose refs.

You can blame all of that to my laziness ;-).

After you fetch a single ref, get_oid("FETCH_HEAD") should keep
returning the object name of what you fetched, but that does not
mean the full multi-line trash needs to be stored in any ref
backend.  If you fetch multiple refs, get_oid("FETCH_HEAD") can only
return one of them anyway, so as long as we keep the only useful use
case working, we do not have to use the machinery the ref backend of
choice uses to store it.  It can remain an on-disk file, just like
the "todo" list "rebase -i" uses is an on-disk file.

glossary: describe "worktree"

Commit Message

Comments

Patch