mbox series

[v3,0/3] Remove special casing for PSEUDOREF updates

Message ID pull.673.v3.git.1594925141.gitgitgadget@gmail.com (mailing list archive)
Headers show
Series Remove special casing for PSEUDOREF updates | expand

Message

Koji Nakamaru via GitGitGadget July 16, 2020, 6:45 p.m. UTC
This gets rid of the special casing code for pseudorefs in refs.c

This is in preparation for reftable.

v3

 * tweak git-update-ref.txt description for logAllRefUpdates

Han-Wen Nienhuys (3):
  t1400: use git rev-parse for testing PSEUDOREF existence
  Modify pseudo refs through ref backend storage
  Make HEAD a PSEUDOREF rather than PER_WORKTREE.

 Documentation/git-update-ref.txt |  13 ++--
 refs.c                           | 127 +++----------------------------
 t/t1400-update-ref.sh            |  30 ++++----
 t/t1405-main-ref-store.sh        |   5 +-
 4 files changed, 36 insertions(+), 139 deletions(-)


base-commit: b6a658bd00c9c29e07f833cabfc0ef12224e277a
Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-673%2Fhanwen%2Fpseudoref-v3
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-673/hanwen/pseudoref-v3
Pull-Request: https://github.com/gitgitgadget/git/pull/673

Range-diff vs v2:

 1:  9c3dc4b2cb = 1:  28bd3534d0 t1400: use git rev-parse for testing PSEUDOREF existence
 2:  871b411517 ! 2:  79cd5dd480 Modify pseudo refs through ref backend storage
     @@ Commit message
      
          Signed-off-by: Han-Wen Nienhuys <hanwen@google.com>
      
     + ## Documentation/git-update-ref.txt ##
     +@@ Documentation/git-update-ref.txt: still see a subset of the modifications.
     + 
     + LOGGING UPDATES
     + ---------------
     +-If config parameter "core.logAllRefUpdates" is true and the ref is one under
     +-"refs/heads/", "refs/remotes/", "refs/notes/", or the symbolic ref HEAD; or
     +-the file "$GIT_DIR/logs/<ref>" exists then `git update-ref` will append
     +-a line to the log file "$GIT_DIR/logs/<ref>" (dereferencing all
     +-symbolic refs before creating the log name) describing the change
     +-in ref value.  Log lines are formatted as:
     ++If config parameter "core.logAllRefUpdates" is true and the ref is one
     ++under "refs/heads/", "refs/remotes/", "refs/notes/", or a pseudoref
     ++like HEAD or ORIG_HEAD; or the file "$GIT_DIR/logs/<ref>" exists then
     ++`git update-ref` will append a line to the log file
     ++"$GIT_DIR/logs/<ref>" (dereferencing all symbolic refs before creating
     ++the log name) describing the change in ref value.  Log lines are
     ++formatted as:
     + 
     +     oldsha1 SP newsha1 SP committer LF
     + 
     +
       ## refs.c ##
      @@ refs.c: long get_files_ref_lock_timeout_ms(void)
       	return timeout_ms;
 3:  1c2b9d5f17 = 3:  3ab9f2f04e Make HEAD a PSEUDOREF rather than PER_WORKTREE.

Comments

Junio C Hamano July 16, 2020, 10:09 p.m. UTC | #1
"Han-Wen Nienhuys via GitGitGadget" <gitgitgadget@gmail.com> writes:

> This gets rid of the special casing code for pseudorefs in refs.c
>
> This is in preparation for reftable.
>
> v3
>
>  * tweak git-update-ref.txt description for logAllRefUpdates
>
> Han-Wen Nienhuys (3):
>   t1400: use git rev-parse for testing PSEUDOREF existence
>   Modify pseudo refs through ref backend storage
>   Make HEAD a PSEUDOREF rather than PER_WORKTREE.

I reviewed some codepaths that deal with FETCH_HEAD recently.  

As the file is quite different from all the other pseudo references
in that it needs to store more than one object name and in that each
ref in it needs more than just the object name, I doubt that it
makes much sense to enhance the refs API so that its requirements
can be covered.

It would probably make sense for FETCH_HEAD to stay "a text file
directly underneath $GIT_DIR", but get_oid() needs to be able to
read it and return the first object name recorded in the file.
Han-Wen Nienhuys July 27, 2020, 8:50 a.m. UTC | #2
On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote:
> I reviewed some codepaths that deal with FETCH_HEAD recently.
>
> As the file is quite different from all the other pseudo references
> in that it needs to store more than one object name and in that each
> ref in it needs more than just the object name, I doubt that it
> makes much sense to enhance the refs API so that its requirements
> can be covered.



I agree. Do we ever pretend that FETCH_HEAD is a ref today?
Junio C Hamano July 27, 2020, 4:20 p.m. UTC | #3
Han-Wen Nienhuys <hanwen@google.com> writes:

> On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote:
>> I reviewed some codepaths that deal with FETCH_HEAD recently.
>>
>> As the file is quite different from all the other pseudo references
>> in that it needs to store more than one object name and in that each
>> ref in it needs more than just the object name, I doubt that it
>> makes much sense to enhance the refs API so that its requirements
>> can be covered.
>
> I agree. Do we ever pretend that FETCH_HEAD is a ref today?

"git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep
working, so in that sense, it is treated as a ref.  It does not
protect the history leading to the objects listed in it from being
collected, though.

"git merge FETCH_HEAD" is an interesting case---I haven't thought it
through.

Jung fubhyq unccra nsgre "tvg chyy bevtva sbb one" nggrzcgf gb teno
gjb oenapurf naq znxr na bpgbchf zretr vagb gur oenapu pheeragyl
purpxrq bhg, naq gura "tvg erfrg --uneq && tvg zretr SRGPU_URNQ" vf
tvira?
Han-Wen Nienhuys Aug. 3, 2020, 7:07 p.m. UTC | #4
On Mon, Jul 27, 2020 at 6:20 PM Junio C Hamano <gitster@pobox.com> wrote:
> > On Fri, Jul 17, 2020 at 12:10 AM Junio C Hamano <gitster@pobox.com> wrote:
> >> I reviewed some codepaths that deal with FETCH_HEAD recently.
> >>
> >> As the file is quite different from all the other pseudo references
> >> in that it needs to store more than one object name and in that each
> >> ref in it needs more than just the object name, I doubt that it
> >> makes much sense to enhance the refs API so that its requirements
> >> can be covered.
> >
> > I agree. Do we ever pretend that FETCH_HEAD is a ref today?
>
> "git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep
> working, so in that sense, it is treated as a ref.

I added this to the last version of the full reftable patch series
that I posted,  as patches
"Split off reading loose ref data in separate function" and "Read
FETCH_HEAD as loose ref".

Which other refs that aren't really refs should also be supported? The
JGit source code suggests that MERGE_HEAD should also be special
cased?

It's not pretty, though. The entry point is read_raw_ref(), which gets
a ref_store as argument. ref_store doesn't have generic API to get at
the repository directory, so the implementation of reading the
FETCH_HEAD file has to be pushed down to the ref backend (which does
know the directory).

> It does not
> protect the history leading to the objects listed in it from being
> collected, though.

Is there documented agreement of how pseudo refs and GC should interact?

> "git merge FETCH_HEAD" is an interesting case---I haven't thought it
> through.
>
> What should happen after "git pull origin foo bar" attempts to grab
> two branches and make an octopus merge into the branch currently
> checked out, and then "git reset --hard && git merge FETCH_HEAD" is
> given?

I don't understand this question.
Junio C Hamano Aug. 3, 2020, 8:07 p.m. UTC | #5
Han-Wen Nienhuys <hanwen@google.com> writes:

>> >> As the file is quite different from all the other pseudo references
>> >> in that it needs to store more than one object name and in that each
>> >> ref in it needs more than just the object name, I doubt that it
>> >> makes much sense to enhance the refs API so that its requirements
>> >> can be covered.
>> >
>> > I agree. Do we ever pretend that FETCH_HEAD is a ref today?
>>
>> "git rev-parse FETCH_HEAD", "git show FETCH_HEAD" etc. all should keep
>> working, so in that sense, it is treated as a ref.
>
> I added this to the last version of the full reftable patch series
> that I posted, as patches
> "Split off reading loose ref data in separate function" and "Read
> FETCH_HEAD as loose ref".
>
> Which other refs that aren't really refs should also be supported? The
> JGit source code suggests that MERGE_HEAD should also be special
> cased?

I'd think all .git/${SOMETHING}_HEAD are of transitory nature that
can be left as simple on-disk files that are read (and preferrably
written---except for FETCH_HEAD for obvious reasons) as if they are
loose refs handled by files backend.  It probably makes sense not to
even write reflog entries for them---it is not like the MERGE_HEAD
I see now in .git/ directory is an updated version of MERGE_HEAD I
had there yesterday. "git log -g MERGE_HEAD" gives no interesting
information.

>> "git merge FETCH_HEAD" is an interesting case---I haven't thought it
>> through.
>>
>> What should happen after "git pull origin foo bar" attempts to grab
>> two branches and make an octopus merge into the branch currently
>> checked out, and then "git reset --hard && git merge FETCH_HEAD" is
>> given?
>
> I don't understand this question.

The request "git pull origin foo bar" is "grab the tip of 'foo' and
'bar' branches from remote whose name is 'origin'; merge these two
commits into the commit that I have checked out".

To fulfill the request, "git pull" runs "git fetch", and the latter
leaves two lines of interest in .git/FETCH_HEAD file.  Each line
lists the name of the object at the ref, an optional "not-for-merge"
token (which in this case does not exist, as both records are for
merging), and piece of human-readable text to describe where that
object came from to help later step that computes the default
message for the resulting merge commit.

If the octopus merge does not finish correctly (e.g. due to
conflicts), with "git reset --hard", we can recover to the original
state and re-attempt the opeation with "git merge FETCH_HEAD".  Such
a merge using FETCH_HEAD will produce an octopus merge.

Which means that at least "git merge", FETCH_HEAD is not just a
regular ref where you can ask what object it points at and it gives
you a single object name back.

But to other commands like "git log master..FETCH_HEAD", it acts as
if there is only one object recorded.  That makes it an interesting
case.  Making it act as if "git log ^master origin/foo origin/bar"
were given might be a "bugfix" to make it behave more "correctly",
but I do not know how large a fallout such a change brings in.
Junio C Hamano Aug. 5, 2020, 1:45 a.m. UTC | #6
Junio C Hamano <gitster@pobox.com> writes:

>> Which other refs that aren't really refs should also be supported? The
>> JGit source code suggests that MERGE_HEAD should also be special
>> cased?
>
> I'd think all .git/${SOMETHING}_HEAD are of transitory nature that
> can be left as simple on-disk files that are read (and preferrably
> written---except for FETCH_HEAD for obvious reasons) as if they are
> loose refs handled by files backend.

Sorry for flipping and flopping.  The above goes directly against
the spirit of 09743417 (Modify pseudo refs through ref backend
storage, 2020-07-27).  I still think .git/${SOMETHING}_HEAD except
for FETCH_HEAD should be written and read via the ref subsystem, but
I was wrong to say that it should always be done via the files
backend.  There is no reason to insist on the use of files backend
here.

> It probably makes sense not to
> even write reflog entries for them---it is not like the MERGE_HEAD
> I see now in .git/ directory is an updated version of MERGE_HEAD I
> had there yesterday. "git log -g MERGE_HEAD" gives no interesting
> information.

This still is true, but that is pretty much orthogonal to which
backend is used.

> If the octopus merge does not finish correctly (e.g. due to
> conflicts), with "git reset --hard", we can recover to the original
> state and re-attempt the opeation with "git merge FETCH_HEAD".  Such
> a merge using FETCH_HEAD will produce an octopus merge.
>
> Which means that at least "git merge", FETCH_HEAD is not just a
> regular ref where you can ask what object it points at and it gives
> you a single object name back.
>
> But to other commands like "git log master..FETCH_HEAD", it acts as
> if there is only one object recorded.

All of which means FETCH_HEAD is special and we may not want to
burden the special casing of it to newer backends.
Han-Wen Nienhuys Aug. 5, 2020, 10:48 a.m. UTC | #7
On Wed, Aug 5, 2020 at 3:45 AM Junio C Hamano <gitster@pobox.com> wrote:
> > I'd think all .git/${SOMETHING}_HEAD are of transitory nature that
> > can be left as simple on-disk files that are read (and preferrably
> > written---except for FETCH_HEAD for obvious reasons) as if they are
> > loose refs handled by files backend.
>
> Sorry for flipping and flopping.  The above goes directly against
> the spirit of 09743417 (Modify pseudo refs through ref backend
> storage, 2020-07-27).  I still think .git/${SOMETHING}_HEAD except
> for FETCH_HEAD should be written and read via the ref subsystem, but
> I was wrong to say that it should always be done via the files
> backend.  There is no reason to insist on the use of files backend
> here.

Yes, that confused me. Thanks for setting that straight.

> > If the octopus merge does not finish correctly (e.g. due to
> > conflicts), with "git reset --hard", we can recover to the original
> > state and re-attempt the opeation with "git merge FETCH_HEAD".  Such
> > a merge using FETCH_HEAD will produce an octopus merge.
> >
> > Which means that at least "git merge", FETCH_HEAD is not just a
> > regular ref where you can ask what object it points at and it gives
> > you a single object name back.
> >
> > But to other commands like "git log master..FETCH_HEAD", it acts as
> > if there is only one object recorded.
>
> All of which means FETCH_HEAD is special and we may not want to
> burden the special casing of it to newer backends.

Can you confirm that FETCH_HEAD is the only thing that can store more
than just a symref / SHA1 ? Based on the name, and a comment in the
JGit source, I thought that MERGE_HEAD might contain more than one
SHA1 at a time.
Junio C Hamano Aug. 5, 2020, 3:54 p.m. UTC | #8
Han-Wen Nienhuys <hanwen@google.com> writes:

>> All of which means FETCH_HEAD is special and we may not want to
>> burden the special casing of it to newer backends.
>
> Can you confirm that FETCH_HEAD is the only thing that can store more
> than just a symref / SHA1 ? Based on the name, and a comment in the
> JGit source, I thought that MERGE_HEAD might contain more than one
> SHA1 at a time.

No I cannot.  I do not think MERGE_HEAD is something I added with as
deep a thought as I did with FETCH_HEAD.  If it mimics FETCH_HEAD,
then perhaps it does, but I somehow doubt it.

As can be seen in builtin/merge.c::collect_parents(), we do special
case FETCH_HEAD when grabbing what commit*S* to merge, but all
others are read with get_merge_parent() that makes a single call to
get_oid(), i.e. anything other than FETCH_HEAD cannot have more than
one object recorded.
Han-Wen Nienhuys Aug. 10, 2020, 2:27 p.m. UTC | #9
On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> All of which means FETCH_HEAD is special and we may not want to
> >> burden the special casing of it to newer backends.
> >
> > Can you confirm that FETCH_HEAD is the only thing that can store more
> > than just a symref / SHA1 ? Based on the name, and a comment in the
> > JGit source, I thought that MERGE_HEAD might contain more than one
> > SHA1 at a time.
>
> No I cannot.  I do not think MERGE_HEAD is something I added with as
> deep a thought as I did with FETCH_HEAD.  If it mimics FETCH_HEAD,
> then perhaps it does, but I somehow doubt it.
>
> As can be seen in builtin/merge.c::collect_parents(), we do special
> case FETCH_HEAD when grabbing what commit*S* to merge, but all
> others are read with get_merge_parent() that makes a single call to
> get_oid(), i.e. anything other than FETCH_HEAD cannot have more than
> one object recorded.

Understood. Is there anything you'd like me to do with this patch
series so it can be merged?

Dealing with FETCH_HEAD generically isn't possible unless we extend
the API of the ref backend: the generic ref_store instance doesn't
offer a way to get at the path that corresponds to FETCH_HEAD, so we
can't handle it in refs_read_raw_ref(). In the current reftable
series, FETCH_HEAD is dealt with in the backends separately.
Junio C Hamano Aug. 10, 2020, 4:04 p.m. UTC | #10
Han-Wen Nienhuys <hanwen@google.com> writes:

> Dealing with FETCH_HEAD generically isn't possible unless we extend
> the API of the ref backend: the generic ref_store instance doesn't
> offer a way to get at the path that corresponds to FETCH_HEAD, so we
> can't handle it in refs_read_raw_ref(). In the current reftable
> series, FETCH_HEAD is dealt with in the backends separately.

I am not sure what the best way would be, but I do not think any
existing code writes into it using the refs API at all, even though
it may be read only for the first object name using the refs API.

And I am not sure if we want to extend the write side API so that
the callers can express the full flexibility of that single file.

So perhaps the best way forward would be to ensure that anybody who
tries to read from FETCH_HEAD using the ref API reads the first
object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
since the beginning of time, regardless of what ref backend is used,
that anybody who tries to write FETCH_HEAD using the ref API gets an
error, and letting the writers write into it directly?
Han-Wen Nienhuys Aug. 11, 2020, 10:49 a.m. UTC | #11
On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Han-Wen Nienhuys <hanwen@google.com> writes:
>
> > Dealing with FETCH_HEAD generically isn't possible unless we extend
> > the API of the ref backend: the generic ref_store instance doesn't
> > offer a way to get at the path that corresponds to FETCH_HEAD, so we
> > can't handle it in refs_read_raw_ref(). In the current reftable
> > series, FETCH_HEAD is dealt with in the backends separately.
>
> I am not sure what the best way would be, but I do not think any
> existing code writes into it using the refs API at all, even though
> it may be read only for the first object name using the refs API.
>
> And I am not sure if we want to extend the write side API so that
> the callers can express the full flexibility of that single file.

That's not what I am getting at. I am just interested in how to handle
FETCH_HEAD in refs_read_raw_ref.

> So perhaps the best way forward would be to ensure that anybody who
> tries to read from FETCH_HEAD using the ref API reads the first
> object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
> since the beginning of time, regardless of what ref backend is used,

Right, but how do we get at the value of $GIT_DIR given a struct
ref_store? We can either push that out to the ref store backend,
because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a
property of the generic ref_store.

I suppose it's cleaner to make the latter API extension.
Junio C Hamano Aug. 11, 2020, 6:38 p.m. UTC | #12
Han-Wen Nienhuys <hanwen@google.com> writes:

> On Mon, Aug 10, 2020 at 6:04 PM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> Han-Wen Nienhuys <hanwen@google.com> writes:
>>
>> > Dealing with FETCH_HEAD generically isn't possible unless we extend
>> > the API of the ref backend: the generic ref_store instance doesn't
>> > offer a way to get at the path that corresponds to FETCH_HEAD, so we
>> > can't handle it in refs_read_raw_ref(). In the current reftable
>> > series, FETCH_HEAD is dealt with in the backends separately.
>>
>> I am not sure what the best way would be, but I do not think any
>> existing code writes into it using the refs API at all, even though
>> it may be read only for the first object name using the refs API.
>>
>> And I am not sure if we want to extend the write side API so that
>> the callers can express the full flexibility of that single file.
>
> That's not what I am getting at. I am just interested in how to handle
> FETCH_HEAD in refs_read_raw_ref.
>
>> So perhaps the best way forward would be to ensure that anybody who
>> tries to read from FETCH_HEAD using the ref API reads the first
>> object name in it from $GIT_DIR/FETCH_HEAD file as we've always done
>> since the beginning of time, regardless of what ref backend is used,
>
> Right, but how do we get at the value of $GIT_DIR given a struct
> ref_store? We can either push that out to the ref store backend,
> because each backend knows what $GIT_DIR is, or we can make $GIT_DIR a
> property of the generic ref_store.
>
> I suppose it's cleaner to make the latter API extension.

Yup.  Sounds like the right direction to go.

Thanks.
Han-Wen Nienhuys Aug. 19, 2020, 1:19 p.m. UTC | #13
On Wed, Aug 5, 2020 at 5:54 PM Junio C Hamano <gitster@pobox.com> wrote:
> >> All of which means FETCH_HEAD is special and we may not want to
> >> burden the special casing of it to newer backends.
> >
> > Can you confirm that FETCH_HEAD is the only thing that can store more
> > than just a symref / SHA1 ? Based on the name, and a comment in the
> > JGit source, I thought that MERGE_HEAD might contain more than one
> > SHA1 at a time.
>
> No I cannot.  I do not think MERGE_HEAD is something I added with as
> deep a thought as I did with FETCH_HEAD.  If it mimics FETCH_HEAD,
> then perhaps it does, but I somehow doubt it.

blame.c has an append_merge_parents() which reads multiple lines from
MERGE_HEAD, and cmd_commit does so too. It's populated from
write_merge_heads(). So the special casing should be extended to
MERGE_HEAD.
Junio C Hamano Aug. 19, 2020, 3:52 p.m. UTC | #14
Han-Wen Nienhuys <hanwen@google.com> writes:

> blame.c has an append_merge_parents() which reads multiple lines from
> MERGE_HEAD, and cmd_commit does so too. It's populated from
> write_merge_heads(). So the special casing should be extended to
> MERGE_HEAD.

Ahh.  That makes sense.