diff mbox series

[v2,1/4] wt-status: read HEAD and ORIG_HEAD via the refdb

Message ID 1db3eb3945432964aabe1c559db4c3ac251e83fd.1702365291.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series refs: improve handling of special refs | expand

Commit Message

Patrick Steinhardt Dec. 12, 2023, 7:18 a.m. UTC
We read both the HEAD and ORIG_HEAD references directly from the
filesystem in order to figure out whether we're currently splitting a
commit. If both of the following are true:

  - HEAD points to the same object as "rebase-merge/amend".

  - ORIG_HEAD points to the same object as "rebase-merge/orig-head".

Then we are currently splitting commits.

The current code only works by chance because we only have a single
reference backend implementation. Refactor it to instead read both refs
via the refdb layer so that we'll also be compatible with alternate
reference backends.

Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is
because we didn't resolve symrefs before either, and in practice none of
the refs in "rebase-merge/" would be symbolic. We thus don't want to
resolve symrefs with the new code either to retain the old behaviour.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 wt-status.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

Comments

Junio C Hamano Dec. 12, 2023, 8:24 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> The current code only works by chance because we only have a single
> reference backend implementation. Refactor it to instead read both refs
> via the refdb layer so that we'll also be compatible with alternate
> reference backends.

"via the refdb" -> "via the refs API" or something here and on the
title, and possibly elsewhere in the proposed log messages and
in-code comments in patches in this series, as I've never seen a
word "refdb" used in the context of this project.

I agree it is bad manners to be intimate with the implementation
details of the how files-backend stores HEAD and ORIG_HEAD.

> Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is
> because we didn't resolve symrefs before either, and in practice none of
> the refs in "rebase-merge/" would be symbolic. We thus don't want to
> resolve symrefs with the new code either to retain the old behaviour.

Good to see a rewrite being careful like this.

> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
>  wt-status.c | 17 +++++++++--------
>  1 file changed, 9 insertions(+), 8 deletions(-)
>
> diff --git a/wt-status.c b/wt-status.c
> index 9f45bf6949..fe9e590b80 100644
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename)
>  static int split_commit_in_progress(struct wt_status *s)
>  {
>  	int split_in_progress = 0;
> -	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
> +	struct object_id head_oid, orig_head_oid;
> +	char *rebase_amend, *rebase_orig_head;
>  
>  	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
>  	    !s->branch || strcmp(s->branch, "HEAD"))
>  		return 0;
>  
> -	head = read_line_from_git_path("HEAD");
> -	orig_head = read_line_from_git_path("ORIG_HEAD");
> +	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
> +	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
> +		return 0;
> +

This made me wonder if we have changed behaviour when on an unborn
branch.  In such a case, the original most likely would have read
"ref: blah" in "head" and compared it with "rebase_amend", which
would be a good way to ensure they would not match.  I would not
know offhand what the updated code would do, but head_oid would be
uninitialized in such a case, so ...?

>  	rebase_amend = read_line_from_git_path("rebase-merge/amend");
>  	rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head");
>  
> -	if (!head || !orig_head || !rebase_amend || !rebase_orig_head)
> +	if (!rebase_amend || !rebase_orig_head)
>  		; /* fall through, no split in progress */
>  	else if (!strcmp(rebase_amend, rebase_orig_head))
> -		split_in_progress = !!strcmp(head, rebase_amend);
> -	else if (strcmp(orig_head, rebase_orig_head))
> +		split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend);
> +	else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head))
>  		split_in_progress = 1;
>  
> -	free(head);
> -	free(orig_head);
>  	free(rebase_amend);
>  	free(rebase_orig_head);
Ramsay Jones Dec. 12, 2023, 11:32 p.m. UTC | #2
On 12/12/2023 20:24, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
>> The current code only works by chance because we only have a single
>> reference backend implementation. Refactor it to instead read both refs
>> via the refdb layer so that we'll also be compatible with alternate
>> reference backends.
> 
> "via the refdb" -> "via the refs API" or something here and on the
> title, and possibly elsewhere in the proposed log messages and
> in-code comments in patches in this series, as I've never seen a
> word "refdb" used in the context of this project.
> 
> I agree it is bad manners to be intimate with the implementation
> details of the how files-backend stores HEAD and ORIG_HEAD.

Hmm, I have never thought of the 'pseudo-refs' as being a part of
the 'reference database' at all. ;)

We seem to have pseudo-refs, special pseudo-refs and (recently)
ex-pseudo-refs!

This patch (well series) changes the 'status' of some, *but not all*,
pseudo-refs; some graduate to full-blown refs stored as part of *a*
reference database (ie reftable).

As far as I recall, this has not been discussed on the ML. Why are
only some chosen to become 'full' refs and others not? This is not
discussed in any of the commit messages.

The '.invalid' HEAD hack featured in a recent completion patch as well.
[If this is because the JAVA implementation does it this way, I think
it needs some thought before including it in the Git project].

Anyway, I haven't had the time to study the details here, so please
ignore my uninformed ramblings!

ATB,
Ramsay Jones
Junio C Hamano Dec. 13, 2023, 12:36 a.m. UTC | #3
Ramsay Jones <ramsay@ramsayjones.plus.com> writes:

>> "via the refdb" -> "via the refs API" or something here and on the
>> title, and possibly elsewhere in the proposed log messages and
>> in-code comments in patches in this series, as I've never seen a
>> word "refdb" used in the context of this project.
>> 
>> I agree it is bad manners to be intimate with the implementation
>> details of the how files-backend stores HEAD and ORIG_HEAD.
>
> Hmm, I have never thought of the 'pseudo-refs' as being a part of
> the 'reference database' at all. ;)

Me neither, but once you start thinking about getting rid of the
need to use one-file-per-ref filesystem, being able to maintain all
refs, including the pseudo refs, in one r/w store backend, becomes a
very tempting goal.  From that point of view, I do not have problem
with the idea to move _all_ pseudorefs to reftable.

But I do have reservations on what Patrick, and the code he
inherited from Han-Wen, calls "special refs" (which is not defined
in the glossary at all), namely, refs.c:is_special_ref() and its
callers.  Neither am I very much sympathetic to the hardcoded list
of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
we need anything more than

    any string that passes refs.c:is_pseudoref_syntax() is a
    pseudoref, is per worktree, and ref backends can store them like
    any other refs.  Many of them have specific meaning and uses
    (e.g. HEAD is "the current branch").

Enumerating existing pseudorefs in files backend may need to
opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
and a corresponding implementation for reftable backend may be much
simpler (because there won't be "other cruft" stored there, unlike
files backend that needs to worry about files that are not refs,
like ".git/config" file.

> We seem to have pseudo-refs, special pseudo-refs and (recently)
> ex-pseudo-refs!
>
> This patch (well series) changes the 'status' of some, *but not all*,
> pseudo-refs; some graduate to full-blown refs stored as part of *a*
> reference database (ie reftable).

Yeah, that leaves bad taste in my mouth, too.
Patrick Steinhardt Dec. 13, 2023, 7:38 a.m. UTC | #4
On Tue, Dec 12, 2023 at 04:36:24PM -0800, Junio C Hamano wrote:
> Ramsay Jones <ramsay@ramsayjones.plus.com> writes:
> 
> >> "via the refdb" -> "via the refs API" or something here and on the
> >> title, and possibly elsewhere in the proposed log messages and
> >> in-code comments in patches in this series, as I've never seen a
> >> word "refdb" used in the context of this project.
> >> 
> >> I agree it is bad manners to be intimate with the implementation
> >> details of the how files-backend stores HEAD and ORIG_HEAD.
> >
> > Hmm, I have never thought of the 'pseudo-refs' as being a part of
> > the 'reference database' at all. ;)
> 
> Me neither, but once you start thinking about getting rid of the
> need to use one-file-per-ref filesystem, being able to maintain all
> refs, including the pseudo refs, in one r/w store backend, becomes a
> very tempting goal.  From that point of view, I do not have problem
> with the idea to move _all_ pseudorefs to reftable.

Yes, we're in agreement.

> But I do have reservations on what Patrick, and the code he
> inherited from Han-Wen, calls "special refs" (which is not defined
> in the glossary at all), namely, refs.c:is_special_ref() and its
> callers.

I do not want to add "special refs" to the glossary because ultimately
they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
Once we're there we can of course discuss whether we want to explicitly
point them out in the glossary and give them a special name.

> Neither am I very much sympathetic to the hardcoded list
> of "known" pseudorefs, refs.c:pseudorefs[].  I cannot quite see why
> we need anything more than


>     any string that passes refs.c:is_pseudoref_syntax() is a
>     pseudoref, is per worktree, and ref backends can store them like
>     any other refs.  Many of them have specific meaning and uses
>     (e.g. HEAD is "the current branch").
> 
> Enumerating existing pseudorefs in files backend may need to
> opendir(".git") + readdir() filtered with is_pseudoref_syntax(),
> and a corresponding implementation for reftable backend may be much
> simpler (because there won't be "other cruft" stored there, unlike
> files backend that needs to worry about files that are not refs,
> like ".git/config" file.
> 
> > We seem to have pseudo-refs, special pseudo-refs and (recently)
> > ex-pseudo-refs!
> >
> > This patch (well series) changes the 'status' of some, *but not all*,
> > pseudo-refs; some graduate to full-blown refs stored as part of *a*
> > reference database (ie reftable).
> 
> Yeah, that leaves bad taste in my mouth, too.

I'm taking an iterative approach to things, which means that we're at
times going to be in an in-between state. I want to avoid changing too
many things at once and overwhelming potential reviewers. But I realize
that I should've done a better job of explaining that this patch series
is not the end goal, but rather a step towards that goal.

The patch series at hand merely records the status quo and rectifies any
inconsistencies we have with accessing such "special" refs. The natural
next step here would be to reduce the list of special refs (like e.g. we
do in patch 4) so that in the end it will only contain those refs which
really are special (FETCH_HEAD, MERGE_HEAD).

Please let me know in case you strongly disagree with my iterative
approach, or whether the issue is rather that I didn't make myself
sufficiently clear.

Patrick
Junio C Hamano Dec. 13, 2023, 3:15 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

>> Me neither, but once you start thinking about getting rid of the
>> need to use one-file-per-ref filesystem, being able to maintain all
>> refs, including the pseudo refs, in one r/w store backend, becomes a
>> very tempting goal.  From that point of view, I do not have problem
>> with the idea to move _all_ pseudorefs to reftable.
>
> Yes, we're in agreement.
>
>> But I do have reservations on what Patrick, and the code he
>> inherited from Han-Wen, calls "special refs" (which is not defined
>> in the glossary at all), namely, refs.c:is_special_ref() and its
>> callers.
>
> I do not want to add "special refs" to the glossary because ultimately
> they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
> Once we're there we can of course discuss whether we want to explicitly
> point them out in the glossary and give them a special name.

OK, I somehow got a (wrong) impression that you are very close to
the finish line.  If the intention is to leave many others still in
the "special" category (for only the reasons of inertia), with a
vision that some selected few must remain "special" with their own
good reasons, then I am perfectly fine.

Thanks.
Patrick Steinhardt Dec. 14, 2023, 9:04 a.m. UTC | #6
On Wed, Dec 13, 2023 at 07:15:33AM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> >> Me neither, but once you start thinking about getting rid of the
> >> need to use one-file-per-ref filesystem, being able to maintain all
> >> refs, including the pseudo refs, in one r/w store backend, becomes a
> >> very tempting goal.  From that point of view, I do not have problem
> >> with the idea to move _all_ pseudorefs to reftable.
> >
> > Yes, we're in agreement.
> >
> >> But I do have reservations on what Patrick, and the code he
> >> inherited from Han-Wen, calls "special refs" (which is not defined
> >> in the glossary at all), namely, refs.c:is_special_ref() and its
> >> callers.
> >
> > I do not want to add "special refs" to the glossary because ultimately
> > they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD.
> > Once we're there we can of course discuss whether we want to explicitly
> > point them out in the glossary and give them a special name.
> 
> OK, I somehow got a (wrong) impression that you are very close to
> the finish line.

You mean with the reftable backend? I indeed am quite close, I've just
finished the last prerequisite ("extensions.refFormat" and related
tooling) today. I will send that patch series upstream for review once
my patches that fix repo initialization with git-clone(1) land in the
"next" branch. The current state at [1] passes CI, even though there
will of course still be bugs which aren't covered by the test suite.

So once all prerequisites that are currently in flight plus the pending
"extensions.refFormat" series have landed I will send the reftable
backend implementation in for review. If things continue to go smoothly
I expect that this may happen at the end of January/start of February.

Anyway. This patch series here is in fact already sufficient to make
reftables work with those special refs. The only thing that we require
in this context is that refs are either exclusively routed through the
filesystem, or exclusively routed through the ref API. If that property
holds then things work just fine.

But still, I do want to clean up the remaining special refs regardless
of that, even though it is not a mandatory prerequisite. I find that the
current state is just plain confusing with all these special cases, and
I'd really love for it to be simplified. Also, I think there is benefit
in having those refs in reftables because it does allow for proper
atomic updates.

> If the intention is to leave many others still in
> the "special" category (for only the reasons of inertia), with a
> vision that some selected few must remain "special" with their own
> good reasons, then I am perfectly fine.

Okay.

Patrick

[1]: https://gitlab.com/gitlab-org/git/-/merge_requests/58
Patrick Steinhardt Dec. 14, 2023, 1:21 p.m. UTC | #7
On Tue, Dec 12, 2023 at 12:24:24PM -0800, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
[snip]
> > diff --git a/wt-status.c b/wt-status.c
> > index 9f45bf6949..fe9e590b80 100644
> > --- a/wt-status.c
> > +++ b/wt-status.c
> > @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename)
> >  static int split_commit_in_progress(struct wt_status *s)
> >  {
> >  	int split_in_progress = 0;
> > -	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
> > +	struct object_id head_oid, orig_head_oid;
> > +	char *rebase_amend, *rebase_orig_head;
> >  
> >  	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
> >  	    !s->branch || strcmp(s->branch, "HEAD"))
> >  		return 0;
> >  
> > -	head = read_line_from_git_path("HEAD");
> > -	orig_head = read_line_from_git_path("ORIG_HEAD");
> > +	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
> > +	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
> > +		return 0;
> > +
> 
> This made me wonder if we have changed behaviour when on an unborn
> branch.  In such a case, the original most likely would have read
> "ref: blah" in "head" and compared it with "rebase_amend", which
> would be a good way to ensure they would not match.  I would not
> know offhand what the updated code would do, but head_oid would be
> uninitialized in such a case, so ...?

The code flow goes as following:

  1. We call `read_ref_full()`, which itself calls
     `refs_resolve_ref_unsafe()`.

  2. It calls `refs_read_raw_ref()`, which succeeds and finds the
     symref.

  3. We notice that this is a symref and that `RESOLVE_REF_NO_RECURSE`
     is set. We thus clear the object ID and return the name of the
     symref target.

  4. Back in `read_ref_full()` we see that `refs_resolve_ref_unsafe()`
     returns the symref target, which we interpret as successful lookup.
     We thus return `0`.

  5. Now we look up "rebase-merge/{amend,orig-head}" and end up
     comparing whatever they contain with the cleared OID resolved from
     HEAD/ORIG_HEAD.

So the OID would not be uninitialized but the zero OID. Now:

  - "rebase-merge/amend" always contains the result of `repo_get_oid()`
    and never contains the zero OID.

  - "rebase-merge/orig-head" may contain the zero OID when there was no
    ORIG_HEAD at the time of starting a rebase or in case it did not
    resolve

So... if ORIG_HEAD was rewritten to be a symref pointing into nirvana
between starting the rebase and calling into "wt-status.c", and when
ORIG_HEAD didn't exist at the time of starting the rebase, then we might
now wrongly report that splitting was in progress.

In other cases the old code was actually doing the wrong thing. Suppose
that ORIG_HEAD was a dangling symref, then we'd have written the zero
OID into "rebase-merge/orig-head". But when calling into "wt-status.c"
now we read the still-dangling symref value and notice that the zero OID
is different than "ref: refs/heads/dangling".

I dunno. It feels like this is one of many cases where as you start to
think deeply about how things behave you realize that it's been broken
all along. On the other hand, I doubt there was even a single user who
ever experienced this issue. It often just needs to be correct enough.

I think the best way to go about this is to check for `REF_ISSSYMREF`
and exit early in that case. We only want to compare direct refs, so
this is closer to the old behaviour and should even fix edge cases like
the above.

Will update.

Patrick
Junio C Hamano Dec. 14, 2023, 4:41 p.m. UTC | #8
Patrick Steinhardt <ps@pks.im> writes:

>> OK, I somehow got a (wrong) impression that you are very close to
>> the finish line.
>
> You mean with the reftable backend?

It would be a major achievement if we just stop bypassing refs API
to read/write ORIG_HEAD and friends, even if we are still using only
the files backend.  And I meant that I got an impression that you
are very close to that, regardless of the readiness of the reftable
support.
diff mbox series

Patch

diff --git a/wt-status.c b/wt-status.c
index 9f45bf6949..fe9e590b80 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1295,26 +1295,27 @@  static char *read_line_from_git_path(const char *filename)
 static int split_commit_in_progress(struct wt_status *s)
 {
 	int split_in_progress = 0;
-	char *head, *orig_head, *rebase_amend, *rebase_orig_head;
+	struct object_id head_oid, orig_head_oid;
+	char *rebase_amend, *rebase_orig_head;
 
 	if ((!s->amend && !s->nowarn && !s->workdir_dirty) ||
 	    !s->branch || strcmp(s->branch, "HEAD"))
 		return 0;
 
-	head = read_line_from_git_path("HEAD");
-	orig_head = read_line_from_git_path("ORIG_HEAD");
+	if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) ||
+	    read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL))
+		return 0;
+
 	rebase_amend = read_line_from_git_path("rebase-merge/amend");
 	rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head");
 
-	if (!head || !orig_head || !rebase_amend || !rebase_orig_head)
+	if (!rebase_amend || !rebase_orig_head)
 		; /* fall through, no split in progress */
 	else if (!strcmp(rebase_amend, rebase_orig_head))
-		split_in_progress = !!strcmp(head, rebase_amend);
-	else if (strcmp(orig_head, rebase_orig_head))
+		split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend);
+	else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head))
 		split_in_progress = 1;
 
-	free(head);
-	free(orig_head);
 	free(rebase_amend);
 	free(rebase_orig_head);