Message ID | 1db3eb3945432964aabe1c559db4c3ac251e83fd.1702365291.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | refs: improve handling of special refs | expand |
Patrick Steinhardt <ps@pks.im> writes: > The current code only works by chance because we only have a single > reference backend implementation. Refactor it to instead read both refs > via the refdb layer so that we'll also be compatible with alternate > reference backends. "via the refdb" -> "via the refs API" or something here and on the title, and possibly elsewhere in the proposed log messages and in-code comments in patches in this series, as I've never seen a word "refdb" used in the context of this project. I agree it is bad manners to be intimate with the implementation details of the how files-backend stores HEAD and ORIG_HEAD. > Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is > because we didn't resolve symrefs before either, and in practice none of > the refs in "rebase-merge/" would be symbolic. We thus don't want to > resolve symrefs with the new code either to retain the old behaviour. Good to see a rewrite being careful like this. > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > wt-status.c | 17 +++++++++-------- > 1 file changed, 9 insertions(+), 8 deletions(-) > > diff --git a/wt-status.c b/wt-status.c > index 9f45bf6949..fe9e590b80 100644 > --- a/wt-status.c > +++ b/wt-status.c > @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename) > static int split_commit_in_progress(struct wt_status *s) > { > int split_in_progress = 0; > - char *head, *orig_head, *rebase_amend, *rebase_orig_head; > + struct object_id head_oid, orig_head_oid; > + char *rebase_amend, *rebase_orig_head; > > if ((!s->amend && !s->nowarn && !s->workdir_dirty) || > !s->branch || strcmp(s->branch, "HEAD")) > return 0; > > - head = read_line_from_git_path("HEAD"); > - orig_head = read_line_from_git_path("ORIG_HEAD"); > + if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) || > + read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL)) > + return 0; > + This made me wonder if we have changed behaviour when on an unborn branch. In such a case, the original most likely would have read "ref: blah" in "head" and compared it with "rebase_amend", which would be a good way to ensure they would not match. I would not know offhand what the updated code would do, but head_oid would be uninitialized in such a case, so ...? > rebase_amend = read_line_from_git_path("rebase-merge/amend"); > rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head"); > > - if (!head || !orig_head || !rebase_amend || !rebase_orig_head) > + if (!rebase_amend || !rebase_orig_head) > ; /* fall through, no split in progress */ > else if (!strcmp(rebase_amend, rebase_orig_head)) > - split_in_progress = !!strcmp(head, rebase_amend); > - else if (strcmp(orig_head, rebase_orig_head)) > + split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend); > + else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head)) > split_in_progress = 1; > > - free(head); > - free(orig_head); > free(rebase_amend); > free(rebase_orig_head);
On 12/12/2023 20:24, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > >> The current code only works by chance because we only have a single >> reference backend implementation. Refactor it to instead read both refs >> via the refdb layer so that we'll also be compatible with alternate >> reference backends. > > "via the refdb" -> "via the refs API" or something here and on the > title, and possibly elsewhere in the proposed log messages and > in-code comments in patches in this series, as I've never seen a > word "refdb" used in the context of this project. > > I agree it is bad manners to be intimate with the implementation > details of the how files-backend stores HEAD and ORIG_HEAD. Hmm, I have never thought of the 'pseudo-refs' as being a part of the 'reference database' at all. ;) We seem to have pseudo-refs, special pseudo-refs and (recently) ex-pseudo-refs! This patch (well series) changes the 'status' of some, *but not all*, pseudo-refs; some graduate to full-blown refs stored as part of *a* reference database (ie reftable). As far as I recall, this has not been discussed on the ML. Why are only some chosen to become 'full' refs and others not? This is not discussed in any of the commit messages. The '.invalid' HEAD hack featured in a recent completion patch as well. [If this is because the JAVA implementation does it this way, I think it needs some thought before including it in the Git project]. Anyway, I haven't had the time to study the details here, so please ignore my uninformed ramblings! ATB, Ramsay Jones
Ramsay Jones <ramsay@ramsayjones.plus.com> writes: >> "via the refdb" -> "via the refs API" or something here and on the >> title, and possibly elsewhere in the proposed log messages and >> in-code comments in patches in this series, as I've never seen a >> word "refdb" used in the context of this project. >> >> I agree it is bad manners to be intimate with the implementation >> details of the how files-backend stores HEAD and ORIG_HEAD. > > Hmm, I have never thought of the 'pseudo-refs' as being a part of > the 'reference database' at all. ;) Me neither, but once you start thinking about getting rid of the need to use one-file-per-ref filesystem, being able to maintain all refs, including the pseudo refs, in one r/w store backend, becomes a very tempting goal. From that point of view, I do not have problem with the idea to move _all_ pseudorefs to reftable. But I do have reservations on what Patrick, and the code he inherited from Han-Wen, calls "special refs" (which is not defined in the glossary at all), namely, refs.c:is_special_ref() and its callers. Neither am I very much sympathetic to the hardcoded list of "known" pseudorefs, refs.c:pseudorefs[]. I cannot quite see why we need anything more than any string that passes refs.c:is_pseudoref_syntax() is a pseudoref, is per worktree, and ref backends can store them like any other refs. Many of them have specific meaning and uses (e.g. HEAD is "the current branch"). Enumerating existing pseudorefs in files backend may need to opendir(".git") + readdir() filtered with is_pseudoref_syntax(), and a corresponding implementation for reftable backend may be much simpler (because there won't be "other cruft" stored there, unlike files backend that needs to worry about files that are not refs, like ".git/config" file. > We seem to have pseudo-refs, special pseudo-refs and (recently) > ex-pseudo-refs! > > This patch (well series) changes the 'status' of some, *but not all*, > pseudo-refs; some graduate to full-blown refs stored as part of *a* > reference database (ie reftable). Yeah, that leaves bad taste in my mouth, too.
On Tue, Dec 12, 2023 at 04:36:24PM -0800, Junio C Hamano wrote: > Ramsay Jones <ramsay@ramsayjones.plus.com> writes: > > >> "via the refdb" -> "via the refs API" or something here and on the > >> title, and possibly elsewhere in the proposed log messages and > >> in-code comments in patches in this series, as I've never seen a > >> word "refdb" used in the context of this project. > >> > >> I agree it is bad manners to be intimate with the implementation > >> details of the how files-backend stores HEAD and ORIG_HEAD. > > > > Hmm, I have never thought of the 'pseudo-refs' as being a part of > > the 'reference database' at all. ;) > > Me neither, but once you start thinking about getting rid of the > need to use one-file-per-ref filesystem, being able to maintain all > refs, including the pseudo refs, in one r/w store backend, becomes a > very tempting goal. From that point of view, I do not have problem > with the idea to move _all_ pseudorefs to reftable. Yes, we're in agreement. > But I do have reservations on what Patrick, and the code he > inherited from Han-Wen, calls "special refs" (which is not defined > in the glossary at all), namely, refs.c:is_special_ref() and its > callers. I do not want to add "special refs" to the glossary because ultimately they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD. Once we're there we can of course discuss whether we want to explicitly point them out in the glossary and give them a special name. > Neither am I very much sympathetic to the hardcoded list > of "known" pseudorefs, refs.c:pseudorefs[]. I cannot quite see why > we need anything more than > any string that passes refs.c:is_pseudoref_syntax() is a > pseudoref, is per worktree, and ref backends can store them like > any other refs. Many of them have specific meaning and uses > (e.g. HEAD is "the current branch"). > > Enumerating existing pseudorefs in files backend may need to > opendir(".git") + readdir() filtered with is_pseudoref_syntax(), > and a corresponding implementation for reftable backend may be much > simpler (because there won't be "other cruft" stored there, unlike > files backend that needs to worry about files that are not refs, > like ".git/config" file. > > > We seem to have pseudo-refs, special pseudo-refs and (recently) > > ex-pseudo-refs! > > > > This patch (well series) changes the 'status' of some, *but not all*, > > pseudo-refs; some graduate to full-blown refs stored as part of *a* > > reference database (ie reftable). > > Yeah, that leaves bad taste in my mouth, too. I'm taking an iterative approach to things, which means that we're at times going to be in an in-between state. I want to avoid changing too many things at once and overwhelming potential reviewers. But I realize that I should've done a better job of explaining that this patch series is not the end goal, but rather a step towards that goal. The patch series at hand merely records the status quo and rectifies any inconsistencies we have with accessing such "special" refs. The natural next step here would be to reduce the list of special refs (like e.g. we do in patch 4) so that in the end it will only contain those refs which really are special (FETCH_HEAD, MERGE_HEAD). Please let me know in case you strongly disagree with my iterative approach, or whether the issue is rather that I didn't make myself sufficiently clear. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> Me neither, but once you start thinking about getting rid of the >> need to use one-file-per-ref filesystem, being able to maintain all >> refs, including the pseudo refs, in one r/w store backend, becomes a >> very tempting goal. From that point of view, I do not have problem >> with the idea to move _all_ pseudorefs to reftable. > > Yes, we're in agreement. > >> But I do have reservations on what Patrick, and the code he >> inherited from Han-Wen, calls "special refs" (which is not defined >> in the glossary at all), namely, refs.c:is_special_ref() and its >> callers. > > I do not want to add "special refs" to the glossary because ultimately > they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD. > Once we're there we can of course discuss whether we want to explicitly > point them out in the glossary and give them a special name. OK, I somehow got a (wrong) impression that you are very close to the finish line. If the intention is to leave many others still in the "special" category (for only the reasons of inertia), with a vision that some selected few must remain "special" with their own good reasons, then I am perfectly fine. Thanks.
On Wed, Dec 13, 2023 at 07:15:33AM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > >> Me neither, but once you start thinking about getting rid of the > >> need to use one-file-per-ref filesystem, being able to maintain all > >> refs, including the pseudo refs, in one r/w store backend, becomes a > >> very tempting goal. From that point of view, I do not have problem > >> with the idea to move _all_ pseudorefs to reftable. > > > > Yes, we're in agreement. > > > >> But I do have reservations on what Patrick, and the code he > >> inherited from Han-Wen, calls "special refs" (which is not defined > >> in the glossary at all), namely, refs.c:is_special_ref() and its > >> callers. > > > > I do not want to add "special refs" to the glossary because ultimately > > they should go away, with two exceptions: FETCH_HEAD and MERGE_HEAD. > > Once we're there we can of course discuss whether we want to explicitly > > point them out in the glossary and give them a special name. > > OK, I somehow got a (wrong) impression that you are very close to > the finish line. You mean with the reftable backend? I indeed am quite close, I've just finished the last prerequisite ("extensions.refFormat" and related tooling) today. I will send that patch series upstream for review once my patches that fix repo initialization with git-clone(1) land in the "next" branch. The current state at [1] passes CI, even though there will of course still be bugs which aren't covered by the test suite. So once all prerequisites that are currently in flight plus the pending "extensions.refFormat" series have landed I will send the reftable backend implementation in for review. If things continue to go smoothly I expect that this may happen at the end of January/start of February. Anyway. This patch series here is in fact already sufficient to make reftables work with those special refs. The only thing that we require in this context is that refs are either exclusively routed through the filesystem, or exclusively routed through the ref API. If that property holds then things work just fine. But still, I do want to clean up the remaining special refs regardless of that, even though it is not a mandatory prerequisite. I find that the current state is just plain confusing with all these special cases, and I'd really love for it to be simplified. Also, I think there is benefit in having those refs in reftables because it does allow for proper atomic updates. > If the intention is to leave many others still in > the "special" category (for only the reasons of inertia), with a > vision that some selected few must remain "special" with their own > good reasons, then I am perfectly fine. Okay. Patrick [1]: https://gitlab.com/gitlab-org/git/-/merge_requests/58
On Tue, Dec 12, 2023 at 12:24:24PM -0800, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: [snip] > > diff --git a/wt-status.c b/wt-status.c > > index 9f45bf6949..fe9e590b80 100644 > > --- a/wt-status.c > > +++ b/wt-status.c > > @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename) > > static int split_commit_in_progress(struct wt_status *s) > > { > > int split_in_progress = 0; > > - char *head, *orig_head, *rebase_amend, *rebase_orig_head; > > + struct object_id head_oid, orig_head_oid; > > + char *rebase_amend, *rebase_orig_head; > > > > if ((!s->amend && !s->nowarn && !s->workdir_dirty) || > > !s->branch || strcmp(s->branch, "HEAD")) > > return 0; > > > > - head = read_line_from_git_path("HEAD"); > > - orig_head = read_line_from_git_path("ORIG_HEAD"); > > + if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) || > > + read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL)) > > + return 0; > > + > > This made me wonder if we have changed behaviour when on an unborn > branch. In such a case, the original most likely would have read > "ref: blah" in "head" and compared it with "rebase_amend", which > would be a good way to ensure they would not match. I would not > know offhand what the updated code would do, but head_oid would be > uninitialized in such a case, so ...? The code flow goes as following: 1. We call `read_ref_full()`, which itself calls `refs_resolve_ref_unsafe()`. 2. It calls `refs_read_raw_ref()`, which succeeds and finds the symref. 3. We notice that this is a symref and that `RESOLVE_REF_NO_RECURSE` is set. We thus clear the object ID and return the name of the symref target. 4. Back in `read_ref_full()` we see that `refs_resolve_ref_unsafe()` returns the symref target, which we interpret as successful lookup. We thus return `0`. 5. Now we look up "rebase-merge/{amend,orig-head}" and end up comparing whatever they contain with the cleared OID resolved from HEAD/ORIG_HEAD. So the OID would not be uninitialized but the zero OID. Now: - "rebase-merge/amend" always contains the result of `repo_get_oid()` and never contains the zero OID. - "rebase-merge/orig-head" may contain the zero OID when there was no ORIG_HEAD at the time of starting a rebase or in case it did not resolve So... if ORIG_HEAD was rewritten to be a symref pointing into nirvana between starting the rebase and calling into "wt-status.c", and when ORIG_HEAD didn't exist at the time of starting the rebase, then we might now wrongly report that splitting was in progress. In other cases the old code was actually doing the wrong thing. Suppose that ORIG_HEAD was a dangling symref, then we'd have written the zero OID into "rebase-merge/orig-head". But when calling into "wt-status.c" now we read the still-dangling symref value and notice that the zero OID is different than "ref: refs/heads/dangling". I dunno. It feels like this is one of many cases where as you start to think deeply about how things behave you realize that it's been broken all along. On the other hand, I doubt there was even a single user who ever experienced this issue. It often just needs to be correct enough. I think the best way to go about this is to check for `REF_ISSSYMREF` and exit early in that case. We only want to compare direct refs, so this is closer to the old behaviour and should even fix edge cases like the above. Will update. Patrick
Patrick Steinhardt <ps@pks.im> writes: >> OK, I somehow got a (wrong) impression that you are very close to >> the finish line. > > You mean with the reftable backend? It would be a major achievement if we just stop bypassing refs API to read/write ORIG_HEAD and friends, even if we are still using only the files backend. And I meant that I got an impression that you are very close to that, regardless of the readiness of the reftable support.
diff --git a/wt-status.c b/wt-status.c index 9f45bf6949..fe9e590b80 100644 --- a/wt-status.c +++ b/wt-status.c @@ -1295,26 +1295,27 @@ static char *read_line_from_git_path(const char *filename) static int split_commit_in_progress(struct wt_status *s) { int split_in_progress = 0; - char *head, *orig_head, *rebase_amend, *rebase_orig_head; + struct object_id head_oid, orig_head_oid; + char *rebase_amend, *rebase_orig_head; if ((!s->amend && !s->nowarn && !s->workdir_dirty) || !s->branch || strcmp(s->branch, "HEAD")) return 0; - head = read_line_from_git_path("HEAD"); - orig_head = read_line_from_git_path("ORIG_HEAD"); + if (read_ref_full("HEAD", RESOLVE_REF_NO_RECURSE, &head_oid, NULL) || + read_ref_full("ORIG_HEAD", RESOLVE_REF_NO_RECURSE, &orig_head_oid, NULL)) + return 0; + rebase_amend = read_line_from_git_path("rebase-merge/amend"); rebase_orig_head = read_line_from_git_path("rebase-merge/orig-head"); - if (!head || !orig_head || !rebase_amend || !rebase_orig_head) + if (!rebase_amend || !rebase_orig_head) ; /* fall through, no split in progress */ else if (!strcmp(rebase_amend, rebase_orig_head)) - split_in_progress = !!strcmp(head, rebase_amend); - else if (strcmp(orig_head, rebase_orig_head)) + split_in_progress = !!strcmp(oid_to_hex(&head_oid), rebase_amend); + else if (strcmp(oid_to_hex(&orig_head_oid), rebase_orig_head)) split_in_progress = 1; - free(head); - free(orig_head); free(rebase_amend); free(rebase_orig_head);
We read both the HEAD and ORIG_HEAD references directly from the filesystem in order to figure out whether we're currently splitting a commit. If both of the following are true: - HEAD points to the same object as "rebase-merge/amend". - ORIG_HEAD points to the same object as "rebase-merge/orig-head". Then we are currently splitting commits. The current code only works by chance because we only have a single reference backend implementation. Refactor it to instead read both refs via the refdb layer so that we'll also be compatible with alternate reference backends. Note that we pass `RESOLVE_REF_NO_RECURSE` to `read_ref_full()`. This is because we didn't resolve symrefs before either, and in practice none of the refs in "rebase-merge/" would be symbolic. We thus don't want to resolve symrefs with the new code either to retain the old behaviour. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- wt-status.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)