Message ID | 0e38103114a206bedbbbd7ea97cb77fa05fd3c29.1701243201.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | refs: improve handling of special refs | expand |
On Wed, Nov 29, 2023 at 09:14:20AM +0100, Patrick Steinhardt wrote: > We have some references that are more special than others. The reason > for them being special is that they either do not follow the usual > format of references, or that they are written to the filesystem > directly by the respective owning subsystem and thus circumvent the > reference backend. > > This works perfectly fine right now because the reffiles backend will > know how to read those refs just fine. But with the prospect of gaining > a new reference backend implementation we need to be a lot more careful > here: > > - We need to make sure that we are consistent about how those refs are > written. They must either always be written via the filesystem, or > they must always be written via the reference backend. Any mixture > will lead to inconsistent state. > > - We need to make sure that such special refs are always handled > specially when reading them. > > We're already mostly good with regard to the first item, except for > `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit. > But the current list of special refs is missing a lot of refs that > really should be treated specially. Right now, we only treat > `FETCH_HEAD` and `MERGE_HEAD` specially here. > > Introduce a new function `is_special_ref()` that contains all current > instances of special refs to fix the reading path. > > Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com> > Signed-off-by: Patrick Steinhardt <ps@pks.im> > --- > refs.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > 1 file changed, 56 insertions(+), 2 deletions(-) > > diff --git a/refs.c b/refs.c > index 7d4a057f36..2d39d3fe80 100644 > --- a/refs.c > +++ b/refs.c > @@ -1822,15 +1822,69 @@ static int refs_read_special_head(struct ref_store *ref_store, > return result; > } > > +static int is_special_ref(const char *refname) > +{ > + /* > + * Special references get written and read directly via the filesystem > + * by the subsystems that create them. Thus, they must not go through > + * the reference backend but must instead be read directly. It is > + * arguable whether this behaviour is sensible, or whether it's simply > + * a leaky abstraction enabled by us only having a single reference > + * backend implementation. But at least for a subset of references it > + * indeed does make sense to treat them specially: > + * > + * - FETCH_HEAD may contain multiple object IDs, and each one of them > + * carries additional metadata like where it came from. > + * > + * - MERGE_HEAD may contain multiple object IDs when merging multiple > + * heads. > + * > + * - "rebase-apply/" and "rebase-merge/" contain all of the state for > + * rebases, where keeping it closely together feels sensible. > + * > + * There are some exceptions that you might expect to see on this list > + * but which are handled exclusively via the reference backend: > + * > + * - CHERRY_PICK_HEAD > + * - HEAD > + * - ORIG_HEAD > + * > + * Writing or deleting references must consistently go either through > + * the filesystem (special refs) or through the reference backend > + * (normal ones). > + */ > + const char * const special_refs[] = { > + "AUTO_MERGE", > + "BISECT_EXPECTED_REV", > + "FETCH_HEAD", > + "MERGE_AUTOSTASH", > + "MERGE_HEAD", > + }; Is there a reason that we don't want to declare this statically? If we did, I think we could drop one const, since the strings would instead reside in the .rodata section. > + int i; Not that it matters for this case, but it may be worth declaring i to be an unsigned type, since it's used as an index into an array. size_t seems like an appropriate choice there. > + for (i = 0; i < ARRAY_SIZE(special_refs); i++) > + if (!strcmp(refname, special_refs[i])) > + return 1; > + > + /* > + * git-rebase(1) stores its state in `rebase-apply/` or > + * `rebase-merge/`, including various reference-like bits. > + */ > + if (starts_with(refname, "rebase-apply/") || > + starts_with(refname, "rebase-merge/")) Do we care about case sensitivity here? Definitely not on case-sensitive filesystems, but I'm not sure about case-insensitive ones. For instance, on macOS, I can do: $ git rev-parse hEAd and get the same value as "git rev-parse HEAD" (on my Linux workstation, this fails as expected). I doubt that there are many users in the wild asking to resolve reBASe-APPLY/xyz, but I think that after this patch that would no longer work as-is, so we may want to replace this with istarts_with() instead. Thanks, Taylor
On Wed, Nov 29, 2023 at 04:59:35PM -0500, Taylor Blau wrote: > On Wed, Nov 29, 2023 at 09:14:20AM +0100, Patrick Steinhardt wrote: > > We have some references that are more special than others. The reason > > for them being special is that they either do not follow the usual > > format of references, or that they are written to the filesystem > > directly by the respective owning subsystem and thus circumvent the > > reference backend. > > > > This works perfectly fine right now because the reffiles backend will > > know how to read those refs just fine. But with the prospect of gaining > > a new reference backend implementation we need to be a lot more careful > > here: > > > > - We need to make sure that we are consistent about how those refs are > > written. They must either always be written via the filesystem, or > > they must always be written via the reference backend. Any mixture > > will lead to inconsistent state. > > > > - We need to make sure that such special refs are always handled > > specially when reading them. > > > > We're already mostly good with regard to the first item, except for > > `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit. > > But the current list of special refs is missing a lot of refs that > > really should be treated specially. Right now, we only treat > > `FETCH_HEAD` and `MERGE_HEAD` specially here. > > > > Introduce a new function `is_special_ref()` that contains all current > > instances of special refs to fix the reading path. > > > > Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com> > > Signed-off-by: Patrick Steinhardt <ps@pks.im> > > --- > > refs.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- > > 1 file changed, 56 insertions(+), 2 deletions(-) > > > > diff --git a/refs.c b/refs.c > > index 7d4a057f36..2d39d3fe80 100644 > > --- a/refs.c > > +++ b/refs.c > > @@ -1822,15 +1822,69 @@ static int refs_read_special_head(struct ref_store *ref_store, > > return result; > > } > > > > +static int is_special_ref(const char *refname) > > +{ > > + /* > > + * Special references get written and read directly via the filesystem > > + * by the subsystems that create them. Thus, they must not go through > > + * the reference backend but must instead be read directly. It is > > + * arguable whether this behaviour is sensible, or whether it's simply > > + * a leaky abstraction enabled by us only having a single reference > > + * backend implementation. But at least for a subset of references it > > + * indeed does make sense to treat them specially: > > + * > > + * - FETCH_HEAD may contain multiple object IDs, and each one of them > > + * carries additional metadata like where it came from. > > + * > > + * - MERGE_HEAD may contain multiple object IDs when merging multiple > > + * heads. > > + * > > + * - "rebase-apply/" and "rebase-merge/" contain all of the state for > > + * rebases, where keeping it closely together feels sensible. > > + * > > + * There are some exceptions that you might expect to see on this list > > + * but which are handled exclusively via the reference backend: > > + * > > + * - CHERRY_PICK_HEAD > > + * - HEAD > > + * - ORIG_HEAD > > + * > > + * Writing or deleting references must consistently go either through > > + * the filesystem (special refs) or through the reference backend > > + * (normal ones). > > + */ > > + const char * const special_refs[] = { > > + "AUTO_MERGE", > > + "BISECT_EXPECTED_REV", > > + "FETCH_HEAD", > > + "MERGE_AUTOSTASH", > > + "MERGE_HEAD", > > + }; > > Is there a reason that we don't want to declare this statically? If we > did, I think we could drop one const, since the strings would instead > reside in the .rodata section. Not really, no. > > + int i; > > Not that it matters for this case, but it may be worth declaring i to be > an unsigned type, since it's used as an index into an array. size_t > seems like an appropriate choice there. Hm. We do use `int` almost everywhere when iterating through an array via `ARRAY_SIZE`, but ultimately I don't mind whether it's `int`, `unsigned` or `size_t`. > > + for (i = 0; i < ARRAY_SIZE(special_refs); i++) > > + if (!strcmp(refname, special_refs[i])) > > + return 1; > > + > > + /* > > + * git-rebase(1) stores its state in `rebase-apply/` or > > + * `rebase-merge/`, including various reference-like bits. > > + */ > > + if (starts_with(refname, "rebase-apply/") || > > + starts_with(refname, "rebase-merge/")) > > Do we care about case sensitivity here? Definitely not on case-sensitive > filesystems, but I'm not sure about case-insensitive ones. For instance, > on macOS, I can do: > > $ git rev-parse hEAd > > and get the same value as "git rev-parse HEAD" (on my Linux workstation, > this fails as expected). > > I doubt that there are many users in the wild asking to resolve > reBASe-APPLY/xyz, but I think that after this patch that would no longer > work as-is, so we may want to replace this with istarts_with() instead. In practice I'd argue that nobody is ever going to ask for something in `rebase-apply/` outside of Git internals or scripts, and I'd expect these to always use proper casing. So I rather lean towards a "no, we don't care about case sensitivity". Patrick
Hi Patrick Thanks for working on this. I've left a couple of thought below. On 29/11/2023 08:14, Patrick Steinhardt wrote: > +static int is_special_ref(const char *refname) > +{ > + /* > + * Special references get written and read directly via the filesystem > + * by the subsystems that create them. Thus, they must not go through > + * the reference backend but must instead be read directly. It is > + * arguable whether this behaviour is sensible, or whether it's simply > + * a leaky abstraction enabled by us only having a single reference > + * backend implementation. But at least for a subset of references it > + * indeed does make sense to treat them specially: > + * > + * - FETCH_HEAD may contain multiple object IDs, and each one of them > + * carries additional metadata like where it came from. > + * > + * - MERGE_HEAD may contain multiple object IDs when merging multiple > + * heads. > + * > + * - "rebase-apply/" and "rebase-merge/" contain all of the state for > + * rebases, where keeping it closely together feels sensible. I'd really like to get away from treating these files as refs. I think their use as refs is purely historic and predates the reflog and possibly ORIG_HEAD. These days I'm not sure there is a good reason to be running git rev-parse rebase-merge/orig-head One reason for not wanting to treat them as refs is that we do not handle multi-level refs that do not begin with "refs/" consistently. git update-ref foo/bar HEAD succeeds and creates .git/foo/bar but git update-ref -d foo/bar fails with error: refusing to update ref with bad name 'foo/bar' To me it would make sense to refuse to create 'foo/bar' but allow an existing ref named 'foo/bar' to be deleted but the current behavior is the opposite of that. I'd be quite happy to see us refuse to treat anything that fails if (starts_with(refname, "refs/") || refname_is_safe(refname)) as a ref but I don't know how much pain that would cause. > + const char * const special_refs[] = { > + "AUTO_MERGE", Is there any reason to treat this specially in the long term? It points to a tree rather than a commit but unlike MERGE_HEAD and FETCH_HEAD it is effectively a "normal" ref. > + "BISECT_EXPECTED_REV", > + "FETCH_HEAD", > + "MERGE_AUTOSTASH", Should we be treating this as a ref? I thought it was written as an implementation detail of the autostash implementation rather than to provide a ref for users and scripts. Best Wishes Phillip
On Thu, Nov 30, 2023 at 03:42:06PM +0000, Phillip Wood wrote: > Hi Patrick > > Thanks for working on this. I've left a couple of thought below. > > On 29/11/2023 08:14, Patrick Steinhardt wrote: > > +static int is_special_ref(const char *refname) > > +{ > > + /* > > + * Special references get written and read directly via the filesystem > > + * by the subsystems that create them. Thus, they must not go through > > + * the reference backend but must instead be read directly. It is > > + * arguable whether this behaviour is sensible, or whether it's simply > > + * a leaky abstraction enabled by us only having a single reference > > + * backend implementation. But at least for a subset of references it > > + * indeed does make sense to treat them specially: > > + * > > + * - FETCH_HEAD may contain multiple object IDs, and each one of them > > + * carries additional metadata like where it came from. > > + * > > + * - MERGE_HEAD may contain multiple object IDs when merging multiple > > + * heads. > > + * > > + * - "rebase-apply/" and "rebase-merge/" contain all of the state for > > + * rebases, where keeping it closely together feels sensible. > > I'd really like to get away from treating these files as refs. I think their > use as refs is purely historic and predates the reflog and possibly > ORIG_HEAD. These days I'm not sure there is a good reason to be running > > git rev-parse rebase-merge/orig-head > > One reason for not wanting to treat them as refs is that we do not handle > multi-level refs that do not begin with "refs/" consistently. > > git update-ref foo/bar HEAD > > succeeds and creates .git/foo/bar but > > git update-ref -d foo/bar > > fails with > > error: refusing to update ref with bad name 'foo/bar' > > To me it would make sense to refuse to create 'foo/bar' but allow an > existing ref named 'foo/bar' to be deleted but the current behavior is the > opposite of that. > > I'd be quite happy to see us refuse to treat anything that fails > > if (starts_with(refname, "refs/") || refname_is_safe(refname)) > > as a ref but I don't know how much pain that would cause. Well, we already do use these internally as references, but I don't disagree with you. I think the current state is extremely confusing, which is why my first approach was to simply document what falls into the category of these "special" references. In my mind, this patch series here is a first step towards addressing the problem more generally. For now it is more or less only documenting _what_ is a special ref and why they are special, while also ensuring that these refs are compatible with the reftable backend. But once this lands, I'd certainly want to see us continue to iterate on this. Most importantly, I'd love to see us address two issues: - Start to refuse writing these special refs via the refdb so that the rules I've now layed out are also enforced. This would also address your point about things being inconsistent. - Gradually reduce the list of special refs so that they are reduced to a bare minimum and so that most refs are simply that, a normal ref. > > + const char * const special_refs[] = { > > + "AUTO_MERGE", > > Is there any reason to treat this specially in the long term? It points to a > tree rather than a commit but unlike MERGE_HEAD and FETCH_HEAD it is > effectively a "normal" ref. No, I'd love to see this and others converted to become a normal ref eventually. The goal of this patch series was mostly to document what we already have, and address those cases which are inconsistent with the new rules. But I'd be happy to convert more of these special refs to become normal refs after it lands. > > + "BISECT_EXPECTED_REV", > > + "FETCH_HEAD", > > + "MERGE_AUTOSTASH", > > Should we be treating this as a ref? I thought it was written as an > implementation detail of the autostash implementation rather than to provide > a ref for users and scripts. Yes, we have to in the context of the reftable backend. There's a bunch of tests that exercise our ability to parse this as a ref, and they would otherwise fail with the reftable backend. Patrick
Hi Patrick On 01/12/2023 06:43, Patrick Steinhardt wrote: > On Thu, Nov 30, 2023 at 03:42:06PM +0000, Phillip Wood wrote: >> Hi Patrick >> >> Thanks for working on this. I've left a couple of thought below. >> >> On 29/11/2023 08:14, Patrick Steinhardt wrote: >>> +static int is_special_ref(const char *refname) >>> +{ >>> + /* >>> + * Special references get written and read directly via the filesystem >>> + * by the subsystems that create them. Thus, they must not go through >>> + * the reference backend but must instead be read directly. It is >>> + * arguable whether this behaviour is sensible, or whether it's simply >>> + * a leaky abstraction enabled by us only having a single reference >>> + * backend implementation. But at least for a subset of references it >>> + * indeed does make sense to treat them specially: >>> + * >>> + * - FETCH_HEAD may contain multiple object IDs, and each one of them >>> + * carries additional metadata like where it came from. >>> + * >>> + * - MERGE_HEAD may contain multiple object IDs when merging multiple >>> + * heads. >>> + * >>> + * - "rebase-apply/" and "rebase-merge/" contain all of the state for >>> + * rebases, where keeping it closely together feels sensible. >> >> I'd really like to get away from treating these files as refs. I think their >> use as refs is purely historic and predates the reflog and possibly >> ORIG_HEAD. These days I'm not sure there is a good reason to be running >> >> git rev-parse rebase-merge/orig-head >> >> One reason for not wanting to treat them as refs is that we do not handle >> multi-level refs that do not begin with "refs/" consistently. >> >> git update-ref foo/bar HEAD >> >> succeeds and creates .git/foo/bar but >> >> git update-ref -d foo/bar >> >> fails with >> >> error: refusing to update ref with bad name 'foo/bar' >> >> To me it would make sense to refuse to create 'foo/bar' but allow an >> existing ref named 'foo/bar' to be deleted but the current behavior is the >> opposite of that. >> >> I'd be quite happy to see us refuse to treat anything that fails >> >> if (starts_with(refname, "refs/") || refname_is_safe(refname)) >> >> as a ref but I don't know how much pain that would cause. > > Well, we already do use these internally as references, but I don't > disagree with you. I should have been clearer that I was talking about the refs starting "rebase-*" rather than FETCH_HEAD and MERGE_HEAD. As a user find it convenient to be able to run "git fetch ... && git log -p FETCH_HEAD" even if the implementation is a bit ugly. As far as I can see we do not use "rebase-(apply|merge)/(orig-head|amend|autostash)" as a ref in our code or tests. > I think the current state is extremely confusing, > which is why my first approach was to simply document what falls into > the category of these "special" references. That's certainly a good place to start > In my mind, this patch series here is a first step towards addressing > the problem more generally. For now it is more or less only documenting > _what_ is a special ref and why they are special, while also ensuring > that these refs are compatible with the reftable backend. But once this > lands, I'd certainly want to see us continue to iterate on this. > > Most importantly, I'd love to see us address two issues: > > - Start to refuse writing these special refs via the refdb so that > the rules I've now layed out are also enforced. This would also > address your point about things being inconsistent. > > - Gradually reduce the list of special refs so that they are reduced > to a bare minimum and so that most refs are simply that, a normal > ref. That sounds like a good plan >>> + const char * const special_refs[] = { >>> + "AUTO_MERGE", >> >> Is there any reason to treat this specially in the long term? It points to a >> tree rather than a commit but unlike MERGE_HEAD and FETCH_HEAD it is >> effectively a "normal" ref. > > No, I'd love to see this and others converted to become a normal ref > eventually. The goal of this patch series was mostly to document what we > already have, and address those cases which are inconsistent with the > new rules. But I'd be happy to convert more of these special refs to > become normal refs after it lands. That's great >>> + "BISECT_EXPECTED_REV", >>> + "FETCH_HEAD", >>> + "MERGE_AUTOSTASH", >> >> Should we be treating this as a ref? I thought it was written as an >> implementation detail of the autostash implementation rather than to provide >> a ref for users and scripts. > > Yes, we have to in the context of the reftable backend. There's a bunch > of tests that exercise our ability to parse this as a ref, and they > would otherwise fail with the reftable backend. Ah, looking at the the man page for "git merge" it seems we do actually document the existence of MERGE_AUTOSTASH so it is not just an implementation detail after all. Best Wishes Phillip
diff --git a/refs.c b/refs.c index 7d4a057f36..2d39d3fe80 100644 --- a/refs.c +++ b/refs.c @@ -1822,15 +1822,69 @@ static int refs_read_special_head(struct ref_store *ref_store, return result; } +static int is_special_ref(const char *refname) +{ + /* + * Special references get written and read directly via the filesystem + * by the subsystems that create them. Thus, they must not go through + * the reference backend but must instead be read directly. It is + * arguable whether this behaviour is sensible, or whether it's simply + * a leaky abstraction enabled by us only having a single reference + * backend implementation. But at least for a subset of references it + * indeed does make sense to treat them specially: + * + * - FETCH_HEAD may contain multiple object IDs, and each one of them + * carries additional metadata like where it came from. + * + * - MERGE_HEAD may contain multiple object IDs when merging multiple + * heads. + * + * - "rebase-apply/" and "rebase-merge/" contain all of the state for + * rebases, where keeping it closely together feels sensible. + * + * There are some exceptions that you might expect to see on this list + * but which are handled exclusively via the reference backend: + * + * - CHERRY_PICK_HEAD + * - HEAD + * - ORIG_HEAD + * + * Writing or deleting references must consistently go either through + * the filesystem (special refs) or through the reference backend + * (normal ones). + */ + const char * const special_refs[] = { + "AUTO_MERGE", + "BISECT_EXPECTED_REV", + "FETCH_HEAD", + "MERGE_AUTOSTASH", + "MERGE_HEAD", + }; + int i; + + for (i = 0; i < ARRAY_SIZE(special_refs); i++) + if (!strcmp(refname, special_refs[i])) + return 1; + + /* + * git-rebase(1) stores its state in `rebase-apply/` or + * `rebase-merge/`, including various reference-like bits. + */ + if (starts_with(refname, "rebase-apply/") || + starts_with(refname, "rebase-merge/")) + return 1; + + return 0; +} + int refs_read_raw_ref(struct ref_store *ref_store, const char *refname, struct object_id *oid, struct strbuf *referent, unsigned int *type, int *failure_errno) { assert(failure_errno); - if (!strcmp(refname, "FETCH_HEAD") || !strcmp(refname, "MERGE_HEAD")) { + if (is_special_ref(refname)) return refs_read_special_head(ref_store, refname, oid, referent, type, failure_errno); - } return ref_store->be->read_raw_ref(ref_store, refname, oid, referent, type, failure_errno);
We have some references that are more special than others. The reason for them being special is that they either do not follow the usual format of references, or that they are written to the filesystem directly by the respective owning subsystem and thus circumvent the reference backend. This works perfectly fine right now because the reffiles backend will know how to read those refs just fine. But with the prospect of gaining a new reference backend implementation we need to be a lot more careful here: - We need to make sure that we are consistent about how those refs are written. They must either always be written via the filesystem, or they must always be written via the reference backend. Any mixture will lead to inconsistent state. - We need to make sure that such special refs are always handled specially when reading them. We're already mostly good with regard to the first item, except for `BISECT_EXPECTED_REV` which will be addressed in a subsequent commit. But the current list of special refs is missing a lot of refs that really should be treated specially. Right now, we only treat `FETCH_HEAD` and `MERGE_HEAD` specially here. Introduce a new function `is_special_ref()` that contains all current instances of special refs to fix the reading path. Based-on-patch-by: Han-Wen Nienhuys <hanwenn@gmail.com> Signed-off-by: Patrick Steinhardt <ps@pks.im> --- refs.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 56 insertions(+), 2 deletions(-)