diff mbox series

[v4,1/5] refs: introduce `is_pseudoref()` and `is_headref()`

Message ID 20240211183923.131278-2-karthik.188@gmail.com (mailing list archive)
State Superseded
Headers show
Series for-each-ref: add '--include-root-refs' option | expand

Commit Message

Karthik Nayak Feb. 11, 2024, 6:39 p.m. UTC
Introduce two new functions `is_pseudoref()` and `is_headref()`. This
provides the necessary functionality for us to add pseudorefs and HEAD
to the loose ref cache in the files backend, allowing us to build
tooling to print these refs.

The `is_pseudoref()` function internally calls `is_pseudoref_syntax()`
but adds onto it by also checking to ensure that the pseudoref either
ends with a "_HEAD" suffix or matches a list of exceptions. After which
we also parse the contents of the pseudoref to ensure that it conforms
to the ref format.

We cannot directly add the new syntax checks to `is_pseudoref_syntax()`
because the function is also used by `is_current_worktree_ref()` and
making it stricter to match only known pseudorefs might have unintended
consequences due to files like 'BISECT_START' which isn't a pseudoref
but sometimes contains object ID.

Keeping this in mind, we leave `is_pseudoref_syntax()` as is and create
`is_pseudoref()` which is stricter. Ideally we'd want to move the new
syntax checks to `is_pseudoref_syntax()` but a prerequisite for this
would be to actually remove the exception list by converting those
pseudorefs to also contain a '_HEAD' suffix and perhaps move bisect
related files like 'BISECT_START' to a new directory similar to the
'rebase-merge' directory.

Helped-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs.c | 41 +++++++++++++++++++++++++++++++++++++++++
 refs.h |  3 +++
 2 files changed, 44 insertions(+)

Comments

Patrick Steinhardt Feb. 12, 2024, 12:47 p.m. UTC | #1
On Sun, Feb 11, 2024 at 07:39:19PM +0100, Karthik Nayak wrote:
> Introduce two new functions `is_pseudoref()` and `is_headref()`. This
> provides the necessary functionality for us to add pseudorefs and HEAD
> to the loose ref cache in the files backend, allowing us to build
> tooling to print these refs.
> 
> The `is_pseudoref()` function internally calls `is_pseudoref_syntax()`
> but adds onto it by also checking to ensure that the pseudoref either
> ends with a "_HEAD" suffix or matches a list of exceptions. After which
> we also parse the contents of the pseudoref to ensure that it conforms
> to the ref format.
> 
> We cannot directly add the new syntax checks to `is_pseudoref_syntax()`
> because the function is also used by `is_current_worktree_ref()` and
> making it stricter to match only known pseudorefs might have unintended
> consequences due to files like 'BISECT_START' which isn't a pseudoref
> but sometimes contains object ID.
> 
> Keeping this in mind, we leave `is_pseudoref_syntax()` as is and create
> `is_pseudoref()` which is stricter. Ideally we'd want to move the new
> syntax checks to `is_pseudoref_syntax()` but a prerequisite for this
> would be to actually remove the exception list by converting those
> pseudorefs to also contain a '_HEAD' suffix and perhaps move bisect
> related files like 'BISECT_START' to a new directory similar to the
> 'rebase-merge' directory.
> 
> Helped-by: Jeff King <peff@peff.net>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
> ---
>  refs.c | 41 +++++++++++++++++++++++++++++++++++++++++
>  refs.h |  3 +++
>  2 files changed, 44 insertions(+)
> 
> diff --git a/refs.c b/refs.c
> index fff343c256..d8e4cf9a11 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -860,6 +860,47 @@ static int is_pseudoref_syntax(const char *refname)
>  	return 1;
>  }
>  
> +int is_pseudoref(struct ref_store *refs, const char *refname)
> +{
> +	static const char *const irregular_pseudorefs[] = {
> +		"AUTO_MERGE",
> +		"BISECT_EXPECTED_REV",
> +		"NOTES_MERGE_PARTIAL",
> +		"NOTES_MERGE_REF",
> +		"MERGE_AUTOSTASH",
> +	};
> +	struct object_id oid;
> +	size_t i;
> +
> +	if (!is_pseudoref_syntax(refname))
> +		return 0;
> +
> +	if (ends_with(refname, "_HEAD")) {
> +		refs_resolve_ref_unsafe(refs, refname,
> +   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +   					&oid, NULL);
> +   		return !is_null_oid(&oid);
> +	}

I think it's quite confusing that `is_pseudoref()` not only checks
whether the refname may be a pseudoref, but also whether it actually
exists. Furthermore, why is a pseudoref only considered to exist in case
it's not a symbolic ref? That sounds overly restrictive to me.

So I think this at least needs to be renamed. But I find it really hard
to come up with a proper name here because in my opinion the function
does too much. `is_existing_pseudoref()` feels much too specific to me.
Also, the "reftable" backend wouldn't need to check whether the ref
exists, but only whether a name that it encounters is a pseudoref name
or not.

> +	for (i = 0; i < ARRAY_SIZE(irregular_pseudorefs); i++)
> +		if (!strcmp(refname, irregular_pseudorefs[i])) {
> +			refs_resolve_ref_unsafe(refs, refname,
> +   						RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +   						&oid, NULL);
> +			return !is_null_oid(&oid);
> +		}
> +
> +	return 0;
> +}
> +
> +int is_headref(struct ref_store *refs, const char *refname)
> +{
> +	if (!strcmp(refname, "HEAD"))
> +		return refs_ref_exists(refs, refname);
> +
> +	return 0;
> +}

The same comment applies here, as well.

I also worry a bit about the API we have. It becomes really hard to
figure out which function to call now as the API surface seems to
explode. We have:

  - is_pseudoref_syntax
  - is_pseudoref
  - is_headref
  - check_refname_format
  - refname_is_safe

I wonder whether we can maybe consolidate the interface into one or
maybe even two functions where the behaviour can be tweaked with a flag
field. Something like `refname_is_valid()` with a bunch of flags:

  - REFNAME_ACCEPT_HEAD to accept "HEAD"
  - REFNAME_ACCEPT_PSEUDOREF to accept all of the refs ending with
    "_HEAD" or being one of the irregular pseudorefs.
  - REFNAME_ACCEPT_INVALID_BUT_SAFE to accept refnames which aren't
    valid, but which would pass `refname_is_safe()`.

Another alternative could be something like `classify_refname()` that
accepts a refname and returns an enum saying what kind of ref something
is.

Given that this topic won't be included in Git v2.44 anymore, I think
that opening this can of worms would be sensible now.

Patrick

>  static int is_current_worktree_ref(const char *ref) {
>  	return is_pseudoref_syntax(ref) || is_per_worktree_ref(ref);
>  }
> diff --git a/refs.h b/refs.h
> index 303c5fac4d..f66cdd731c 100644
> --- a/refs.h
> +++ b/refs.h
> @@ -1023,4 +1023,7 @@ extern struct ref_namespace_info ref_namespace[NAMESPACE__COUNT];
>   */
>  void update_ref_namespace(enum ref_namespace namespace, char *ref);
>  
> +int is_pseudoref(struct ref_store *refs, const char *refname);
> +int is_headref(struct ref_store *refs, const char *refname);
> +
>  #endif /* REFS_H */
> -- 
> 2.43.GIT
>
Junio C Hamano Feb. 12, 2024, 5:01 p.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> I think it's quite confusing that `is_pseudoref()` not only checks
> whether the refname may be a pseudoref, but also whether it actually
> exists. Furthermore, why is a pseudoref only considered to exist in case
> it's not a symbolic ref? That sounds overly restrictive to me.

I am torn on this, but in favor of the proposed naming, primarily
because is_pseudoref_syntax() was about "does this string look like
the fullref a pseudoref would have?", and the reason why we wanted
to have this new function was we wanted to ask "does this string
name a valid pseudoref?"

 Q: Is CHERRY_PICK_HEAD a pseudoref?
 A: It would have been if it existed, but I see only
    $GIT_DIR/CHERRY_PICK_HEAD that is a symbolic link, and it cannot
    be a pseudoref.

I can certainly see a broken out set of helper functions to check

 - Does this string make a good fullref for a pseudoref?
 - Does a pseudoref with his string as its fullref exist?

independently.  The first one would answer Yes and the second one
would answer No in such a context.

Thanks.
Junio C Hamano Feb. 12, 2024, 6:05 p.m. UTC | #3
Karthik Nayak <karthik.188@gmail.com> writes:

> +	if (ends_with(refname, "_HEAD")) {
> +		refs_resolve_ref_unsafe(refs, refname,
> +   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> +   					&oid, NULL);
> +   		return !is_null_oid(&oid);
> +	}

FYI. I see

.git/rebase-apply/patch:31: space before tab in indent.
   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
.git/rebase-apply/patch:32: space before tab in indent.
   					&oid, NULL);
.git/rebase-apply/patch:33: space before tab in indent.
   		return !is_null_oid(&oid);

around here.
Karthik Nayak Feb. 13, 2024, 3:48 p.m. UTC | #4
Hello,

Patrick Steinhardt <ps@pks.im> writes:
>> +
>> +int is_headref(struct ref_store *refs, const char *refname)
>> +{
>> +	if (!strcmp(refname, "HEAD"))
>> +		return refs_ref_exists(refs, refname);
>> +
>> +	return 0;
>> +}
>
> The same comment applies here, as well.
>
> I also worry a bit about the API we have. It becomes really hard to
> figure out which function to call now as the API surface seems to
> explode. We have:
>
>   - is_pseudoref_syntax
>   - is_pseudoref
>   - is_headref
>   - check_refname_format
>   - refname_is_safe
>

I also found `is_head()` in 'reflog.c'.

> I wonder whether we can maybe consolidate the interface into one or
> maybe even two functions where the behaviour can be tweaked with a flag
> field.
>

You do bring up an interesting point about the APIs present and I agree
that it would be best to consolidate them to something much simpler and
nicer.

> Something like `refname_is_valid()` with a bunch of flags:
>
>   - REFNAME_ACCEPT_HEAD to accept "HEAD"
>   - REFNAME_ACCEPT_PSEUDOREF to accept all of the refs ending with
>     "_HEAD" or being one of the irregular pseudorefs.
>   - REFNAME_ACCEPT_INVALID_BUT_SAFE to accept refnames which aren't
>     valid, but which would pass `refname_is_safe()`.
>
> Another alternative could be something like `classify_refname()` that
> accepts a refname and returns an enum saying what kind of ref something
> is.
>
> Given that this topic won't be included in Git v2.44 anymore, I think
> that opening this can of worms would be sensible now.
>

Over the two I do prefer something like the former method of the callee
using flags to state the requirements, this way the function only does
what is necessary between the listed operations.

Junio C Hamano <gitster@pobox.com> writes:
> Patrick Steinhardt <ps@pks.im> writes:
>
>> I think it's quite confusing that `is_pseudoref()` not only checks
>> whether the refname may be a pseudoref, but also whether it actually
>> exists. Furthermore, why is a pseudoref only considered to exist in case
>> it's not a symbolic ref? That sounds overly restrictive to me.
>
> I am torn on this, but in favor of the proposed naming, primarily
> because is_pseudoref_syntax() was about "does this string look like
> the fullref a pseudoref would have?", and the reason why we wanted
> to have this new function was we wanted to ask "does this string
> name a valid pseudoref?"
>
>  Q: Is CHERRY_PICK_HEAD a pseudoref?
>  A: It would have been if it existed, but I see only
>     $GIT_DIR/CHERRY_PICK_HEAD that is a symbolic link, and it cannot
>     be a pseudoref.
>
> I can certainly see a broken out set of helper functions to check
>
>  - Does this string make a good fullref for a pseudoref?
>  - Does a pseudoref with his string as its fullref exist?
>
> independently.  The first one would answer Yes and the second one
> would answer No in such a context.
>

This does work into the flags based mechanism that Patrick mentioned
too. The users of this new function can then only request information as
necessary.

Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> +	if (ends_with(refname, "_HEAD")) {
>> +		refs_resolve_ref_unsafe(refs, refname,
>> +   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
>> +   					&oid, NULL);
>> +   		return !is_null_oid(&oid);
>> +	}
>
> FYI. I see
>
> .git/rebase-apply/patch:31: space before tab in indent.
>    					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
> .git/rebase-apply/patch:32: space before tab in indent.
>    					&oid, NULL);
> .git/rebase-apply/patch:33: space before tab in indent.
>    		return !is_null_oid(&oid);
>
> around here.

Thanks for notifying, Jeff mentioned the same and I thought I fixed it.
I'll have a look at why my editor did this and add some steps to avoid
this in the following patches.
Junio C Hamano Feb. 13, 2024, 7:42 p.m. UTC | #5
Patrick Steinhardt <ps@pks.im> writes:

> I wonder whether we can maybe consolidate the interface into one or
> maybe even two functions where the behaviour can be tweaked with a flag
> field. Something like `refname_is_valid()` with a bunch of flags:
>
>   - REFNAME_ACCEPT_HEAD to accept "HEAD"
>   - REFNAME_ACCEPT_PSEUDOREF to accept all of the refs ending with
>     "_HEAD" or being one of the irregular pseudorefs.
>   - REFNAME_ACCEPT_INVALID_BUT_SAFE to accept refnames which aren't
>     valid, but which would pass `refname_is_safe()`.

I am certain we _can_, but it will take an actual patch to see if
such a refactoring makes the callers easier to follow, which is the
real test.  FWIW, I am much less skeptical than hopeful in this
particular case.
Karthik Nayak Feb. 14, 2024, 10:28 a.m. UTC | #6
Hello,

On Tue, Feb 13, 2024 at 8:42 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Patrick Steinhardt <ps@pks.im> writes:
>
> > I wonder whether we can maybe consolidate the interface into one or
> > maybe even two functions where the behaviour can be tweaked with a flag
> > field. Something like `refname_is_valid()` with a bunch of flags:
> >
> >   - REFNAME_ACCEPT_HEAD to accept "HEAD"
> >   - REFNAME_ACCEPT_PSEUDOREF to accept all of the refs ending with
> >     "_HEAD" or being one of the irregular pseudorefs.
> >   - REFNAME_ACCEPT_INVALID_BUT_SAFE to accept refnames which aren't
> >     valid, but which would pass `refname_is_safe()`.
>
> I am certain we _can_, but it will take an actual patch to see if
> such a refactoring makes the callers easier to follow, which is the
> real test.  FWIW, I am much less skeptical than hopeful in this
> particular case.

I was trying to implement this and realized that the changes sprawl
multiple files and
and have a fair bit of complexity since `check_refname_format()`
implements its own
flags. Overall, adding it to this patch series would overshadow what
we're trying to do here.

I think it would be best to tackle this problem after this series has landed.

Junio, let me know if you want me to reroll for the whitespace issues.
Otherwise, I'll wait
for reviews here.
Junio C Hamano Feb. 14, 2024, 4:59 p.m. UTC | #7
Karthik Nayak <karthik.188@gmail.com> writes:

> Junio, let me know if you want me to reroll for the whitespace issues.

I think I applied with "am --whitespace=fix", so we should be OK,
but if you can double check the result that would be appreciated.
Karthik Nayak Feb. 14, 2024, 6:15 p.m. UTC | #8
On Wed, Feb 14, 2024 at 5:59 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Karthik Nayak <karthik.188@gmail.com> writes:
>
> > Junio, let me know if you want me to reroll for the whitespace issues.
>
> I think I applied with "am --whitespace=fix", so we should be OK,
> but if you can double check the result that would be appreciated.

I did go through the patches and didn't find any other fixes needed. Thanks
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index fff343c256..d8e4cf9a11 100644
--- a/refs.c
+++ b/refs.c
@@ -860,6 +860,47 @@  static int is_pseudoref_syntax(const char *refname)
 	return 1;
 }
 
+int is_pseudoref(struct ref_store *refs, const char *refname)
+{
+	static const char *const irregular_pseudorefs[] = {
+		"AUTO_MERGE",
+		"BISECT_EXPECTED_REV",
+		"NOTES_MERGE_PARTIAL",
+		"NOTES_MERGE_REF",
+		"MERGE_AUTOSTASH",
+	};
+	struct object_id oid;
+	size_t i;
+
+	if (!is_pseudoref_syntax(refname))
+		return 0;
+
+	if (ends_with(refname, "_HEAD")) {
+		refs_resolve_ref_unsafe(refs, refname,
+   					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
+   					&oid, NULL);
+   		return !is_null_oid(&oid);
+	}
+
+	for (i = 0; i < ARRAY_SIZE(irregular_pseudorefs); i++)
+		if (!strcmp(refname, irregular_pseudorefs[i])) {
+			refs_resolve_ref_unsafe(refs, refname,
+   						RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
+   						&oid, NULL);
+			return !is_null_oid(&oid);
+		}
+
+	return 0;
+}
+
+int is_headref(struct ref_store *refs, const char *refname)
+{
+	if (!strcmp(refname, "HEAD"))
+		return refs_ref_exists(refs, refname);
+
+	return 0;
+}
+
 static int is_current_worktree_ref(const char *ref) {
 	return is_pseudoref_syntax(ref) || is_per_worktree_ref(ref);
 }
diff --git a/refs.h b/refs.h
index 303c5fac4d..f66cdd731c 100644
--- a/refs.h
+++ b/refs.h
@@ -1023,4 +1023,7 @@  extern struct ref_namespace_info ref_namespace[NAMESPACE__COUNT];
  */
 void update_ref_namespace(enum ref_namespace namespace, char *ref);
 
+int is_pseudoref(struct ref_store *refs, const char *refname);
+int is_headref(struct ref_store *refs, const char *refname);
+
 #endif /* REFS_H */