diff mbox series

[v5,06/10] refs: do not check ref existence in `is_root_ref()`

Message ID af22581c2212088ea6a380cc1a58923abfdc4fe1.1715755591.git.ps@pks.im (mailing list archive)
State Accepted
Commit afcd067dad27fa7eddde01ebde90daa6cc541cc5
Headers show
Series Clarify pseudo-ref terminology | expand

Commit Message

Patrick Steinhardt May 15, 2024, 6:50 a.m. UTC
Before this patch series, root refs except for "HEAD" and our special
refs were classified as pseudorefs. Furthermore, our terminology
clarified that pseudorefs must not be symbolic refs. This restriction
is enforced in `is_root_ref()`, which explicitly checks that a supposed
root ref resolves to an object ID without recursing.

This has been extremely confusing right from the start because (in old
terminology) a ref name may sometimes be a pseudoref and sometimes not
depending on whether it is a symbolic or regular ref. This behaviour
does not seem reasonable at all and I very much doubt that it results in
anything sane.

Last but not least, the current behaviour can actually lead to a
segfault when calling `is_root_ref()` with a reference that either does
not exist or that is a symbolic ref because we never initialized `oid`,
but then read it via `is_null_oid()`.

We have now changed terminology to clarify that pseudorefs are really
only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
in the root of the ref hierarchy are just plain refs. Thus, we do not
need to check whether the ref is symbolic or not. In fact, we can now
avoid looking up the ref completely as the name is sufficient for us to
figure out whether something would be a root ref or not.

This change of course changes semantics for our callers. As there are
only three of them we can assess each of them individually:

  - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
    It's clear that the intent is to classify based on the ref name,
    only.

  - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
    filter root refs. Again, using existence checks is pointless here as
    the iterator has just surfaced the ref, so we know it does exist.

  - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
    determine whether it should add a ref to the root directory of its
    iterator. This had the effect that we skipped over any files that
    are either a symbolic ref, or which are not a ref at all.

    The new behaviour is to include symbolic refs know, which aligns us
    with the adapted terminology. Furthermore, files which look like
    root refs but aren't are now mark those as "broken". As broken refs
    are not surfaced by our tooling, this should not lead to a change in
    user-visible behaviour, but may cause us to emit warnings. This
    feels like the right thing to do as we would otherwise just silently
    ignore corrupted root refs completely.

So in all cases the existence check was either superfluous, not in line
with the adapted terminology or masked potential issues. This commit
thus changes the behaviour as proposed and drops the existence check
altogether.

Add a test that verifies that this does not change user-visible
behaviour. Namely, we still don't want to show broken refs to the user
by default in git-for-each-ref(1). What this does allow though is for
internal callers to surface dangling root refs when they pass in the
`DO_FOR_EACH_INCLUDE_BROKEN` flag.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 ref-filter.c                   |  2 +-
 refs.c                         | 19 +++++--------------
 refs.h                         |  5 +++--
 refs/files-backend.c           |  4 ++--
 refs/reftable-backend.c        |  2 +-
 t/t6302-for-each-ref-filter.sh | 17 +++++++++++++++++
 6 files changed, 29 insertions(+), 20 deletions(-)

Comments

Justin Tobler May 15, 2024, 8:38 p.m. UTC | #1
On 24/05/15 08:50AM, Patrick Steinhardt wrote:
> Before this patch series, root refs except for "HEAD" and our special
> refs were classified as pseudorefs. Furthermore, our terminology
> clarified that pseudorefs must not be symbolic refs. This restriction
> is enforced in `is_root_ref()`, which explicitly checks that a supposed
> root ref resolves to an object ID without recursing.
> 
> This has been extremely confusing right from the start because (in old
> terminology) a ref name may sometimes be a pseudoref and sometimes not
> depending on whether it is a symbolic or regular ref. This behaviour
> does not seem reasonable at all and I very much doubt that it results in
> anything sane.
> 
> Last but not least, the current behaviour can actually lead to a
> segfault when calling `is_root_ref()` with a reference that either does
> not exist or that is a symbolic ref because we never initialized `oid`,
> but then read it via `is_null_oid()`.
> 
> We have now changed terminology to clarify that pseudorefs are really
> only "MERGE_HEAD" and "FETCH_HEAD", whereas all the other refs that live
> in the root of the ref hierarchy are just plain refs. Thus, we do not
> need to check whether the ref is symbolic or not. In fact, we can now
> avoid looking up the ref completely as the name is sufficient for us to
> figure out whether something would be a root ref or not.
> 
> This change of course changes semantics for our callers. As there are
> only three of them we can assess each of them individually:
> 
>   - "ref-filter.c:ref_kind_from_refname()" uses it to classify refs.
>     It's clear that the intent is to classify based on the ref name,
>     only.
> 
>   - "refs/reftable_backend.c:reftable_ref_iterator_advance()" uses it to
>     filter root refs. Again, using existence checks is pointless here as
>     the iterator has just surfaced the ref, so we know it does exist.
> 
>   - "refs/files_backend.c:add_pseudoref_and_head_entries()" uses it to
>     determine whether it should add a ref to the root directory of its
>     iterator. This had the effect that we skipped over any files that
>     are either a symbolic ref, or which are not a ref at all.
> 
>     The new behaviour is to include symbolic refs know, which aligns us

s/know/now/

>     with the adapted terminology. Furthermore, files which look like
>     root refs but aren't are now mark those as "broken". As broken refs
>     are not surfaced by our tooling, this should not lead to a change in
>     user-visible behaviour, but may cause us to emit warnings. This
>     feels like the right thing to do as we would otherwise just silently
>     ignore corrupted root refs completely.

Is there an expected source of broken root refs? Or would it just be due
to bugs?

> So in all cases the existence check was either superfluous, not in line
> with the adapted terminology or masked potential issues. This commit
> thus changes the behaviour as proposed and drops the existence check
> altogether.

Dropping the existence check makes sense to me. It also has the added
benefit of simplifying `is_root_ref()` which is nice.

> 
> Add a test that verifies that this does not change user-visible
> behaviour. Namely, we still don't want to show broken refs to the user
> by default in git-for-each-ref(1). What this does allow though is for
> internal callers to surface dangling root refs when they pass in the
> `DO_FOR_EACH_INCLUDE_BROKEN` flag.
Patrick Steinhardt May 16, 2024, 4:13 a.m. UTC | #2
On Wed, May 15, 2024 at 03:38:47PM -0500, Justin Tobler wrote:
> On 24/05/15 08:50AM, Patrick Steinhardt wrote:
[snip]
> >     The new behaviour is to include symbolic refs know, which aligns us
> 
> s/know/now/

Fixed locally. I'll refrain from sending a new version just to fix this
typo though.

> >     with the adapted terminology. Furthermore, files which look like
> >     root refs but aren't are now mark those as "broken". As broken refs
> >     are not surfaced by our tooling, this should not lead to a change in
> >     user-visible behaviour, but may cause us to emit warnings. This
> >     feels like the right thing to do as we would otherwise just silently
> >     ignore corrupted root refs completely.
> 
> Is there an expected source of broken root refs? Or would it just be due
> to bugs?

Dangling symbolic refs are the only expected source. The fact that we
did not include those here feels like a bug to me.

Patrick
diff mbox series

Patch

diff --git a/ref-filter.c b/ref-filter.c
index 361beb6619..23e81e3e04 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -2756,7 +2756,7 @@  static int ref_kind_from_refname(const char *refname)
 			return ref_kind[i].kind;
 	}
 
-	if (is_root_ref(get_main_ref_store(the_repository), refname))
+	if (is_root_ref(refname))
 		return FILTER_REFS_PSEUDOREFS;
 
 	return FILTER_REFS_OTHERS;
diff --git a/refs.c b/refs.c
index c1c406fc5f..4fec29e660 100644
--- a/refs.c
+++ b/refs.c
@@ -856,7 +856,7 @@  static int is_root_ref_syntax(const char *refname)
 	return 1;
 }
 
-int is_root_ref(struct ref_store *refs, const char *refname)
+int is_root_ref(const char *refname)
 {
 	static const char *const irregular_root_refs[] = {
 		"AUTO_MERGE",
@@ -865,26 +865,17 @@  int is_root_ref(struct ref_store *refs, const char *refname)
 		"NOTES_MERGE_REF",
 		"MERGE_AUTOSTASH",
 	};
-	struct object_id oid;
 	size_t i;
 
 	if (!is_root_ref_syntax(refname))
 		return 0;
 
-	if (ends_with(refname, "_HEAD")) {
-		refs_resolve_ref_unsafe(refs, refname,
-					RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
-					&oid, NULL);
-		return !is_null_oid(&oid);
-	}
+	if (ends_with(refname, "_HEAD"))
+		return 1;
 
 	for (i = 0; i < ARRAY_SIZE(irregular_root_refs); i++)
-		if (!strcmp(refname, irregular_root_refs[i])) {
-			refs_resolve_ref_unsafe(refs, refname,
-						RESOLVE_REF_READING | RESOLVE_REF_NO_RECURSE,
-						&oid, NULL);
-			return !is_null_oid(&oid);
-		}
+		if (!strcmp(refname, irregular_root_refs[i]))
+			return 1;
 
 	return 0;
 }
diff --git a/refs.h b/refs.h
index d0374c3275..8a574a22c7 100644
--- a/refs.h
+++ b/refs.h
@@ -1052,7 +1052,8 @@  extern struct ref_namespace_info ref_namespace[NAMESPACE__COUNT];
 void update_ref_namespace(enum ref_namespace namespace, char *ref);
 
 /*
- * Check whether the reference is an existing root reference.
+ * Check whether the provided name names a root reference. This function only
+ * performs a syntactic check.
  *
  * A root ref is a reference that lives in the root of the reference hierarchy.
  * These references must conform to special syntax:
@@ -1076,7 +1077,7 @@  void update_ref_namespace(enum ref_namespace namespace, char *ref);
  *
  *   - MERGE_AUTOSTASH
  */
-int is_root_ref(struct ref_store *refs, const char *refname);
+int is_root_ref(const char *refname);
 
 int is_headref(struct ref_store *refs, const char *refname);
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 0fcb601444..06240ce327 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -351,8 +351,8 @@  static void add_pseudoref_and_head_entries(struct ref_store *ref_store,
 		strbuf_addstr(&refname, de->d_name);
 
 		dtype = get_dtype(de, &path, 1);
-		if (dtype == DT_REG && (is_root_ref(ref_store, de->d_name) ||
-								is_headref(ref_store, de->d_name)))
+		if (dtype == DT_REG && (is_root_ref(de->d_name) ||
+					is_headref(ref_store, de->d_name)))
 			loose_fill_ref_dir_regular_file(refs, refname.buf, dir);
 
 		strbuf_setlen(&refname, dirnamelen);
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 36ab3357a7..bc927ef17b 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -354,7 +354,7 @@  static int reftable_ref_iterator_advance(struct ref_iterator *ref_iterator)
 		 */
 		if (!starts_with(iter->ref.refname, "refs/") &&
 		    !(iter->flags & DO_FOR_EACH_INCLUDE_ROOT_REFS &&
-		     (is_root_ref(&iter->refs->base, iter->ref.refname) ||
+		     (is_root_ref(iter->ref.refname) ||
 		      is_headref(&iter->refs->base, iter->ref.refname)))) {
 			continue;
 		}
diff --git a/t/t6302-for-each-ref-filter.sh b/t/t6302-for-each-ref-filter.sh
index 948f1bb5f4..92ed8957c8 100755
--- a/t/t6302-for-each-ref-filter.sh
+++ b/t/t6302-for-each-ref-filter.sh
@@ -62,6 +62,23 @@  test_expect_success '--include-root-refs with other patterns' '
 	test_cmp expect actual
 '
 
+test_expect_success '--include-root-refs omits dangling symrefs' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit initial &&
+		git symbolic-ref DANGLING_HEAD refs/heads/missing &&
+		cat >expect <<-EOF &&
+		HEAD
+		$(git symbolic-ref HEAD)
+		refs/tags/initial
+		EOF
+		git for-each-ref --format="%(refname)" --include-root-refs >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_expect_success 'filtering with --points-at' '
 	cat >expect <<-\EOF &&
 	refs/heads/main