[1/2] object-name: detect and report empty reflogs

Message ID	0fac6ebb098c7e8cdc87cb75f2dcffdc4b1ccfaa.1708509190.git.ps@pks.im (mailing list archive)
State	New
Headers	show Received: from fhigh3-smtp.messagingengine.com (fhigh3-smtp.messagingengine.com [103.168.172.154]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B7B5E3D576 for <git@vger.kernel.org>; Wed, 21 Feb 2024 09:56:40 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Wed, 21 Feb 2024 10:56:36 +0100 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Cc: Yasushi SHOJI <yasushi.shoji@gmail.com> Subject: [PATCH 1/2] object-name: detect and report empty reflogs Message-ID: <0fac6ebb098c7e8cdc87cb75f2dcffdc4b1ccfaa.1708509190.git.ps@pks.im> References: <CAELBRWK-bZTV0qx6_34HAgpmYwy+5Zo2E0M+4B6yZJJ3CqweTw@mail.gmail.com> <cover.1708509190.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="MpryXjFS4b6pN0Ie" Content-Disposition: inline In-Reply-To: <cover.1708509190.git.ps@pks.im>
Series	Detect empty or missing reflogs with `ref@{0}` \| expand [0/2] Detect empty or missing reflogs with `ref@{0}` [1/2] object-name: detect and report empty reflogs [2/2] builtin/show-branch: detect empty reflogs

Message ID

0fac6ebb098c7e8cdc87cb75f2dcffdc4b1ccfaa.1708509190.git.ps@pks.im (mailing list archive)

State

New

Headers

Feedback-ID: i197146af:Fastmail
Date: Wed, 21 Feb 2024 10:56:36 +0100
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Cc: Yasushi SHOJI <yasushi.shoji@gmail.com>
Subject: [PATCH 1/2] object-name: detect and report empty reflogs
Message-ID: 
 <0fac6ebb098c7e8cdc87cb75f2dcffdc4b1ccfaa.1708509190.git.ps@pks.im>
References: 
 <CAELBRWK-bZTV0qx6_34HAgpmYwy+5Zo2E0M+4B6yZJJ3CqweTw@mail.gmail.com>
 <cover.1708509190.git.ps@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
	protocol="application/pgp-signature"; boundary="MpryXjFS4b6pN0Ie"
Content-Disposition: inline
In-Reply-To: <cover.1708509190.git.ps@pks.im>

Series

Detect empty or missing reflogs with `ref@{0}` | expand

Commit Message

Patrick Steinhardt Feb. 21, 2024, 9:56 a.m. UTC

The `ref@{n}` syntax allows the user to request the n'th reflog entry
for a ref. This syntax is zero-indexed, meaning that entry zero refers
to the first entry in the reflog. The behaviour here is quite confusing
though and depends on the state of the corresponding reflog:

  - If the reflog is not empty, we return the first reflog entry.

  - If the reflog is empty, we return the object ID of the ref.

  - If the reflog is missing, we return an error.

This is inconsistent and quite misleading.

This behaviour goes back to 6436a20284 (refs: allow @{n} to work with
n-sized reflog, 2021-01-07), which fixed a bug that wouldn't allow a
user to return the n'th reflog entry with an n-sized reflog. With this
commit, `read_ref_at()` started to special case reading the first entry
of the reflog via a separate `read_ref_at_ent_newest()` function. The
problem here is that we forgot to check whether the callback was invoked
at all, and thus we don't notice empty reflogs.

The commit in question added a test for `ref@{0}` when the reflog is
empty. But that test only works by chance: while `read_ref_at()` won't
initialize the object ID passed in by the pointer, all callers of this
function happen to call `repo_ref_dwim()` and thus pre-populate the
object ID. Thus, the consequence is that we indeed return the object ID
of the refname when the reflog is empty.

This behaviour is documented nowhere, and the fact that we return a
somewhat sensible result to the caller by sheer luck further stresses
the point that this behaviour is only accidental, even if there is a
test covering it.

Furthermore, this behaviour causes problems for the git-show-branch(1)
command. When executing `git show-branch --reflog` for a ref that either
has no or an empty reflog we run into a segfault. This is because the
`read_ref_at()` function doesn't report the error to us, and thus parts
of its out-parameters are not initialized.

Start to detect and report empty or missing reflogs in `read_ref_at()`
and report them to the caller. This results in a change in behaviour
when asking for `ref@{0}` with an empty or missing reflog because we now
die instead of returning the object ID of the ref itself. This adapted
behaviour should lead to less surprises as we now really only report
object IDs to the caller that actually come from the reflog, thus making
the user experience a whole lot more consistent.

This change also fixes the segfault in git-show-branch(1). Note that
this commit does not add a test yet -- this will be handled in the next
commit.

Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 object-name.c                  | 10 ++++++----
 refs.c                         |  3 ++-
 t/t1506-rev-parse-diagnosis.sh |  8 ++++++++
 t/t1508-at-combinations.sh     | 29 +++++++++++++++++++++++++----
 4 files changed, 41 insertions(+), 9 deletions(-)

Comments

Kristoffer Haugsbakk Feb. 21, 2024, 10:37 a.m. UTC | #1

On Wed, Feb 21, 2024, at 10:56, Patrick Steinhardt wrote:
> The `ref@{n}` syntax allows the user to request the n'th reflog entry
> for a ref. This syntax is zero-indexed, meaning that entry zero refers
> to the first entry in the reflog. The behaviour here is quite confusing
> though and depends on the state of the corresponding reflog:

Maybe this is obvious to other readers, but I sometimes get tripped up
when reading about such logs: what’s the “first entry”? The oldest one
or newest one? How about:

  “ The `ref@{n}` syntax allows the user to request the n'th reflog entry
    for a ref, starting from `ref@{0}` which points to the commit that
    `ref` points to (zero-indexed). The behaviour here is quite confusing
    though and depends on the state of the corresponding reflog:

Eric Sunshine Feb. 21, 2024, 4:48 p.m. UTC | #2

On Wed, Feb 21, 2024 at 5:03 AM Patrick Steinhardt <ps@pks.im> wrote:
> [...]
> Start to detect and report empty or missing reflogs in `read_ref_at()`
> and report them to the caller. This results in a change in behaviour
> when asking for `ref@{0}` with an empty or missing reflog because we now
> die instead of returning the object ID of the ref itself. This adapted
> behaviour should lead to less surprises as we now really only report
> object IDs to the caller that actually come from the reflog, thus making
> the user experience a whole lot more consistent.
> [...]
> Reported-by: Yasushi SHOJI <yasushi.shoji@gmail.com>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
> ---
> diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
> @@ -110,10 +110,31 @@ test_expect_success '@{1} works with only one reflog entry' '
> +test_expect_success '@{0} fails with empty reflog' '
> +       git checkout -B empty-reflog main &&
> +       git reflog expire --expire=now refs/heads/empty-reflog &&
> +       cat >expect <<-EOF &&
> +       fatal: Needed a single revision
> +       EOF

Typically, we would use <<-\EOF rather than <<-EOF to signal to the
reader that no variable interpolation is happening in the here-doc
body.

A simpler alternative in this case would, of course, be to use `echo`:

    echo "fatal: Needed a single revision" >expect &&

> +       test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
> +       test_cmp expect err
> +'
> +
> +test_expect_success '@{0} fails with missing reflog' '
> +       git -c core.logAllRefUpdates=false checkout -B missing-reflog main &&
> +       cat >expect <<-EOF &&
> +       fatal: Needed a single revision
> +       EOF

Ditto.

> +       test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
> +       test_cmp expect err
> +'

Probably neither comment is worth a reroll, but if you reroll for some
other reason, perhaps consider them.

Jeff King Feb. 21, 2024, 5:31 p.m. UTC | #3

On Wed, Feb 21, 2024 at 10:56:36AM +0100, Patrick Steinhardt wrote:

> This behaviour goes back to 6436a20284 (refs: allow @{n} to work with
> n-sized reflog, 2021-01-07), which fixed a bug that wouldn't allow a
> user to return the n'th reflog entry with an n-sized reflog. With this
> commit, `read_ref_at()` started to special case reading the first entry
> of the reflog via a separate `read_ref_at_ent_newest()` function. The
> problem here is that we forgot to check whether the callback was invoked
> at all, and thus we don't notice empty reflogs.

I'm on the fence no whether the current @{0} behavior is sensible and
should be preserved. I agree it mostly worked by luck, but the presence
of the test makes me think at least the intention was there.

But assuming that is a good direction, there's one thing that puzzles me
about your patch:

> @@ -1084,6 +1084,7 @@ static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
>  	struct read_ref_at_cb *cb = cb_data;
>  
>  	set_read_ref_cutoffs(cb, timestamp, tz, message);
> +	cb->found_it = 1;
>  	oidcpy(cb->oid, noid);
>  	/* We just want the first entry */
>  	return 1;

OK, so we note whether the callback was invoked, which is good...

> @@ -1123,7 +1124,7 @@ int read_ref_at(struct ref_store *refs, const char *refname,
>  
>  	if (cb.cnt == 0) {
>  		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
> -		return 0;
> +		return !cb.found_it;
>  	}

...but here we just return without an error message. Whereas later in
the function, we have logic to produce the "log for %s is empty"
message. So now we will produce a message if you ask for branch@{1} in
an empty reflog, but not for branch@{0}, and the caller is responsible
for printing an error in the latter case.

If we instead set reccnt for branch@{0} as we would for branch@{1}, then
we can fall through and share the error handling (like it was before
6436a20284, when they used the same callback):

diff --git a/refs.c b/refs.c
index c633abf284..7d5e7a9ba6 100644
--- a/refs.c
+++ b/refs.c
@@ -1084,6 +1084,8 @@ static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
 	struct read_ref_at_cb *cb = cb_data;

 	set_read_ref_cutoffs(cb, timestamp, tz, message);
+	cb->reccnt++;
+	cb->found_it = 1;
 	oidcpy(cb->oid, noid);
 	/* We just want the first entry */
 	return 1;
@@ -1121,12 +1123,10 @@ int read_ref_at(struct ref_store *refs, const char *refname,
 	cb.cutoff_cnt = cutoff_cnt;
 	cb.oid = oid;

-	if (cb.cnt == 0) {
+	if (cb.cnt == 0)
 		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
-	}
-
-	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
+	else
+		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);

 	if (!cb.reccnt) {
 		if (flags & GET_OID_QUIETLY)

And it all just works without having to touch get_oid_basic() or
cmd_show_branch() at all. Do note that one of the tests needs to be
updated to account for the slightly different format of the error
message; but again, I think that is showing off the inconsistency in
having the error message in two places.

-Peff

diff --git a/object-name.c b/object-name.c
index 3a2ef5d680..e2a6c9d2ec 100644
--- a/object-name.c
+++ b/object-name.c
@@ -994,8 +994,8 @@  static int get_oid_basic(struct repository *r, const char *str, int len,
 	if (reflog_len) {
 		int nth, i;
 		timestamp_t at_time;
-		timestamp_t co_time;
-		int co_tz, co_cnt;
+		timestamp_t co_time = 0;
+		int co_tz = 0, co_cnt = 0;
 
 		/* Is it asking for N-th entry, or approxidate? */
 		for (i = nth = 0; 0 <= nth && i < reflog_len; i++) {
@@ -1020,6 +1020,7 @@  static int get_oid_basic(struct repository *r, const char *str, int len,
 				return -1;
 			}
 		}
+
 		if (read_ref_at(get_main_ref_store(r),
 				real_ref, flags, at_time, nth, oid, NULL,
 				&co_time, &co_tz, &co_cnt)) {
@@ -1035,9 +1036,10 @@  static int get_oid_basic(struct repository *r, const char *str, int len,
 						show_date(co_time, co_tz, DATE_MODE(RFC2822)));
 				}
 			} else {
-				if (flags & GET_OID_QUIETLY) {
+				if (flags & GET_OID_QUIETLY)
 					exit(128);
-				}
+				if (!co_cnt)
+					die(_("log for '%.*s' is empty"), len, str);
 				die(_("log for '%.*s' only has %d entries"),
 				    len, str, co_cnt);
 			}
diff --git a/refs.c b/refs.c
index c633abf284..a2369e7835 100644
--- a/refs.c
+++ b/refs.c
@@ -1084,6 +1084,7 @@  static int read_ref_at_ent_newest(struct object_id *ooid UNUSED,
 	struct read_ref_at_cb *cb = cb_data;
 
 	set_read_ref_cutoffs(cb, timestamp, tz, message);
+	cb->found_it = 1;
 	oidcpy(cb->oid, noid);
 	/* We just want the first entry */
 	return 1;
@@ -1123,7 +1124,7 @@  int read_ref_at(struct ref_store *refs, const char *refname,
 
 	if (cb.cnt == 0) {
 		refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent_newest, &cb);
-		return 0;
+		return !cb.found_it;
 	}
 
 	refs_for_each_reflog_ent_reverse(refs, refname, read_ref_at_ent, &cb);
diff --git a/t/t1506-rev-parse-diagnosis.sh b/t/t1506-rev-parse-diagnosis.sh
index ef40511d89..9d147c4ade 100755
--- a/t/t1506-rev-parse-diagnosis.sh
+++ b/t/t1506-rev-parse-diagnosis.sh
@@ -140,6 +140,14 @@  test_expect_success 'incorrect file in :path and :N:path' '
 	test_grep "path .disk-only.txt. exists on disk, but not in the index" error
 '
 
+test_expect_success '@{0} reference with empty reflog' '
+	git checkout -B empty-reflog main &&
+	git reflog expire --expire=now refs/heads/empty-reflog &&
+	test_must_fail git rev-parse empty-reflog@{0} >output 2>error &&
+	test_must_be_empty output &&
+	test_grep "log for ${SQ}empty-reflog${SQ} is empty" error
+'
+
 test_expect_success 'invalid @{n} reference' '
 	test_must_fail git rev-parse main@{99999} >output 2>error &&
 	test_must_be_empty output &&
diff --git a/t/t1508-at-combinations.sh b/t/t1508-at-combinations.sh
index e841309d0e..020106a1cc 100755
--- a/t/t1508-at-combinations.sh
+++ b/t/t1508-at-combinations.sh
@@ -110,10 +110,31 @@  test_expect_success '@{1} works with only one reflog entry' '
 	test_cmp_rev newbranch~ newbranch@{1}
 '
 
-test_expect_success '@{0} works with empty reflog' '
-	git checkout -B newbranch main &&
-	git reflog expire --expire=now refs/heads/newbranch &&
-	test_cmp_rev newbranch newbranch@{0}
+test_expect_success '@{0} fails with empty reflog' '
+	git checkout -B empty-reflog main &&
+	git reflog expire --expire=now refs/heads/empty-reflog &&
+	cat >expect <<-EOF &&
+	fatal: Needed a single revision
+	EOF
+	test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success '@{0} fails with missing reflog' '
+	git -c core.logAllRefUpdates=false checkout -B missing-reflog main &&
+	cat >expect <<-EOF &&
+	fatal: Needed a single revision
+	EOF
+	test_must_fail git rev-parse --verify missing-reflog@{0} 2>err &&
+	test_cmp expect err
+'
+
+test_expect_success '@{0} favors first reflog entry with diverged reflog' '
+	git checkout -B diverged-reflog main &&
+	test_commit A &&
+	test_commit B &&
+	git reflog delete diverged-reflog@{0} &&
+	test_cmp_rev diverged-reflog~ diverged-reflog@{0}
 '
 
 test_done

[1/2] object-name: detect and report empty reflogs

Commit Message

Comments

Patch