refs: fix creation of corrupted reflogs for symrefs

Message ID	20250122100319.2280647-1-karthik.188@gmail.com (mailing list archive)
State	New
Headers	show Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6A1B3211290 for <git@vger.kernel.org>; Wed, 22 Jan 2025 10:03:31 +0000 (UTC) From: Karthik Nayak <karthik.188@gmail.com> To: peff@peff.net Cc: git@vger.kernel.org, karthik.188@gmail.com, nika@thelayzells.com, gitster@pobox.com, ps@pks.im Subject: [PATCH] refs: fix creation of corrupted reflogs for symrefs Date: Wed, 22 Jan 2025 11:03:19 +0100 Message-ID: <20250122100319.2280647-1-karthik.188@gmail.com> In-Reply-To: <20250121215235.GA2753621@coredump.intra.peff.net> References: <20250121215235.GA2753621@coredump.intra.peff.net> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	refs: fix creation of corrupted reflogs for symrefs \| expand refs: fix creation of corrupted reflogs for symrefs

Message ID

20250122100319.2280647-1-karthik.188@gmail.com (mailing list archive)

State

New

Headers

From: Karthik Nayak <karthik.188@gmail.com>
To: peff@peff.net
Cc: git@vger.kernel.org,
	karthik.188@gmail.com,
	nika@thelayzells.com,
	gitster@pobox.com,
	ps@pks.im
Subject: [PATCH] refs: fix creation of corrupted reflogs for symrefs
Date: Wed, 22 Jan 2025 11:03:19 +0100
Message-ID: <20250122100319.2280647-1-karthik.188@gmail.com>
In-Reply-To: <20250121215235.GA2753621@coredump.intra.peff.net>
References: <20250121215235.GA2753621@coredump.intra.peff.net>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

refs: fix creation of corrupted reflogs for symrefs | expand

Commit Message

Karthik Nayak Jan. 22, 2025, 10:03 a.m. UTC

The commit 297c09eabb (refs: allow multiple reflog entries for the same
refname, 2024-12-16) added logic for reflogs to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization as it was assumed that no further
processing was required for reflog only updates. However this was
incorrect since for a symref's reflog entry, the update needs to be
populated with the old_oid value. This is done right after the early
exit.

This caused a bug in Git 2.48 where target references of symrefs being
updated would create a corrupted reflog entry for the symref since the
old_oid is not populated. Undo the skip in logic to fix this issue and
also add a test to ensure that such an issue doesn't arise in the
future.

The early exit was added as a performance optimization for reflog-only
updates, but this accidentally broke symref reflog handling. Remove the
optimization since it wasn't essential to the original changes.

Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Hello,

This patch is based on top of 'maint' so that it can be easily backported.
Sorry for the inconvenience here. This was a premature optimization which
wasn't needed, and unfortunately this wasn't captured by any test.

Karthik

---
 refs/files-backend.c  |  3 ---
 t/t1400-update-ref.sh | 16 ++++++++++++++++
 2 files changed, 16 insertions(+), 3 deletions(-)

Comments

Patrick Steinhardt Jan. 22, 2025, 12:04 p.m. UTC | #1

On Wed, Jan 22, 2025 at 11:03:19AM +0100, Karthik Nayak wrote:
> The commit 297c09eabb (refs: allow multiple reflog entries for the same
> refname, 2024-12-16) added logic for reflogs to exit early in
> `lock_ref_for_update()` after obtaining the required lock. This was
> added as a performance optimization as it was assumed that no further
> processing was required for reflog only updates. However this was

s/reflog only/reflog-only

> incorrect since for a symref's reflog entry, the update needs to be
> populated with the old_oid value. This is done right after the early
> exit.
> 
> This caused a bug in Git 2.48 where target references of symrefs being
> updated would create a corrupted reflog entry for the symref since the
> old_oid is not populated. Undo the skip in logic to fix this issue and
> also add a test to ensure that such an issue doesn't arise in the
> future.

It's a bit curious that you describe the fix here, then in the next
paragraph describe why we have skipped the logic only to reiterate the
fix.

> The early exit was added as a performance optimization for reflog-only
> updates, but this accidentally broke symref reflog handling. Remove the
> optimization since it wasn't essential to the original changes.

[snip]
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 5cfb8b7ca8..29f08dced4 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2615,9 +2615,6 @@ static int lock_ref_for_update(struct files_ref_store *refs,
>  
>  	update->backend_data = lock;
>  
> -	if (update->flags & REF_LOG_ONLY)
> -		goto out;
> -
>  	if (update->type & REF_ISSYMREF) {
>  		if (update->flags & REF_NO_DEREF) {
>  			/*

Okay, makes sense. The error is specific to the "files" backend, which
might be worth mentioning in the commit message.

One thing that made me a bit curious is that we now end up executing
`check_old_oid()` for symref reflog entries, because we have
`REF_ISSYMREF` and `REF_NO_DEREF` set. But that function should end up
skipping the check because we explicitly unset `REF_HAVE_OLD` when
queueing the update. The remainder should be skipped because we have
`REF_LOG_ONLY` set.

> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index e2316f1dd4..59493dd73f 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -4,6 +4,8 @@
>  #
>  
>  test_description='Test git update-ref and basic ref logging'
> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>  
>  . ./test-lib.sh
>  

We could use `git symbolic-ref HEAD` to resolve the branch name instead
of overriding the branch name here.

Patrick

Jeff King Jan. 22, 2025, 3:02 p.m. UTC | #2

On Wed, Jan 22, 2025 at 11:03:19AM +0100, Karthik Nayak wrote:

> The commit 297c09eabb (refs: allow multiple reflog entries for the same
> refname, 2024-12-16) added logic for reflogs to exit early in
> `lock_ref_for_update()` after obtaining the required lock. This was
> added as a performance optimization as it was assumed that no further
> processing was required for reflog only updates. However this was
> incorrect since for a symref's reflog entry, the update needs to be
> populated with the old_oid value. This is done right after the early
> exit.
> 
> This caused a bug in Git 2.48 where target references of symrefs being
> updated would create a corrupted reflog entry for the symref since the
> old_oid is not populated. Undo the skip in logic to fix this issue and
> also add a test to ensure that such an issue doesn't arise in the
> future.
> 
> The early exit was added as a performance optimization for reflog-only
> updates, but this accidentally broke symref reflog handling. Remove the
> optimization since it wasn't essential to the original changes.

Thanks for the explanation.

> Reported-by: Nika Layzell <nika@thelayzells.com>
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>

I don't know if we need my s-o-b to delete a few lines of code, but just
in case:

  Signed-off-by: Jeff King <peff@peff.net>

> +test_expect_success 'update-ref should also create reflog for HEAD' '
> +	test_when_finished "rm -rf repo" &&
> +	git init repo &&
> +	(
> +		cd repo &&
> +		test_commit A &&
> +		test_commit B &&
> +		git rev-parse HEAD >>expect &&

Using ">>" here is unexpected. It's OK because we are in a new repo (so
there is no leftover "expect" file from a previous test) but probably
better to stick to ">" unless we really need to append.

Plus I don't think there is really any need for a new repo. The
important thing is just updating the branch via update-ref (it doesn't
even have to be a rewind, but of course it has to exist already, so a
rewind is the simplest thing).

> +		git update-ref --create-reflog refs/heads/main HEAD~ &&

I agree with Patrick that we are probably better off just getting the
branch name with symbolic-ref.

So all together, something like:

diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e2316f1dd4..29045aad43 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -2068,4 +2068,13 @@ do
 
 done
 
+test_expect_success 'update-ref should also create reflog for HEAD' '
+	test_commit to-rewind &&
+	git rev-parse HEAD >expect &&
+	head=$(git symbolic-ref HEAD) &&
+	git update-ref --create-reflog "$head" HEAD~ &&
+	git rev-parse HEAD@{1} >actual &&
+	test_cmp expect actual
+'
+
 test_done

-Peff

Junio C Hamano Jan. 22, 2025, 5:56 p.m. UTC | #3

Patrick Steinhardt <ps@pks.im> writes:

>> This caused a bug in Git 2.48 where target references of symrefs being
>> updated would create a corrupted reflog entry for the symref since the
>> old_oid is not populated. Undo the skip in logic to fix this issue and
>> also add a test to ensure that such an issue doesn't arise in the
>> future.
>
> It's a bit curious that you describe the fix here, then in the next
> paragraph describe why we have skipped the logic only to reiterate the
> fix.
>
>> The early exit was added as a performance optimization for reflog-only
>> updates, but this accidentally broke symref reflog handling. Remove the
>> optimization since it wasn't essential to the original changes.

Yeah, that indeed is a "bit" curious.  I'd call it confusing, though
;-).

> Okay, makes sense. The error is specific to the "files" backend, which
> might be worth mentioning in the commit message.

Indeed.

>>  test_description='Test git update-ref and basic ref logging'
>> +GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
>> +export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
>>  
>>  . ./test-lib.sh
>>  
>
> We could use `git symbolic-ref HEAD` to resolve the branch name instead
> of overriding the branch name here.

I agree.  That sounds like a more sensible way to go.

Thanks.

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5cfb8b7ca8..29f08dced4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2615,9 +2615,6 @@  static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
-	if (update->flags & REF_LOG_ONLY)
-		goto out;
-
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e2316f1dd4..59493dd73f 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -4,6 +4,8 @@ 
 #
 
 test_description='Test git update-ref and basic ref logging'
+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
 
 . ./test-lib.sh
 
@@ -2068,4 +2070,18 @@  do
 
 done
 
+test_expect_success 'update-ref should also create reflog for HEAD' '
+	test_when_finished "rm -rf repo" &&
+	git init repo &&
+	(
+		cd repo &&
+		test_commit A &&
+		test_commit B &&
+		git rev-parse HEAD >>expect &&
+		git update-ref --create-reflog refs/heads/main HEAD~ &&
+		git rev-parse HEAD@{1} >actual &&
+		test_cmp expect actual
+	)
+'
+
 test_done

refs: fix creation of corrupted reflogs for symrefs

Commit Message

Comments

Patch