diff mbox series

[v2] refs: fix creation of corrupted reflogs for symrefs

Message ID 20250123112944.3922712-1-karthik.188@gmail.com (mailing list archive)
State Accepted
Commit 3519492430ba26cadcdb215730a6c8e1bcf5b9cf
Headers show
Series [v2] refs: fix creation of corrupted reflogs for symrefs | expand

Commit Message

Karthik Nayak Jan. 23, 2025, 11:29 a.m. UTC
The commit 297c09eabb (refs: allow multiple reflog entries for the same
refname, 2024-12-16) added logic for reflogs to exit early in
`lock_ref_for_update()` after obtaining the required lock. This was
added as a performance optimization as it was assumed that no further
processing was required for reflog-only updates. However this was
incorrect since for a symref's reflog entry, the update needs to be
populated with the old_oid value. This is done right after the early
exit.

This caused a bug in Git 2.48 in the files backend where target
references of symrefs being updated would create a corrupted reflog
entry for the symref since the old_oid is not populated. Undo the skip
in logic to fix this issue and also add a test to ensure that such an
issue doesn't arise in the future.

The early exit was added as a performance optimization for reflog-only
updates, and it wasn't essential to the original changes. As such,
reverting it shouldn't cause any further issues.

Reported-by: Nika Layzell <nika@thelayzells.com>
Co-authored-by: Jeff King <peff@peff.net>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Changes since the v1:
- Modified the commit message and subject to make it a little more clearer.
- Used `git symbolic-ref HEAD` in the test instead of setting
  the default branch.

Range diff:

1:  dbb1186512 ! 1:  9326fb08a9 reflog: fix bug which didn't resolve symref reflogs
    @@ Metadata
     Author: Karthik Nayak <karthik.188@gmail.com>
     
      ## Commit message ##
    -    reflog: fix bug which didn't resolve symref reflogs
    +    refs: fix creation of corrupted reflogs for symrefs
     
         The commit 297c09eabb (refs: allow multiple reflog entries for the same
    -    refname, 2024-12-16) added logic to skip the flow for reflogs in
    -    `lock_ref_for_update()` after obtaining the required lock. This was done
    -    because it was assumed that the flow ends there for reflogs. However
    -    this was incorrect since for a symref's reflog entry, we need to
    -    populate the old_oid value which is done right after.
    +    refname, 2024-12-16) added logic for reflogs to exit early in
    +    `lock_ref_for_update()` after obtaining the required lock. This was
    +    added as a performance optimization as it was assumed that no further
    +    processing was required for reflog-only updates. However this was
    +    incorrect since for a symref's reflog entry, the update needs to be
    +    populated with the old_oid value. This is done right after the early
    +    exit.
     
    -    This caused a bug in Git 2.48 where target references of symrefs being
    -    updated would create a corrupted reflog entry for the symref since the
    -    old_oid is not populated. Undo the skip in logic to fix this issue and
    -    also add a test to ensure that such an issue doesn't arise in the
    -    future.
    +    This caused a bug in Git 2.48 in the files backend where target
    +    references of symrefs being updated would create a corrupted reflog
    +    entry for the symref since the old_oid is not populated. Undo the skip
    +    in logic to fix this issue and also add a test to ensure that such an
    +    issue doesn't arise in the future.
    +
    +    The early exit was added as a performance optimization for reflog-only
    +    updates, and it wasn't essential to the original changes. As such,
    +    reverting it shouldn't cause any further issues.
     
         Reported-by: Nika Layzell <nika@thelayzells.com>
         Co-authored-by: Jeff King <peff@peff.net>
    +    Signed-off-by: Jeff King <peff@peff.net>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
     
      ## refs/files-backend.c ##
    @@ refs/files-backend.c: static int lock_ref_for_update(struct files_ref_store *ref
      			/*
     
      ## t/t1400-update-ref.sh ##
    -@@
    - #
    - 
    - test_description='Test git update-ref and basic ref logging'
    -+GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME=main
    -+export GIT_TEST_DEFAULT_INITIAL_BRANCH_NAME
    - 
    - . ./test-lib.sh
    - 
     @@ t/t1400-update-ref.sh: do
      
      done
      
     +test_expect_success 'update-ref should also create reflog for HEAD' '
    -+	test_when_finished "rm -rf repo" &&
    -+	git init repo &&
    -+	(
    -+		cd repo &&
    -+		test_commit A &&
    -+		test_commit B &&
    -+		git rev-parse HEAD >>expect &&
    -+		git update-ref --create-reflog refs/heads/main HEAD~ &&
    -+		git rev-parse HEAD@{1} >actual &&
    -+		test_cmp expect actual
    -+	)
    ++	test_commit to-rewind &&
    ++	git rev-parse HEAD >expect &&
    ++	head=$(git symbolic-ref HEAD) &&
    ++	git update-ref --create-reflog "$head" HEAD~ &&
    ++	git rev-parse HEAD@{1} >actual &&
    ++	test_cmp expect actual
     +'
     +
      test_done

---
 refs/files-backend.c  | 3 ---
 t/t1400-update-ref.sh | 9 +++++++++
 2 files changed, 9 insertions(+), 3 deletions(-)

Comments

Junio C Hamano Jan. 23, 2025, 5:55 p.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> Subject: Re: [PATCH v2] refs: fix creation of corrupted reflogs for symrefs

This may be just me, but every time I see the above title, it read
to me as if we are on purpose doing "creation of corrupted reflogs
for symrefs", but we are failing to do so for some reason, and this
commit is about improving the situation so that we can correctly
create corrupted reflog entries for symbolic ref updates.

And because the only sensible reason why we may on purpose do
"creation of corrupted reflogs" I can think of is perhaps we prepare
such corrupted thing to test how robust the production code is when
seeing such corrupted data, I would expect to see a change to t/
hierarchy.

But the patch touches the code, not just tests, which makes me
doubly puzzled.

It happens every time I see this title and the change.  Perhaps drop
"corrupted" from the title?

> The commit 297c09eabb (refs: allow multiple reflog entries for the same
> refname, 2024-12-16) added logic for reflogs to exit early in
> `lock_ref_for_update()` after obtaining the required lock. This was

I do not think the actor, who "exits early", is not "reflogs".
Should we have "for reflogs" in the above, or perhaps move it to the
end of the sentence (i.e. the required lock gets obtained because we
want to do some operation "for reflogs")?

> added as a performance optimization as it was assumed that no further
> processing was required for reflog-only updates. However this was
> incorrect since for a symref's reflog entry, the update needs to be
> populated with the old_oid value. This is done right after the early
> exit.

"The early exit skipped this required work"?

> This caused a bug in Git 2.48 in the files backend where target
> references of symrefs being updated would create a corrupted reflog
> entry for the symref since the old_oid is not populated. Undo the skip
> in logic to fix this issue and also add a test to ensure that such an
> issue doesn't arise in the future.

OK.

> The early exit was added as a performance optimization for reflog-only
> updates, and it wasn't essential to the original changes. As such,
> reverting it shouldn't cause any further issues.

I am not sure if this is even worth saying, as you already said that
the early return was done incorrectly assuming that the remainder of
the function can be skipped as an optimization.  What may help
readers is to state that all the rest of the code path that was
skipped by a mistaken optimization is necessary and would not do
anything unwanted.

> Reported-by: Nika Layzell <nika@thelayzells.com>
> Co-authored-by: Jeff King <peff@peff.net>
> Signed-off-by: Jeff King <peff@peff.net>
> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>

> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
> index e2316f1dd4..29045aad43 100755
> --- a/t/t1400-update-ref.sh
> +++ b/t/t1400-update-ref.sh
> @@ -2068,4 +2068,13 @@ do
>  
>  done
>  
> +test_expect_success 'update-ref should also create reflog for HEAD' '
> +	test_commit to-rewind &&
> +	git rev-parse HEAD >expect &&
> +	head=$(git symbolic-ref HEAD) &&
> +	git update-ref --create-reflog "$head" HEAD~ &&
> +	git rev-parse HEAD@{1} >actual &&
> +	test_cmp expect actual
> +'

Nice.  We could rename "head" to something more meaningful (like
"current branch") but I can live with the above version.  It is much
nicer than assuming on what branch we would be, which was what the
previous iteration did.

Thanks.
Karthik Nayak Jan. 24, 2025, 10:38 a.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> Subject: Re: [PATCH v2] refs: fix creation of corrupted reflogs for symrefs
>
> This may be just me, but every time I see the above title, it read
> to me as if we are on purpose doing "creation of corrupted reflogs
> for symrefs", but we are failing to do so for some reason, and this
> commit is about improving the situation so that we can correctly
> create corrupted reflog entries for symbolic ref updates.
>

Okay. I see what you mean.

> And because the only sensible reason why we may on purpose do
> "creation of corrupted reflogs" I can think of is perhaps we prepare
> such corrupted thing to test how robust the production code is when
> seeing such corrupted data, I would expect to see a change to t/
> hierarchy.
>
> But the patch touches the code, not just tests, which makes me
> doubly puzzled.
>
> It happens every time I see this title and the change.  Perhaps drop
> "corrupted" from the title?
>

Yeah, that would make it much clearer.

>> The commit 297c09eabb (refs: allow multiple reflog entries for the same
>> refname, 2024-12-16) added logic for reflogs to exit early in
>> `lock_ref_for_update()` after obtaining the required lock. This was
>
> I do not think the actor, who "exits early", is not "reflogs".

Took me some time to understand this, but I get what you're talking
about. My sentence adds ambiguity on what we're exactly exiting early.

> Should we have "for reflogs" in the above, or perhaps move it to the
> end of the sentence (i.e. the required lock gets obtained because we
> want to do some operation "for reflogs")?
>

Yeah that would make it much clearer.

>> added as a performance optimization as it was assumed that no further
>> processing was required for reflog-only updates. However this was
>> incorrect since for a symref's reflog entry, the update needs to be
>> populated with the old_oid value. This is done right after the early
>> exit.
>
> "The early exit skipped this required work"?
>

Yeah, that works!

>> This caused a bug in Git 2.48 in the files backend where target
>> references of symrefs being updated would create a corrupted reflog
>> entry for the symref since the old_oid is not populated. Undo the skip
>> in logic to fix this issue and also add a test to ensure that such an
>> issue doesn't arise in the future.
>
> OK.
>
>> The early exit was added as a performance optimization for reflog-only
>> updates, and it wasn't essential to the original changes. As such,
>> reverting it shouldn't cause any further issues.
>
> I am not sure if this is even worth saying, as you already said that
> the early return was done incorrectly assuming that the remainder of
> the function can be skipped as an optimization.  What may help
> readers is to state that all the rest of the code path that was
> skipped by a mistaken optimization is necessary and would not do
> anything unwanted.
>

That was what I was trying to convey.

>> Reported-by: Nika Layzell <nika@thelayzells.com>
>> Co-authored-by: Jeff King <peff@peff.net>
>> Signed-off-by: Jeff King <peff@peff.net>
>> Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
>
>> diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
>> index e2316f1dd4..29045aad43 100755
>> --- a/t/t1400-update-ref.sh
>> +++ b/t/t1400-update-ref.sh
>> @@ -2068,4 +2068,13 @@ do
>>
>>  done
>>
>> +test_expect_success 'update-ref should also create reflog for HEAD' '
>> +	test_commit to-rewind &&
>> +	git rev-parse HEAD >expect &&
>> +	head=$(git symbolic-ref HEAD) &&
>> +	git update-ref --create-reflog "$head" HEAD~ &&
>> +	git rev-parse HEAD@{1} >actual &&
>> +	test_cmp expect actual
>> +'
>
> Nice.  We could rename "head" to something more meaningful (like
> "current branch") but I can live with the above version.  It is much
> nicer than assuming on what branch we would be, which was what the
> previous iteration did.
>
> Thanks.

I agree, this is much nicer indeed.

Also I just noticed that you have already amended the commit message
and added it to `next`. Thanks for doing that. Happy to re-roll if
needed!

Karthik
diff mbox series

Patch

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 5cfb8b7ca8..29f08dced4 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2615,9 +2615,6 @@  static int lock_ref_for_update(struct files_ref_store *refs,
 
 	update->backend_data = lock;
 
-	if (update->flags & REF_LOG_ONLY)
-		goto out;
-
 	if (update->type & REF_ISSYMREF) {
 		if (update->flags & REF_NO_DEREF) {
 			/*
diff --git a/t/t1400-update-ref.sh b/t/t1400-update-ref.sh
index e2316f1dd4..29045aad43 100755
--- a/t/t1400-update-ref.sh
+++ b/t/t1400-update-ref.sh
@@ -2068,4 +2068,13 @@  do
 
 done
 
+test_expect_success 'update-ref should also create reflog for HEAD' '
+	test_commit to-rewind &&
+	git rev-parse HEAD >expect &&
+	head=$(git symbolic-ref HEAD) &&
+	git update-ref --create-reflog "$head" HEAD~ &&
+	git rev-parse HEAD@{1} >actual &&
+	test_cmp expect actual
+'
+
 test_done