diff mbox series

[v2] reftable: write correct max_update_index to header

Message ID 20250123135613.748916-1-karthik.188@gmail.com (mailing list archive)
State New
Headers show
Series [v2] reftable: write correct max_update_index to header | expand

Commit Message

Karthik Nayak Jan. 23, 2025, 1:56 p.m. UTC
In 297c09eabb (refs: allow multiple reflog entries for the same refname,
2024-12-16), the reftable backend learned to handle multiple reflog
entries within the same transaction. This was done modifying the
`update_index` for reflogs with multiple indices. During writing the
logs, the `max_update_index` of the writer was modified to ensure the
limits were raised to the modified `update_index`s.

However, since ref entries are written before the modification to the
`max_update_index`, if there are multiple blocks to be written, the
reftable backend writes the header with the old `max_update_index`. When
all logs are finally written, the footer will be written with the new
`min_update_index`. This causes a mismatch between the header and the
footer and causes the reftable file to be corrupted. The existing tests
only spawn a single block and since headers are lazily written with the
first block, the tests didn't capture this bug.

To fix the issue, the appropriate `max_update_index` limit must be set
even before the first block is written. Add a `max_index` field to the
transaction which holds the `max_index` within all its updates, then
propagate this value to the reftable backend, wherein this is used to
the set the `max_update_index` correctly.

Add a test which creates a few thousand reference updates with multiple
reflog entries, which should trigger the bug.

Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---

Hello,

While this patch was merged to next, Dscho reported that it was flaky
on macos pipeline. On further investigation I found this was easily
reproducible when the leak sanitizer was turned on and the reftable
tests were run. The fix was simply to add the missing 0 initialization.

I also found that we didn't set the appropriate values for all elements
of the `reftable_transaction_data`. Which I've done now.

The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13). 

Thanks

Changes from v1:
1. Correctly intialize the value to `0` for `write_transaction_table_arg.max_index`.
2. Set the `max_index` for all elements of `reftable_transaction_data`.
3. The range diff was made against the commit in Junio's branch
   `gitster/kn/reflog-migration-fix`, which included his 'S-o-b', which I've
   removed in this patch, since that is usually applied by Junio. Not sure if
   that was the right move. 

Range diff:
1:  bc67b4ab5f ! 1:  5489826b47 reftable: write correct max_update_index to header
    @@ Commit message
     
         Reported-by: brian m. carlson <sandals@crustytoothpaste.net>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
    -    Signed-off-by: Junio C Hamano <gitster@pobox.com>
     
      ## refs.c ##
     @@ refs.c: int ref_transaction_update_reflog(struct ref_transaction *transaction,
    @@ refs/reftable-backend.c: struct write_transaction_table_arg {
      };
      
      struct reftable_transaction_data {
    +@@ refs/reftable-backend.c: static int prepare_transaction_update(struct write_transaction_table_arg **out,
    + 		arg->updates_nr = 0;
    + 		arg->updates_alloc = 0;
    + 		arg->updates_expected = 0;
    ++		arg->max_index = 0;
    + 	}
    + 
    + 	arg->updates_expected++;
     @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writer *writer, void *cb_data
      	struct reftable_log_record *logs = NULL;
      	struct ident_split committer_ident = {0};
    @@ refs/reftable-backend.c: static int write_transaction_table(struct reftable_writ
      		if (ret < 0)
      			goto done;
     @@ refs/reftable-backend.c: static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
    - 	struct reftable_transaction_data *tx_data = transaction->backend_data;
      	int ret = 0;
      
    -+	if (tx_data->args)
    -+		tx_data->args->max_index = transaction->max_index;
    -+
      	for (size_t i = 0; i < tx_data->args_nr; i++) {
    ++		tx_data->args[i].max_index = transaction->max_index;
    ++
      		ret = reftable_addition_add(tx_data->args[i].addition,
      					    write_transaction_table, &tx_data->args[i]);
    + 		if (ret < 0)
     
      ## t/t1460-refs-migrate.sh ##
     @@ t/t1460-refs-migrate.sh: do
---
 refs.c                  |  7 +++++++
 refs/refs-internal.h    |  1 +
 refs/reftable-backend.c | 20 ++++++++++----------
 t/t1460-refs-migrate.sh | 12 ++++++++++++
 4 files changed, 30 insertions(+), 10 deletions(-)

Comments

Junio C Hamano Jan. 23, 2025, 6:11 p.m. UTC | #1
Karthik Nayak <karthik.188@gmail.com> writes:

> While this patch was merged to next, Dscho reported that it was flaky
> on macos pipeline. On further investigation I found this was easily
> reproducible when the leak sanitizer was turned on and the reftable
> tests were run. The fix was simply to add the missing 0 initialization.

If it is already _in_ 'next', please turn it into a relative patch
on top of it, instead of replacing it.

That will give you an opportunity to describe the breakage in the
original version, which everybody missed until it hit 'next'.  And
you can also credit the folks who reported the breakage, and
describe the fix.

The reason we do not revert out of 'next' lightly is because the
changes we merge to 'next' are supposed to be reviewed well enough,
which means that any bug we discover later is likely to have been
caused by mistakes any of us may repeat in the future, and it is
worth documenting in our history.

It is quite a different review philosophy if you compare the rules
we use for patches that haven't hit 'next'.  These uncooked patches
may have mistrakes that reviewers can easily spot and get corrected,
and these easy ones are not worth documenting as much.

> The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13). 

Thanks.
Junio C Hamano Jan. 23, 2025, 6:24 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> While this patch was merged to next, Dscho reported that it was flaky
>> on macos pipeline. On further investigation I found this was easily
>> reproducible when the leak sanitizer was turned on and the reftable
>> tests were run. The fix was simply to add the missing 0 initialization.
>
> If it is already _in_ 'next', please turn it into a relative patch
> on top of it, instead of replacing it.

For now, I have tentatively created the following and will queue on
a separate kn/reflog-migration-fix-fix topic (which would be ahead
of kn/reflog-migration-fix topic by this one commit), in the hope
that it can be replaced with a version with proper commit log
message that describes what bugs in the original "fix" are
addressed, how they are caused (e.g., how does it lead to the
breakage to forget clearing of arg->max_index in the first hunk
had?), and what their fixes are.

Thanks.

--- >8 ---
From: Karthik Nayak <karthik.188@gmail.com>
Date: Fri, 20 Dec 2024 13:58:37 +0100
Subject: [PATCH] SQUASH - needs to describe the breakage and fix in v1

---
 refs/reftable-backend.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 68db2baa8f..bb658826fe 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -920,6 +920,7 @@ static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1502,10 +1503,9 @@ static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	struct reftable_transaction_data *tx_data = transaction->backend_data;
 	int ret = 0;
 
-	if (tx_data->args)
-		tx_data->args->max_index = transaction->max_index;
-
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
Karthik Nayak Jan. 24, 2025, 4:06 a.m. UTC | #3
Junio C Hamano <gitster@pobox.com> writes:

> Karthik Nayak <karthik.188@gmail.com> writes:
>
>> While this patch was merged to next, Dscho reported that it was flaky
>> on macos pipeline. On further investigation I found this was easily
>> reproducible when the leak sanitizer was turned on and the reftable
>> tests were run. The fix was simply to add the missing 0 initialization.
>
> If it is already _in_ 'next', please turn it into a relative patch
> on top of it, instead of replacing it.
>
> That will give you an opportunity to describe the breakage in the
> original version, which everybody missed until it hit 'next'.  And
> you can also credit the folks who reported the breakage, and
> describe the fix.
>
> The reason we do not revert out of 'next' lightly is because the
> changes we merge to 'next' are supposed to be reviewed well enough,
> which means that any bug we discover later is likely to have been
> caused by mistakes any of us may repeat in the future, and it is
> worth documenting in our history.
>
> It is quite a different review philosophy if you compare the rules
> we use for patches that haven't hit 'next'.  These uncooked patches
> may have mistrakes that reviewers can easily spot and get corrected,
> and these easy ones are not worth documenting as much.
>

Thanks Junio, I understand your reasoning here and it makes sense to me.
Do you think it is worthwhile to also add something like this to our
Documentation? I couldn't find anything there. I'll add a small patch to
the bottom of this mail.

>> The patch is based on Maint f93ff170b9 (Git 2.48.1, 2025-01-13).
>
> Thanks.

-- >8 --

Subject: [PATCH] doc: add guideline to tackle bugs in `next`

When fixing a bug in a topic already merged into `next`, there are no
strict guidelines to follow. While topics in `seen` can be reverted,
topics in `next` have undergone thorough review. Documenting fixes for
such topics is valuable, as it helps to clarify the issue and
contributes to preventing similar problems in the future.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 Documentation/SubmittingPatches | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
index 958e3cc3d5..72454acf21 100644
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -115,6 +115,13 @@ latest HEAD commit of `maint` or `master` based
on the following cases:
   new API features on the cutting edge that recently appeared in
   `master` but were not available in the released version).

+* If you're fixing a bug in a topic that's already been merged into
+  `next`, it's preferable to create a patch relative to that topic.
+  This approach allows you to describe the issue in the original version
+  that went unnoticed until it reached next. Additionally, it provides
+  an opportunity to credit those who reported the issue and document
+  the details of the fix.
+
 * Otherwise (such as if you are adding new features) use `master`.
Junio C Hamano Jan. 24, 2025, 3:25 p.m. UTC | #4
Karthik Nayak <karthik.188@gmail.com> writes:

> Thanks Junio, I understand your reasoning here and it makes sense to me.
> Do you think it is worthwhile to also add something like this to our
> Documentation? I couldn't find anything there. I'll add a small patch to
> the bottom of this mail.

Perhaps.  I think referring people to "Notes from the maintainer"
may also work without duplicating information.

Thanks.
diff mbox series

Patch

diff --git a/refs.c b/refs.c
index 0f41b2fd4a..f7b6f0f897 100644
--- a/refs.c
+++ b/refs.c
@@ -1345,6 +1345,13 @@  int ref_transaction_update_reflog(struct ref_transaction *transaction,
 	update->flags &= ~REF_HAVE_OLD;
 	update->index = index;
 
+	/*
+	 * Reference backends may need to know the max index to optimize
+	 * their writes. So we store the max_index on the transaction level.
+	 */
+	if (index > transaction->max_index)
+		transaction->max_index = index;
+
 	return 0;
 }
 
diff --git a/refs/refs-internal.h b/refs/refs-internal.h
index 16550862d3..aaab711bb9 100644
--- a/refs/refs-internal.h
+++ b/refs/refs-internal.h
@@ -203,6 +203,7 @@  struct ref_transaction {
 	enum ref_transaction_state state;
 	void *backend_data;
 	unsigned int flags;
+	unsigned int max_index;
 };
 
 /*
diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 00d95a9a2f..d39a14c5a4 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -942,6 +942,7 @@  struct write_transaction_table_arg {
 	size_t updates_nr;
 	size_t updates_alloc;
 	size_t updates_expected;
+	unsigned int max_index;
 };
 
 struct reftable_transaction_data {
@@ -1019,6 +1020,7 @@  static int prepare_transaction_update(struct write_transaction_table_arg **out,
 		arg->updates_nr = 0;
 		arg->updates_alloc = 0;
 		arg->updates_expected = 0;
+		arg->max_index = 0;
 	}
 
 	arg->updates_expected++;
@@ -1428,7 +1430,6 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	struct reftable_log_record *logs = NULL;
 	struct ident_split committer_ident = {0};
 	size_t logs_nr = 0, logs_alloc = 0, i;
-	uint64_t max_update_index = ts;
 	const char *committer_info;
 	int ret = 0;
 
@@ -1438,7 +1439,12 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 
 	QSORT(arg->updates, arg->updates_nr, transaction_update_cmp);
 
-	reftable_writer_set_limits(writer, ts, ts);
+	/*
+	 * During reflog migration, we add indexes for a single reflog with
+	 * multiple entries. Each entry will contain a different update_index,
+	 * so set the limits accordingly.
+	 */
+	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1540,12 +1546,6 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 				 */
 				log->update_index = ts + u->index;
 
-				/*
-				 * Note the max update_index so the limit can be set later on.
-				 */
-				if (log->update_index > max_update_index)
-					max_update_index = log->update_index;
-
 				log->refname = xstrdup(u->refname);
 				memcpy(log->value.update.new_hash,
 				       u->new_oid.hash, GIT_MAX_RAWSZ);
@@ -1609,8 +1609,6 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * and log blocks.
 	 */
 	if (logs) {
-		reftable_writer_set_limits(writer, ts, max_update_index);
-
 		ret = reftable_writer_add_logs(writer, logs, logs_nr);
 		if (ret < 0)
 			goto done;
@@ -1632,6 +1630,8 @@  static int reftable_be_transaction_finish(struct ref_store *ref_store UNUSED,
 	int ret = 0;
 
 	for (size_t i = 0; i < tx_data->args_nr; i++) {
+		tx_data->args[i].max_index = transaction->max_index;
+
 		ret = reftable_addition_add(tx_data->args[i].addition,
 					    write_transaction_table, &tx_data->args[i]);
 		if (ret < 0)
diff --git a/t/t1460-refs-migrate.sh b/t/t1460-refs-migrate.sh
index f59bc4860f..307b2998ef 100755
--- a/t/t1460-refs-migrate.sh
+++ b/t/t1460-refs-migrate.sh
@@ -227,6 +227,18 @@  do
 	done
 done
 
+test_expect_success 'multiple reftable blocks with multiple entries' '
+	test_when_finished "rm -rf repo" &&
+	git init --ref-format=files repo &&
+	test_commit -C repo first &&
+	printf "create refs/heads/ref-%d HEAD\n" $(test_seq 5000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_commit -C repo second &&
+	printf "update refs/heads/ref-%d HEAD\n" $(test_seq 3000) >stdin &&
+	git -C repo update-ref --stdin <stdin &&
+	test_migration repo reftable
+'
+
 test_expect_success 'migrating from files format deletes backend files' '
 	test_when_finished "rm -rf repo" &&
 	git init --ref-format=files repo &&