mbox series

[v3,0/3] refs: small followups to the migration corruption fix

Message ID 20250122-461-corrupted-reftable-followup-v3-0-ae5f88bf04fa@gmail.com (mailing list archive)
Headers show
Series refs: small followups to the migration corruption fix | expand

Message

Karthik Nayak Jan. 22, 2025, 5:35 a.m. UTC
This is a follow up to the bug that was reported [1] around `git refs
migrate --ref-format=reftable` where the migration would fail for
repositories with reflogs with lots of entries. This was caused due to a
mismatch in the reftable's header and footer, specifically WRT the
'max_update_index'.

While there was a fix posted. This series is a small followup to fix
some of the topics discussed there:
1. To mark `ref_transaction_update_reflog()` as static since it is only
used internally within 'refs.c'.
2. To change the type of 'max_index' from 'unsigned int' to 'uint64_t'.
This would be much safer for large repositories with millions of files
and on 32bit systems.
3. To add a safeguard to prevent 'update_index' changes post any record
addition. This is a preventive measure to ensure such bugs don't arise
in the future.

This is based on top of master 757161efcc (Sync with Git 2.48.1,
2025-01-13) with 'kn/reflog-migration-fix' merged in.

[1]: https://lore.kernel.org/r/Z4UbkcmJAU1MT-Rs@tapette.crustytoothpaste.net

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
Changes in v3:
- Commit 3: Modify the commit message to make it clearer that this is a
  preventive measure. Modify the condition to check for both `last_key`
  and `next`. Also add a unit test.
- Link to v2: https://lore.kernel.org/r/20250121-461-corrupted-reftable-followup-v2-0-37e26c7a79b4@gmail.com

Changes in v2:
- Commit 1: Keep the function comments.
- Commit 3: Modify the requirement to be for any record writes instead
of the first block write. This ensures that the limits need to be set 
before any records being added. Instead of using `BUG()`, return 
`REFTABLE_API_ERROR`. Handle all callers as needed.
- Link to v1: https://lore.kernel.org/r/20250117-461-corrupted-reftable-followup-v1-0-70ee605ae3fe@gmail.com

---
Karthik Nayak (3):
      refs: mark `ref_transaction_update_reflog()` as static
      refs: use 'uint64_t' for 'ref_update.index'
      reftable: prevent 'update_index' changes after adding records

 refs.c                          | 24 ++++++++++++------
 refs.h                          | 14 -----------
 refs/refs-internal.h            |  4 +--
 refs/reftable-backend.c         | 22 ++++++++++++-----
 reftable/reftable-error.h       |  1 +
 reftable/reftable-writer.h      | 24 ++++++++++--------
 reftable/stack.c                |  6 +++--
 reftable/writer.c               | 15 +++++++++++-
 t/unit-tests/t-reftable-stack.c | 54 ++++++++++++++++++++++++++++++++++++++---
 9 files changed, 118 insertions(+), 46 deletions(-)
---

Range-diff versus v2:

1:  76e6da564a = 1:  cafede9b10 refs: mark `ref_transaction_update_reflog()` as static
2:  f8e2e81eb4 = 2:  737360d883 refs: use 'uint64_t' for 'ref_update.index'
3:  ca92e29ecc ! 3:  01bf1d765f reftable: prevent 'update_index' changes after adding records
    @@ Commit message
         and footer values. The footer would contain the newer values while the
         header retained the original ones.
     
    -    To fix this bug, prevent callers from updating these values after any
    -    record is written. To do this, modify the function to return an error
    -    whenever the limits are modified after any record adds. Check for record
    -    adds within `reftable_writer_set_limits()` by checking the `last_key`
    -    variable, which is set whenever a new record is added.
    +    To protect against this bug, prevent callers from updating these values
    +    after any record is written. To do this, modify the function to return
    +    an error whenever the limits are modified after any record adds. Check
    +    for record adds within `reftable_writer_set_limits()` by checking the
    +    `last_key` and `next` variable. The former is updated after each record
    +    added, but is reset at certain points. The latter is set after writing
    +    the first block.
     
         Modify all callers of the function to anticipate a return type and
    -    handle it accordingly.
    +    handle it accordingly. Add a unit test to also ensure the function
    +    returns the error as expected.
     
         Helped-by: Patrick Steinhardt <ps@pks.im>
         Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
    @@ reftable/writer.c: int reftable_writer_new(struct reftable_writer **out,
      }
      
     -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
    --				uint64_t max)
     +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
    -+			       uint64_t max)
    + 				uint64_t max)
      {
     +	/*
    -+	 * The limits should be set before any records are added to the writer.
    -+	 * Check if any records were added by checking if `last_key` was set.
    ++	  * Set the min/max update index limits for the reftable writer.
    ++	  * This must be called before adding any records, since:
    ++	  * - The 'next' field gets set after writing the first block.
    ++	  * - The 'last_key' field updates with each new record (but resets
    ++	  *   after sections).
    ++	  * Returns REFTABLE_API_ERROR if called after writing has begun.
     +	 */
    -+	if (w->last_key.len)
    ++	if (w->next || w->last_key.len)
     +		return REFTABLE_API_ERROR;
     +
      	w->min_update_index = min;
    @@ t/unit-tests/t-reftable-stack.c: static void t_reflog_expire(void)
      	return 0;
      }
      
    +@@ t/unit-tests/t-reftable-stack.c: static void t_reftable_stack_reload_with_missing_table(void)
    + 	clear_dir(dir);
    + }
    + 
    ++static int write_limits_after_ref(struct reftable_writer *wr, void *arg)
    ++{
    ++	struct reftable_ref_record *ref = arg;
    ++	check(!reftable_writer_set_limits(wr, ref->update_index, ref->update_index));
    ++	check(!reftable_writer_add_ref(wr, ref));
    ++	return reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
    ++}
    ++
    ++static void t_reftable_invalid_limit_updates(void)
    ++{
    ++	struct reftable_ref_record ref = {
    ++		.refname = (char *) "HEAD",
    ++		.update_index = 1,
    ++		.value_type = REFTABLE_REF_SYMREF,
    ++		.value.symref = (char *) "master",
    ++	};
    ++	struct reftable_write_options opts = {
    ++		.default_permissions = 0660,
    ++	};
    ++	struct reftable_addition *add = NULL;
    ++	char *dir = get_tmp_dir(__LINE__);
    ++	struct reftable_stack *st = NULL;
    ++	int err;
    ++
    ++	err = reftable_new_stack(&st, dir, &opts);
    ++	check(!err);
    ++
    ++	reftable_addition_destroy(add);
    ++
    ++	err = reftable_stack_new_addition(&add, st, 0);
    ++	check(!err);
    ++
    ++	/*
    ++	 * write_limits_after_ref also updates the update indexes after adding
    ++	 * the record. This should cause an err to be returned, since the limits
    ++	 * must be set at the start.
    ++	 */
    ++	err = reftable_addition_add(add, write_limits_after_ref, &ref);
    ++	check_int(err, ==, REFTABLE_API_ERROR);
    ++
    ++	reftable_addition_destroy(add);
    ++	reftable_stack_destroy(st);
    ++	clear_dir(dir);
    ++}
    ++
    + int cmd_main(int argc UNUSED, const char *argv[] UNUSED)
    + {
    + 	TEST(t_empty_add(), "empty addition to stack");
    + 	TEST(t_read_file(), "read_lines works");
    + 	TEST(t_reflog_expire(), "expire reflog entries");
    ++	TEST(t_reftable_invalid_limit_updates(), "prevent limit updates after adding records");
    + 	TEST(t_reftable_stack_add(), "add multiple refs and logs to stack");
    + 	TEST(t_reftable_stack_add_one(), "add a single ref record to stack");
    + 	TEST(t_reftable_stack_add_performs_auto_compaction(), "addition to stack triggers auto-compaction");


---

base-commit: a5aa44e7930761cb900813d971b4105f901818fb
change-id: 20250117-461-corrupted-reftable-followup-eb0e4fd1a723

Thanks
- Karthik