diff mbox series

[v2,3/3] reftable: prevent 'update_index' changes after adding records

Message ID 20250121-461-corrupted-reftable-followup-v2-3-37e26c7a79b4@gmail.com (mailing list archive)
State Superseded
Headers show
Series refs: small followups to the migration corruption fix | expand

Commit Message

Karthik Nayak Jan. 21, 2025, 3:34 a.m. UTC
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.

Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones.

To fix this bug, prevent callers from updating these values after any
record is written. To do this, modify the function to return an error
whenever the limits are modified after any record adds. Check for record
adds within `reftable_writer_set_limits()` by checking the `last_key`
variable, which is set whenever a new record is added.

Modify all callers of the function to anticipate a return type and
handle it accordingly.

Helped-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 refs/reftable-backend.c         | 20 +++++++++++++++-----
 reftable/reftable-error.h       |  1 +
 reftable/reftable-writer.h      | 24 ++++++++++++++----------
 reftable/stack.c                |  6 ++++--
 reftable/writer.c               | 13 +++++++++++--
 t/unit-tests/t-reftable-stack.c |  8 +++++---
 6 files changed, 50 insertions(+), 22 deletions(-)

Comments

Patrick Steinhardt Jan. 21, 2025, 6:56 a.m. UTC | #1
On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote:
> The function `reftable_writer_set_limits()` allows updating the
> 'min_update_index' and 'max_update_index' of a reftable writer. These
> values are written to both the writer's header and footer.
> 
> Since the header is written during the first block write, any subsequent
> changes to the update index would create a mismatch between the header
> and footer values. The footer would contain the newer values while the
> header retained the original ones.
> 
> To fix this bug, prevent callers from updating these values after any

Nit: it's not really fixing a bug, but protecting us against it. Not
worth a reroll though, from my point of view.

> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
> --- a/reftable/reftable-writer.h
> +++ b/reftable/reftable-writer.h
> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
>  			int (*flush_func)(void *),
>  			void *writer_arg, const struct reftable_write_options *opts);
>  
> -/* Set the range of update indices for the records we will add. When writing a
> -   table into a stack, the min should be at least
> -   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> -
> -   For transactional updates to a stack, typically min==max, and the
> -   update_index can be obtained by inspeciting the stack. When converting an
> -   existing ref database into a single reftable, this would be a range of
> -   update-index timestamps.
> +/*
> + * Set the range of update indices for the records we will add. When writing a
> + * table into a stack, the min should be at least
> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
> + *
> + * For transactional updates to a stack, typically min==max, and the
> + * update_index can be obtained by inspeciting the stack. When converting an
> + * existing ref database into a single reftable, this would be a range of
> + * update-index timestamps.
> + *
> + * The function should be called before adding any records to the writer. If not
> + * it will fail with REFTABLE_API_ERROR.
>   */

Thanks for updating this. I think the reftable library is one of those
code areas where it makes sense to sneak in a formatting fix every now
and then because its coding style is quite alien to Git's own in some
places. We could also do it all in one go, but I strongly doubt that it
would be worth the churn.

> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> -				uint64_t max);
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> +			       uint64_t max);
>  
>  /*
>    Add a reftable_ref_record. The record should have names that come after

> diff --git a/reftable/writer.c b/reftable/writer.c
> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
>  	return 0;
>  }
>  
> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> -				uint64_t max)
> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
> +			       uint64_t max)
>  {
> +	/*
> +	 * The limits should be set before any records are added to the writer.
> +	 * Check if any records were added by checking if `last_key` was set.
> +	 */
> +	if (w->last_key.len)
> +		return REFTABLE_API_ERROR;

Hm. Using the last key feels somewhat dangerous to me as it does get
reset at times, e.g. when finishing writing the current section. It
_should_ work, but overall it just feels a tad to disconnected from the
thing that we actually want to check.

How about we instead use `next`? This variable records the offset of the
next block we're about to write, and `writer_flush_nonempty_block()`
uses it directly to check whether we're currently writing the first
block in order to decide whether it needs to write a header or not. If
it's 0, we know that we haven't written the first block yet. That feels
much closer aligned with what we're checking.

> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
> --- a/t/unit-tests/t-reftable-stack.c
> +++ b/t/unit-tests/t-reftable-stack.c

Can we maybe add a unit test that demonstrates the error?

Patrick
Karthik Nayak Jan. 21, 2025, 11:44 a.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> On Tue, Jan 21, 2025 at 04:34:12AM +0100, Karthik Nayak wrote:
>> The function `reftable_writer_set_limits()` allows updating the
>> 'min_update_index' and 'max_update_index' of a reftable writer. These
>> values are written to both the writer's header and footer.
>>
>> Since the header is written during the first block write, any subsequent
>> changes to the update index would create a mismatch between the header
>> and footer values. The footer would contain the newer values while the
>> header retained the original ones.
>>
>> To fix this bug, prevent callers from updating these values after any
>
> Nit: it's not really fixing a bug, but protecting us against it. Not
> worth a reroll though, from my point of view.
>

That's right, I'll add that in.

>> diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
>> index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
>> --- a/reftable/reftable-writer.h
>> +++ b/reftable/reftable-writer.h
>> @@ -124,17 +124,21 @@ int reftable_writer_new(struct reftable_writer **out,
>>  			int (*flush_func)(void *),
>>  			void *writer_arg, const struct reftable_write_options *opts);
>>
>> -/* Set the range of update indices for the records we will add. When writing a
>> -   table into a stack, the min should be at least
>> -   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
>> -
>> -   For transactional updates to a stack, typically min==max, and the
>> -   update_index can be obtained by inspeciting the stack. When converting an
>> -   existing ref database into a single reftable, this would be a range of
>> -   update-index timestamps.
>> +/*
>> + * Set the range of update indices for the records we will add. When writing a
>> + * table into a stack, the min should be at least
>> + * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
>> + *
>> + * For transactional updates to a stack, typically min==max, and the
>> + * update_index can be obtained by inspeciting the stack. When converting an
>> + * existing ref database into a single reftable, this would be a range of
>> + * update-index timestamps.
>> + *
>> + * The function should be called before adding any records to the writer. If not
>> + * it will fail with REFTABLE_API_ERROR.
>>   */
>
> Thanks for updating this. I think the reftable library is one of those
> code areas where it makes sense to sneak in a formatting fix every now
> and then because its coding style is quite alien to Git's own in some
> places. We could also do it all in one go, but I strongly doubt that it
> would be worth the churn.
>

Generally I try to sneak in small fixes like this around code being
touched. I know it is a little more toll on reviewers, but small
improvements do add up.

>> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> -				uint64_t max);
>> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> +			       uint64_t max);
>>
>>  /*
>>    Add a reftable_ref_record. The record should have names that come after
>
>> diff --git a/reftable/writer.c b/reftable/writer.c
>> index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
>> --- a/reftable/writer.c
>> +++ b/reftable/writer.c
>> @@ -179,11 +179,20 @@ int reftable_writer_new(struct reftable_writer **out,
>>  	return 0;
>>  }
>>
>> -void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> -				uint64_t max)
>> +int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>> +			       uint64_t max)
>>  {
>> +	/*
>> +	 * The limits should be set before any records are added to the writer.
>> +	 * Check if any records were added by checking if `last_key` was set.
>> +	 */
>> +	if (w->last_key.len)
>> +		return REFTABLE_API_ERROR;
>
> Hm. Using the last key feels somewhat dangerous to me as it does get
> reset at times, e.g. when finishing writing the current section. It
> _should_ work, but overall it just feels a tad to disconnected from the
> thing that we actually want to check.
>
> How about we instead use `next`? This variable records the offset of the
> next block we're about to write, and `writer_flush_nonempty_block()`
> uses it directly to check whether we're currently writing the first
> block in order to decide whether it needs to write a header or not. If
> it's 0, we know that we haven't written the first block yet. That feels
> much closer aligned with what we're checking.
>

The last version did use `next`. I changed it because `next` is only
modified once the first block has been written. This would still allow
limit modification post writing of first few records.

This should be okay however since we're concerned about header <> footer
mismatch. But from an ideological point, it makes sense to only allow
limit modification before _any_ records have been written.

I'm thinking if we should use both `if (w->next || w->last_key.len)`.
This way we capture all modifications.

>> diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
>> index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
>> --- a/t/unit-tests/t-reftable-stack.c
>> +++ b/t/unit-tests/t-reftable-stack.c
>
> Can we maybe add a unit test that demonstrates the error?

Good suggestion, will add it!



> Patrick
diff mbox series

Patch

diff --git a/refs/reftable-backend.c b/refs/reftable-backend.c
index 6814c87bc618229ac8a70b904be3f850371ad876..9cfb0cb26721a9425c3b4a374f7b41e192037315 100644
--- a/refs/reftable-backend.c
+++ b/refs/reftable-backend.c
@@ -1443,7 +1443,9 @@  static int write_transaction_table(struct reftable_writer *writer, void *cb_data
 	 * multiple entries. Each entry will contain a different update_index,
 	 * so set the limits accordingly.
 	 */
-	reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+	ret = reftable_writer_set_limits(writer, ts, ts + arg->max_index);
+	if (ret < 0)
+		goto done;
 
 	for (i = 0; i < arg->updates_nr; i++) {
 		struct reftable_transaction_update *tx_update = &arg->updates[i];
@@ -1766,7 +1768,9 @@  static int write_copy_table(struct reftable_writer *writer, void *cb_data)
 	deletion_ts = creation_ts = reftable_stack_next_update_index(arg->be->stack);
 	if (arg->delete_old)
 		creation_ts++;
-	reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+	ret = reftable_writer_set_limits(writer, deletion_ts, creation_ts);
+	if (ret < 0)
+		goto done;
 
 	/*
 	 * Add the new reference. If this is a rename then we also delete the
@@ -2298,7 +2302,9 @@  static int write_reflog_existence_table(struct reftable_writer *writer,
 	if (ret <= 0)
 		goto done;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		goto done;
 
 	/*
 	 * The existence entry has both old and new object ID set to the
@@ -2357,7 +2363,9 @@  static int write_reflog_delete_table(struct reftable_writer *writer, void *cb_da
 	uint64_t ts = reftable_stack_next_update_index(arg->stack);
 	int ret;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		goto out;
 
 	ret = reftable_stack_init_log_iterator(arg->stack, &it);
 	if (ret < 0)
@@ -2434,7 +2442,9 @@  static int write_reflog_expiry_table(struct reftable_writer *writer, void *cb_da
 		if (arg->records[i].value_type == REFTABLE_LOG_UPDATE)
 			live_records++;
 
-	reftable_writer_set_limits(writer, ts, ts);
+	ret = reftable_writer_set_limits(writer, ts, ts);
+	if (ret < 0)
+		return ret;
 
 	if (!is_null_oid(&arg->update_oid)) {
 		struct reftable_ref_record ref = {0};
diff --git a/reftable/reftable-error.h b/reftable/reftable-error.h
index f4048265629fe456207b88620658193f770a84f0..a7e33d964d0cfe5546f588d26c0fcb66ab326828 100644
--- a/reftable/reftable-error.h
+++ b/reftable/reftable-error.h
@@ -30,6 +30,7 @@  enum reftable_error {
 
 	/* Misuse of the API:
 	 *  - on writing a record with NULL refname.
+	 *  - on writing a record before setting the writer limits.
 	 *  - on writing a reftable_ref_record outside the table limits
 	 *  - on writing a ref or log record before the stack's
 	 * next_update_inde*x
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 5f9afa620bb00de66c311765fb0ae8c6f56401ae..1ea014d389cc47f173279e3234a82f3fcbc807a0 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -124,17 +124,21 @@  int reftable_writer_new(struct reftable_writer **out,
 			int (*flush_func)(void *),
 			void *writer_arg, const struct reftable_write_options *opts);
 
-/* Set the range of update indices for the records we will add. When writing a
-   table into a stack, the min should be at least
-   reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
-
-   For transactional updates to a stack, typically min==max, and the
-   update_index can be obtained by inspeciting the stack. When converting an
-   existing ref database into a single reftable, this would be a range of
-   update-index timestamps.
+/*
+ * Set the range of update indices for the records we will add. When writing a
+ * table into a stack, the min should be at least
+ * reftable_stack_next_update_index(), or REFTABLE_API_ERROR is returned.
+ *
+ * For transactional updates to a stack, typically min==max, and the
+ * update_index can be obtained by inspeciting the stack. When converting an
+ * existing ref database into a single reftable, this would be a range of
+ * update-index timestamps.
+ *
+ * The function should be called before adding any records to the writer. If not
+ * it will fail with REFTABLE_API_ERROR.
  */
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-				uint64_t max);
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+			       uint64_t max);
 
 /*
   Add a reftable_ref_record. The record should have names that come after
diff --git a/reftable/stack.c b/reftable/stack.c
index 531660a49f0948c33041831ee0d740feacb22b2f..9649dbbb04c51e106ee752f14481bbad381cb348 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -1058,8 +1058,10 @@  static int stack_write_compact(struct reftable_stack *st,
 
 	for (size_t i = first; i <= last; i++)
 		st->stats.bytes += st->readers[i]->size;
-	reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
-				   st->readers[last]->max_update_index);
+	err = reftable_writer_set_limits(wr, st->readers[first]->min_update_index,
+					 st->readers[last]->max_update_index);
+	if (err < 0)
+		goto done;
 
 	err = reftable_merged_table_new(&mt, st->readers + first, subtabs_len,
 					st->opts.hash_id);
diff --git a/reftable/writer.c b/reftable/writer.c
index 740c98038eaf883258bef4988f78977ac7e4a75a..03acbdbcce75fd51820c5fb016bd94f0f7f4914a 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -179,11 +179,20 @@  int reftable_writer_new(struct reftable_writer **out,
 	return 0;
 }
 
-void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
-				uint64_t max)
+int reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
+			       uint64_t max)
 {
+	/*
+	 * The limits should be set before any records are added to the writer.
+	 * Check if any records were added by checking if `last_key` was set.
+	 */
+	if (w->last_key.len)
+		return REFTABLE_API_ERROR;
+
 	w->min_update_index = min;
 	w->max_update_index = max;
+
+	return 0;
 }
 
 static void writer_release(struct reftable_writer *w)
diff --git a/t/unit-tests/t-reftable-stack.c b/t/unit-tests/t-reftable-stack.c
index aeec195b2b1014445d71c5db39a9795017fd8ff2..b23edf18a7d75b0c2292490ad06d4dfaaa571e79 100644
--- a/t/unit-tests/t-reftable-stack.c
+++ b/t/unit-tests/t-reftable-stack.c
@@ -103,7 +103,8 @@  static void t_read_file(void)
 static int write_test_ref(struct reftable_writer *wr, void *arg)
 {
 	struct reftable_ref_record *ref = arg;
-	reftable_writer_set_limits(wr, ref->update_index, ref->update_index);
+	check(!reftable_writer_set_limits(wr, ref->update_index,
+					  ref->update_index));
 	return reftable_writer_add_ref(wr, ref);
 }
 
@@ -143,7 +144,8 @@  static int write_test_log(struct reftable_writer *wr, void *arg)
 {
 	struct write_log_arg *wla = arg;
 
-	reftable_writer_set_limits(wr, wla->update_index, wla->update_index);
+	check(!reftable_writer_set_limits(wr, wla->update_index,
+					  wla->update_index));
 	return reftable_writer_add_log(wr, wla->log);
 }
 
@@ -961,7 +963,7 @@  static void t_reflog_expire(void)
 
 static int write_nothing(struct reftable_writer *wr, void *arg UNUSED)
 {
-	reftable_writer_set_limits(wr, 1, 1);
+	check(!reftable_writer_set_limits(wr, 1, 1));
 	return 0;
 }