diff mbox series

[3/3] reftable: prevent 'update_index' changes after header write

Message ID 20250117-461-corrupted-reftable-followup-v1-3-70ee605ae3fe@gmail.com (mailing list archive)
State Superseded
Headers show
Series refs: small followups to the migration corruption fix | expand

Commit Message

Karthik Nayak Jan. 17, 2025, 7:59 a.m. UTC
The function `reftable_writer_set_limits()` allows updating the
'min_update_index' and 'max_update_index' of a reftable writer. These
values are written to both the writer's header and footer.

Since the header is written during the first block write, any subsequent
changes to the update index would create a mismatch between the header
and footer values. The footer would contain the newer values while the
header retained the original ones. To fix this bug, we now prevent
callers from updating these values after the header has been written.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
---
 reftable/writer.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Patrick Steinhardt Jan. 17, 2025, 9:29 a.m. UTC | #1
On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
> diff --git a/reftable/writer.c b/reftable/writer.c
> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
> --- a/reftable/writer.c
> +++ b/reftable/writer.c
> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
>  void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>  				uint64_t max)
>  {
> +	/*
> +	 * The limits shouldn't be modified post writing the first block, else
> +	 * it would cause a mismatch between the header and the footer.
> +	 */

Can we make this *even* stricter? I think that this is something that is
easy to do wrong, and the fact that it only triggers in some situations
of misuse may easily make tests miss this issue. So ideally, we should
assert that `set_limits()` is always called before queueing any records
to the writer. This would make us error out in all situations where the
calling order is wrong.

There are two ways I can see us doing that:

  - Detect any state written by `writer_add_record()` and error out if
    it's set when `reftable_writer_set_limits()` is called.

  - Adapt `reftable_writer_new()` so that it takes the update indices as
    input and drop `reftable_writer_set_limits()` altogether.

The latter might be preferable as you basically want to set limits in
all (most?) situations anyway.

> +	if (w->next)
> +		BUG("update index modified after writing first block");

Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It
requires a bit more plumbing because we'll also hvae to adapt all
callers to handle errors. But on the one hand we don't want to die in
library code. And on the other hand we don't want to keep on adding more
dependencies on "git-compat-util.h".

Patrick
Karthik Nayak Jan. 20, 2025, 11:47 a.m. UTC | #2
Patrick Steinhardt <ps@pks.im> writes:

> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
>> diff --git a/reftable/writer.c b/reftable/writer.c
>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
>> --- a/reftable/writer.c
>> +++ b/reftable/writer.c
>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
>>  void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>>  				uint64_t max)
>>  {
>> +	/*
>> +	 * The limits shouldn't be modified post writing the first block, else
>> +	 * it would cause a mismatch between the header and the footer.
>> +	 */
>
> Can we make this *even* stricter? I think that this is something that is
> easy to do wrong, and the fact that it only triggers in some situations
> of misuse may easily make tests miss this issue. So ideally, we should
> assert that `set_limits()` is always called before queueing any records
> to the writer. This would make us error out in all situations where the
> calling order is wrong.
>

I agree here, it makes sense to make this stricter. Like you mentioned,
currently they are independent. The only way to enforce the limits is to
ensure that they are dependent.

> There are two ways I can see us doing that:
>
>   - Detect any state written by `writer_add_record()` and error out if
>     it's set when `reftable_writer_set_limits()` is called.
>

Yeah I think this would be simple to do. I guess we can check
`w->last_key` is set, since any record write would modify that.

>   - Adapt `reftable_writer_new()` so that it takes the update indices as
>     input and drop `reftable_writer_set_limits()` altogether.
>

This one is a bit harder to do because of our flow. Generally the writer
is provided to callers via a callback function passed to
`reftable_addition_add()`. I guess I could simply pass the data:

  caller -> reftable_addition_add() -> reftable_writer_new()

Any direct users of `reftable_writer_new()` would simply pass the data
directly.

I'll play around and see if this is doable without too much refactoring
and have something in the next version.

> The latter might be preferable as you basically want to set limits in
> all (most?) situations anyway.
>
>> +	if (w->next)
>> +		BUG("update index modified after writing first block");
>
> Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It
> requires a bit more plumbing because we'll also hvae to adapt all
> callers to handle errors. But on the one hand we don't want to die in
> library code. And on the other hand we don't want to keep on adding more
> dependencies on "git-compat-util.h".
>

Fair enough, thanks for explaining.

> Patrick
Karthik Nayak Jan. 20, 2025, 12:18 p.m. UTC | #3
Karthik Nayak <karthik.188@gmail.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote:
>>> diff --git a/reftable/writer.c b/reftable/writer.c
>>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
>>> --- a/reftable/writer.c
>>> +++ b/reftable/writer.c
>>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out,
>>>  void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
>>>  				uint64_t max)
>>>  {
>>> +	/*
>>> +	 * The limits shouldn't be modified post writing the first block, else
>>> +	 * it would cause a mismatch between the header and the footer.
>>> +	 */
>>
>> Can we make this *even* stricter? I think that this is something that is
>> easy to do wrong, and the fact that it only triggers in some situations
>> of misuse may easily make tests miss this issue. So ideally, we should
>> assert that `set_limits()` is always called before queueing any records
>> to the writer. This would make us error out in all situations where the
>> calling order is wrong.
>>
>
> I agree here, it makes sense to make this stricter. Like you mentioned,
> currently they are independent. The only way to enforce the limits is to
> ensure that they are dependent.
>
>> There are two ways I can see us doing that:
>>
>>   - Detect any state written by `writer_add_record()` and error out if
>>     it's set when `reftable_writer_set_limits()` is called.
>>
>
> Yeah I think this would be simple to do. I guess we can check
> `w->last_key` is set, since any record write would modify that.
>
>>   - Adapt `reftable_writer_new()` so that it takes the update indices as
>>     input and drop `reftable_writer_set_limits()` altogether.
>>
>
> This one is a bit harder to do because of our flow. Generally the writer
> is provided to callers via a callback function passed to
> `reftable_addition_add()`. I guess I could simply pass the data:
>
>   caller -> reftable_addition_add() -> reftable_writer_new()
>
> Any direct users of `reftable_writer_new()` would simply pass the data
> directly.
>
> I'll play around and see if this is doable without too much refactoring
> and have something in the next version.

It seems like `reftable_addition_add()` is internally and externally
called by a lot of functions and as such, modifying them all would be
tedious. Another idea is to modify the function pointer that
`reftable_addition_add()` receives so that the `write_table` argument
receives a function which would set the limits, but this seems like bad
design to me.

So the best/easiest way to do this is to error out if any state is set
before calling `reftable_writer_set_limits()`. I also realized that if
`reftable_writer_set_limits()` isn't called, the writer errors anyways.
So just ensuring that `reftable_writer_set_limits()` checks for any
state being set would work well.

[snip]
diff mbox series

Patch

diff --git a/reftable/writer.c b/reftable/writer.c
index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644
--- a/reftable/writer.c
+++ b/reftable/writer.c
@@ -182,6 +182,13 @@  int reftable_writer_new(struct reftable_writer **out,
 void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min,
 				uint64_t max)
 {
+	/*
+	 * The limits shouldn't be modified post writing the first block, else
+	 * it would cause a mismatch between the header and the footer.
+	 */
+	if (w->next)
+		BUG("update index modified after writing first block");
+
 	w->min_update_index = min;
 	w->max_update_index = max;
 }