Message ID | 20250117-461-corrupted-reftable-followup-v1-3-70ee605ae3fe@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | refs: small followups to the migration corruption fix | expand |
On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: > diff --git a/reftable/writer.c b/reftable/writer.c > index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 > --- a/reftable/writer.c > +++ b/reftable/writer.c > @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, > void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, > uint64_t max) > { > + /* > + * The limits shouldn't be modified post writing the first block, else > + * it would cause a mismatch between the header and the footer. > + */ Can we make this *even* stricter? I think that this is something that is easy to do wrong, and the fact that it only triggers in some situations of misuse may easily make tests miss this issue. So ideally, we should assert that `set_limits()` is always called before queueing any records to the writer. This would make us error out in all situations where the calling order is wrong. There are two ways I can see us doing that: - Detect any state written by `writer_add_record()` and error out if it's set when `reftable_writer_set_limits()` is called. - Adapt `reftable_writer_new()` so that it takes the update indices as input and drop `reftable_writer_set_limits()` altogether. The latter might be preferable as you basically want to set limits in all (most?) situations anyway. > + if (w->next) > + BUG("update index modified after writing first block"); Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It requires a bit more plumbing because we'll also hvae to adapt all callers to handle errors. But on the one hand we don't want to die in library code. And on the other hand we don't want to keep on adding more dependencies on "git-compat-util.h". Patrick
Patrick Steinhardt <ps@pks.im> writes: > On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >> diff --git a/reftable/writer.c b/reftable/writer.c >> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >> --- a/reftable/writer.c >> +++ b/reftable/writer.c >> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >> uint64_t max) >> { >> + /* >> + * The limits shouldn't be modified post writing the first block, else >> + * it would cause a mismatch between the header and the footer. >> + */ > > Can we make this *even* stricter? I think that this is something that is > easy to do wrong, and the fact that it only triggers in some situations > of misuse may easily make tests miss this issue. So ideally, we should > assert that `set_limits()` is always called before queueing any records > to the writer. This would make us error out in all situations where the > calling order is wrong. > I agree here, it makes sense to make this stricter. Like you mentioned, currently they are independent. The only way to enforce the limits is to ensure that they are dependent. > There are two ways I can see us doing that: > > - Detect any state written by `writer_add_record()` and error out if > it's set when `reftable_writer_set_limits()` is called. > Yeah I think this would be simple to do. I guess we can check `w->last_key` is set, since any record write would modify that. > - Adapt `reftable_writer_new()` so that it takes the update indices as > input and drop `reftable_writer_set_limits()` altogether. > This one is a bit harder to do because of our flow. Generally the writer is provided to callers via a callback function passed to `reftable_addition_add()`. I guess I could simply pass the data: caller -> reftable_addition_add() -> reftable_writer_new() Any direct users of `reftable_writer_new()` would simply pass the data directly. I'll play around and see if this is doable without too much refactoring and have something in the next version. > The latter might be preferable as you basically want to set limits in > all (most?) situations anyway. > >> + if (w->next) >> + BUG("update index modified after writing first block"); > > Let's not use BUG, but rather return a `REFTABLE_API_ERROR` error. It > requires a bit more plumbing because we'll also hvae to adapt all > callers to handle errors. But on the one hand we don't want to die in > library code. And on the other hand we don't want to keep on adding more > dependencies on "git-compat-util.h". > Fair enough, thanks for explaining. > Patrick
Karthik Nayak <karthik.188@gmail.com> writes: > Patrick Steinhardt <ps@pks.im> writes: > >> On Fri, Jan 17, 2025 at 08:59:14AM +0100, Karthik Nayak wrote: >>> diff --git a/reftable/writer.c b/reftable/writer.c >>> index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 >>> --- a/reftable/writer.c >>> +++ b/reftable/writer.c >>> @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, >>> void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, >>> uint64_t max) >>> { >>> + /* >>> + * The limits shouldn't be modified post writing the first block, else >>> + * it would cause a mismatch between the header and the footer. >>> + */ >> >> Can we make this *even* stricter? I think that this is something that is >> easy to do wrong, and the fact that it only triggers in some situations >> of misuse may easily make tests miss this issue. So ideally, we should >> assert that `set_limits()` is always called before queueing any records >> to the writer. This would make us error out in all situations where the >> calling order is wrong. >> > > I agree here, it makes sense to make this stricter. Like you mentioned, > currently they are independent. The only way to enforce the limits is to > ensure that they are dependent. > >> There are two ways I can see us doing that: >> >> - Detect any state written by `writer_add_record()` and error out if >> it's set when `reftable_writer_set_limits()` is called. >> > > Yeah I think this would be simple to do. I guess we can check > `w->last_key` is set, since any record write would modify that. > >> - Adapt `reftable_writer_new()` so that it takes the update indices as >> input and drop `reftable_writer_set_limits()` altogether. >> > > This one is a bit harder to do because of our flow. Generally the writer > is provided to callers via a callback function passed to > `reftable_addition_add()`. I guess I could simply pass the data: > > caller -> reftable_addition_add() -> reftable_writer_new() > > Any direct users of `reftable_writer_new()` would simply pass the data > directly. > > I'll play around and see if this is doable without too much refactoring > and have something in the next version. It seems like `reftable_addition_add()` is internally and externally called by a lot of functions and as such, modifying them all would be tedious. Another idea is to modify the function pointer that `reftable_addition_add()` receives so that the `write_table` argument receives a function which would set the limits, but this seems like bad design to me. So the best/easiest way to do this is to error out if any state is set before calling `reftable_writer_set_limits()`. I also realized that if `reftable_writer_set_limits()` isn't called, the writer errors anyways. So just ensuring that `reftable_writer_set_limits()` checks for any state being set would work well. [snip]
diff --git a/reftable/writer.c b/reftable/writer.c index 740c98038eaf883258bef4988f78977ac7e4a75a..c602b873543790e36178f797ed9f98112671f97f 100644 --- a/reftable/writer.c +++ b/reftable/writer.c @@ -182,6 +182,13 @@ int reftable_writer_new(struct reftable_writer **out, void reftable_writer_set_limits(struct reftable_writer *w, uint64_t min, uint64_t max) { + /* + * The limits shouldn't be modified post writing the first block, else + * it would cause a mismatch between the header and the footer. + */ + if (w->next) + BUG("update index modified after writing first block"); + w->min_update_index = min; w->max_update_index = max; }
The function `reftable_writer_set_limits()` allows updating the 'min_update_index' and 'max_update_index' of a reftable writer. These values are written to both the writer's header and footer. Since the header is written during the first block write, any subsequent changes to the update index would create a mismatch between the header and footer values. The footer would contain the newer values while the header retained the original ones. To fix this bug, we now prevent callers from updating these values after the header has been written. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- reftable/writer.c | 7 +++++++ 1 file changed, 7 insertions(+)