mbox series

[0/3] reftable: graceful concurrent writes

Message ID cover.1726578382.git.ps@pks.im (mailing list archive)
Headers show
Series reftable: graceful concurrent writes | expand

Message

Patrick Steinhardt Sept. 17, 2024, 1:43 p.m. UTC
Hi,

the reftable backend cannot properly handle concurrent writes due to two
reasons:

  - It will bail out immediately when it sees a locked "tables.list"
    file. This is being addressed by introducing a configurable locking
    timeout, similar to how we have it for both loose and packed refs.

  - It will bail out when it notices that its stack is out-of-date after
    having locked the "tables.list" file. This is addressed by reloading
    the stack as requested after locking it, which is fine because our
    transactional API would verify queued ref updates against their
    expected state after the lock was acquired anyway.

So with this patch series we can now spawn concurrent writers and they
are expected to succeed, which is demonstrated by the test added by the
last patch.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  7 ++++
 refs/reftable-backend.c           |  9 ++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        |  8 ++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 181 insertions(+), 19 deletions(-)

Comments

karthik nayak Sept. 17, 2024, 6:26 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> the reftable backend cannot properly handle concurrent writes due to two
> reasons:
>
>   - It will bail out immediately when it sees a locked "tables.list"
>     file. This is being addressed by introducing a configurable locking
>     timeout, similar to how we have it for both loose and packed refs.
>
>   - It will bail out when it notices that its stack is out-of-date after
>     having locked the "tables.list" file. This is addressed by reloading
>     the stack as requested after locking it, which is fine because our
>     transactional API would verify queued ref updates against their
>     expected state after the lock was acquired anyway.
>
> So with this patch series we can now spawn concurrent writers and they
> are expected to succeed, which is demonstrated by the test added by the
> last patch.
>

I only had a comment in the first commit. The rest two look good to me.
I do wonder if we really need a flag? But that's just a nit.

Thanks
Patrick Steinhardt Sept. 18, 2024, 4:31 a.m. UTC | #2
On Tue, Sep 17, 2024 at 11:26:58AM -0700, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Hi,
> >
> > the reftable backend cannot properly handle concurrent writes due to two
> > reasons:
> >
> >   - It will bail out immediately when it sees a locked "tables.list"
> >     file. This is being addressed by introducing a configurable locking
> >     timeout, similar to how we have it for both loose and packed refs.
> >
> >   - It will bail out when it notices that its stack is out-of-date after
> >     having locked the "tables.list" file. This is addressed by reloading
> >     the stack as requested after locking it, which is fine because our
> >     transactional API would verify queued ref updates against their
> >     expected state after the lock was acquired anyway.
> >
> > So with this patch series we can now spawn concurrent writers and they
> > are expected to succeed, which is demonstrated by the test added by the
> > last patch.
> >
> 
> I only had a comment in the first commit. The rest two look good to me.
> I do wonder if we really need a flag? But that's just a nit.

I didn't carefully vet all locations where we could pass that flag, so
right now we only pass it in a single location. Also, we need to keep in
mind that this is library code: even if we had converted all callsites
to pass the new flag I still it's a sensibe thing to wish for to disable
the automatic refresh and abort instead.

Thanks for your review!

Patrick