[0/3] reftable: graceful concurrent writes

Message ID	cover.1726578382.git.ps@pks.im (mailing list archive)
Headers	show Received: from fout5-smtp.messagingengine.com (fout5-smtp.messagingengine.com [103.168.172.148]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6FF6115C14D for <git@vger.kernel.org>; Tue, 17 Sep 2024 13:43:22 +0000 (UTC) Feedback-ID: i197146af:Fastmail Date: Tue, 17 Sep 2024 15:43:17 +0200 From: Patrick Steinhardt <ps@pks.im> To: git@vger.kernel.org Subject: [PATCH 0/3] reftable: graceful concurrent writes Message-ID: <cover.1726578382.git.ps@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline
Series	reftable: graceful concurrent writes \| expand [0/3] reftable: graceful concurrent writes [1/3] refs/reftable: introduce "reftable.lockTimeout" [2/3] reftable/stack: allow locking of outdated stacks [3/3] refs/reftable: reload locked stack when preparing transaction

Message ID

cover.1726578382.git.ps@pks.im (mailing list archive)

Headers

Feedback-ID: i197146af:Fastmail
Date: Tue, 17 Sep 2024 15:43:17 +0200
From: Patrick Steinhardt <ps@pks.im>
To: git@vger.kernel.org
Subject: [PATCH 0/3] reftable: graceful concurrent writes
Message-ID: <cover.1726578382.git.ps@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline

Series

reftable: graceful concurrent writes | expand

Message

Patrick Steinhardt Sept. 17, 2024, 1:43 p.m. UTC

Hi,

the reftable backend cannot properly handle concurrent writes due to two
reasons:

  - It will bail out immediately when it sees a locked "tables.list"
    file. This is being addressed by introducing a configurable locking
    timeout, similar to how we have it for both loose and packed refs.

  - It will bail out when it notices that its stack is out-of-date after
    having locked the "tables.list" file. This is addressed by reloading
    the stack as requested after locking it, which is fine because our
    transactional API would verify queued ref updates against their
    expected state after the lock was acquired anyway.

So with this patch series we can now spawn concurrent writers and they
are expected to succeed, which is demonstrated by the test added by the
last patch.

Thanks!

Patrick

Patrick Steinhardt (3):
  refs/reftable: introduce "reftable.lockTimeout"
  reftable/stack: allow locking of outdated stacks
  refs/reftable: reload locked stack when preparing transaction

 Documentation/config/reftable.txt |  7 ++++
 refs/reftable-backend.c           |  9 ++++-
 reftable/reftable-stack.h         | 13 +++++-
 reftable/reftable-writer.h        |  8 ++++
 reftable/stack.c                  | 38 ++++++++++++------
 t/t0610-reftable-basics.sh        | 58 ++++++++++++++++++++++++++
 t/unit-tests/t-reftable-stack.c   | 67 ++++++++++++++++++++++++++++++-
 7 files changed, 181 insertions(+), 19 deletions(-)

Comments

Karthik Nayak Sept. 17, 2024, 6:26 p.m. UTC | #1

Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> the reftable backend cannot properly handle concurrent writes due to two
> reasons:
>
>   - It will bail out immediately when it sees a locked "tables.list"
>     file. This is being addressed by introducing a configurable locking
>     timeout, similar to how we have it for both loose and packed refs.
>
>   - It will bail out when it notices that its stack is out-of-date after
>     having locked the "tables.list" file. This is addressed by reloading
>     the stack as requested after locking it, which is fine because our
>     transactional API would verify queued ref updates against their
>     expected state after the lock was acquired anyway.
>
> So with this patch series we can now spawn concurrent writers and they
> are expected to succeed, which is demonstrated by the test added by the
> last patch.
>

I only had a comment in the first commit. The rest two look good to me.
I do wonder if we really need a flag? But that's just a nit.

Thanks

Patrick Steinhardt Sept. 18, 2024, 4:31 a.m. UTC | #2

On Tue, Sep 17, 2024 at 11:26:58AM -0700, karthik nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > Hi,
> >
> > the reftable backend cannot properly handle concurrent writes due to two
> > reasons:
> >
> >   - It will bail out immediately when it sees a locked "tables.list"
> >     file. This is being addressed by introducing a configurable locking
> >     timeout, similar to how we have it for both loose and packed refs.
> >
> >   - It will bail out when it notices that its stack is out-of-date after
> >     having locked the "tables.list" file. This is addressed by reloading
> >     the stack as requested after locking it, which is fine because our
> >     transactional API would verify queued ref updates against their
> >     expected state after the lock was acquired anyway.
> >
> > So with this patch series we can now spawn concurrent writers and they
> > are expected to succeed, which is demonstrated by the test added by the
> > last patch.
> >
> 
> I only had a comment in the first commit. The rest two look good to me.
> I do wonder if we really need a flag? But that's just a nit.

I didn't carefully vet all locations where we could pass that flag, so
right now we only pass it in a single location. Also, we need to keep in
mind that this is library code: even if we had converted all callsites
to pass the new flag I still it's a sensibe thing to wish for to disable
the automatic refresh and abort instead.

Thanks for your review!

Patrick