[v3,0/9] refs/reftable: reuse iterators when reading refs

Message ID	20241125-pks-reftable-backend-reuse-iter-v3-0-1d7b658e3e9e@pks.im (mailing list archive)
Headers	show Received: from fhigh-b4-smtp.messagingengine.com (fhigh-b4-smtp.messagingengine.com [202.12.124.155]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5114F2119 for <git@vger.kernel.org>; Mon, 25 Nov 2024 07:38:40 +0000 (UTC) Feedback-ID: i197146af:Fastmail From: Patrick Steinhardt <ps@pks.im> Subject: [PATCH v3 0/9] refs/reftable: reuse iterators when reading refs Date: Mon, 25 Nov 2024 08:38:22 +0100 Message-Id: <20241125-pks-reftable-backend-reuse-iter-v3-0-1d7b658e3e9e@pks.im> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit In-Reply-To: <cover.1730732881.git.ps@pks.im> References: <cover.1730732881.git.ps@pks.im> To: git@vger.kernel.org Cc: karthik nayak <karthik.188@gmail.com>, Junio C Hamano <gitster@pobox.com>
Series	refs/reftable: reuse iterators when reading refs \| expand [v3,0/9] refs/reftable: reuse iterators when reading refs [v3,1/9] refs/reftable: encapsulate reftable stack [v3,2/9] refs/reftable: handle reloading stacks in the reftable backend [v3,3/9] reftable/stack: add accessor for the hash ID [v3,4/9] refs/reftable: read references via `struct reftable_backend` [v3,5/9] refs/reftable: refactor reading symbolic refs to use reftable backend [v3,6/9] refs/reftable: refactor reflog expiry to use reftable backend [v3,7/9] reftable/stack: add mechanism to notify callers on reload [v3,8/9] reftable/merged: drain priority queue on reseek [v3,9/9] refs/reftable: reuse iterators when reading refs

Message ID

20241125-pks-reftable-backend-reuse-iter-v3-0-1d7b658e3e9e@pks.im (mailing list archive)

Headers

Feedback-ID: i197146af:Fastmail
From: Patrick Steinhardt <ps@pks.im>
Subject: [PATCH v3 0/9] refs/reftable: reuse iterators when reading refs
Date: Mon, 25 Nov 2024 08:38:22 +0100
Message-Id: 
 <20241125-pks-reftable-backend-reuse-iter-v3-0-1d7b658e3e9e@pks.im>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 7bit
In-Reply-To: <cover.1730732881.git.ps@pks.im>
References: <cover.1730732881.git.ps@pks.im>
To: git@vger.kernel.org
Cc: karthik nayak <karthik.188@gmail.com>,
 Junio C Hamano <gitster@pobox.com>

Series

refs/reftable: reuse iterators when reading refs | expand

Message

Patrick Steinhardt Nov. 25, 2024, 7:38 a.m. UTC

Hi,

this is the second version of my patch series that refactors the
reftable backend to reuse iterators when reading random references. This
removes the overhead of having to recreate the iterator on every read
and thus leads to better performance and less allocation churn.

Changes in v3:

  - Adapt some comments to refer to the "backend" instead of to the
    "stack".
  - Fix indentation of a statement while at it.
  - Explain why callsites don't want to reload the stack.
  - Optimize `prepare_transaction_update()` by not using `backend_for()`
    twice, but instead reload the stack manually.
  - Split out the change that adds `reftable_stack_hash_id()` into a
    separate commit.
  - Link to v2: https://lore.kernel.org/r/cover.1730792627.git.ps@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (9):
      refs/reftable: encapsulate reftable stack
      refs/reftable: handle reloading stacks in the reftable backend
      reftable/stack: add accessor for the hash ID
      refs/reftable: read references via `struct reftable_backend`
      refs/reftable: refactor reading symbolic refs to use reftable backend
      refs/reftable: refactor reflog expiry to use reftable backend
      reftable/stack: add mechanism to notify callers on reload
      reftable/merged: drain priority queue on reseek
      refs/reftable: reuse iterators when reading refs

 refs/reftable-backend.c          | 409 +++++++++++++++++++++++++--------------
 reftable/merged.c                |   2 +
 reftable/reftable-stack.h        |   3 +
 reftable/reftable-writer.h       |   9 +
 reftable/stack.c                 |   9 +
 t/unit-tests/t-reftable-merged.c |  73 +++++++
 6 files changed, 357 insertions(+), 148 deletions(-)

Range-diff versus v2:

 1:  9854214fe9 !  1:  21071ae5a5 refs/reftable: encapsulate reftable stack
    @@ refs/reftable-backend.c
      struct reftable_ref_store {
      	struct ref_store base;
      
    -@@ refs/reftable-backend.c: struct reftable_ref_store {
    - 	 * The main stack refers to the common dir and thus contains common
    + 	/*
    +-	 * The main stack refers to the common dir and thus contains common
    ++	 * The main backend refers to the common dir and thus contains common
      	 * refs as well as refs of the main repository.
      	 */
     -	struct reftable_stack *main_stack;
     +	struct reftable_backend main_backend;
      	/*
    - 	 * The worktree stack refers to the gitdir in case the refdb is opened
    +-	 * The worktree stack refers to the gitdir in case the refdb is opened
    ++	 * The worktree backend refers to the gitdir in case the refdb is opened
      	 * via a worktree. It thus contains the per-worktree refs.
      	 */
     -	struct reftable_stack *worktree_stack;
     +	struct reftable_backend worktree_backend;
      	/*
    - 	 * Map of worktree stacks by their respective worktree names. The map
    +-	 * Map of worktree stacks by their respective worktree names. The map
    ++	 * Map of worktree backends by their respective worktree names. The map
      	 * is populated lazily when we try to resolve `worktrees/$worktree` refs.
      	 */
     -	struct strmap worktree_stacks;
    @@ refs/reftable-backend.c: static struct ref_iterator *reftable_be_iterator_begin(
      	 * iterator, only.
      	 */
     -	 if (!refs->worktree_stack)
    -+	 if (!refs->worktree_backend.stack)
    ++	if (!refs->worktree_backend.stack)
      		return &main_iter->base;
      
      	/*
 2:  18265dfafc !  2:  a9588125c7 refs/reftable: handle reloading stacks in the reftable backend
    @@ Commit message
         But second this makes the logic to access stacks more self-contained by
         letting the `struct reftable_backend` manage themselves.
     
    +    Update callsites where we don't reload the stack to document why we
    +    don't. In some cases it's unclear whether it is the right thing to do in
    +    the first place, but fixing that is outside of the scope of this patch
    +    series.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## refs/reftable-backend.c ##
    @@ refs/reftable-backend.c: static int prepare_transaction_update(struct write_tran
      	size_t i;
      	int ret;
      
    ++	/*
    ++	 * This function gets called in a loop, and we don't want to repeatedly
    ++	 * reload the stack for every single ref update. Instead, we manually
    ++	 * reload further down in the case where we haven't yet prepared the
    ++	 * specific `reftable_backend`.
    ++	 */
     +	ret = backend_for(&be, refs, update->refname, NULL, 0);
     +	if (ret)
     +		return ret;
    @@ refs/reftable-backend.c: static int prepare_transaction_update(struct write_tran
      		struct reftable_addition *addition;
      
     -		ret = reftable_stack_reload(stack);
    -+		ret = backend_for(&be, refs, update->refname, NULL, 1);
    ++		ret = reftable_stack_reload(be->stack);
      		if (ret)
      			return ret;
      
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      	}
      
     -	ret = read_ref_without_reload(refs, backend_for(refs, "HEAD", NULL)->stack, "HEAD",
    ++	/*
    ++	 * TODO: it's dubious whether we should reload the stack that "HEAD"
    ++	 * belongs to or not. In theory, it may happen that we only modify
    ++	 * stacks which are _not_ part of the "HEAD" stack. In that case we
    ++	 * wouldn't have prepared any transaction for its stack and would not
    ++	 * have reloaded it, which may mean that it is stale.
    ++	 *
    ++	 * On the other hand, reloading that stack without locking it feels
    ++	 * wrong to, as the value of "HEAD" could be modified concurrently at
    ++	 * any point in time.
    ++	 */
     +	ret = backend_for(&be, refs, "HEAD", NULL, 0);
     +	if (ret)
     +		goto done;
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      		const char *rewritten_ref;
      
     -		stack = backend_for(refs, u->refname, &rewritten_ref)->stack;
    ++		/*
    ++		 * There is no need to reload the respective backends here as
    ++		 * we have already reloaded them when preparing the transaction
    ++		 * update. And given that the stacks have been locked there
    ++		 * shouldn't have been any concurrent modifications of the
    ++		 * stack.
    ++		 */
     +		ret = backend_for(&be, refs, u->refname, &rewritten_ref, 0);
     +		if (ret)
     +			goto done;
    @@ refs/reftable-backend.c: static int reftable_be_for_each_reflog_ent_reverse(stru
      		return refs->err;
      
     -	ret = reftable_stack_init_log_iterator(stack, &it);
    ++	/*
    ++	 * TODO: we should adapt this callsite to reload the stack. There is no
    ++	 * obvious reason why we shouldn't.
    ++	 */
     +	ret = backend_for(&be, refs, refname, &refname, 0);
     +	if (ret)
     +		goto done;
    @@ refs/reftable-backend.c: static int reftable_be_for_each_reflog_ent(struct ref_s
      		return refs->err;
      
     -	ret = reftable_stack_init_log_iterator(stack, &it);
    ++	/*
    ++	 * TODO: we should adapt this callsite to reload the stack. There is no
    ++	 * obvious reason why we shouldn't.
    ++	 */
     +	ret = backend_for(&be, refs, refname, &refname, 0);
     +	if (ret)
     +		goto done;
    @@ refs/reftable-backend.c: static int reftable_be_reflog_expire(struct ref_store *
      	arg.records = rewritten;
      	arg.len = logs_nr;
     -	arg.stack = stack,
    -+	arg.stack = be->stack,
    - 	arg.refname = refname,
    +-	arg.refname = refname,
    ++	arg.stack = be->stack;
    ++	arg.refname = refname;
      
      	ret = reftable_addition_add(add, &write_reflog_expiry_table, &arg);
    + 	if (ret < 0)
 -:  ---------- >  3:  76f7ff40d2 reftable/stack: add accessor for the hash ID
 3:  c33093e73a !  4:  19da5f587c refs/reftable: read references via `struct reftable_backend`
    @@ refs/reftable-backend.c: static int reftable_be_copy_ref(struct ref_store *ref_s
      
      done:
      	assert(ret != REFTABLE_API_ERROR);
    -
    - ## reftable/reftable-stack.h ##
    -@@ reftable/reftable-stack.h: struct reftable_compaction_stats {
    - struct reftable_compaction_stats *
    - reftable_stack_compaction_stats(struct reftable_stack *st);
    - 
    -+/* Return the hash of the stack. */
    -+enum reftable_hash reftable_stack_hash_id(struct reftable_stack *st);
    -+
    - #endif
    -
    - ## reftable/stack.c ##
    -@@ reftable/stack.c: int reftable_stack_clean(struct reftable_stack *st)
    - 	reftable_addition_destroy(add);
    - 	return err;
    - }
    -+
    -+enum reftable_hash reftable_stack_hash_id(struct reftable_stack *st)
    -+{
    -+	return reftable_merged_table_hash_id(st->merged);
    -+}
 4:  8489e32d87 =  5:  ff4f02dda7 refs/reftable: refactor reading symbolic refs to use reftable backend
 5:  b1afd63785 =  6:  ed8963a520 refs/reftable: refactor reflog expiry to use reftable backend
 6:  1e754ccde8 =  7:  5b91ca48e3 reftable/stack: add mechanism to notify callers on reload
 7:  6755cf9ec9 =  8:  613f794fe6 reftable/merged: drain priority queue on reseek
 8:  e3b29b2035 =  9:  a44911e4a4 refs/reftable: reuse iterators when reading refs

---
base-commit: 455ddbf8c6a694968c1089fb6c7ffb1d31d97e9d
change-id: 20241125-pks-reftable-backend-reuse-iter-3a2e92428789

Comments

Christian Couder Nov. 25, 2024, 9:47 a.m. UTC | #1

Hi,

On Mon, Nov 25, 2024 at 8:38 AM Patrick Steinhardt <ps@pks.im> wrote:
>
> Hi,
>
> this is the second version of my patch series that refactors the

Do you mean that despite the "v3" mark, it's only the second version
because "v2" was just rebasing "v1" on top of a better base?

> reftable backend to reuse iterators when reading random references. This
> removes the overhead of having to recreate the iterator on every read
> and thus leads to better performance and less allocation churn.
>
> Changes in v3:
>
>   - Adapt some comments to refer to the "backend" instead of to the
>     "stack".
>   - Fix indentation of a statement while at it.
>   - Explain why callsites don't want to reload the stack.
>   - Optimize `prepare_transaction_update()` by not using `backend_for()`
>     twice, but instead reload the stack manually.
>   - Split out the change that adds `reftable_stack_hash_id()` into a
>     separate commit.
>   - Link to v2: https://lore.kernel.org/r/cover.1730792627.git.ps@pks.im

Thanks!

Patrick Steinhardt Nov. 25, 2024, 9:52 a.m. UTC | #2

On Mon, Nov 25, 2024 at 10:47:57AM +0100, Christian Couder wrote:
> Hi,
> 
> On Mon, Nov 25, 2024 at 8:38 AM Patrick Steinhardt <ps@pks.im> wrote:
> >
> > Hi,
> >
> > this is the second version of my patch series that refactors the
> 
> Do you mean that despite the "v3" mark, it's only the second version
> because "v2" was just rebasing "v1" on top of a better base?

Nah, I simply forgot to update this introduction :)

Patrick