mbox series

[v4,00/10] refs/reftable: reuse iterators when reading refs

Message ID 20241126-pks-reftable-backend-reuse-iter-v4-0-b17fd27df126@pks.im (mailing list archive)
Headers show
Series refs/reftable: reuse iterators when reading refs | expand

Message

Patrick Steinhardt Nov. 26, 2024, 6:42 a.m. UTC
Hi,

this patch series refactors the reftable backend to reuse reftable
iterators when reading random references. This removes the overhead of
having to recreate the iterator on every read and thus leads to better
performance and less allocation churn. It also gives us the ability to
further optimize reads by optimizing re-seeking iterators in the future.

Overall this leads to a 7% speedup when creating many refs in a
transaction, which performs many random reads. But this change also
positively impacts other usecases.

Changes in v3:

  - Adapt some comments to refer to the "backend" instead of to the
    "stack".
  - Fix indentation of a statement while at it.
  - Explain why callsites don't want to reload the stack.
  - Optimize `prepare_transaction_update()` by not using `backend_for()`
    twice, but instead reload the stack manually.
  - Split out the change that adds `reftable_stack_hash_id()` into a
    separate commit.
  - Link to v2: https://lore.kernel.org/r/cover.1730792627.git.ps@pks.im

Changes in v4:

  - Split up the introduction of `reftable_backend_read_ref()` into two
    commits: one that gets rid of the `struct reftable_ref_store`
    parameter and one that converts the function to accept a `struct
    reftable_backend`.
  - Fix a comment typo.
  - Link to v3: https://lore.kernel.org/r/20241125-pks-reftable-backend-reuse-iter-v3-0-1d7b658e3e9e@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (10):
      refs/reftable: encapsulate reftable stack
      refs/reftable: handle reloading stacks in the reftable backend
      reftable/stack: add accessor for the hash ID
      refs/reftable: figure out hash via `reftable_stack`
      refs/reftable: read references via `struct reftable_backend`
      refs/reftable: refactor reading symbolic refs to use reftable backend
      refs/reftable: refactor reflog expiry to use reftable backend
      reftable/stack: add mechanism to notify callers on reload
      reftable/merged: drain priority queue on reseek
      refs/reftable: reuse iterators when reading refs

 refs/reftable-backend.c          | 409 +++++++++++++++++++++++++--------------
 reftable/merged.c                |   2 +
 reftable/reftable-stack.h        |   3 +
 reftable/reftable-writer.h       |   9 +
 reftable/stack.c                 |   9 +
 t/unit-tests/t-reftable-merged.c |  73 +++++++
 6 files changed, 357 insertions(+), 148 deletions(-)

Range-diff versus v3:

 1:  3db0ba3eb5 =  1:  ec0b7e35c8 refs/reftable: encapsulate reftable stack
 2:  556eb8301c !  2:  437304908b refs/reftable: handle reloading stacks in the reftable backend
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
     +	 * have reloaded it, which may mean that it is stale.
     +	 *
     +	 * On the other hand, reloading that stack without locking it feels
    -+	 * wrong to, as the value of "HEAD" could be modified concurrently at
    ++	 * wrong, too, as the value of "HEAD" could be modified concurrently at
     +	 * any point in time.
     +	 */
     +	ret = backend_for(&be, refs, "HEAD", NULL, 0);
 3:  e966b9acf2 =  3:  0bd9aea884 reftable/stack: add accessor for the hash ID
 -:  ---------- >  4:  d8b801503f refs/reftable: figure out hash via `reftable_stack`
 4:  de9f52a5b2 !  5:  a57210ac46 refs/reftable: read references via `struct reftable_backend`
    @@ Metadata
      ## Commit message ##
         refs/reftable: read references via `struct reftable_backend`
     
    -    Refactor `read_ref_without_reload()` to accept a `struct reftable_stack`
    -    as input instead of accepting a `struct reftable_stack`. This allows us
    -    to implement an additional caching layer when reading refs where we can
    -    reuse reftable iterators.
    +    Refactor `read_ref_without_reload()` to accept `struct reftable_backend`
    +    as parameter instead of `struct reftable_stack`. Rename the function to
    +    `reftable_backend_read_ref()` to clarify its scope and move it close to
    +    other functions operating on `struct reftable_backend`.
    +
    +    This change allows us to implement an additional caching layer when
    +    reading refs where we can reuse reftable iterators.
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    @@ refs/reftable-backend.c: static void fill_reftable_log_record(struct reftable_lo
      	log->value.update.tz_offset = sign * atoi(tz_begin);
      }
      
    --static int read_ref_without_reload(struct reftable_ref_store *refs,
    --				   struct reftable_stack *stack,
    +-static int read_ref_without_reload(struct reftable_stack *stack,
     -				   const char *refname,
     -				   struct object_id *oid,
     -				   struct strbuf *referent,
    @@ refs/reftable-backend.c: static void fill_reftable_log_record(struct reftable_lo
     -		strbuf_addstr(referent, ref.value.symref);
     -		*type |= REF_ISSYMREF;
     -	} else if (reftable_ref_record_val1(&ref)) {
    +-		unsigned int hash_id;
    +-
    +-		switch (reftable_stack_hash_id(stack)) {
    +-		case REFTABLE_HASH_SHA1:
    +-			hash_id = GIT_HASH_SHA1;
    +-			break;
    +-		case REFTABLE_HASH_SHA256:
    +-			hash_id = GIT_HASH_SHA256;
    +-			break;
    +-		default:
    +-			BUG("unhandled hash ID %d", reftable_stack_hash_id(stack));
    +-		}
    +-
     -		oidread(oid, reftable_ref_record_val1(&ref),
    --			refs->base.repo->hash_algo);
    +-			&hash_algos[hash_id]);
     -	} else {
     -		/* We got a tombstone, which should not happen. */
     -		BUG("unhandled reference value type %d", ref.value_type);
    @@ refs/reftable-backend.c: static int reftable_be_read_raw_ref(struct ref_store *r
      	if (ret)
      		return ret;
      
    --	ret = read_ref_without_reload(refs, be->stack, refname, oid, referent, type);
    +-	ret = read_ref_without_reload(be->stack, refname, oid, referent, type);
     +	ret = reftable_backend_read_ref(be, refname, oid, referent, type);
      	if (ret < 0)
      		return ret;
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      	if (ret)
      		goto done;
      
    --	ret = read_ref_without_reload(refs, be->stack, "HEAD",
    +-	ret = read_ref_without_reload(be->stack, "HEAD",
     -				      &head_oid, &head_referent, &head_type);
     +	ret = reftable_backend_read_ref(be, "HEAD", &head_oid,
     +					&head_referent, &head_type);
    @@ refs/reftable-backend.c: static int reftable_be_transaction_prepare(struct ref_s
      			string_list_insert(&affected_refnames, new_update->refname);
      		}
      
    --		ret = read_ref_without_reload(refs, be->stack, rewritten_ref,
    +-		ret = read_ref_without_reload(be->stack, rewritten_ref,
     -					      &current_oid, &referent, &u->type);
     +		ret = reftable_backend_read_ref(be, rewritten_ref,
     +						&current_oid, &referent, &u->type);
    @@ refs/reftable-backend.c: static int write_copy_table(struct reftable_writer *wri
      		memcpy(logs[logs_nr].value.update.old_hash, old_ref.value.val1, GIT_MAX_RAWSZ);
      		logs_nr++;
      
    --		ret = read_ref_without_reload(arg->refs, arg->stack, "HEAD", &head_oid,
    +-		ret = read_ref_without_reload(arg->stack, "HEAD", &head_oid,
     -					      &head_referent, &head_type);
     +		ret = reftable_backend_read_ref(arg->be, "HEAD", &head_oid,
     +						&head_referent, &head_type);
 5:  6e7db8e3fc =  6:  432f75ef01 refs/reftable: refactor reading symbolic refs to use reftable backend
 6:  687afb781a =  7:  1ca28e00b1 refs/reftable: refactor reflog expiry to use reftable backend
 7:  e1d4bdc3e8 =  8:  a95b67f1f3 reftable/stack: add mechanism to notify callers on reload
 8:  8eccf308bd =  9:  cbaf95ff15 reftable/merged: drain priority queue on reseek
 9:  942957447a = 10:  013f05e08b refs/reftable: reuse iterators when reading refs

---
base-commit: 455ddbf8c6a694968c1089fb6c7ffb1d31d97e9d
change-id: 20241125-pks-reftable-backend-reuse-iter-3a2e92428789