mbox series

[v3,00/10] refs: optimize ref format migrations

Message ID 20241125-pks-refs-optimize-migrations-v3-0-17bc85e33ad7@pks.im (mailing list archive)
Headers show
Series refs: optimize ref format migrations | expand

Message

Patrick Steinhardt Nov. 25, 2024, 6:27 a.m. UTC
Hi,

I have recently learned that ref format migrations can take a
significant amount of time in the order of minutes when migrating
millions of refs. This is probably not entirely surprising: the initial
focus for the logic to migrate ref backends was mostly focussed on
getting the basic feature working, and I didn't yet invest any time into
optimizing the code path at all. But I was still mildly surprised that
the migration of a couple million refs was taking minutes to finish.

This patch series thus optimizes how we migrate ref formats. This is
mostly done by expanding upon the "initial transaction" semantics that
we already use for git-clone(1). These semantics allow us to assume that
the ref backend is completely empty and that there are no concurrent
writers, and thus we are free to perform certain optimizations that
wouldn't have otherwise been possible. On the one hand this allows us to
drop needless collision checks. On the other hand, it also allows us to
write regular refs directly into the "packed-refs" file when migrating
from the "reftable" backend to the "files" backend.

This leads to some significant speedups. Migrating 1 million refs from
"files" to "reftable":

    Benchmark 1: migrate files:reftable (refcount = 1000000, revision = origin/master)
      Time (mean ± σ):      4.580 s ±  0.062 s    [User: 1.818 s, System: 2.746 s]
      Range (min … max):    4.534 s …  4.743 s    10 runs

    Benchmark 2: migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations)
      Time (mean ± σ):     767.7 ms ±   9.5 ms    [User: 629.2 ms, System: 126.1 ms]
      Range (min … max):   755.8 ms … 786.9 ms    10 runs

    Summary
      migrate files:reftable (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
        5.97 ± 0.11 times faster than migrate files:reftable (refcount = 1000000, revision = origin/master)

And migrating from "reftable" to "files:

    Benchmark 1: migrate reftable:files (refcount = 1000000, revision = origin/master)
      Time (mean ± σ):     35.409 s ±  0.302 s    [User: 5.061 s, System: 29.244 s]
      Range (min … max):   35.055 s … 35.898 s    10 runs

    Benchmark 2: migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations)
      Time (mean ± σ):     855.9 ms ±  61.5 ms    [User: 646.7 ms, System: 187.1 ms]
      Range (min … max):   830.0 ms … 1030.3 ms    10 runs

    Summary
      migrate reftable:files (refcount = 1000000, revision = pks-refs-optimize-migrations) ran
       41.37 ± 2.99 times faster than migrate reftable:files (refcount = 1000000, revision = origin/master)

Changes in v2:

  - Mention in the first commit message that the introduced flag will be
    used in a subsequent patch.
  - Link to v1: https://lore.kernel.org/r/20241108-pks-refs-optimize-migrations-v1-0-7fd37fa80e35@pks.im

Changes in v3:

  - Mention that we store the ref transaction flag for access by the
    backend in the first commit message.
  - Fix a missing word in another commit message.
  - Use `unsigned int` to pass `initial_transaction` flag.
  - Rename the scratch buffers and provide a comment for why they exist.
  - Link to v2: https://lore.kernel.org/r/20241120-pks-refs-optimize-migrations-v2-0-a233374b7452@pks.im

Thanks!

Patrick

---
Patrick Steinhardt (10):
      refs: allow passing flags when setting up a transaction
      refs/files: move logic to commit initial transaction
      refs: introduce "initial" transaction flag
      refs/files: support symbolic and root refs in initial transaction
      refs: use "initial" transaction semantics to migrate refs
      refs: skip collision checks in initial transactions
      refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
      reftable/writer: optimize allocations by using a scratch buffer
      reftable/block: rename `block_writer::buf` variable
      reftable/block: optimize allocations by using scratch buffer

 branch.c                  |   2 +-
 builtin/clone.c           |   4 +-
 builtin/fast-import.c     |   4 +-
 builtin/fetch.c           |   4 +-
 builtin/receive-pack.c    |   4 +-
 builtin/replace.c         |   2 +-
 builtin/tag.c             |   2 +-
 builtin/update-ref.c      |   4 +-
 refs.c                    |  70 ++++++-------
 refs.h                    |  45 +++++----
 refs/debug.c              |  13 ---
 refs/files-backend.c      | 244 +++++++++++++++++++++++++---------------------
 refs/packed-backend.c     |   8 --
 refs/refs-internal.h      |   2 +-
 refs/reftable-backend.c   |  14 +--
 reftable/block.c          |  33 +++----
 reftable/block.h          |  10 +-
 reftable/writer.c         |  23 +++--
 reftable/writer.h         |   2 +
 sequencer.c               |   6 +-
 t/helper/test-ref-store.c |   2 +-
 t/t1460-refs-migrate.sh   |   2 +-
 walker.c                  |   2 +-
 23 files changed, 249 insertions(+), 253 deletions(-)

Range-diff versus v2:

 1:  74d23ad64d !  1:  64ebf2d3c6 refs: allow passing flags when setting up a transaction
    @@ Metadata
      ## Commit message ##
         refs: allow passing flags when setting up a transaction
     
    -    Allow passing flags when setting up a transaction such that the
    -    behaviour of the transaction itself can be altered. This functionality
    -    will be used in a subsequent patch.
    +    Allow passing flags when creating a new transaction. These flagas are
    +    stored in the `struct ref_transaction` and can be used by the respective
    +    backends to alter their behaviour depending on the flag's value. This
    +    functionality will be used in a subsequent patch.
     
         Adapt callers accordingly.
     
 2:  b7d50b9270 =  2:  616a5d66de refs/files: move logic to commit initial transaction
 3:  2a0fff3748 =  3:  8d4b867641 refs: introduce "initial" transaction flag
 4:  3904f337f6 =  4:  70961f6295 refs/files: support symbolic and root refs in initial transaction
 5:  131161d467 =  5:  ccae50e766 refs: use "initial" transaction semantics to migrate refs
 6:  a443916bbd !  6:  ec5de7e1a2 refs: skip collision checks in initial transactions
    @@ Commit message
         well as the creation of a ref iterator for every ref we're checking,
         which adds up quite fast when performing the check for many refs.
     
    -    Introduce a new flag that allows us to skip this check and wire it up in
    -    such that the backends pass it when running an initial transaction. This
    +    Introduce a new flag that allows us to skip this check and wire it up so
    +    that the backends pass it when running an initial transaction. This
         leads to significant speedups when migrating ref storage backends. From
         "files" to "reftable":
     
    @@ refs.c: int refs_verify_refname_available(struct ref_store *refs,
      				  const char *refname,
      				  const struct string_list *extras,
      				  const struct string_list *skip,
    -+				  int initial_transaction,
    ++				  unsigned int initial_transaction,
      				  struct strbuf *err)
      {
      	const char *slash;
    @@ refs.h: int refs_read_symbolic_ref(struct ref_store *ref_store, const char *refn
      				  const char *refname,
      				  const struct string_list *extras,
      				  const struct string_list *skip,
    -+				  int initial_transaction,
    ++				  unsigned int initial_transaction,
      				  struct strbuf *err);
      
      int refs_ref_exists(struct ref_store *refs, const char *refname);
 7:  4bd95e5661 =  7:  f60f692ea0 refs: don't normalize log messages with `REF_SKIP_CREATE_REFLOG`
 8:  de1573c1f7 !  8:  e8881d67db reftable/writer: optimize allocations by using a scratch buffer
    @@ reftable/writer.c: int reftable_writer_new(struct reftable_writer **out,
      
      	reftable_buf_init(&wp->block_writer_data.last_key);
      	reftable_buf_init(&wp->last_key);
    -+	reftable_buf_init(&wp->buf);
    ++	reftable_buf_init(&wp->scratch);
      	REFTABLE_CALLOC_ARRAY(wp->block, opts.block_size);
      	if (!wp->block) {
      		reftable_free(wp);
    @@ reftable/writer.c: static void writer_release(struct reftable_writer *w)
      		w->block_writer = NULL;
      		writer_clear_index(w);
      		reftable_buf_release(&w->last_key);
    -+		reftable_buf_release(&w->buf);
    ++		reftable_buf_release(&w->scratch);
      	}
      }
      
    @@ reftable/writer.c: static int writer_index_hash(struct reftable_writer *w, struc
      	int err;
      
     -	err = reftable_record_key(rec, &key);
    -+	err = reftable_record_key(rec, &w->buf);
    ++	err = reftable_record_key(rec, &w->scratch);
      	if (err < 0)
      		goto done;
      
     -	if (reftable_buf_cmp(&w->last_key, &key) >= 0) {
    -+	if (reftable_buf_cmp(&w->last_key, &w->buf) >= 0) {
    ++	if (reftable_buf_cmp(&w->last_key, &w->scratch) >= 0) {
      		err = REFTABLE_API_ERROR;
      		goto done;
      	}
      
      	reftable_buf_reset(&w->last_key);
     -	err = reftable_buf_add(&w->last_key, key.buf, key.len);
    -+	err = reftable_buf_add(&w->last_key, w->buf.buf, w->buf.len);
    ++	err = reftable_buf_add(&w->last_key, w->scratch.buf, w->scratch.len);
      	if (err < 0)
      		goto done;
      
    @@ reftable/writer.c: int reftable_writer_add_ref(struct reftable_writer *w,
      
      	if (!w->opts.skip_index_objects && reftable_ref_record_val1(ref)) {
     -		err = reftable_buf_add(&buf, (char *)reftable_ref_record_val1(ref),
    -+		reftable_buf_reset(&w->buf);
    -+		err = reftable_buf_add(&w->buf, (char *)reftable_ref_record_val1(ref),
    ++		reftable_buf_reset(&w->scratch);
    ++		err = reftable_buf_add(&w->scratch, (char *)reftable_ref_record_val1(ref),
      				       hash_size(w->opts.hash_id));
      		if (err < 0)
      			goto out;
      
     -		err = writer_index_hash(w, &buf);
    -+		err = writer_index_hash(w, &w->buf);
    ++		err = writer_index_hash(w, &w->scratch);
      		if (err < 0)
      			goto out;
      	}
    @@ reftable/writer.c: int reftable_writer_add_ref(struct reftable_writer *w,
      	if (!w->opts.skip_index_objects && reftable_ref_record_val2(ref)) {
     -		reftable_buf_reset(&buf);
     -		err = reftable_buf_add(&buf, reftable_ref_record_val2(ref),
    -+		reftable_buf_reset(&w->buf);
    -+		err = reftable_buf_add(&w->buf, reftable_ref_record_val2(ref),
    ++		reftable_buf_reset(&w->scratch);
    ++		err = reftable_buf_add(&w->scratch, reftable_ref_record_val2(ref),
      				       hash_size(w->opts.hash_id));
      		if (err < 0)
      			goto out;
      
     -		err = writer_index_hash(w, &buf);
    -+		err = writer_index_hash(w, &w->buf);
    ++		err = writer_index_hash(w, &w->scratch);
      		if (err < 0)
      			goto out;
      	}
    @@ reftable/writer.h: struct reftable_writer {
      	void *write_arg;
      	int pending_padding;
      	struct reftable_buf last_key;
    -+	struct reftable_buf buf;
    ++	/* Scratch buffer used to avoid allocations. */
    ++	struct reftable_buf scratch;
      
      	/* offset of next block to write. */
      	uint64_t next;
 9:  903fb084d5 =  9:  1127729311 reftable/block: rename `block_writer::buf` variable
10:  27692b8a00 ! 10:  7b8ddad673 reftable/block: optimize allocations by using scratch buffer
    @@ reftable/block.c: int block_writer_add(struct block_writer *w, struct reftable_r
      	int err;
      
     -	err = reftable_record_key(rec, &key);
    -+	err = reftable_record_key(rec, &w->buf);
    ++	err = reftable_record_key(rec, &w->scratch);
      	if (err < 0)
      		goto done;
      
     -	if (!key.len) {
    -+	if (!w->buf.len) {
    ++	if (!w->scratch.len) {
      		err = REFTABLE_API_ERROR;
      		goto done;
      	}
      
     -	n = reftable_encode_key(&is_restart, out, last, key,
    -+	n = reftable_encode_key(&is_restart, out, last, w->buf,
    ++	n = reftable_encode_key(&is_restart, out, last, w->scratch,
      				reftable_record_val_type(rec));
      	if (n < 0) {
      		err = -1;
    @@ reftable/block.c: int block_writer_add(struct block_writer *w, struct reftable_r
      
      	err = block_writer_register_restart(w, start.len - out.len, is_restart,
     -					    &key);
    -+					    &w->buf);
    ++					    &w->scratch);
      done:
     -	reftable_buf_release(&key);
      	return err;
    @@ reftable/block.c: void block_writer_release(struct block_writer *bw)
      	REFTABLE_FREE_AND_NULL(bw->zstream);
      	REFTABLE_FREE_AND_NULL(bw->restarts);
      	REFTABLE_FREE_AND_NULL(bw->compressed);
    -+	reftable_buf_release(&bw->buf);
    ++	reftable_buf_release(&bw->scratch);
      	reftable_buf_release(&bw->last_key);
      	/* the block is not owned. */
      }
    @@ reftable/block.h: struct block_writer {
      	uint32_t restart_cap;
      
      	struct reftable_buf last_key;
    -+	struct reftable_buf buf;
    ++	/* Scratch buffer used to avoid allocations. */
    ++	struct reftable_buf scratch;
      	int entries;
      };
      

---
base-commit: facbe4f633e4ad31e641f64617bc88074c659959
change-id: 20241108-pks-refs-optimize-migrations-6d0ceee4abb7

Comments

Patrick Steinhardt Nov. 25, 2024, 6:29 a.m. UTC | #1
On Mon, Nov 25, 2024 at 07:27:05AM +0100, Patrick Steinhardt wrote:
> Changes in v3:
> 
>   - Mention that we store the ref transaction flag for access by the
>     backend in the first commit message.
>   - Fix a missing word in another commit message.
>   - Use `unsigned int` to pass `initial_transaction` flag.
>   - Rename the scratch buffers and provide a comment for why they exist.
>   - Link to v2: https://lore.kernel.org/r/20241120-pks-refs-optimize-migrations-v2-0-a233374b7452@pks.im

Oh, I just saw that the merge to 'next' has crossed with v3 of this
series. Not much of an issue, these were all cosmetic changes anyway. I
can send these in a follow-up series.

Patrick