diff mbox series

[09/12] reftable/record: reuse refname when decoding

Message ID 3ba697036c1db3837f46775823a7bd55602b4bac.1707895758.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: improve ref iteration performance (pt.2) | expand

Commit Message

Patrick Steinhardt Feb. 14, 2024, 7:46 a.m. UTC
When decoding a reftable record we will first release the user-provided
record and then decode the new record into it. This is quite inefficient
as we basically need to reallocate at least the refname every time.

Refactor the function to start tracking the refname capacity. Like this,
we can stow away the refname, release, restore and then grow the refname
to the required number of bytes via `REFTABLE_ALLOC_GROW()`.

This refactoring is safe to do because all functions that assigning to
the refname will first call `release_reftable_record()`, which will zero
out the complete record after releasing memory.

This change results in a nice speedup when iterating over 1 million
refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)

    Time (mean ± σ):     124.0 ms ±   3.9 ms    [User: 121.1 ms, System: 2.7 ms]
    Range (min … max):   120.4 ms … 152.7 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     114.4 ms ±   3.7 ms    [User: 111.5 ms, System: 2.7 ms]
    Range (min … max):   111.0 ms … 152.1 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.08 ± 0.05 times faster than show-ref: single matching ref (revision = HEAD~)

Furthermore, with this change we now perform a mostly constant number of
allocations when iterating. Before this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 1,006,620 allocs, 1,006,495 frees, 25,398,363 bytes allocated

After this change:

  HEAP SUMMARY:
      in use at exit: 13,603 bytes in 125 blocks
    total heap usage: 6,623 allocs, 6,498 frees, 509,592 bytes allocated

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c          | 16 ++++++++++++----
 reftable/reftable-record.h |  1 +
 2 files changed, 13 insertions(+), 4 deletions(-)

Comments

James Liu Feb. 28, 2024, 12:06 a.m. UTC | #1
On Wed Feb 14, 2024 at 6:46 PM AEDT, Patrick Steinhardt wrote:
> This refactoring is safe to do because all functions that assigning to
> the refname will first call `release_reftable_record()`, which will zero
> out the complete record after releasing memory.

s/release_reftable_record/reftable_ref_record_release

> Furthermore, with this change we now perform a mostly constant number of
> allocations when iterating.

That's awesome!

> +	SWAP(refname, r->refname);
> +	SWAP(refname_cap, r->refname_cap);
>  	reftable_ref_record_release(r);
> +	SWAP(refname, r->refname);
> +	SWAP(refname_cap, r->refname_cap);

What do you think about reversing the order of the `SWAP` arguments in
the last two invocations? If my understanding is correct that we're
preserving the `refname` and `refname_cap` fields so we can set them back
into a freshly initialised `r`, reversing the args might make that intent
a bit clearer.

Also, since we're unconditionally `memcpy`ing the key into `r->refname`
below, can we avoid the `SWAP(refname, r->refname)` call altogether?

> -	assert(hash_size > 0);
> -
> -	r->refname = reftable_malloc(key.len + 1);
> +	REFTABLE_ALLOC_GROW(r->refname, key.len + 1, r->refname_cap);
>  	memcpy(r->refname, key.buf, key.len);
>  	r->refname[key.len] = 0;
Patrick Steinhardt March 4, 2024, 10:39 a.m. UTC | #2
On Wed, Feb 28, 2024 at 11:06:52AM +1100, James Liu wrote:
> On Wed Feb 14, 2024 at 6:46 PM AEDT, Patrick Steinhardt wrote:
> > This refactoring is safe to do because all functions that assigning to
> > the refname will first call `release_reftable_record()`, which will zero
> > out the complete record after releasing memory.
> 
> s/release_reftable_record/reftable_ref_record_release
> 
> > Furthermore, with this change we now perform a mostly constant number of
> > allocations when iterating.
> 
> That's awesome!
> 
> > +	SWAP(refname, r->refname);
> > +	SWAP(refname_cap, r->refname_cap);
> >  	reftable_ref_record_release(r);
> > +	SWAP(refname, r->refname);
> > +	SWAP(refname_cap, r->refname_cap);
> 
> What do you think about reversing the order of the `SWAP` arguments in
> the last two invocations? If my understanding is correct that we're
> preserving the `refname` and `refname_cap` fields so we can set them back
> into a freshly initialised `r`, reversing the args might make that intent
> a bit clearer.

Yeah, fair enough.

> Also, since we're unconditionally `memcpy`ing the key into `r->refname`
> below, can we avoid the `SWAP(refname, r->refname)` call altogether?

No, otherwise `reftable_ref_record_release()` would have already
released the underlying pointer of `r->refname` and the call to
`REFTABLE_ALLOC_GROW()` would always have to reallocate it.

Patrick

> > -	assert(hash_size > 0);
> > -
> > -	r->refname = reftable_malloc(key.len + 1);
> > +	REFTABLE_ALLOC_GROW(r->refname, key.len + 1, r->refname_cap);
> >  	memcpy(r->refname, key.buf, key.len);
> >  	r->refname[key.len] = 0;
>
diff mbox series

Patch

diff --git a/reftable/record.c b/reftable/record.c
index d6bb42e887..e800cfef00 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -368,16 +368,24 @@  static int reftable_ref_record_decode(void *rec, struct strbuf key,
 	struct reftable_ref_record *r = rec;
 	struct string_view start = in;
 	uint64_t update_index = 0;
-	int n = get_var_int(&update_index, &in);
+	const char *refname = NULL;
+	size_t refname_cap = 0;
+	int n;
+
+	assert(hash_size > 0);
+
+	n = get_var_int(&update_index, &in);
 	if (n < 0)
 		return n;
 	string_view_consume(&in, n);
 
+	SWAP(refname, r->refname);
+	SWAP(refname_cap, r->refname_cap);
 	reftable_ref_record_release(r);
+	SWAP(refname, r->refname);
+	SWAP(refname_cap, r->refname_cap);
 
-	assert(hash_size > 0);
-
-	r->refname = reftable_malloc(key.len + 1);
+	REFTABLE_ALLOC_GROW(r->refname, key.len + 1, r->refname_cap);
 	memcpy(r->refname, key.buf, key.len);
 	r->refname[key.len] = 0;
 
diff --git a/reftable/reftable-record.h b/reftable/reftable-record.h
index bb6e99acd3..e657001d42 100644
--- a/reftable/reftable-record.h
+++ b/reftable/reftable-record.h
@@ -22,6 +22,7 @@  license that can be found in the LICENSE file or at
 /* reftable_ref_record holds a ref database entry target_value */
 struct reftable_ref_record {
 	char *refname; /* Name of the ref, malloced. */
+	size_t refname_cap;
 	uint64_t update_index; /* Logical timestamp at which this value is
 				* written */