diff mbox series

[3/7] reftable/record: avoid copying author info

Message ID 6f568e4ccb67a7af8279352153d052c5f9a88234.1709640322.git.ps@pks.im (mailing list archive)
State Accepted
Commit 01639ec148f3b5ecd4460a2a5720f987c0b6a247
Headers show
Series reftable: memory optimizations for reflog iteration | expand

Commit Message

Patrick Steinhardt March 5, 2024, 12:11 p.m. UTC
Each reflog entry contains information regarding the authorship of who
has made the change. This authorship information is not the same as that
of any of the commits that the reflog entry references, but instead
corresponds to the local user that has executed the command. Thus, it is
almost always the case that all reflog entries have the same author.

We can make use of this fact when decoding reftable records: instead of
freeing and then reallocating the authorship information of log records,
we can special-case when the next record during an iteration has the
exact same authorship as the preceding record. If so, then there is no
need to reallocate the respective fields.

This change results in two allocations less per log record that we're
iterating over in the most common case. Before:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 6,068,489 allocs, 6,068,367 frees, 361,011,822 bytes allocated

After:

    HEAP SUMMARY:
        in use at exit: 13,473 bytes in 122 blocks
      total heap usage: 4,068,487 allocs, 4,068,365 frees, 332,011,793 bytes allocated

An alternative would be to store the capacity of both name and email and
then use `REFTABLE_ALLOC_GROW()` to conditionally reallocate the array.
But reftable records are copied around quite a lot, and thus we need to
be a bit mindful of the overall record size. Furthermore, a memory
comparison should also be more efficient than having to copy over memory
even if we wouldn't have to allocate a new array every time.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/record.c | 29 +++++++++++++++++++++--------
 1 file changed, 21 insertions(+), 8 deletions(-)

Comments

James Liu March 13, 2024, 1:09 a.m. UTC | #1
On Tue Mar 5, 2024 at 11:11 PM AEDT, Patrick Steinhardt wrote:
> Each reflog entry contains information regarding the authorship of who
> has made the change. This authorship information is not the same as that
> of any of the commits that the reflog entry references, but instead
> corresponds to the local user that has executed the command. Thus, it is
> almost always the case that all reflog entries have the same author.

What are your thoughts on simplifying this explanation a little bit? I
gave it a try below:

Each reflog entry contains authorship information indicating who has made
the change. The author here corresponds to the local user who has executed
the command rather than the author of the referenced commits. Thus, it is
almost always the case that all reflog entries have the same author.


Cheers,
James
Patrick Steinhardt March 21, 2024, 1:10 p.m. UTC | #2
On Wed, Mar 13, 2024 at 12:09:23PM +1100, James Liu wrote:
> On Tue Mar 5, 2024 at 11:11 PM AEDT, Patrick Steinhardt wrote:
> > Each reflog entry contains information regarding the authorship of who
> > has made the change. This authorship information is not the same as that
> > of any of the commits that the reflog entry references, but instead
> > corresponds to the local user that has executed the command. Thus, it is
> > almost always the case that all reflog entries have the same author.
> 
> What are your thoughts on simplifying this explanation a little bit? I
> gave it a try below:
> 
> Each reflog entry contains authorship information indicating who has made
> the change. The author here corresponds to the local user who has executed
> the command rather than the author of the referenced commits. Thus, it is
> almost always the case that all reflog entries have the same author.

That would've been a bit shorter indeed. But the patch series has been
merged to `next` by now, so I'll leave it at that.

Thanks!

Patrick
diff mbox series

Patch

diff --git a/reftable/record.c b/reftable/record.c
index 8d2cd6b081..d816de6d93 100644
--- a/reftable/record.c
+++ b/reftable/record.c
@@ -896,10 +896,19 @@  static int reftable_log_record_decode(void *rec, struct strbuf key,
 		goto done;
 	string_view_consume(&in, n);
 
-	r->value.update.name =
-		reftable_realloc(r->value.update.name, dest.len + 1);
-	memcpy(r->value.update.name, dest.buf, dest.len);
-	r->value.update.name[dest.len] = 0;
+	/*
+	 * In almost all cases we can expect the reflog name to not change for
+	 * reflog entries as they are tied to the local identity, not to the
+	 * target commits. As an optimization for this common case we can thus
+	 * skip copying over the name in case it's accurate already.
+	 */
+	if (!r->value.update.name ||
+	    strcmp(r->value.update.name, dest.buf)) {
+		r->value.update.name =
+			reftable_realloc(r->value.update.name, dest.len + 1);
+		memcpy(r->value.update.name, dest.buf, dest.len);
+		r->value.update.name[dest.len] = 0;
+	}
 
 	strbuf_reset(&dest);
 	n = decode_string(&dest, in);
@@ -907,10 +916,14 @@  static int reftable_log_record_decode(void *rec, struct strbuf key,
 		goto done;
 	string_view_consume(&in, n);
 
-	r->value.update.email =
-		reftable_realloc(r->value.update.email, dest.len + 1);
-	memcpy(r->value.update.email, dest.buf, dest.len);
-	r->value.update.email[dest.len] = 0;
+	/* Same as above, but for the reflog email. */
+	if (!r->value.update.email ||
+	    strcmp(r->value.update.email, dest.buf)) {
+		r->value.update.email =
+			reftable_realloc(r->value.update.email, dest.len + 1);
+		memcpy(r->value.update.email, dest.buf, dest.len);
+		r->value.update.email[dest.len] = 0;
+	}
 
 	ts = 0;
 	n = get_var_int(&ts, &in);