diff mbox series

[3/7] reftable/merged: skip comparison for records of the same subiter

Message ID 0ca86eba710895f0e22fc15fe5221f5487031f64.1706782841.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: improve ref iteration performance | expand

Commit Message

Patrick Steinhardt Feb. 1, 2024, 10:25 a.m. UTC
When retrieving the next entry of a merged iterator we need to drop all
records of other sub-iterators that would be shadowed by the record that
we are about to return. We do this by comparing record keys, dropping
all keys that are smaller or equal to the key of the record we are about
to return.

There is an edge case here where we can skip that comparison: when the
record in the priority queue comes from the same subiterator than the
record we are about to return then we know that its key must be larger
than the key of the record we are about to return. This property is
guaranteed by the sub-iterators, and if it didn't hold then the whole
merged iterator would return records in the wrong order, too.

While this may seem like a very specific edge case it's in fact quite
likely to happen. For most repositories out there you can assume that we
will end up with one large table and several smaller ones on top of it.
Thus, it is very likely that the next entry will sort towards the top of
the priority queue.

Special case this and break out of the loop in that case. The following
benchmark uses git-show-ref(1) to print a single ref matching a pattern
out of 1 million refs:

  Benchmark 1: show-ref: single matching ref (revision = HEAD~)
    Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
    Range (min … max):   156.6 ms … 188.5 ms    1000 runs

  Benchmark 2: show-ref: single matching ref (revision = HEAD)
    Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
    Range (min … max):   151.4 ms … 188.4 ms    1000 runs

  Summary
    show-ref: single matching ref (revision = HEAD) ran
      1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/merged.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Eric Sunshine Feb. 1, 2024, 5:29 p.m. UTC | #1
On Thu, Feb 1, 2024 at 8:17 AM Patrick Steinhardt <ps@pks.im> wrote:
> When retrieving the next entry of a merged iterator we need to drop all
> records of other sub-iterators that would be shadowed by the record that
> we are about to return. We do this by comparing record keys, dropping
> all keys that are smaller or equal to the key of the record we are about
> to return.
>
> There is an edge case here where we can skip that comparison: when the
> record in the priority queue comes from the same subiterator than the

s/than/as/

> record we are about to return then we know that its key must be larger
> than the key of the record we are about to return. This property is
> guaranteed by the sub-iterators, and if it didn't hold then the whole
> merged iterator would return records in the wrong order, too.
>
> While this may seem like a very specific edge case it's in fact quite
> likely to happen. For most repositories out there you can assume that we
> will end up with one large table and several smaller ones on top of it.
> Thus, it is very likely that the next entry will sort towards the top of
> the priority queue.
>
> Special case this and break out of the loop in that case. The following
> benchmark uses git-show-ref(1) to print a single ref matching a pattern
> out of 1 million refs:
>
>   Benchmark 1: show-ref: single matching ref (revision = HEAD~)
>     Time (mean ± σ):     162.6 ms ±   4.5 ms    [User: 159.0 ms, System: 3.5 ms]
>     Range (min … max):   156.6 ms … 188.5 ms    1000 runs
>
>   Benchmark 2: show-ref: single matching ref (revision = HEAD)
>     Time (mean ± σ):     156.8 ms ±   4.7 ms    [User: 153.0 ms, System: 3.6 ms]
>     Range (min … max):   151.4 ms … 188.4 ms    1000 runs
>
>   Summary
>     show-ref: single matching ref (revision = HEAD) ran
>       1.04 ± 0.04 times faster than show-ref: single matching ref (revision = HEAD~)
>
> Signed-off-by: Patrick Steinhardt <ps@pks.im>
Patrick Steinhardt Feb. 2, 2024, 5:15 a.m. UTC | #2
On Thu, Feb 01, 2024 at 12:29:16PM -0500, Eric Sunshine wrote:
> On Thu, Feb 1, 2024 at 8:17 AM Patrick Steinhardt <ps@pks.im> wrote:
> > When retrieving the next entry of a merged iterator we need to drop all
> > records of other sub-iterators that would be shadowed by the record that
> > we are about to return. We do this by comparing record keys, dropping
> > all keys that are smaller or equal to the key of the record we are about
> > to return.
> >
> > There is an edge case here where we can skip that comparison: when the
> > record in the priority queue comes from the same subiterator than the
> 
> s/than/as/

Thanks for your corrections, as usual :) I'll refrain from sending a v2
for now, but have squashed the fixes in while I wait for more feedback
on this series.

Patrick
diff mbox series

Patch

diff --git a/reftable/merged.c b/reftable/merged.c
index fb9978d798..0f74a14a77 100644
--- a/reftable/merged.c
+++ b/reftable/merged.c
@@ -107,6 +107,14 @@  static int merged_iter_next_entry(struct merged_iter *mi,
 		struct pq_entry top = merged_iter_pqueue_top(mi->pq);
 		int cmp;
 
+		/*
+		 * When the next entry comes from the same queue as the current
+		 * entry then it must by definition be larger. This avoids a
+		 * comparison in the most common case.
+		 */
+		if (top.index == entry.index)
+			break;
+
 		cmp = reftable_record_cmp(&top.rec, &entry.rec);
 		if (cmp > 0)
 			break;