mbox series

[0/2] bundle: fix non-linear performance scaling with refs

Message ID 20250401-488-generating-bundles-with-many-references-has-non-linear-performance-v1-0-6d23b2d96557@gmail.com (mailing list archive)
Headers show
Series bundle: fix non-linear performance scaling with refs | expand

Message

Karthik Nayak April 1, 2025, 5 p.m. UTC
Hello, 

At GitLab, we noticed that bundle creation doesn't seem to scale linearly 
the number of references in a repository. The following benchmark demostrates
the issue:

Benchmark 1: bundle (refcount = 100)
  Time (mean ± σ):       4.4 ms ±   0.5 ms    [User: 1.8 ms, System: 2.4 ms]
  Range (min … max):     3.4 ms …   7.7 ms    434 runs

Benchmark 2: bundle (refcount = 1000)
  Time (mean ± σ):      16.5 ms ±   1.7 ms    [User: 9.6 ms, System: 7.2 ms]
  Range (min … max):    14.1 ms …  21.7 ms    176 runs

Benchmark 3: bundle (refcount = 10000)
  Time (mean ± σ):     220.6 ms ±   3.2 ms    [User: 171.6 ms, System: 55.7 ms]
  Range (min … max):   215.8 ms … 224.9 ms    13 runs

Benchmark 4: bundle (refcount = 100000)
  Time (mean ± σ):      9.622 s ±  0.063 s    [User: 9.143 s, System: 0.546 s]
  Range (min … max):    9.563 s …  9.738 s    10 runs

Summary
  bundle (refcount = 100) ran
    3.79 ± 0.61 times faster than bundle (refcount = 1000)
   50.63 ± 6.39 times faster than bundle (refcount = 10000)
 2207.95 ± 277.35 times faster than bundle (refcount = 100000)

Digging into this, the reason for this is because we check for duplicate refnames
added by the user. But this check uses an O(N^2) algorithm, which would not 
scale linearly with the number of refs.

The first commit in this small series adds a bunch of tests for this behavior,
while also discovering a missed edge case. The second commit introduces an 
alternative approach which uses an 'strset' to check for duplicates. The new
approach fixes the performance problems noticed while also fixing the earlier
missed edge case. Overall we see a 6x performance improvement with this series.

I found that there is a conflict with 'ps/object-wo-the-repository' in seen,
the resolution seems simple enough. Happy to support as needed.

---
 bundle.c               | 10 +++++++++-
 object.c               | 33 -------------------------------
 object.h               |  6 ------
 t/t6020-bundle-misc.sh | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 62 insertions(+), 40 deletions(-)

Karthik Nayak (2):
      t6020: test for duplicate refnames in bundle creation
      bundle: fix non-linear performance scaling with refs
---



---

base-commit: 683c54c999c301c2cd6f715c411407c413b1d84e
change-id: 20250322-488-generating-bundles-with-many-references-has-non-linear-performance-64aec8e0cf1d

Thanks
- Karthik