Message ID | YVy1sx8Xb1xMLFQT@coredump.intra.peff.net (mailing list archive) |
---|---|
Headers | show |
Series | cat-file replace handling and optimization | expand |
On 10/5/2021 4:29 PM, Jeff King wrote: > This started as an optimization to have cat-file use the pack/offset > info it gets during --batch-all-objects to avoid extra object lookups. > And that does happen in the final patch. > > But there was an interesting interaction with replace refs there, which > led me to patch 3. > > The other patches are relevant prep/cleanup. > > [1/5]: t1006: clean up broken objects > [2/5]: cat-file: mention --unordered along with --batch-all-objects > [3/5]: cat-file: disable refs/replace with --batch-all-objects > [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks > [5/5]: cat-file: use packed_object_info() for --batch-all-objects I took a close read through this series and it was easy to understand. LGTM. Nice ~2x speedup in patch 5. Thanks, -Stolee
On Wed, Oct 06, 2021 at 04:41:26PM -0400, Derrick Stolee wrote: > On 10/5/2021 4:29 PM, Jeff King wrote: > > This started as an optimization to have cat-file use the pack/offset > > info it gets during --batch-all-objects to avoid extra object lookups. > > And that does happen in the final patch. > > > > But there was an interesting interaction with replace refs there, which > > led me to patch 3. > > > > The other patches are relevant prep/cleanup. > > > > [1/5]: t1006: clean up broken objects > > [2/5]: cat-file: mention --unordered along with --batch-all-objects > > [3/5]: cat-file: disable refs/replace with --batch-all-objects > > [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks > > [5/5]: cat-file: use packed_object_info() for --batch-all-objects > > I took a close read through this series and it was easy to understand. > > LGTM. Nice ~2x speedup in patch 5. Thanks. I was actually surprise at the speedup in the final patch. It's a bit less impressive on a smaller repo like git.git (I think it was like 250ms versus 180ms). That kind of makes sense, though. For a repo with N objects, we're cutting out N lookups at a cost of log(N) each. So we'd expect it to be a better than linear improvement as the number of objects increases. (Of course it's not N(log(N)) over all because we're only cutting out _some_ of the work the process is doing). Still, I'm a fan of easy wins. :) -Peff