mbox series

[0/5] cat-file replace handling and optimization

Message ID YVy1sx8Xb1xMLFQT@coredump.intra.peff.net (mailing list archive)
Headers show
Series cat-file replace handling and optimization | expand

Message

Jeff King Oct. 5, 2021, 8:29 p.m. UTC
This started as an optimization to have cat-file use the pack/offset
info it gets during --batch-all-objects to avoid extra object lookups.
And that does happen in the final patch.

But there was an interesting interaction with replace refs there, which
led me to patch 3.

The other patches are relevant prep/cleanup.

  [1/5]: t1006: clean up broken objects
  [2/5]: cat-file: mention --unordered along with --batch-all-objects
  [3/5]: cat-file: disable refs/replace with --batch-all-objects
  [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
  [5/5]: cat-file: use packed_object_info() for --batch-all-objects

 Documentation/git-cat-file.txt |  6 ++-
 builtin/cat-file.c             | 49 +++++++++++++++-------
 t/t1006-cat-file.sh            | 75 ++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 16 deletions(-)

-Peff

Comments

Derrick Stolee Oct. 6, 2021, 8:41 p.m. UTC | #1
On 10/5/2021 4:29 PM, Jeff King wrote:
> This started as an optimization to have cat-file use the pack/offset
> info it gets during --batch-all-objects to avoid extra object lookups.
> And that does happen in the final patch.
> 
> But there was an interesting interaction with replace refs there, which
> led me to patch 3.
> 
> The other patches are relevant prep/cleanup.
> 
>   [1/5]: t1006: clean up broken objects
>   [2/5]: cat-file: mention --unordered along with --batch-all-objects
>   [3/5]: cat-file: disable refs/replace with --batch-all-objects
>   [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
>   [5/5]: cat-file: use packed_object_info() for --batch-all-objects

I took a close read through this series and it was easy to understand.

LGTM. Nice ~2x speedup in patch 5.

Thanks,
-Stolee
Jeff King Oct. 7, 2021, 12:32 a.m. UTC | #2
On Wed, Oct 06, 2021 at 04:41:26PM -0400, Derrick Stolee wrote:

> On 10/5/2021 4:29 PM, Jeff King wrote:
> > This started as an optimization to have cat-file use the pack/offset
> > info it gets during --batch-all-objects to avoid extra object lookups.
> > And that does happen in the final patch.
> > 
> > But there was an interesting interaction with replace refs there, which
> > led me to patch 3.
> > 
> > The other patches are relevant prep/cleanup.
> > 
> >   [1/5]: t1006: clean up broken objects
> >   [2/5]: cat-file: mention --unordered along with --batch-all-objects
> >   [3/5]: cat-file: disable refs/replace with --batch-all-objects
> >   [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
> >   [5/5]: cat-file: use packed_object_info() for --batch-all-objects
> 
> I took a close read through this series and it was easy to understand.
> 
> LGTM. Nice ~2x speedup in patch 5.

Thanks. I was actually surprise at the speedup in the final patch. It's
a bit less impressive on a smaller repo like git.git (I think it was
like 250ms versus 180ms). That kind of makes sense, though. For a repo
with N objects, we're cutting out N lookups at a cost of log(N) each. So
we'd expect it to be a better than linear improvement as the number of
objects increases. (Of course it's not N(log(N)) over all because we're
only cutting out _some_ of the work the process is doing).

Still, I'm a fan of easy wins. :)

-Peff