[0/5] cat-file replace handling and optimization

Message ID	YVy1sx8Xb1xMLFQT@coredump.intra.peff.net (mailing list archive)
Headers	show Return-Path: <git-owner@kernel.org> Date: Tue, 5 Oct 2021 16:29:39 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Subject: [PATCH 0/5] cat-file replace handling and optimization Message-ID: <YVy1sx8Xb1xMLFQT@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Precedence: bulk
Series	cat-file replace handling and optimization \| expand [0/5] cat-file replace handling and optimization [1/5] t1006: clean up broken objects [2/5] cat-file: mention --unordered along with --batch-all-objects [3/5] cat-file: disable refs/replace with --batch-all-objects [4/5] cat-file: split ordered/unordered batch-all-objects callbacks [5/5] cat-file: use packed_object_info() for --batch-all-objects

Message ID

YVy1sx8Xb1xMLFQT@coredump.intra.peff.net (mailing list archive)

Headers

Date: Tue, 5 Oct 2021 16:29:39 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Subject: [PATCH 0/5] cat-file replace handling and optimization
Message-ID: <YVy1sx8Xb1xMLFQT@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Precedence: bulk

Series

cat-file replace handling and optimization | expand

Message

Jeff King Oct. 5, 2021, 8:29 p.m. UTC

This started as an optimization to have cat-file use the pack/offset
info it gets during --batch-all-objects to avoid extra object lookups.
And that does happen in the final patch.

But there was an interesting interaction with replace refs there, which
led me to patch 3.

The other patches are relevant prep/cleanup.

  [1/5]: t1006: clean up broken objects
  [2/5]: cat-file: mention --unordered along with --batch-all-objects
  [3/5]: cat-file: disable refs/replace with --batch-all-objects
  [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
  [5/5]: cat-file: use packed_object_info() for --batch-all-objects

 Documentation/git-cat-file.txt |  6 ++-
 builtin/cat-file.c             | 49 +++++++++++++++-------
 t/t1006-cat-file.sh            | 75 ++++++++++++++++++++++++++++++++++
 3 files changed, 114 insertions(+), 16 deletions(-)

-Peff

Comments

Derrick Stolee Oct. 6, 2021, 8:41 p.m. UTC | #1

On 10/5/2021 4:29 PM, Jeff King wrote:
> This started as an optimization to have cat-file use the pack/offset
> info it gets during --batch-all-objects to avoid extra object lookups.
> And that does happen in the final patch.
> 
> But there was an interesting interaction with replace refs there, which
> led me to patch 3.
> 
> The other patches are relevant prep/cleanup.
> 
>   [1/5]: t1006: clean up broken objects
>   [2/5]: cat-file: mention --unordered along with --batch-all-objects
>   [3/5]: cat-file: disable refs/replace with --batch-all-objects
>   [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
>   [5/5]: cat-file: use packed_object_info() for --batch-all-objects

I took a close read through this series and it was easy to understand.

LGTM. Nice ~2x speedup in patch 5.

Thanks,
-Stolee

Jeff King Oct. 7, 2021, 12:32 a.m. UTC | #2

On Wed, Oct 06, 2021 at 04:41:26PM -0400, Derrick Stolee wrote:

> On 10/5/2021 4:29 PM, Jeff King wrote:
> > This started as an optimization to have cat-file use the pack/offset
> > info it gets during --batch-all-objects to avoid extra object lookups.
> > And that does happen in the final patch.
> > 
> > But there was an interesting interaction with replace refs there, which
> > led me to patch 3.
> > 
> > The other patches are relevant prep/cleanup.
> > 
> >   [1/5]: t1006: clean up broken objects
> >   [2/5]: cat-file: mention --unordered along with --batch-all-objects
> >   [3/5]: cat-file: disable refs/replace with --batch-all-objects
> >   [4/5]: cat-file: split ordered/unordered batch-all-objects callbacks
> >   [5/5]: cat-file: use packed_object_info() for --batch-all-objects
> 
> I took a close read through this series and it was easy to understand.
> 
> LGTM. Nice ~2x speedup in patch 5.

Thanks. I was actually surprise at the speedup in the final patch. It's
a bit less impressive on a smaller repo like git.git (I think it was
like 250ms versus 180ms). That kind of makes sense, though. For a repo
with N objects, we're cutting out N lookups at a cost of log(N) each. So
we'd expect it to be a better than linear improvement as the number of
objects increases. (Of course it's not N(log(N)) over all because we're
only cutting out _some_ of the work the process is doing).

Still, I'm a fan of easy wins. :)

-Peff