mbox series

[v1,00/10] cat-file speedups

Message ID 20240715003519.2671385-1-e@80x24.org (mailing list archive)
Headers show
Series cat-file speedups | expand

Message

Eric Wong July 15, 2024, 12:35 a.m. UTC
This continues the work of Jeff King and my initial work to
speed up cat-file --batch(-contents)? users in
https://lore.kernel.org/git/20240621062915.GA2105230@coredump.intra.peff.net/T/

There's more speedups I'm working on, but this series touches
on the work Jeff and I have already published.

I've started putting some Perl5 + Inline::C benchmarks with
several knobs up at: git clone https://80x24.org/misc-git-benchmarks.git

I've found it necessary to use schedtool(1) on Linux to pin all
processes to a single CPU on multicore systems.

Some patches make more sense for largish objects, some for
smaller objects.  Small objects (several KB) were my main focus,
but I figure 5/10 could help with some pathological big cases
and also open the door to expanding the use of caching down the
line.

10/10 actually ended up being more significant than I originally
anticipated for repeat lookups of the same objects (common for
web frontends getting hammered).

Jeff: I started writing commit messages for your patches (1 and
2), but there's probably better explanations you could do :>

Eric Wong (8):
  packfile: fix off-by-one in content_limit comparison
  packfile: inline cache_or_unpack_entry
  cat-file: use delta_base_cache entries directly
  packfile: packed_object_info avoids packed_to_object_type
  object_info: content_limit only applies to blobs
  cat-file: batch-command uses content_limit
  cat-file: batch_write: use size_t for length
  cat-file: use writev(2) if available

Jeff King (2):
  packfile: move sizep computation
  packfile: allow content-limit for cat-file

 Makefile            |   3 ++
 builtin/cat-file.c  | 124 +++++++++++++++++++++++++++++++-------------
 config.mak.uname    |   5 ++
 git-compat-util.h   |  10 ++++
 object-file.c       |  12 +++++
 object-store-ll.h   |   8 +++
 packfile.c          | 120 ++++++++++++++++++++++++++----------------
 packfile.h          |   4 ++
 t/t1006-cat-file.sh |  19 +++++--
 wrapper.c           |  18 +++++++
 wrapper.h           |   1 +
 write-or-die.c      |  66 +++++++++++++++++++++++
 write-or-die.h      |   2 +
 13 files changed, 308 insertions(+), 84 deletions(-)

Comments

Patrick Steinhardt July 24, 2024, 8:35 a.m. UTC | #1
On Mon, Jul 15, 2024 at 12:35:09AM +0000, Eric Wong wrote:
> This continues the work of Jeff King and my initial work to
> speed up cat-file --batch(-contents)? users in
> https://lore.kernel.org/git/20240621062915.GA2105230@coredump.intra.peff.net/T/
> 
> There's more speedups I'm working on, but this series touches
> on the work Jeff and I have already published.
> 
> I've started putting some Perl5 + Inline::C benchmarks with
> several knobs up at: git clone https://80x24.org/misc-git-benchmarks.git
> 
> I've found it necessary to use schedtool(1) on Linux to pin all
> processes to a single CPU on multicore systems.
> 
> Some patches make more sense for largish objects, some for
> smaller objects.  Small objects (several KB) were my main focus,
> but I figure 5/10 could help with some pathological big cases
> and also open the door to expanding the use of caching down the
> line.
> 
> 10/10 actually ended up being more significant than I originally
> anticipated for repeat lookups of the same objects (common for
> web frontends getting hammered).
> 
> Jeff: I started writing commit messages for your patches (1 and
> 2), but there's probably better explanations you could do :>

I definitely think that most of the commit messages could use some
deeper explanations. I had quite a hard time to figure out the idea
behind the commits because the messages only really talk about what they
are doing, but don't mention why they are doing it or why the
transformations are safe.

It might also help with attracting more folks to review this patch
series if things have better explanations :)

Patrick