mbox series

[v4,00/10] compat/zlib: allow use of zlib-ng as backend

Message ID 20250128-b4-pks-compat-drop-uncompress2-v4-0-129bc36ae8f5@pks.im (mailing list archive)
Headers show
Series compat/zlib: allow use of zlib-ng as backend | expand

Message

Patrick Steinhardt Jan. 28, 2025, 8:41 a.m. UTC
Hi,

I have recently started to play around with zlib-ng a bit, which is a
hard fork of the zlib library. It describes itself as zlib replacement
with optimizations for "next generation" systems. As such, it contains
several implementations of central algorithms using for example SSE2,
AVX2 and other vectorized CPU intrinsics that supposedly speed up in-
and deflating data.

And indeed, compiling Git against zlib-ng leads to a significant speedup
when reading objects. The following benchmark uses git-cat-file(1) with
`--batch --batch-all-objects` in the Git repository:

    Benchmark 1: zlib
      Time (mean ± σ):     52.085 s ±  0.141 s    [User: 51.500 s, System: 0.456 s]
      Range (min … max):   52.004 s … 52.335 s    5 runs

    Benchmark 2: zlib-ng
      Time (mean ± σ):     40.324 s ±  0.134 s    [User: 39.731 s, System: 0.490 s]
      Range (min … max):   40.135 s … 40.484 s    5 runs

    Summary
      zlib-ng ran
        1.29 ± 0.01 times faster than zlib

So we're looking at a ~25% speedup compared to zlib. This is of course
an extreme example, as it makes us read through all objects in the
repository. But regardless, it should be possible to see some sort of
speedup in most commands that end up accessing the object database.

This patch series refactors how we wire up zlib in our project by
introducing a new "compat/zlib.h" header function. This header is then
later extended to patch over the differences between zlib and zlib-ng,
which is mostly just that zlib-ng has a `zng_` prefix for each of its
symbols. Like this, we can support both libraries directly, and a new
Meson build options allows users to pick whichever backend they like.

In theory, these changes shouldn't be necessary because zlib-ng provides
a compatibility layer that make it directly compatible with zlib. But
most distros don't allow you to install zlib-ng with that layer is it
would mean that zlib would need to be replaced globally. Instead, they
typically only provide a version of zlib-ng that only has the `zng_`
prefixed symbols.

Given the observed speedup I do think that this is a worthwhile change
so that users (or especially hosting providers) can easily switch to
zlib-ng without impacting the rest of their system.

Changes in v2:
  - Wire up zlib-ng in our Makefile.
  - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
    Meson and installing zlib-ng.
  - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im

Changes in v3:
  - Fix a couple of commit message typos.
  - Mention why we can safely drop "CC=gcc" when converting the musl job
    to use Meson.
  - Link to v2: https://lore.kernel.org/r/20250114-b4-pks-compat-drop-uncompress2-v2-0-614a2158e34e@pks.im

Changes in v4:
  - Add a comment explaining why we can stub out `deflateSetHeader()`.
  - Add a comment explaining why we have to cast away constness with
    zlib-ng's `next_in` field.
  - Link to v3: https://lore.kernel.org/r/20250116-b4-pks-compat-drop-uncompress2-v3-0-f2af1f5c4a06@pks.im

I've adjusted the series to be based on top of fbe8d3079d (Git 2.48,
2025-01-10) with ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide
a summary of configured backends, 2024-12-30) and ps/build-meson-fixes
at 4e517e68b5 (ci: wire up Visual Studio build with Meson, 2025-01-14)
merged into it. This matches what Junio has in his tree -- sorry for
screwing up the previous base!

Thanks!

Patrick

---
Patrick Steinhardt (10):
      compat: drop `uncompress2()` compatibility shim
      git-compat-util: drop `z_const` define
      compat: introduce new "zlib.h" header
      git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
      compat/zlib: provide `deflateBound()` shim centrally
      compat/zlib: provide stubs for `deflateSetHeader()`
      git-zlib: cast away potential constness of `next_in` pointer
      compat/zlib: allow use of zlib-ng as backend
      ci: switch linux-musl to use Meson
      ci: make "linux-musl" job use zlib-ng

 .github/workflows/main.yml |  2 +-
 .gitlab-ci.yml             |  2 +-
 Makefile                   | 21 +++++++---
 archive-tar.c              |  4 --
 archive.c                  |  1 +
 ci/install-dependencies.sh |  4 +-
 ci/lib.sh                  |  5 +--
 ci/run-build-and-tests.sh  |  3 +-
 compat/zlib-compat.h       | 53 +++++++++++++++++++++++++
 compat/zlib-uncompress2.c  | 96 ----------------------------------------------
 config.c                   |  1 +
 csum-file.c                |  3 +-
 environment.c              |  1 +
 git-compat-util.h          | 12 ------
 git-zlib.c                 |  7 +---
 git-zlib.h                 |  2 +
 meson.build                | 24 +++++++++---
 meson_options.txt          |  4 ++
 reftable/block.c           |  1 -
 reftable/system.h          |  1 +
 20 files changed, 107 insertions(+), 140 deletions(-)

Range-diff versus v3:

 1:  00984b07b9 =  1:  fcfcf1ed81 compat: drop `uncompress2()` compatibility shim
 2:  de7bf8bf15 =  2:  b483553549 git-compat-util: drop `z_const` define
 3:  1aea050dae =  3:  f4f23ad8bc compat: introduce new "zlib.h" header
 4:  40229f1c0a =  4:  14f6055809 git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
 5:  230d23877f =  5:  acb5212ed3 compat/zlib: provide `deflateBound()` shim centrally
 6:  05e0757235 !  6:  918cf3eb0d compat/zlib: provide stubs for `deflateSetHeader()`
    @@ compat/zlib-compat.h
      # define deflateBound(c,s)  ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)
      #endif
      
    ++/*
    ++ * zlib only gained support for setting up the gzip header in v1.2.2.1. In
    ++ * Git we only set the header to make archives reproducible across different
    ++ * operating systems, so it's fine to simply make this a no-op when using a
    ++ * zlib version that doesn't support this yet.
    ++ */
     +#if ZLIB_VERNUM < 0x1221
     +struct gz_header_s {
     +	int os;
 7:  b10e6f35d7 !  7:  4047b9226a git-zlib: cast away potential constness of `next_in` pointer
    @@ git-zlib.c: static void zlib_post_call(git_zstream *s)
      	s->total_out = s->z.total_out;
      	s->total_in = s->z.total_in;
     -	s->next_in = s->z.next_in;
    ++	/* zlib-ng marks `next_in` as `const`, so we have to cast it away. */
     +	s->next_in = (unsigned char *) s->z.next_in;
      	s->next_out = s->z.next_out;
      	s->avail_in -= bytes_consumed;
 8:  6149885889 !  8:  d8f5c87d71 compat/zlib: allow use of zlib-ng as backend
    @@ compat/zlib-compat.h
     -#endif
     +# define z_stream zng_stream
     +#define gz_header_s zng_gz_header_s
    - 
    --#if ZLIB_VERNUM < 0x1221
    ++
     +# define crc32(crc, buf, len) zng_crc32(crc, buf, len)
     +
     +# define inflate(strm, bits) zng_inflate(strm, bits)
    @@ compat/zlib-compat.h
     +# if defined(NO_DEFLATE_BOUND) || ZLIB_VERNUM < 0x1200
     +#  define deflateBound(c,s)  ((s) + (((s) + 7) >> 3) + (((s) + 63) >> 6) + 11)
     +# endif
    -+
    + 
    + /*
    +  * zlib only gained support for setting up the gzip header in v1.2.2.1. In
    +@@
    +  * operating systems, so it's fine to simply make this a no-op when using a
    +  * zlib version that doesn't support this yet.
    +  */
    +-#if ZLIB_VERNUM < 0x1221
     +# if ZLIB_VERNUM < 0x1221
      struct gz_header_s {
      	int os;
 9:  f663af4332 =  9:  87fbc86f47 ci: switch linux-musl to use Meson
10:  376c05fe77 = 10:  f3ea4c5a81 ci: make "linux-musl" job use zlib-ng

---
base-commit: cbdbb490357c16eaaa6528c1d550c513a632d196
change-id: 20250110-b4-pks-compat-drop-uncompress2-eb5914459c32

Comments

Junio C Hamano Jan. 28, 2025, 8:50 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Changes in v4:
>   - Add a comment explaining why we can stub out `deflateSetHeader()`.
>   - Add a comment explaining why we have to cast away constness with
>     zlib-ng's `next_in` field.
>   - Link to v3: https://lore.kernel.org/r/20250116-b4-pks-compat-drop-uncompress2-v3-0-f2af1f5c4a06@pks.im
>
> I've adjusted the series to be based on top of fbe8d3079d (Git 2.48,
> 2025-01-10) with ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide
> a summary of configured backends, 2024-12-30) and ps/build-meson-fixes
> at 4e517e68b5 (ci: wire up Visual Studio build with Meson, 2025-01-14)
> merged into it. This matches what Junio has in his tree -- sorry for
> screwing up the previous base!

Looks like this is getting close to the final.  Will replace.

Thanks.