mbox series

[v3,00/10] compat/zlib: allow use of zlib-ng as backend

Message ID 20250116-b4-pks-compat-drop-uncompress2-v3-0-f2af1f5c4a06@pks.im (mailing list archive)
Headers show
Series compat/zlib: allow use of zlib-ng as backend | expand

Message

Patrick Steinhardt Jan. 16, 2025, 9:17 a.m. UTC
Hi,

I have recently started to play around with zlib-ng a bit, which is a
hard fork of the zlib library. It describes itself as zlib replacement
with optimizations for "next generation" systems. As such, it contains
several implementations of central algorithms using for example SSE2,
AVX2 and other vectorized CPU intrinsics that supposedly speed up in-
and deflating data.

And indeed, compiling Git against zlib-ng leads to a significant speedup
when reading objects. The following benchmark uses git-cat-file(1) with
`--batch --batch-all-objects` in the Git repository:

    Benchmark 1: zlib
      Time (mean ± σ):     52.085 s ±  0.141 s    [User: 51.500 s, System: 0.456 s]
      Range (min … max):   52.004 s … 52.335 s    5 runs

    Benchmark 2: zlib-ng
      Time (mean ± σ):     40.324 s ±  0.134 s    [User: 39.731 s, System: 0.490 s]
      Range (min … max):   40.135 s … 40.484 s    5 runs

    Summary
      zlib-ng ran
        1.29 ± 0.01 times faster than zlib

So we're looking at a ~25% speedup compared to zlib. This is of course
an extreme example, as it makes us read through all objects in the
repository. But regardless, it should be possible to see some sort of
speedup in most commands that end up accessing the object database.

This patch series refactors how we wire up zlib in our project by
introducing a new "compat/zlib.h" header function. This header is then
later extended to patch over the differences between zlib and zlib-ng,
which is mostly just that zlib-ng has a `zng_` prefix for each of its
symbols. Like this, we can support both libraries directly, and a new
Meson build options allows users to pick whichever backend they like.

In theory, these changes shouldn't be necessary because zlib-ng provides
a compatibility layer that make it directly compatible with zlib. But
most distros don't allow you to install zlib-ng with that layer is it
would mean that zlib would need to be replaced globally. Instead, they
typically only provide a version of zlib-ng that only has the `zng_`
prefixed symbols.

Given the observed speedup I do think that this is a worthwhile change
so that users (or especially hosting providers) can easily switch to
zlib-ng without impacting the rest of their system.

Changes in v2:
  - Wire up zlib-ng in our Makefile.
  - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
    Meson and installing zlib-ng.
  - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im

Changes in v3:
  - Fix a couple of commit message typos.
  - Mention why we can safely drop "CC=gcc" when converting the musl job
    to use Meson.
  - Link to v2: https://lore.kernel.org/r/20250114-b4-pks-compat-drop-uncompress2-v2-0-614a2158e34e@pks.im

I've adjusted the series to be based on top of fbe8d3079d (Git 2.48,
2025-01-10) with ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide
a summary of configured backends, 2024-12-30) and ps/build-meson-fixes
at 4e517e68b5 (ci: wire up Visual Studio build with Meson, 2025-01-14)
merged into it. This matches what Junio has in his tree -- sorry for
screwing up the previous base!

Thanks!

Patrick

---
Patrick Steinhardt (10):
      compat: drop `uncompress2()` compatibility shim
      git-compat-util: drop `z_const` define
      compat: introduce new "zlib.h" header
      git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
      compat/zlib: provide `deflateBound()` shim centrally
      compat/zlib: provide stubs for `deflateSetHeader()`
      git-zlib: cast away potential constness of `next_in` pointer
      compat/zlib: allow use of zlib-ng as backend
      ci: switch linux-musl to use Meson
      ci: make "linux-musl" job use zlib-ng

 .github/workflows/main.yml |  2 +-
 .gitlab-ci.yml             |  2 +-
 Makefile                   | 21 +++++++---
 archive-tar.c              |  4 --
 archive.c                  |  1 +
 ci/install-dependencies.sh |  4 +-
 ci/lib.sh                  |  5 +--
 ci/run-build-and-tests.sh  |  3 +-
 compat/zlib-compat.h       | 47 +++++++++++++++++++++++
 compat/zlib-uncompress2.c  | 96 ----------------------------------------------
 config.c                   |  1 +
 csum-file.c                |  3 +-
 environment.c              |  1 +
 git-compat-util.h          | 12 ------
 git-zlib.c                 |  6 +--
 git-zlib.h                 |  2 +
 meson.build                | 24 +++++++++---
 meson_options.txt          |  4 ++
 reftable/block.c           |  1 -
 reftable/system.h          |  1 +
 20 files changed, 100 insertions(+), 140 deletions(-)

Range-diff versus v2:

 1:  39a0bb02f4 =  1:  1d2035f387 compat: drop `uncompress2()` compatibility shim
 2:  e4783c2e8d =  2:  967bcabebc git-compat-util: drop `z_const` define
 3:  e222910808 =  3:  1b003953e0 compat: introduce new "zlib.h" header
 4:  0af5406642 =  4:  231f734fe6 git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
 5:  0092d57c16 =  5:  3cde8de583 compat/zlib: provide `deflateBound()` shim centrally
 6:  8e602b5e91 !  6:  5e20458b72 compat/zlib: provide stubs for `deflateSetHeader()`
    @@ Metadata
      ## Commit message ##
         compat/zlib: provide stubs for `deflateSetHeader()`
     
    -    The function `deflateSetHeader()` has been introduce with zlib v1.2.2.1,
    +    The function `deflateSetHeader()` has been introduced with zlib v1.2.2.1,
         so we don't use it when linking against an older version of it. Refactor
         the code to instead provide a central stub via "compat/zlib.h" so that
         we can adapt it based on whether or not we use zlib-ng in a subsequent
 7:  937688fcf5 !  7:  4de4631970 git-zlib: cast away potential constness of `next_in` pointer
    @@ Metadata
      ## Commit message ##
         git-zlib: cast away potential constness of `next_in` pointer
     
    -    The `struct git_zstream::next_in` variable points to the input data that
    -    and is used in combination with `struct z_stream::next_in`. While that
    +    The `struct git_zstream::next_in` variable points to the input data and
    +    is used in combination with `struct z_stream::next_in`. While that
         latter field is not marked as a constant in zlib, it is marked as such
         in zlib-ng. This causes a couple of compiler errors when we try to
         assign these fields to one another due to mismatching constness.
 8:  a721b846f7 =  8:  77f28f0f7d compat/zlib: allow use of zlib-ng as backend
 9:  45fde7a7dd !  9:  6fefd3ab44 ci: switch linux-musl to use Meson
    @@ Commit message
         is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build
         option for it, which we set via a new "MESONFLAGS" environment variable.
     
    +    Note that we also drop the CC variable, which is set to "gcc". We
    +    already default to GCC when CC is unset in "ci/lib.sh", so this is not
    +    needed.
    +
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
      ## .github/workflows/main.yml ##
    @@ ci/lib.sh: linux32)
     
      ## ci/run-build-and-tests.sh ##
     @@ ci/run-build-and-tests.sh: case "$jobname" in
    - 	group "Configure" meson setup build . \
    + 		--fatal-meson-warnings \
      		--warnlevel 2 --werror \
      		--wrap-mode nofallback \
     -		-Dfuzzers=true
10:  0aa66bf9c1 = 10:  15acea92a2 ci: make "linux-musl" job use zlib-ng

---
base-commit: cbdbb490357c16eaaa6528c1d550c513a632d196
change-id: 20250110-b4-pks-compat-drop-uncompress2-eb5914459c32

Comments

Karthik Nayak Jan. 17, 2025, 10:06 a.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

[snip]

>  9:  45fde7a7dd !  9:  6fefd3ab44 ci: switch linux-musl to use Meson
>     @@ Commit message
>          is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build
>          option for it, which we set via a new "MESONFLAGS" environment variable.
>
>     +    Note that we also drop the CC variable, which is set to "gcc". We
>     +    already default to GCC when CC is unset in "ci/lib.sh", so this is not
>     +    needed.
>     +
>          Signed-off-by: Patrick Steinhardt <ps@pks.im>
>
>       ## .github/workflows/main.yml ##
>     @@ ci/lib.sh: linux32)
>
>       ## ci/run-build-and-tests.sh ##
>      @@ ci/run-build-and-tests.sh: case "$jobname" in
>     - 	group "Configure" meson setup build . \
>     + 		--fatal-meson-warnings \
>       		--warnlevel 2 --werror \
>       		--wrap-mode nofallback \
>      -		-Dfuzzers=true
>

why remove the group here? The rest of the range-diff looks good.

> 10:  0aa66bf9c1 = 10:  15acea92a2 ci: make "linux-musl" job use zlib-ng
>
> ---
> base-commit: cbdbb490357c16eaaa6528c1d550c513a632d196
> change-id: 20250110-b4-pks-compat-drop-uncompress2-eb5914459c32
Patrick Steinhardt Jan. 17, 2025, 11:02 a.m. UTC | #2
On Fri, Jan 17, 2025 at 10:06:48AM +0000, Karthik Nayak wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> >  9:  45fde7a7dd !  9:  6fefd3ab44 ci: switch linux-musl to use Meson
> >     @@ Commit message
> >          is the `GIT_TEST_UTF8_LOCALE` variable used in tests. Wire up a build
> >          option for it, which we set via a new "MESONFLAGS" environment variable.
> >
> >     +    Note that we also drop the CC variable, which is set to "gcc". We
> >     +    already default to GCC when CC is unset in "ci/lib.sh", so this is not
> >     +    needed.
> >     +
> >          Signed-off-by: Patrick Steinhardt <ps@pks.im>
> >
> >       ## .github/workflows/main.yml ##
> >     @@ ci/lib.sh: linux32)
> >
> >       ## ci/run-build-and-tests.sh ##
> >      @@ ci/run-build-and-tests.sh: case "$jobname" in
> >     - 	group "Configure" meson setup build . \
> >     + 		--fatal-meson-warnings \
> >       		--warnlevel 2 --werror \
> >       		--wrap-mode nofallback \
> >      -		-Dfuzzers=true
> >
> 
> why remove the group here? The rest of the range-diff looks good.

I don't, it just fell out of the diff context and thus isn't seen
anymore :) The line still exists.

Patrick