mbox series

[v2,00/10] compat/zlib: allow use of zlib-ng as backend

Message ID 20250114-b4-pks-compat-drop-uncompress2-v2-0-614a2158e34e@pks.im (mailing list archive)
Headers show
Series compat/zlib: allow use of zlib-ng as backend | expand

Message

Patrick Steinhardt Jan. 14, 2025, 11:57 a.m. UTC
Hi,

I have recently started to play around with zlib-ng a bit, which is a
hard fork of the zlib library. It describes itself as zlib replacement
with optimizations for "next generation" systems. As such, it contains
several implementations of central algorithms using for example SSE2,
AVX2 and other vectorized CPU intrinsics that supposedly speed up in-
and deflating data.

And indeed, compiling Git against zlib-ng leads to a significant speedup
when reading objects. The following benchmark uses git-cat-file(1) with
`--batch --batch-all-objects` in the Git repository:

    Benchmark 1: zlib
      Time (mean ± σ):     52.085 s ±  0.141 s    [User: 51.500 s, System: 0.456 s]
      Range (min … max):   52.004 s … 52.335 s    5 runs

    Benchmark 2: zlib-ng
      Time (mean ± σ):     40.324 s ±  0.134 s    [User: 39.731 s, System: 0.490 s]
      Range (min … max):   40.135 s … 40.484 s    5 runs

    Summary
      zlib-ng ran
        1.29 ± 0.01 times faster than zlib

So we're looking at a ~25% speedup compared to zlib. This is of course
an extreme example, as it makes us read through all objects in the
repository. But regardless, it should be possible to see some sort of
speedup in most commands that end up accessing the object database.

This patch series refactors how we wire up zlib in our project by
introducing a new "compat/zlib.h" header function. This header is then
later extended to patch over the differences between zlib and zlib-ng,
which is mostly just that zlib-ng has a `zng_` prefix for each of its
symbols. Like this, we can support both libraries directly, and a new
Meson build options allows users to pick whichever backend they like.

In theory, these changes shouldn't be necessary because zlib-ng provides
a compatibility layer that make it directly compatible with zlib. But
most distros don't allow you to install zlib-ng with that layer is it
would mean that zlib would need to be replaced globally. Instead, they
typically only provide a version of zlib-ng that only has the `zng_`
prefixed symbols.

Given the observed speedup I do think that this is a worthwhile change
so that users (or especially hosting providers) can easily switch to
zlib-ng without impacting the rest of their system.

Changes in v2:
  - Wire up zlib-ng in our Makefile.
  - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
    Meson and installing zlib-ng.
  - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im

The series is built on top of fbe8d3079d (Git 2.48, 2025-01-10) with
ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide a summary of
configured backends, 2024-12-30) merged into it.

Thanks!

Patrick

---
Patrick Steinhardt (10):
      compat: drop `uncompress2()` compatibility shim
      git-compat-util: drop `z_const` define
      compat: introduce new "zlib.h" header
      git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
      compat/zlib: provide `deflateBound()` shim centrally
      compat/zlib: provide stubs for `deflateSetHeader()`
      git-zlib: cast away potential constness of `next_in` pointer
      compat/zlib: allow use of zlib-ng as backend
      ci: switch linux-musl to use Meson
      ci: make "linux-musl" job use zlib-ng

 .github/workflows/main.yml |  2 +-
 .gitlab-ci.yml             |  2 +-
 Makefile                   | 21 +++++++---
 archive-tar.c              |  4 --
 archive.c                  |  1 +
 ci/install-dependencies.sh |  4 +-
 ci/lib.sh                  |  5 +--
 ci/run-build-and-tests.sh  |  3 +-
 compat/zlib-compat.h       | 47 +++++++++++++++++++++++
 compat/zlib-uncompress2.c  | 96 ----------------------------------------------
 config.c                   |  1 +
 csum-file.c                |  3 +-
 environment.c              |  1 +
 git-compat-util.h          | 12 ------
 git-zlib.c                 |  6 +--
 git-zlib.h                 |  2 +
 meson.build                | 24 +++++++++---
 meson_options.txt          |  4 ++
 reftable/block.c           |  1 -
 reftable/system.h          |  1 +
 20 files changed, 100 insertions(+), 140 deletions(-)

Range-diff versus v1:

 1:  5f650c2a6b =  1:  0d442c86cf compat: drop `uncompress2()` compatibility shim
 2:  a94c26ad03 =  2:  9fb474c07a git-compat-util: drop `z_const` define
 3:  4431647ede =  3:  d732ab51ca compat: introduce new "zlib.h" header
 4:  bbca17dfb5 =  4:  b261f9ebcd git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
 5:  d5315531d8 =  5:  8fa63dc02c compat/zlib: provide `deflateBound()` shim centrally
 6:  27691c395f =  6:  851c90ea6a compat/zlib: provide stubs for `deflateSetHeader()`
 7:  5c47ab5f9b =  7:  54d4a95753 git-zlib: cast away potential constness of `next_in` pointer
 8:  9826f57665 !  8:  e260c57b2e compat/zlib: allow use of zlib-ng as backend
    @@ Commit message
     
         Signed-off-by: Patrick Steinhardt <ps@pks.im>
     
    + ## Makefile ##
    +@@ Makefile: include shared.mak
    + # byte-order mark (BOM) when writing UTF-16 or UTF-32 and always writes in
    + # big-endian format.
    + #
    +-# Define NO_DEFLATE_BOUND if your zlib does not have deflateBound.
    ++# Define NO_DEFLATE_BOUND if your zlib does not have deflateBound. Define
    ++# ZLIB_NG if you want to use zlib-ng instead of zlib.
    + #
    + # Define NO_NORETURN if using buggy versions of gcc 4.6+ and profile feedback,
    + # as the compiler can crash (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49299)
    +@@ Makefile: else
    + endif
    + IMAP_SEND_LDFLAGS += $(OPENSSL_LINK) $(OPENSSL_LIBSSL) $(LIB_4_CRYPTO)
    + 
    +-ifdef ZLIB_PATH
    +-	BASIC_CFLAGS += -I$(ZLIB_PATH)/include
    +-	EXTLIBS += $(call libpath_template,$(ZLIB_PATH)/$(lib))
    ++ifdef ZLIB_NG
    ++	BASIC_CFLAGS += -DHAVE_ZLIB_NG
    ++	ifdef ZLIB_NG_PATH
    ++		BASIC_CFLAGS += -I$(ZLIB_NG_PATH)/include
    ++		EXTLIBS += $(call libpath_template,$(ZLIB_NG_PATH)/$(lib))
    ++	endif
    ++	EXTLIBS += -lz-ng
    ++else
    ++	ifdef ZLIB_PATH
    ++		BASIC_CFLAGS += -I$(ZLIB_PATH)/include
    ++		EXTLIBS += $(call libpath_template,$(ZLIB_PATH)/$(lib))
    ++	endif
    ++	EXTLIBS += -lz
    + endif
    +-EXTLIBS += -lz
    + 
    + ifndef NO_OPENSSL
    + 	OPENSSL_LIBSSL = -lssl
    +
      ## compat/zlib-compat.h ##
     @@
      #ifndef COMPAT_ZLIB_H
 -:  ---------- >  9:  7ae8f413d4 ci: switch linux-musl to use Meson
 -:  ---------- > 10:  2dd1b49e4f ci: make "linux-musl" job use zlib-ng

---
base-commit: b2da7775f8b064ef54920eb0f2e60c7f6df8f995
change-id: 20250110-b4-pks-compat-drop-uncompress2-eb5914459c32

Comments

Junio C Hamano Jan. 14, 2025, 7:34 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> Changes in v2:
>   - Wire up zlib-ng in our Makefile.
>   - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
>     Meson and installing zlib-ng.
>   - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im
>
> The series is built on top of fbe8d3079d (Git 2.48, 2025-01-10) with
> ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide a summary of
> configured backends, 2024-12-30) merged into it.

I think you are now also textually depending on the fuzzer thing due
to touching meson_options.txt and ci/run-build-and-tests.sh with a
later step.

>  -:  ---------- >  9:  7ae8f413d4 ci: switch linux-musl to use Meson
>  -:  ---------- > 10:  2dd1b49e4f ci: make "linux-musl" job use zlib-ng

I will see what other things I can find.

Thanks.
Junio C Hamano Jan. 14, 2025, 9:09 p.m. UTC | #2
Junio C Hamano <gitster@pobox.com> writes:

> Patrick Steinhardt <ps@pks.im> writes:
>
>> Changes in v2:
>>   - Wire up zlib-ng in our Makefile.
>>   - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
>>     Meson and installing zlib-ng.
>>   - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im
>>
>> The series is built on top of fbe8d3079d (Git 2.48, 2025-01-10) with
>> ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide a summary of
>> configured backends, 2024-12-30) merged into it.
>
> I think you are now also textually depending on the fuzzer thing due
> to touching meson_options.txt and ci/run-build-and-tests.sh with a
> later step.
>
>>  -:  ---------- >  9:  7ae8f413d4 ci: switch linux-musl to use Meson
>>  -:  ---------- > 10:  2dd1b49e4f ci: make "linux-musl" job use zlib-ng
>
> I will see what other things I can find.

Yup.  The patch series for some reason still does not seem to apply
cleanly ([09/10] ci/run-build-and-tests.sh somehow seems to be
troublesome), but it was easy to wiggle it in when the base was
prepared with these two topics merged on top of 'master':

    4610af08e7 ci: make "linux-musl" job use zlib-ng
    b2ddd0b33e ci: switch linux-musl to use Meson
    5118183ef4 compat/zlib: allow use of zlib-ng as backend
    08bf6b2062 git-zlib: cast away potential constness of `next_in` pointer
    ebf98412e3 compat/zlib: provide stubs for `deflateSetHeader()`
    29829e5714 compat/zlib: provide `deflateBound()` shim centrally
    8f19b26bbe git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
    8aab230253 compat: introduce new "zlib.h" header
    1ce001beaa git-compat-util: drop `z_const` define
    b9d4bd5467 compat: drop `uncompress2()` compatibility shim
    db620fad21 Merge branch 'ps/build-meson-fixes' into ps/zlib-ng
    64156589d9 Merge branch 'ps/meson-weak-sha1-build' into ps/zlib-ng

I think the reason is because the other topic that touches the fuzz
thing we see in the context of [09/10] is not ps/build-meson-fixes
but something else that is before "--fatal-meson-warnings" was
added.

One request.  You seem to have started using --full-index when
generating the patches.  It is extremely annoying when a patch needs
to be mucked with an editor to inspect why it does not apply and to
tweak it to make it apply.  40-hex does not help at all if the base
commit is not conveyed correctly, as the recipient will not have the
necessary blob objects _anyway_.  And 40-hex is unnecessarily long
in order to protect the recipient who uses "--3way" from using a
wrong blob in a fake ancestor tree.  Please stop.

Thanks.
Patrick Steinhardt Jan. 15, 2025, 5:45 a.m. UTC | #3
On Tue, Jan 14, 2025 at 01:09:43PM -0800, Junio C Hamano wrote:
> One request.  You seem to have started using --full-index when
> generating the patches.  It is extremely annoying when a patch needs
> to be mucked with an editor to inspect why it does not apply and to
> tweak it to make it apply.  40-hex does not help at all if the base
> commit is not conveyed correctly, as the recipient will not have the
> necessary blob objects _anyway_.  And 40-hex is unnecessarily long
> in order to protect the recipient who uses "--3way" from using a
> wrong blob in a fake ancestor tree.  Please stop.

I have in fact started using b4, as it makes most of the tedious
housekeeping around patch series go away, and it indeed uses
`--full-index` to generate patches. There isn't any way to change that,
but I'll send a patch upstream that gives us an option to do so.

My last patches haven't gotten any feedback though, so let's see how it
goes.

Patrick
Patrick Steinhardt Jan. 15, 2025, 5:46 a.m. UTC | #4
On Tue, Jan 14, 2025 at 01:09:43PM -0800, Junio C Hamano wrote:
> Junio C Hamano <gitster@pobox.com> writes:
> 
> > Patrick Steinhardt <ps@pks.im> writes:
> >
> >> Changes in v2:
> >>   - Wire up zlib-ng in our Makefile.
> >>   - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
> >>     Meson and installing zlib-ng.
> >>   - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im
> >>
> >> The series is built on top of fbe8d3079d (Git 2.48, 2025-01-10) with
> >> ps/meson-weak-sha1-build at 6a0ee54f9a (meson: provide a summary of
> >> configured backends, 2024-12-30) merged into it.
> >
> > I think you are now also textually depending on the fuzzer thing due
> > to touching meson_options.txt and ci/run-build-and-tests.sh with a
> > later step.
> >
> >>  -:  ---------- >  9:  7ae8f413d4 ci: switch linux-musl to use Meson
> >>  -:  ---------- > 10:  2dd1b49e4f ci: make "linux-musl" job use zlib-ng
> >
> > I will see what other things I can find.
> 
> Yup.  The patch series for some reason still does not seem to apply
> cleanly ([09/10] ci/run-build-and-tests.sh somehow seems to be
> troublesome), but it was easy to wiggle it in when the base was
> prepared with these two topics merged on top of 'master':
> 
>     4610af08e7 ci: make "linux-musl" job use zlib-ng
>     b2ddd0b33e ci: switch linux-musl to use Meson
>     5118183ef4 compat/zlib: allow use of zlib-ng as backend
>     08bf6b2062 git-zlib: cast away potential constness of `next_in` pointer
>     ebf98412e3 compat/zlib: provide stubs for `deflateSetHeader()`
>     29829e5714 compat/zlib: provide `deflateBound()` shim centrally
>     8f19b26bbe git-compat-util: move include of "compat/zlib.h" into "git-zlib.h"
>     8aab230253 compat: introduce new "zlib.h" header
>     1ce001beaa git-compat-util: drop `z_const` define
>     b9d4bd5467 compat: drop `uncompress2()` compatibility shim
>     db620fad21 Merge branch 'ps/build-meson-fixes' into ps/zlib-ng
>     64156589d9 Merge branch 'ps/meson-weak-sha1-build' into ps/zlib-ng
> 
> I think the reason is because the other topic that touches the fuzz
> thing we see in the context of [09/10] is not ps/build-meson-fixes
> but something else that is before "--fatal-meson-warnings" was
> added.

Okay, I'll adapt the base accordingly for the next iterations. Thanks!

Patrick
Konstantin Ryabitsev Jan. 15, 2025, 3:50 p.m. UTC | #5
On Wed, Jan 15, 2025 at 06:45:31AM +0100, Patrick Steinhardt wrote:
> I have in fact started using b4, as it makes most of the tedious
> housekeeping around patch series go away, and it indeed uses
> `--full-index` to generate patches. There isn't any way to change that,
> but I'll send a patch upstream that gives us an option to do so.

This was done as part of this change:
https://git.kernel.org/pub/scm/utils/b4/b4.git/commit/?id=23a9ddba10a057bfa9c438c0b50ac36d278ae022

I'm not sure why --full-index was added there -- I don't think it's needed for
--binary? Please feel free to send a fix for that.

> My last patches haven't gotten any feedback though, so let's see how it
> goes.

I had to focus on infrastructure needs over the past few months, but I'm
starting on my b4 backlog soon.

-K
Junio C Hamano Jan. 15, 2025, 4:20 p.m. UTC | #6
Konstantin Ryabitsev <konstantin@linuxfoundation.org> writes:

> I'm not sure why --full-index was added there -- I don't think it's needed for
> --binary?

Correct.  For the purpose of that quoted commit, giving "--binary"
should be sufficient.  It would avoid "Binary files differ" that is
useless in the context of format-patch, and when a binary patch is
given, it will automatically get the full length blob object name,
without --full-index.

Thanks.
Patrick Steinhardt Jan. 15, 2025, 4:25 p.m. UTC | #7
On Wed, Jan 15, 2025 at 10:50:36AM -0500, Konstantin Ryabitsev wrote:
> On Wed, Jan 15, 2025 at 06:45:31AM +0100, Patrick Steinhardt wrote:
> > I have in fact started using b4, as it makes most of the tedious
> > housekeeping around patch series go away, and it indeed uses
> > `--full-index` to generate patches. There isn't any way to change that,
> > but I'll send a patch upstream that gives us an option to do so.
> 
> This was done as part of this change:
> https://git.kernel.org/pub/scm/utils/b4/b4.git/commit/?id=23a9ddba10a057bfa9c438c0b50ac36d278ae022
> 
> I'm not sure why --full-index was added there -- I don't think it's needed for
> --binary? Please feel free to send a fix for that.

No, it shouldn't be needed. `--binary` implies `--full-index` for that
particular binary diff anyway. I'll send a patch.

> > My last patches haven't gotten any feedback though, so let's see how it
> > goes.
> 
> I had to focus on infrastructure needs over the past few months, but I'm
> starting on my b4 backlog soon.

Fair enough, thanks!

Patrick
Karthik Nayak Jan. 16, 2025, 8:35 a.m. UTC | #8
Patrick Steinhardt <ps@pks.im> writes:

> Hi,
>
> I have recently started to play around with zlib-ng a bit, which is a
> hard fork of the zlib library. It describes itself as zlib replacement
> with optimizations for "next generation" systems. As such, it contains
> several implementations of central algorithms using for example SSE2,
> AVX2 and other vectorized CPU intrinsics that supposedly speed up in-
> and deflating data.
>
> And indeed, compiling Git against zlib-ng leads to a significant speedup
> when reading objects. The following benchmark uses git-cat-file(1) with
> `--batch --batch-all-objects` in the Git repository:
>
>     Benchmark 1: zlib
>       Time (mean ± σ):     52.085 s ±  0.141 s    [User: 51.500 s, System: 0.456 s]
>       Range (min … max):   52.004 s … 52.335 s    5 runs
>
>     Benchmark 2: zlib-ng
>       Time (mean ± σ):     40.324 s ±  0.134 s    [User: 39.731 s, System: 0.490 s]
>       Range (min … max):   40.135 s … 40.484 s    5 runs
>
>     Summary
>       zlib-ng ran
>         1.29 ± 0.01 times faster than zlib
>
> So we're looking at a ~25% speedup compared to zlib. This is of course
> an extreme example, as it makes us read through all objects in the
> repository. But regardless, it should be possible to see some sort of
> speedup in most commands that end up accessing the object database.
>
> This patch series refactors how we wire up zlib in our project by
> introducing a new "compat/zlib.h" header function. This header is then
> later extended to patch over the differences between zlib and zlib-ng,
> which is mostly just that zlib-ng has a `zng_` prefix for each of its
> symbols. Like this, we can support both libraries directly, and a new
> Meson build options allows users to pick whichever backend they like.
>
> In theory, these changes shouldn't be necessary because zlib-ng provides
> a compatibility layer that make it directly compatible with zlib. But
> most distros don't allow you to install zlib-ng with that layer is it
> would mean that zlib would need to be replaced globally. Instead, they
> typically only provide a version of zlib-ng that only has the `zng_`
> prefixed symbols.
>
> Given the observed speedup I do think that this is a worthwhile change
> so that users (or especially hosting providers) can easily switch to
> zlib-ng without impacting the rest of their system.
>
> Changes in v2:
>   - Wire up zlib-ng in our Makefile.
>   - Exercise zlib-ng via CI by adapting our "linux-musl" job to use
>     Meson and installing zlib-ng.
>   - Link to v1: https://lore.kernel.org/r/20250110-b4-pks-compat-drop-uncompress2-v1-0-965d0022a74d@pks.im
>

Apart from a few typos, Patrick has already answered two of questions
and the series already looks good to me!

Thanks

[snip]
Konstantin Ryabitsev Jan. 16, 2025, 8:51 p.m. UTC | #9
On Wed, Jan 15, 2025 at 05:25:27PM +0100, Patrick Steinhardt wrote:
> > I'm not sure why --full-index was added there -- I don't think it's needed for
> > --binary? Please feel free to send a fix for that.
> 
> No, it shouldn't be needed. `--binary` implies `--full-index` for that
> particular binary diff anyway. I'll send a patch.

I've applied it to master and stable-0.14.y, thanks! I should have version
0.14.3 out in the near future with this fix.

-K