Test breakage with zlib-ng

Message ID	CA+B51BEpSh1wT627Efpysw3evVocpiDCoQ3Xaza6jKE3B62yig@mail.gmail.com (mailing list archive)
State	New, archived
Headers	show Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="HfZl28nT" Precedence: bulk MIME-Version: 1.0 From: Ondrej Pohorelsky <opohorel@redhat.com> Date: Tue, 12 Dec 2023 15:16:26 +0100 Message-ID: <CA+B51BEpSh1wT627Efpysw3evVocpiDCoQ3Xaza6jKE3B62yig@mail.gmail.com> Subject: Test breakage with zlib-ng To: git@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Series	Test breakage with zlib-ng \| expand Test breakage with zlib-ng [2/1] test-lib-functions: add object size functions

Ondrej Pohorelsky Dec. 12, 2023, 2:16 p.m. UTC

Hi everyone,

As some might have heard, there is a proposal for Fedora 40 to
transition from zlib to zlib-ng[0]. Because of this, there has been a
rebuild of all packages to ensure every package works under zlib-ng.

Git test suit has three breakages in t6300-for-each-ref.sh.
To be precise, it is:

not ok 35 - basic atom: refs/heads/main objectsize:disk
not ok 107 - basic atom: refs/tags/testtag objectsize:disk
not ok 108 - basic atom: refs/tags/testtag *objectsize:disk


All of these tests are atomic, and they compare the result against
$disklen. I discussed it with Tulio Magno Quites Machado Filho, who
ran the tests and is owner of the proposal.
It seems like the compression of zlib-ng is shaving/adding some bytes
to the actual output, which then fails the comparison.

Here is an example:

```
expecting success of 6300.35 'basic atom: refs/heads/main objectsize:disk':
git for-each-ref --format="%($format)" "$ref" >actual &&
sanitize_pgp <actual >actual.clean &&
test_cmp expected actual.clean


++ git for-each-ref '--format=%(objectsize:disk)' refs/heads/main
++ sanitize_pgp
++ perl -ne '
/^-----END PGP/ and $in_pgp = 0;
print unless $in_pgp;
/^-----BEGIN PGP/ and $in_pgp = 1;
'
++ command /usr/bin/perl -ne '
/^-----END PGP/ and $in_pgp = 0;
print unless $in_pgp;
/^-----BEGIN PGP/ and $in_pgp = 1;
'
++ test_cmp expected actual.clean
++ test 2 -ne 2
++ eval 'diff -u' '"$@"'
+++ diff -u expected actual.clean
error: last command exited with $?=1
not ok 35 - basic atom: refs/heads/main objectsize:disk
```

The whole build log can be found here[1].

I can easily patch these tests in Fedora to be compatible with zlib-ng
only by not comparing to $disklen, but a concrete value, however I
would like to have a universal solution that works with both zlib and
zlib-ng. So if anyone has an idea on how to do it, please let me know.
Thanks


[0]https://discussion.fedoraproject.org/t/f40-change-proposal-transitioning-to-zlib-ng-as-a-compatible-replacement-for-zlib-system-wide/95807
[1]https://download.copr.fedorainfracloud.org/results/tuliom/zlib-ng-compat-mpb/fedora-rawhide-x86_64/06729801-git/builder-live.log.gz

Cheers,
Ondřej Pohořelský

René Scharfe Dec. 12, 2023, 5:04 p.m. UTC | #1

Am 12.12.23 um 15:16 schrieb Ondrej Pohorelsky:
> Hi everyone,
>
> As some might have heard, there is a proposal for Fedora 40 to
> transition from zlib to zlib-ng[0]. Because of this, there has been a
> rebuild of all packages to ensure every package works under zlib-ng.
>
> Git test suit has three breakages in t6300-for-each-ref.sh.
> To be precise, it is:
>
> not ok 35 - basic atom: refs/heads/main objectsize:disk
> not ok 107 - basic atom: refs/tags/testtag objectsize:disk
> not ok 108 - basic atom: refs/tags/testtag *objectsize:disk
>
>
> All of these tests are atomic, and they compare the result against
> $disklen.

Why do these three objects (HEAD commit of main, testtag and testtag
target) have the same size?  Half of the answer is that testtag points
to the HEAD of main.  But the other half is pure coincidence as far as I
can see.

> I can easily patch these tests in Fedora to be compatible with zlib-ng
> only by not comparing to $disklen, but a concrete value, however I
> would like to have a universal solution that works with both zlib and
> zlib-ng. So if anyone has an idea on how to do it, please let me know.

The test stores the expected values at the top, in the following lines,
for the two possible repository formats:

	test_expect_success setup '
		test_oid_cache <<-EOF &&
		disklen sha1:138
		disklen sha256:154
		EOF

So it's using hard-coded values already, which breaks when the
compression rate changes.

We could set core.compression to 0 to take compression out of the
picture.

Or we could get the sizes of the objects by checking their files,
which would not require  hard-coding anymore.  Patch below.

--- >8 ---
Subject: [PATCH] t6300: avoid hard-coding object sizes

f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
hard-coded the expected object sizes.  Coincidentally the size of commit
and tag is the same with zlib at the default compression level.

1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
encoded the sizes as a single value, which coincidentally also works
with sha256.

Different compression libraries like zlib-ng may arrive at different
values.  Get them from the file system instead of hard-coding them to
make switching the compression library (or changing the compression
level) easier.

Reported-by: Ondrej Pohorelsky <opohorel@redhat.com>
Signed-off-by: René Scharfe <l.s.r@web.de>
---
 t/t6300-for-each-ref.sh | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
index 54e2281259..843a7fe143 100755
--- a/t/t6300-for-each-ref.sh
+++ b/t/t6300-for-each-ref.sh
@@ -20,12 +20,13 @@ setdate_and_increment () {
     export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
 }

-test_expect_success setup '
-	test_oid_cache <<-EOF &&
-	disklen sha1:138
-	disklen sha256:154
-	EOF
+test_object_file_size () {
+	oid=$(git rev-parse "$1")
+	path=".git/objects/$(test_oid_to_path $oid)"
+	test_file_size "$path"
+}

+test_expect_success setup '
 	# setup .mailmap
 	cat >.mailmap <<-EOF &&
 	A Thor <athor@example.com> A U Thor <author@example.com>
@@ -94,7 +95,6 @@ test_atom () {
 }

 hexlen=$(test_oid hexsz)
-disklen=$(test_oid disklen)

 test_atom head refname refs/heads/main
 test_atom head refname: refs/heads/main
@@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
 test_atom head push:strip=-1 main
 test_atom head objecttype commit
 test_atom head objectsize $((131 + hexlen))
-test_atom head objectsize:disk $disklen
+test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
 test_atom head deltabase $ZERO_OID
 test_atom head objectname $(git rev-parse refs/heads/main)
 test_atom head objectname:short $(git rev-parse --short refs/heads/main)
@@ -203,8 +203,8 @@ test_atom tag upstream ''
 test_atom tag push ''
 test_atom tag objecttype tag
 test_atom tag objectsize $((114 + hexlen))
-test_atom tag objectsize:disk $disklen
-test_atom tag '*objectsize:disk' $disklen
+test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
+test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
 test_atom tag deltabase $ZERO_OID
 test_atom tag '*deltabase' $ZERO_OID
 test_atom tag objectname $(git rev-parse refs/tags/testtag)
--
2.43.0

Jeff King Dec. 12, 2023, 8:01 p.m. UTC | #2

On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:

> Subject: [PATCH] t6300: avoid hard-coding object sizes
> 
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
> 
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
> 
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.

Yeah, this is definitely the right solution here. I'm surprised the
hard-coded values didn't cause problems before now. ;)

The patch looks good to me, but a few small comments:

> +test_object_file_size () {
> +	oid=$(git rev-parse "$1")
> +	path=".git/objects/$(test_oid_to_path $oid)"
> +	test_file_size "$path"
> +}

Here we're assuming the objects are loose. I think that's probably OK
(and certainly the test will notice if that changes).

We're covering the formatting code paths along with the underlying
implementation that fills in object_info->disk_sizep for loose objects.
Which I think is plenty for this particular script, which is about
for-each-ref.

It would be nice to have coverage of the packed_object_info() code path,
though. Back when it was added in a4ac106178 (cat-file: add
%(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
writing:

  This patch does not include any tests, as the exact numbers
  returned are volatile and subject to zlib and packing
  decisions. We cannot even reliably guarantee that the
  on-disk size is smaller than the object content (though in
  general this should be the case for non-trivial objects).

I don't think it's that big a deal, but I guess we could do something
like:

  prev=
  git show-index <$pack_idx |
  sort -n |
  grep -A1 $oid |
  while read ofs oid csum
  do
    test -n "$prev" && echo "$((ofs - prev))"
    prev=$ofs
  done

It feels a little redundant with what Git is doing under the hood, but
at least is exercising the code (and we're using the idx directly, so
we're confirming that the revindex is right).

Anyway, that is all way beyond the scope of your patch, but I wonder if
it's worth doing on top.

> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>  test_atom head push:strip=-1 main
>  test_atom head objecttype commit
>  test_atom head objectsize $((131 + hexlen))
> -test_atom head objectsize:disk $disklen
> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)

These test_object_file_size calls are happening outside of any
test_expect_* block, so we'd miss failing exit codes (and also the
helper is not &&-chained), and any stderr would leak to the output.
That's probably OK in practice, though (if something goes wrong then the
expected value output will be bogus and the test itself will fail).

-Peff

brian m. carlson Dec. 12, 2023, 10:18 p.m. UTC | #3

On 2023-12-12 at 17:04:55, René Scharfe wrote:
> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
> 
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
> 
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
> 
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.

This definitely seems like the right thing to do.  I was a bit lazy at
the time and probably could have improved it then, but it's at least
good that we're doing it now.

Junio C Hamano Dec. 12, 2023, 10:30 p.m. UTC | #4

René Scharfe <l.s.r@web.de> writes:

> Or we could get the sizes of the objects by checking their files,
> which would not require  hard-coding anymore.  Patch below.

That was my first reaction to seeing the original report.  It is a
bit surprising that the necessary fix is so small and makes me wonder
why we initially did the hardcoded values, which feels more work to
initially write the test vector.

Looking good.  Thanks.

> --- >8 ---
> Subject: [PATCH] t6300: avoid hard-coding object sizes
>
> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
> hard-coded the expected object sizes.  Coincidentally the size of commit
> and tag is the same with zlib at the default compression level.
>
> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
> encoded the sizes as a single value, which coincidentally also works
> with sha256.
>
> Different compression libraries like zlib-ng may arrive at different
> values.  Get them from the file system instead of hard-coding them to
> make switching the compression library (or changing the compression
> level) easier.
>
> Reported-by: Ondrej Pohorelsky <opohorel@redhat.com>
> Signed-off-by: René Scharfe <l.s.r@web.de>
> ---
>  t/t6300-for-each-ref.sh | 18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh
> index 54e2281259..843a7fe143 100755
> --- a/t/t6300-for-each-ref.sh
> +++ b/t/t6300-for-each-ref.sh
> @@ -20,12 +20,13 @@ setdate_and_increment () {
>      export GIT_COMMITTER_DATE GIT_AUTHOR_DATE
>  }
>
> -test_expect_success setup '
> -	test_oid_cache <<-EOF &&
> -	disklen sha1:138
> -	disklen sha256:154
> -	EOF
> +test_object_file_size () {
> +	oid=$(git rev-parse "$1")
> +	path=".git/objects/$(test_oid_to_path $oid)"
> +	test_file_size "$path"
> +}
>
> +test_expect_success setup '
>  	# setup .mailmap
>  	cat >.mailmap <<-EOF &&
>  	A Thor <athor@example.com> A U Thor <author@example.com>
> @@ -94,7 +95,6 @@ test_atom () {
>  }
>
>  hexlen=$(test_oid hexsz)
> -disklen=$(test_oid disklen)
>
>  test_atom head refname refs/heads/main
>  test_atom head refname: refs/heads/main
> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>  test_atom head push:strip=-1 main
>  test_atom head objecttype commit
>  test_atom head objectsize $((131 + hexlen))
> -test_atom head objectsize:disk $disklen
> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>  test_atom head deltabase $ZERO_OID
>  test_atom head objectname $(git rev-parse refs/heads/main)
>  test_atom head objectname:short $(git rev-parse --short refs/heads/main)
> @@ -203,8 +203,8 @@ test_atom tag upstream ''
>  test_atom tag push ''
>  test_atom tag objecttype tag
>  test_atom tag objectsize $((114 + hexlen))
> -test_atom tag objectsize:disk $disklen
> -test_atom tag '*objectsize:disk' $disklen
> +test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag)
> +test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main)
>  test_atom tag deltabase $ZERO_OID
>  test_atom tag '*deltabase' $ZERO_OID
>  test_atom tag objectname $(git rev-parse refs/tags/testtag)
> --
> 2.43.0

René Scharfe Dec. 12, 2023, 10:54 p.m. UTC | #5

Am 12.12.23 um 21:01 schrieb Jeff King:
> On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote:
>
>> Subject: [PATCH] t6300: avoid hard-coding object sizes
>>
>> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24)
>> hard-coded the expected object sizes.  Coincidentally the size of commit
>> and tag is the same with zlib at the default compression level.
>>
>> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22)
>> encoded the sizes as a single value, which coincidentally also works
>> with sha256.
>>
>> Different compression libraries like zlib-ng may arrive at different
>> values.  Get them from the file system instead of hard-coding them to
>> make switching the compression library (or changing the compression
>> level) easier.
>
> Yeah, this is definitely the right solution here. I'm surprised the
> hard-coded values didn't cause problems before now. ;)
>
> The patch looks good to me, but a few small comments:
>
>> +test_object_file_size () {
>> +	oid=$(git rev-parse "$1")
>> +	path=".git/objects/$(test_oid_to_path $oid)"
>> +	test_file_size "$path"
>> +}
>
> Here we're assuming the objects are loose. I think that's probably OK
> (and certainly the test will notice if that changes).
>
> We're covering the formatting code paths along with the underlying
> implementation that fills in object_info->disk_sizep for loose objects.
> Which I think is plenty for this particular script, which is about
> for-each-ref.
>
> It would be nice to have coverage of the packed_object_info() code path,
> though. Back when it was added in a4ac106178 (cat-file: add
> %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this,
> writing:
>
>   This patch does not include any tests, as the exact numbers
>   returned are volatile and subject to zlib and packing
>   decisions. We cannot even reliably guarantee that the
>   on-disk size is smaller than the object content (though in
>   general this should be the case for non-trivial objects).
>
> I don't think it's that big a deal, but I guess we could do something
> like:
>
>   prev=
>   git show-index <$pack_idx |
>   sort -n |
>   grep -A1 $oid |
>   while read ofs oid csum
>   do
>     test -n "$prev" && echo "$((ofs - prev))"
>     prev=$ofs
>   done
>
> It feels a little redundant with what Git is doing under the hood, but
> at least is exercising the code (and we're using the idx directly, so
> we're confirming that the revindex is right).

A generic object size function based on both methods could live in the
test lib and be used for e.g. cat-file tests as well.  Getting such a
function polished and library-worthy is probably more work than I
naively imagine, however -- due to our shunning of pipes alone.

> Anyway, that is all way beyond the scope of your patch, but I wonder if
> it's worth doing on top.
>
>> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main
>>  test_atom head push:strip=-1 main
>>  test_atom head objecttype commit
>>  test_atom head objectsize $((131 + hexlen))
>> -test_atom head objectsize:disk $disklen
>> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main)
>
> These test_object_file_size calls are happening outside of any
> test_expect_* block, so we'd miss failing exit codes (and also the
> helper is not &&-chained), and any stderr would leak to the output.
> That's probably OK in practice, though (if something goes wrong then the
> expected value output will be bogus and the test itself will fail).

Right.  We could also set variables during setup, though, to put
readers' minds at rest.

René

Test breakage with zlib-ng

Commit Message

Comments

Patch