Message ID | CA+B51BEpSh1wT627Efpysw3evVocpiDCoQ3Xaza6jKE3B62yig@mail.gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Test breakage with zlib-ng | expand |
Am 12.12.23 um 15:16 schrieb Ondrej Pohorelsky: > Hi everyone, > > As some might have heard, there is a proposal for Fedora 40 to > transition from zlib to zlib-ng[0]. Because of this, there has been a > rebuild of all packages to ensure every package works under zlib-ng. > > Git test suit has three breakages in t6300-for-each-ref.sh. > To be precise, it is: > > not ok 35 - basic atom: refs/heads/main objectsize:disk > not ok 107 - basic atom: refs/tags/testtag objectsize:disk > not ok 108 - basic atom: refs/tags/testtag *objectsize:disk > > > All of these tests are atomic, and they compare the result against > $disklen. Why do these three objects (HEAD commit of main, testtag and testtag target) have the same size? Half of the answer is that testtag points to the HEAD of main. But the other half is pure coincidence as far as I can see. > I can easily patch these tests in Fedora to be compatible with zlib-ng > only by not comparing to $disklen, but a concrete value, however I > would like to have a universal solution that works with both zlib and > zlib-ng. So if anyone has an idea on how to do it, please let me know. The test stores the expected values at the top, in the following lines, for the two possible repository formats: test_expect_success setup ' test_oid_cache <<-EOF && disklen sha1:138 disklen sha256:154 EOF So it's using hard-coded values already, which breaks when the compression rate changes. We could set core.compression to 0 to take compression out of the picture. Or we could get the sizes of the objects by checking their files, which would not require hard-coding anymore. Patch below. --- >8 --- Subject: [PATCH] t6300: avoid hard-coding object sizes f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) hard-coded the expected object sizes. Coincidentally the size of commit and tag is the same with zlib at the default compression level. 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) encoded the sizes as a single value, which coincidentally also works with sha256. Different compression libraries like zlib-ng may arrive at different values. Get them from the file system instead of hard-coding them to make switching the compression library (or changing the compression level) easier. Reported-by: Ondrej Pohorelsky <opohorel@redhat.com> Signed-off-by: René Scharfe <l.s.r@web.de> --- t/t6300-for-each-ref.sh | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index 54e2281259..843a7fe143 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -20,12 +20,13 @@ setdate_and_increment () { export GIT_COMMITTER_DATE GIT_AUTHOR_DATE } -test_expect_success setup ' - test_oid_cache <<-EOF && - disklen sha1:138 - disklen sha256:154 - EOF +test_object_file_size () { + oid=$(git rev-parse "$1") + path=".git/objects/$(test_oid_to_path $oid)" + test_file_size "$path" +} +test_expect_success setup ' # setup .mailmap cat >.mailmap <<-EOF && A Thor <athor@example.com> A U Thor <author@example.com> @@ -94,7 +95,6 @@ test_atom () { } hexlen=$(test_oid hexsz) -disklen=$(test_oid disklen) test_atom head refname refs/heads/main test_atom head refname: refs/heads/main @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main test_atom head push:strip=-1 main test_atom head objecttype commit test_atom head objectsize $((131 + hexlen)) -test_atom head objectsize:disk $disklen +test_atom head objectsize:disk $(test_object_file_size refs/heads/main) test_atom head deltabase $ZERO_OID test_atom head objectname $(git rev-parse refs/heads/main) test_atom head objectname:short $(git rev-parse --short refs/heads/main) @@ -203,8 +203,8 @@ test_atom tag upstream '' test_atom tag push '' test_atom tag objecttype tag test_atom tag objectsize $((114 + hexlen)) -test_atom tag objectsize:disk $disklen -test_atom tag '*objectsize:disk' $disklen +test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag) +test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main) test_atom tag deltabase $ZERO_OID test_atom tag '*deltabase' $ZERO_OID test_atom tag objectname $(git rev-parse refs/tags/testtag) -- 2.43.0
On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote: > Subject: [PATCH] t6300: avoid hard-coding object sizes > > f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) > hard-coded the expected object sizes. Coincidentally the size of commit > and tag is the same with zlib at the default compression level. > > 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) > encoded the sizes as a single value, which coincidentally also works > with sha256. > > Different compression libraries like zlib-ng may arrive at different > values. Get them from the file system instead of hard-coding them to > make switching the compression library (or changing the compression > level) easier. Yeah, this is definitely the right solution here. I'm surprised the hard-coded values didn't cause problems before now. ;) The patch looks good to me, but a few small comments: > +test_object_file_size () { > + oid=$(git rev-parse "$1") > + path=".git/objects/$(test_oid_to_path $oid)" > + test_file_size "$path" > +} Here we're assuming the objects are loose. I think that's probably OK (and certainly the test will notice if that changes). We're covering the formatting code paths along with the underlying implementation that fills in object_info->disk_sizep for loose objects. Which I think is plenty for this particular script, which is about for-each-ref. It would be nice to have coverage of the packed_object_info() code path, though. Back when it was added in a4ac106178 (cat-file: add %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this, writing: This patch does not include any tests, as the exact numbers returned are volatile and subject to zlib and packing decisions. We cannot even reliably guarantee that the on-disk size is smaller than the object content (though in general this should be the case for non-trivial objects). I don't think it's that big a deal, but I guess we could do something like: prev= git show-index <$pack_idx | sort -n | grep -A1 $oid | while read ofs oid csum do test -n "$prev" && echo "$((ofs - prev))" prev=$ofs done It feels a little redundant with what Git is doing under the hood, but at least is exercising the code (and we're using the idx directly, so we're confirming that the revindex is right). Anyway, that is all way beyond the scope of your patch, but I wonder if it's worth doing on top. > @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main > test_atom head push:strip=-1 main > test_atom head objecttype commit > test_atom head objectsize $((131 + hexlen)) > -test_atom head objectsize:disk $disklen > +test_atom head objectsize:disk $(test_object_file_size refs/heads/main) These test_object_file_size calls are happening outside of any test_expect_* block, so we'd miss failing exit codes (and also the helper is not &&-chained), and any stderr would leak to the output. That's probably OK in practice, though (if something goes wrong then the expected value output will be bogus and the test itself will fail). -Peff
On 2023-12-12 at 17:04:55, René Scharfe wrote: > --- >8 --- > Subject: [PATCH] t6300: avoid hard-coding object sizes > > f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) > hard-coded the expected object sizes. Coincidentally the size of commit > and tag is the same with zlib at the default compression level. > > 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) > encoded the sizes as a single value, which coincidentally also works > with sha256. > > Different compression libraries like zlib-ng may arrive at different > values. Get them from the file system instead of hard-coding them to > make switching the compression library (or changing the compression > level) easier. This definitely seems like the right thing to do. I was a bit lazy at the time and probably could have improved it then, but it's at least good that we're doing it now.
René Scharfe <l.s.r@web.de> writes: > Or we could get the sizes of the objects by checking their files, > which would not require hard-coding anymore. Patch below. That was my first reaction to seeing the original report. It is a bit surprising that the necessary fix is so small and makes me wonder why we initially did the hardcoded values, which feels more work to initially write the test vector. Looking good. Thanks. > --- >8 --- > Subject: [PATCH] t6300: avoid hard-coding object sizes > > f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) > hard-coded the expected object sizes. Coincidentally the size of commit > and tag is the same with zlib at the default compression level. > > 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) > encoded the sizes as a single value, which coincidentally also works > with sha256. > > Different compression libraries like zlib-ng may arrive at different > values. Get them from the file system instead of hard-coding them to > make switching the compression library (or changing the compression > level) easier. > > Reported-by: Ondrej Pohorelsky <opohorel@redhat.com> > Signed-off-by: René Scharfe <l.s.r@web.de> > --- > t/t6300-for-each-ref.sh | 18 +++++++++--------- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh > index 54e2281259..843a7fe143 100755 > --- a/t/t6300-for-each-ref.sh > +++ b/t/t6300-for-each-ref.sh > @@ -20,12 +20,13 @@ setdate_and_increment () { > export GIT_COMMITTER_DATE GIT_AUTHOR_DATE > } > > -test_expect_success setup ' > - test_oid_cache <<-EOF && > - disklen sha1:138 > - disklen sha256:154 > - EOF > +test_object_file_size () { > + oid=$(git rev-parse "$1") > + path=".git/objects/$(test_oid_to_path $oid)" > + test_file_size "$path" > +} > > +test_expect_success setup ' > # setup .mailmap > cat >.mailmap <<-EOF && > A Thor <athor@example.com> A U Thor <author@example.com> > @@ -94,7 +95,6 @@ test_atom () { > } > > hexlen=$(test_oid hexsz) > -disklen=$(test_oid disklen) > > test_atom head refname refs/heads/main > test_atom head refname: refs/heads/main > @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main > test_atom head push:strip=-1 main > test_atom head objecttype commit > test_atom head objectsize $((131 + hexlen)) > -test_atom head objectsize:disk $disklen > +test_atom head objectsize:disk $(test_object_file_size refs/heads/main) > test_atom head deltabase $ZERO_OID > test_atom head objectname $(git rev-parse refs/heads/main) > test_atom head objectname:short $(git rev-parse --short refs/heads/main) > @@ -203,8 +203,8 @@ test_atom tag upstream '' > test_atom tag push '' > test_atom tag objecttype tag > test_atom tag objectsize $((114 + hexlen)) > -test_atom tag objectsize:disk $disklen > -test_atom tag '*objectsize:disk' $disklen > +test_atom tag objectsize:disk $(test_object_file_size refs/tags/testtag) > +test_atom tag '*objectsize:disk' $(test_object_file_size refs/heads/main) > test_atom tag deltabase $ZERO_OID > test_atom tag '*deltabase' $ZERO_OID > test_atom tag objectname $(git rev-parse refs/tags/testtag) > -- > 2.43.0
Am 12.12.23 um 21:01 schrieb Jeff King: > On Tue, Dec 12, 2023 at 06:04:55PM +0100, René Scharfe wrote: > >> Subject: [PATCH] t6300: avoid hard-coding object sizes >> >> f4ee22b526 (ref-filter: add tests for objectsize:disk, 2018-12-24) >> hard-coded the expected object sizes. Coincidentally the size of commit >> and tag is the same with zlib at the default compression level. >> >> 1f5f8f3e85 (t6300: abstract away SHA-1-specific constants, 2020-02-22) >> encoded the sizes as a single value, which coincidentally also works >> with sha256. >> >> Different compression libraries like zlib-ng may arrive at different >> values. Get them from the file system instead of hard-coding them to >> make switching the compression library (or changing the compression >> level) easier. > > Yeah, this is definitely the right solution here. I'm surprised the > hard-coded values didn't cause problems before now. ;) > > The patch looks good to me, but a few small comments: > >> +test_object_file_size () { >> + oid=$(git rev-parse "$1") >> + path=".git/objects/$(test_oid_to_path $oid)" >> + test_file_size "$path" >> +} > > Here we're assuming the objects are loose. I think that's probably OK > (and certainly the test will notice if that changes). > > We're covering the formatting code paths along with the underlying > implementation that fills in object_info->disk_sizep for loose objects. > Which I think is plenty for this particular script, which is about > for-each-ref. > > It would be nice to have coverage of the packed_object_info() code path, > though. Back when it was added in a4ac106178 (cat-file: add > %(objectsize:disk) format atom, 2013-07-10), I cowardly punted on this, > writing: > > This patch does not include any tests, as the exact numbers > returned are volatile and subject to zlib and packing > decisions. We cannot even reliably guarantee that the > on-disk size is smaller than the object content (though in > general this should be the case for non-trivial objects). > > I don't think it's that big a deal, but I guess we could do something > like: > > prev= > git show-index <$pack_idx | > sort -n | > grep -A1 $oid | > while read ofs oid csum > do > test -n "$prev" && echo "$((ofs - prev))" > prev=$ofs > done > > It feels a little redundant with what Git is doing under the hood, but > at least is exercising the code (and we're using the idx directly, so > we're confirming that the revindex is right). A generic object size function based on both methods could live in the test lib and be used for e.g. cat-file tests as well. Getting such a function polished and library-worthy is probably more work than I naively imagine, however -- due to our shunning of pipes alone. > Anyway, that is all way beyond the scope of your patch, but I wonder if > it's worth doing on top. > >> @@ -129,7 +129,7 @@ test_atom head push:strip=1 remotes/myfork/main >> test_atom head push:strip=-1 main >> test_atom head objecttype commit >> test_atom head objectsize $((131 + hexlen)) >> -test_atom head objectsize:disk $disklen >> +test_atom head objectsize:disk $(test_object_file_size refs/heads/main) > > These test_object_file_size calls are happening outside of any > test_expect_* block, so we'd miss failing exit codes (and also the > helper is not &&-chained), and any stderr would leak to the output. > That's probably OK in practice, though (if something goes wrong then the > expected value output will be bogus and the test itself will fail). Right. We could also set variables during setup, though, to put readers' minds at rest. René
--- expected 2023-12-06 21:06:07.808849497 +0000 +++ actual.clean 2023-12-06 21:06:07.812849541 +0000 @@ -1 +1 @@ -138 +137