Message ID | 7a69cf84ae5b92d99e5777d4600270712424c4d7.1731518931.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | pack-objects: more brown-paper-bag multi-pack reuse fixes | expand |
Taylor Blau <me@ttaylorr.com> writes: > +test_expect_failure 'duplicate objects with verbatim reuse' ' > + git init duplicate-objects-verbatim && > + ( > + cd duplicate-objects-verbatim && > + > + git config pack.allowPackReuse multi && > + > + test_commit_bulk 64 && > + > + # take the first object from the main pack... > + git show-index <$(ls $packdir/pack-*.idx) >obj.raw && > + sort -nk1 <obj.raw | head -n1 | cut -d" " -f2 >in && > + > + # ...and create a separate pack containing just that object > + p="$(git pack-objects $packdir/pack <in)" && > + git show-index <$packdir/pack-$p.idx && Is this done so that "git show-index" fails when the .idx file fed is malformed? Or is it a leftover debugging aid, where a human developer was helped by eyeballing the contents of the .idx file in human readable form? If the latter, do we perhaps want to "parse" the output the same way in this test to validate our expectation? > + git multi-pack-index write --bitmap --preferred-pack=pack-$p.idx && > + > + test_pack_objects_reused_all 192 2 > + ) > +' > + > test_done
On Thu, Nov 14, 2024 at 10:12:15AM +0900, Junio C Hamano wrote: > Taylor Blau <me@ttaylorr.com> writes: > > > +test_expect_failure 'duplicate objects with verbatim reuse' ' > > + git init duplicate-objects-verbatim && > > + ( > > + cd duplicate-objects-verbatim && > > + > > + git config pack.allowPackReuse multi && > > + > > + test_commit_bulk 64 && > > + > > + # take the first object from the main pack... > > + git show-index <$(ls $packdir/pack-*.idx) >obj.raw && > > + sort -nk1 <obj.raw | head -n1 | cut -d" " -f2 >in && > > + > > + # ...and create a separate pack containing just that object > > + p="$(git pack-objects $packdir/pack <in)" && > > + git show-index <$packdir/pack-$p.idx && > > Is this done so that "git show-index" fails when the .idx file fed > is malformed? Or is it a leftover debugging aid, where a human > developer was helped by eyeballing the contents of the .idx file in > human readable form? If the latter, do we perhaps want to "parse" > the output the same way in this test to validate our expectation? Oops. This is stray debugging left over that I forgot to take out before committing. That makes it the latter of the two you mentioned, but I don't think we need to validate the output of 'git show-index' in this instance. We're just relying on pack-objects to generate a pack containing a single object, which feels like basic functionality not worth explicitly making an assertion on. There is a subtle assertion on the line below here: > > + git multi-pack-index write --bitmap --preferred-pack=pack-$p.idx && that the pack (a) exists, and (b) has at least one object, since both conditions must be met for a pack to become preferred in a MIDX bitmap. But beyond that, I don't think we need to validate the output of show-index here. I'll remove the stray debugging line and send a new round. Sorry about that, and thanks for spotting! Thanks, Taylor
diff --git a/t/t5332-multi-pack-reuse.sh b/t/t5332-multi-pack-reuse.sh index 955ea42769b..8f403d9fdaa 100755 --- a/t/t5332-multi-pack-reuse.sh +++ b/t/t5332-multi-pack-reuse.sh @@ -259,4 +259,27 @@ test_expect_success 'duplicate objects' ' ) ' +test_expect_failure 'duplicate objects with verbatim reuse' ' + git init duplicate-objects-verbatim && + ( + cd duplicate-objects-verbatim && + + git config pack.allowPackReuse multi && + + test_commit_bulk 64 && + + # take the first object from the main pack... + git show-index <$(ls $packdir/pack-*.idx) >obj.raw && + sort -nk1 <obj.raw | head -n1 | cut -d" " -f2 >in && + + # ...and create a separate pack containing just that object + p="$(git pack-objects $packdir/pack <in)" && + git show-index <$packdir/pack-$p.idx && + + git multi-pack-index write --bitmap --preferred-pack=pack-$p.idx && + + test_pack_objects_reused_all 192 2 + ) +' + test_done
In the multi-pack reuse code, there are two paths for reusing the on-disk representation of an object, handled by: - builtin/pack-objects.c::write_reused_pack_one() - builtin/pack-objects.c::write_reused_pack_verbatim() The former is responsible for copying the bytes for a single object out of an existing source pack. The latter does the same but for a region of objects aligned at eword_t boundaries. Demonstrate a bug whereby write_reused_pack_verbatim() can be tricked into writing out objects from some source pack, even when those objects were selected from a different source pack in the MIDX bitmap. When the caller wants at least one of the objects in that region, pack-objects will write the same object twice as a result of this bug. In the other case where the caller doesn't want any of the objects in the region of interest, we will write out objects that weren't requested. Demonstrate this bug by creating two packs, where the preferred one of those packs contains a single object which also appears in the main (non-preferred) pack. A separate bug[^1] prevents us from triggering the main bug when the duplicated object is the last one in the main pack, but any earlier object will suffice. We could fix that separate bug, but the following commit will simplify write_reused_pack_verbatim() and only call it on the preferred pack, so doing so would have little point. [^1]: Because write_reused_pack_verbatim() only reuses bits in the range off_t pack_start_off = pack_pos_to_offset(reuse_packfile->p, 0); off_t pack_end_off = pack_pos_to_offset(reuse_packfile->p, pos - reuse_packfile->bitmap_pos); written += pos - reuse_packfile->bitmap_pos; /* We're recording one chunk, not one object. */ record_reused_object(pack_start_off, pack_start_off - (hashfile_total(out) - pack_start)); , or in other words excluding the object beginning at position 'pos - reuse_packfile->bitmap_pos' in the source pack. But since reuse_packfile->bitmap_pos is '1' in the non-preferred pack (accounting for the single-object pack which is preferred), we don't actually copy the bytes from the last object. Helped-by: Jeff King <peff@peff.net> Signed-off-by: Taylor Blau <me@ttaylorr.com> --- t/t5332-multi-pack-reuse.sh | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+)