diff mbox series

[v2,1/2] t7700: check post-condition in kept-pack test

Message ID f2f8d12929bcbd630b2de3ce770a6763989ffcff.1648146897.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series test-lib-functions: fix test_subcommand_inexact | expand

Commit Message

Derrick Stolee March 24, 2022, 6:34 p.m. UTC
From: Derrick Stolee <derrickstolee@github.com>

The '--write-midx -b packs non-kept objects' test in t7700-repack.sh
uses test_subcommand_inexact to check that 'git repack' properly adds
the '--honor-pack-keep' flag to the 'git pack-objects' subcommand.
However, the test_subcommand_inexact helper is more flexible than
initially designed, and this instance is the only one that makes use of
it: there are additional arguments between 'git pack-objects' and the
'--honor-pack-keep' flag. In order to make test_subcommand_inexact more
strict, we need to fix this instance.

This test checks that 'git repack --write-midx -a -b -d' will create a
new pack-file that does not contain the objects within the kept pack.
This behavior is possible because of the multi-pack-index bitmap that
will bitmap objects against multiple packs. Without --write-midx, the
objects in the kept pack would be duplicated so the resulting pack is
closed under reachability and bitmaps can be created against it. This is
discussed in more detail in e4d0c11c0 (repack: respect kept objects with
'--write-midx -b', 2021-12-20) which also introduced this instance of
test_subcommand_inexact.

To better verify the intended post-conditions while also removing this
instance of test_subcommand_inexact, rewrite the test to check the list
of packed objects in the kept pack and the list of the objects in the
newly-repacked pack-file _other_ than the kept pack. These lists should
be disjoint.

Be sure to include a non-kept pack-file and loose objects to be extra
careful that this is properly behaving with kept packs and not just
avoiding repacking all pack-files.

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Derrick Stolee <derrickstolee@github.com>
---
 t/t7700-repack.sh | 52 ++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 49 insertions(+), 3 deletions(-)

Comments

Taylor Blau March 24, 2022, 6:58 p.m. UTC | #1
On Thu, Mar 24, 2022 at 06:34:56PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <derrickstolee@github.com>
>
> The '--write-midx -b packs non-kept objects' test in t7700-repack.sh
> uses test_subcommand_inexact to check that 'git repack' properly adds
> the '--honor-pack-keep' flag to the 'git pack-objects' subcommand.
> However, the test_subcommand_inexact helper is more flexible than
> initially designed, and this instance is the only one that makes use of
> it: there are additional arguments between 'git pack-objects' and the
> '--honor-pack-keep' flag. In order to make test_subcommand_inexact more
> strict, we need to fix this instance.
>
> This test checks that 'git repack --write-midx -a -b -d' will create a
> new pack-file that does not contain the objects within the kept pack.
> This behavior is possible because of the multi-pack-index bitmap that
> will bitmap objects against multiple packs. Without --write-midx, the
> objects in the kept pack would be duplicated so the resulting pack is
> closed under reachability and bitmaps can be created against it. This is
> discussed in more detail in e4d0c11c0 (repack: respect kept objects with
> '--write-midx -b', 2021-12-20) which also introduced this instance of
> test_subcommand_inexact.
>
> To better verify the intended post-conditions while also removing this
> instance of test_subcommand_inexact, rewrite the test to check the list
> of packed objects in the kept pack and the list of the objects in the
> newly-repacked pack-file _other_ than the kept pack. These lists should
> be disjoint.
>
> Be sure to include a non-kept pack-file and loose objects to be extra
> careful that this is properly behaving with kept packs and not just
> avoiding repacking all pack-files.

Nicely put.

> Co-authored-by: Taylor Blau <me@ttaylorr.com>
> Signed-off-by: Taylor Blau <me@ttaylorr.com>

And to be clear, I am totally OK with this usage of my signed-off-by.

> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
> index 770d1432046..73452e23896 100755
> --- a/t/t7700-repack.sh
> +++ b/t/t7700-repack.sh
> @@ -369,10 +369,56 @@ test_expect_success '--write-midx with preferred bitmap tips' '
>  	)
>  '
>
> +get_sorted_objects_from_packs () {
> +	git show-index <$(cat) >raw &&

It seems a little odd to me to pass the name of a single file as input
to get_sorted_objects_from_packs over stdin. I probably would have
expected something like `git show-index <"$1" >raw && ...` instead.

We may also want to s/packs/pack, since this function only will handle
one index at a time.

> +	cut -d" " -f2 raw | sort

Having the sort in there is my fault, but after reading this more
carefully it's definitely unnecessary, since show-index will give us
the results in lexical order by object name already.

> +}
> +
>  test_expect_success '--write-midx -b packs non-kept objects' '
> -	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
> -		git repack --write-midx -a -b &&
> -	test_subcommand_inexact git pack-objects --honor-pack-keep <trace.txt
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		# Create a kept pack-file
> +		test_commit base &&
> +		git repack -ad &&
> +		find $objdir/pack -name "*.idx" >before &&

I thought that here it might be easier to say:

    before="$(find $objdir/pack -name "*.idx")"

> +		>$objdir/pack/$(basename $(cat before) .idx).keep &&

...and then replace "$(cat before)" with "$before", along with the
other uses of the before file below. But it gets a little funny when
you want to discover which is the new pack, where it is more natural to
dump the output of comm into a file.

> +		# Get object list from the one non-kept pack-file
> +		comm -13 before after >new-pack &&

You could write "new_pack=$(comm -13 before after)", but debugging this
test would be difficult if the output of comm there contained more than
one line.

> +		get_sorted_objects_from_packs \
> +			<new-pack \

Though we probably want to check that we only get one line anyway here,
since get_sorted_objects_from_packs will barf if we had more than one
line in file new-pack here, too.

Thanks,
Taylor
Derrick Stolee March 25, 2022, 1:55 p.m. UTC | #2
On 3/24/2022 2:58 PM, Taylor Blau wrote:
> On Thu, Mar 24, 2022 at 06:34:56PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <derrickstolee@github.com>

>> diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
>> index 770d1432046..73452e23896 100755
>> --- a/t/t7700-repack.sh
>> +++ b/t/t7700-repack.sh
>> @@ -369,10 +369,56 @@ test_expect_success '--write-midx with preferred bitmap tips' '
>>  	)
>>  '
>>
>> +get_sorted_objects_from_packs () {
>> +	git show-index <$(cat) >raw &&
> 
> It seems a little odd to me to pass the name of a single file as input
> to get_sorted_objects_from_packs over stdin. I probably would have
> expected something like `git show-index <"$1" >raw && ...` instead.

Based on the way we are creating a file whose contents is the name
of the .idx file, we would at least use '$(cat "$1")'. I kind of like
the symmetry of the input/output redirection when using the helper, but
I can easily change this.

> We may also want to s/packs/pack, since this function only will handle
> one index at a time.

Yes.

>> +	cut -d" " -f2 raw | sort
> 
> Having the sort in there is my fault, but after reading this more
> carefully it's definitely unnecessary, since show-index will give us
> the results in lexical order by object name already.

Cool. Will drop.

>> +}
>> +
>>  test_expect_success '--write-midx -b packs non-kept objects' '
>> -	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
>> -		git repack --write-midx -a -b &&
>> -	test_subcommand_inexact git pack-objects --honor-pack-keep <trace.txt
>> +	git init repo &&
>> +	test_when_finished "rm -fr repo" &&
>> +	(
>> +		cd repo &&
>> +
>> +		# Create a kept pack-file
>> +		test_commit base &&
>> +		git repack -ad &&
>> +		find $objdir/pack -name "*.idx" >before &&
> 
> I thought that here it might be easier to say:
> 
>     before="$(find $objdir/pack -name "*.idx")"
> 
>> +		>$objdir/pack/$(basename $(cat before) .idx).keep &&
> 
> ...and then replace "$(cat before)" with "$before", along with the
> other uses of the before file below. But it gets a little funny when
> you want to discover which is the new pack, where it is more natural to
> dump the output of comm into a file.

For this reason, I'll continue to store the .idx names in files.

>> +		# Get object list from the one non-kept pack-file
>> +		comm -13 before after >new-pack &&
> 
> You could write "new_pack=$(comm -13 before after)", but debugging this
> test would be difficult if the output of comm there contained more than
> one line.
>
>> +		get_sorted_objects_from_packs \
>> +			<new-pack \
> 
> Though we probably want to check that we only get one line anyway here,
> since get_sorted_objects_from_packs will barf if we had more than one
> line in file new-pack here, too.

Thanks. Easy to add a test_line_count before this check.

-Stolee
Junio C Hamano March 25, 2022, 5:07 p.m. UTC | #3
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:

> +get_sorted_objects_from_packs () {
> +	git show-index <$(cat) >raw &&
> +	cut -d" " -f2 raw | sort
> +}

As pointed out by Taylor, this "the standard input gives us the name
of a file to be read" looks strange.  It may work, and it may even
give the easiest interface to use by all the callers, but if we were
designing a more generic helper function suitable to be added to the
test-lib*.sh, we wouldn't design it like so---instead it would be
either "we read the contents of the .idx file from the standard
input" or "the first argument is the name of the .idx file".

>  test_expect_success '--write-midx -b packs non-kept objects' '
> +	git init repo &&
> +	test_when_finished "rm -fr repo" &&
> +	(
> +		cd repo &&
> +
> +		# Create a kept pack-file
> +		test_commit base &&
> +		git repack -ad &&
> +		find $objdir/pack -name "*.idx" >before &&
> +		>$objdir/pack/$(basename $(cat before) .idx).keep &&

We probably want to sanity check "repack -a" by insisting "before"
has found exactly one .idx file, before using it this way.

		test_line_count = 1 before &&
		before=$(cat before) &&
		>"${before%.idx}.keep"

> +		# Create a non-kept pack-file
> +		test_commit other &&
> +		git repack &&
> +
> +		# Create loose objects
> +		test_commit loose &&
> +
> +		# Repack everything
> +		git repack --write-midx -a -b -d &&
> +
> +		# There should be two pack-files now, the
> +		# old, kept pack and the new, non-kept pack.
> +		find $objdir/pack -name "*.idx" | sort >after &&
> +		test_line_count = 2 after &&

OK.  "after" gets sorted because we will pass it to "comm" later.

> +		find $objdir/pack -name "*.keep" >kept &&
> +		test_line_count = 1 kept &&

Since we've made sure "before" is a one-liner earlier, we could just
say

		test_cmp before kept &&

instead, no?

> +		# Get object list from the kept pack.
> +		get_sorted_objects_from_packs \
> +			<before \
> +			>old.objects &&
		
OK.

> +		# Get object list from the one non-kept pack-file
> +		comm -13 before after >new-pack &&

OK.  This should give only one line of output but that is merely
assumed.  We know after has 2 and before has 1, but haven't made
sure that before is a subset of after.

		test_line_count = 1 new-pack &&

> +		get_sorted_objects_from_packs \
> +			<new-pack \
> +			>new.objects &&

OK.

> +		# None of the objects in the new pack should
> +		# exist within the kept pack.
> +		comm -12 old.objects new.objects >shared.objects &&

Great.

> +		test_must_be_empty shared.objects
> +	)
>  '
>  
>  test_expect_success TTY '--quiet disables progress' '
Derrick Stolee March 25, 2022, 5:23 p.m. UTC | #4
On 3/25/2022 1:07 PM, Junio C Hamano wrote:
> "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes:
> 
>> +get_sorted_objects_from_packs () {
>> +	git show-index <$(cat) >raw &&
>> +	cut -d" " -f2 raw | sort
>> +}
> 
> As pointed out by Taylor, this "the standard input gives us the name
> of a file to be read" looks strange.  It may work, and it may even
> give the easiest interface to use by all the callers, but if we were
> designing a more generic helper function suitable to be added to the
> test-lib*.sh, we wouldn't design it like so---instead it would be
> either "we read the contents of the .idx file from the standard
> input" or "the first argument is the name of the .idx file".

Ok. Can do.
 
>>  test_expect_success '--write-midx -b packs non-kept objects' '
>> +	git init repo &&
>> +	test_when_finished "rm -fr repo" &&
>> +	(
>> +		cd repo &&
>> +
>> +		# Create a kept pack-file
>> +		test_commit base &&
>> +		git repack -ad &&
>> +		find $objdir/pack -name "*.idx" >before &&
>> +		>$objdir/pack/$(basename $(cat before) .idx).keep &&
> 
> We probably want to sanity check "repack -a" by insisting "before"
> has found exactly one .idx file, before using it this way.

> 		test_line_count = 1 before &&
> 		before=$(cat before) &&
> 		>"${before%.idx}.keep"

Good idea. This mixture of a file and variable sharing
a name is a bit muddy for me, though. Using "before_name"
as the variable would be enough of a differentiator.

>> +		find $objdir/pack -name "*.keep" >kept &&
>> +		test_line_count = 1 kept &&
> 
> Since we've made sure "before" is a one-liner earlier, we could just
> say
 
> 		test_cmp before kept &&
> 
> instead, no?

'before' contains a .idx name and 'kept' contains a .keep name,
so this direct comparison does not work. We could do that
additional check like this:

	kept_name=$(cat kept) &&
	echo ${kept_name%.keep}.idx >kept-idx &&
	test_cmp before kept-idx &&

Thanks for taking the time to clean this up, as it might become
a good example for these kinds of post-condition checks of the packs
directory in the future.

Thanks,
-Stolee
Taylor Blau March 25, 2022, 5:36 p.m. UTC | #5
On Fri, Mar 25, 2022 at 01:23:27PM -0400, Derrick Stolee wrote:
> > Since we've made sure "before" is a one-liner earlier, we could just
> > say
>
> > 		test_cmp before kept &&
> >
> > instead, no?
>
> 'before' contains a .idx name and 'kept' contains a .keep name,
> so this direct comparison does not work. We could do that
> additional check like this:
>
> 	kept_name=$(cat kept) &&
> 	echo ${kept_name%.keep}.idx >kept-idx &&
> 	test_cmp before kept-idx &&

I think keeping this kind of post-condition check pretty minimal is
favorable, since this is functionality of `git repack` (i.e., that `-a`
leaves you with one kept) that is already tested thoroughly elsewhere.

So I'd probably avoid checking the output altogether, but if we did want
to test it, I think something as quick and cheap as:

    test_line_count = 1 before

would do the trick.

Thanks,
Taylor
Junio C Hamano March 25, 2022, 6:22 p.m. UTC | #6
Taylor Blau <me@ttaylorr.com> writes:

> On Fri, Mar 25, 2022 at 01:23:27PM -0400, Derrick Stolee wrote:
>> > Since we've made sure "before" is a one-liner earlier, we could just
>> > say
>>
>> > 		test_cmp before kept &&
>> >
>> > instead, no?
>>
>> 'before' contains a .idx name and 'kept' contains a .keep name,
>> so this direct comparison does not work. We could do that
>> additional check like this:
>>
>> 	kept_name=$(cat kept) &&
>> 	echo ${kept_name%.keep}.idx >kept-idx &&
>> 	test_cmp before kept-idx &&
>
> I think keeping this kind of post-condition check pretty minimal is
> favorable, since this is functionality of `git repack` (i.e., that `-a`
> leaves you with one kept) that is already tested thoroughly elsewhere.

It is nice in principle, but for this particular script, whose
helper function's implementation relies on these "assumptions" to
work correctly (e.g. imagine what happens when 'before' file had 0
or 2 lines in it and you called the get_sorted_objects_from_packs
helper), we should be more defensive, and that is where all my
suggestions in the thread come from.
diff mbox series

Patch

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 770d1432046..73452e23896 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -369,10 +369,56 @@  test_expect_success '--write-midx with preferred bitmap tips' '
 	)
 '
 
+get_sorted_objects_from_packs () {
+	git show-index <$(cat) >raw &&
+	cut -d" " -f2 raw | sort
+}
+
 test_expect_success '--write-midx -b packs non-kept objects' '
-	GIT_TRACE2_EVENT="$(pwd)/trace.txt" \
-		git repack --write-midx -a -b &&
-	test_subcommand_inexact git pack-objects --honor-pack-keep <trace.txt
+	git init repo &&
+	test_when_finished "rm -fr repo" &&
+	(
+		cd repo &&
+
+		# Create a kept pack-file
+		test_commit base &&
+		git repack -ad &&
+		find $objdir/pack -name "*.idx" >before &&
+		>$objdir/pack/$(basename $(cat before) .idx).keep &&
+
+		# Create a non-kept pack-file
+		test_commit other &&
+		git repack &&
+
+		# Create loose objects
+		test_commit loose &&
+
+		# Repack everything
+		git repack --write-midx -a -b -d &&
+
+		# There should be two pack-files now, the
+		# old, kept pack and the new, non-kept pack.
+		find $objdir/pack -name "*.idx" | sort >after &&
+		test_line_count = 2 after &&
+		find $objdir/pack -name "*.keep" >kept &&
+		test_line_count = 1 kept &&
+
+		# Get object list from the kept pack.
+		get_sorted_objects_from_packs \
+			<before \
+			>old.objects &&
+
+		# Get object list from the one non-kept pack-file
+		comm -13 before after >new-pack &&
+		get_sorted_objects_from_packs \
+			<new-pack \
+			>new.objects &&
+
+		# None of the objects in the new pack should
+		# exist within the kept pack.
+		comm -12 old.objects new.objects >shared.objects &&
+		test_must_be_empty shared.objects
+	)
 '
 
 test_expect_success TTY '--quiet disables progress' '