diff mbox series

[v2,16/19] parallel-checkout: add tests for basic operations

Message ID 64b41d537e68a45f2bb0a0c3078f2cd314b5a57d.1600814153.git.matheus.bernardino@usp.br (mailing list archive)
State Accepted
Commit 682569c6fd08b8f3f2edef85e0424a0997a70c68
Headers show
Series Parallel Checkout (part I) | expand

Commit Message

Matheus Tavares Sept. 22, 2020, 10:49 p.m. UTC
Add tests to populate the working tree during clone and checkout using
the sequential and parallel modes, to confirm that they produce
identical results. Also test basic checkout mechanics, such as checking
for symlinks in the leading directories and the abidance to --force.

Note: some helper functions are added to a common lib file which is only
included by t2080 for now. But it will also be used by another
parallel-checkout test in a following patch.

Original-patch-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Jeff Hostetler <jeffhost@microsoft.com>
Signed-off-by: Matheus Tavares <matheus.bernardino@usp.br>
---
 t/lib-parallel-checkout.sh          |  39 ++++++
 t/t2080-parallel-checkout-basics.sh | 197 ++++++++++++++++++++++++++++
 2 files changed, 236 insertions(+)
 create mode 100644 t/lib-parallel-checkout.sh
 create mode 100755 t/t2080-parallel-checkout-basics.sh

Comments

Jonathan Nieder Oct. 20, 2020, 1:35 a.m. UTC | #1
Hi,

Matheus Tavares wrote:

> Add tests to populate the working tree during clone and checkout using
> the sequential and parallel modes, to confirm that they produce
> identical results. Also test basic checkout mechanics, such as checking
> for symlinks in the leading directories and the abidance to --force.

Thanks for implementing parallel checkout!  I'm excited about the
feature.  And thanks for including these tests.

[...]
> --- /dev/null
> +++ b/t/lib-parallel-checkout.sh
> @@ -0,0 +1,39 @@
[...]
> +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
> +# and checks that the number of workers spawned is equal to $3.
> +git_pc()

nit: what does git_pc mean?  Can this spell it out more verbosely, or
could callers take on more of the burden?  (Perhaps it would make sense
to use a helper that uses test_config to set the relevant configuration,
and then the caller can use plain "git clone"?)

[...]
> +	GIT_TRACE2="$(pwd)/trace" git \
> +		-c checkout.workers=$workers \
> +		-c checkout.thresholdForParallelism=$threshold \
> +		-c advice.detachedHead=0 \
> +		$@ &&

$@ needs to be quoted, or else it will act like $* (and in particular it
won't handle parameters with embedded spaces).

> +
> +	# Check that the expected number of workers has been used. Note that it
> +	# can be different than the requested number in two cases: when the
> +	# quantity of entries to be checked out is less than the number of
> +	# workers; and when the threshold has not been reached.
> +	#
> +	local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&

Do we use grep's \+ operator in other tests?  I thought we preferred to
use the more portable *, but it may be that I'm out of date.

[...]
> +# Verify that both the working tree and the index were created correctly
> +verify_checkout()
> +{
> +	git -C $1 diff-index --quiet HEAD -- &&
> +	git -C $1 diff-index --quiet --cached HEAD -- &&
> +	git -C $1 status --porcelain >$1.status &&
> +	test_must_be_empty $1.status
> +}

Like git_pc, this is not easy to take in at a glance.

"$1" needs to be quoted if we are to handle paths with spaces.

[...]
> --- /dev/null
> +++ b/t/t2080-parallel-checkout-basics.sh
> @@ -0,0 +1,197 @@
> +#!/bin/sh
> +
> +test_description='parallel-checkout basics
> +
> +Ensure that parallel-checkout basically works on clone and checkout, spawning
> +the required number of workers and correctly populating both the index and
> +working tree.
> +'
> +
> +TEST_NO_CREATE_REPO=1
> +. ./test-lib.sh
> +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> +
> +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256"
> +# currently produces a wrong result (See
> +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/).
> +# So we skip the "parallel-checkout during clone" tests when this test flag is
> +# set to "sha256". Remove this when the bug is fixed.
> +#
> +if test "$GIT_TEST_DEFAULT_HASH" = "sha256"
> +then
> +	skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256"
> +	test_done
> +fi
> +
> +R_BASE=$GIT_BUILD_DIR
> +
> +test_expect_success 'sequential clone' '
> +	git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential &&

This fails when I run it when building from a tarball, which is
presenting me from releasing this patch series to Debian experimental.

Can we use an artificial repo instead of git.git?  Using git.git as
test data seems like a recipe for hard-to-reproduce test failures.

Thanks and hope that helps,
Jonathan
Taylor Blau Oct. 20, 2020, 2:55 a.m. UTC | #2
On Mon, Oct 19, 2020 at 06:35:58PM -0700, Jonathan Nieder wrote:
> > +
> > +	# Check that the expected number of workers has been used. Note that it
> > +	# can be different than the requested number in two cases: when the
> > +	# quantity of entries to be checked out is less than the number of
> > +	# workers; and when the threshold has not been reached.
> > +	#
> > +	local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&
>
> Do we use grep's \+ operator in other tests?  I thought we preferred to
> use the more portable *, but it may be that I'm out of date.

You're not out-of-date; I looked at this myself a couple of months ago:

  https://lore.kernel.org/git/20200812140352.GC74542@syl.lan/

Thanks,
Taylor
Matheus Tavares Oct. 20, 2020, 3:18 a.m. UTC | #3
Hi, Jonathan

On Mon, Oct 19, 2020 at 10:36 PM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Hi,
>
> Matheus Tavares wrote:
>
> > Add tests to populate the working tree during clone and checkout using
> > the sequential and parallel modes, to confirm that they produce
> > identical results. Also test basic checkout mechanics, such as checking
> > for symlinks in the leading directories and the abidance to --force.
>
> Thanks for implementing parallel checkout!  I'm excited about the
> feature.  And thanks for including these tests.

Thanks for the comments and feedback :)

> [...]
> > --- /dev/null
> > +++ b/t/lib-parallel-checkout.sh
> > @@ -0,0 +1,39 @@
> [...]
> > +# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
> > +# and checks that the number of workers spawned is equal to $3.
> > +git_pc()
>
> nit: what does git_pc mean?

The idea was "git w/ parallel-checkout". But I realize it may have
gotten too abbreviated...

> Can this spell it out more verbosely, or
> could callers take on more of the burden?  (Perhaps it would make sense
> to use a helper that uses test_config to set the relevant configuration,
> and then the caller can use plain "git clone"?)

Hmm, it's possible, but I think we might end up with quite a lot of
repetition (to always check that checkout spawned the right number of
workers).

> [...]
> > +     GIT_TRACE2="$(pwd)/trace" git \
> > +             -c checkout.workers=$workers \
> > +             -c checkout.thresholdForParallelism=$threshold \
> > +             -c advice.detachedHead=0 \
> > +             $@ &&
>
> $@ needs to be quoted, or else it will act like $* (and in particular it
> won't handle parameters with embedded spaces).

Nice catch, thanks! I will send a patch for this tomorrow.

> > +
> > +     # Check that the expected number of workers has been used. Note that it
> > +     # can be different than the requested number in two cases: when the
> > +     # quantity of entries to be checked out is less than the number of
> > +     # workers; and when the threshold has not been reached.
> > +     #
> > +     local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&
>
> Do we use grep's \+ operator in other tests?  I thought we preferred to
> use the more portable *, but it may be that I'm out of date.

Oh, I didn't know about the portability issue with \+. This is already
in `next`, but I guess it's worth sending a follow-up patch to fix it,
right? (I see we have a second \+ occurrence in t7508, which could be
changed in the same patch.)

> [...]
> > +# Verify that both the working tree and the index were created correctly
> > +verify_checkout()
> > +{
> > +     git -C $1 diff-index --quiet HEAD -- &&
> > +     git -C $1 diff-index --quiet --cached HEAD -- &&
> > +     git -C $1 status --porcelain >$1.status &&
> > +     test_must_be_empty $1.status
> > +}
>
> Like git_pc, this is not easy to take in at a glance.
>
> "$1" needs to be quoted if we are to handle paths with spaces.

Thanks, again :) Currently, this function doesn't get paths with
spaces, but I agree that it's better to be cautious here.

> [...]
> > --- /dev/null
> > +++ b/t/t2080-parallel-checkout-basics.sh
> > @@ -0,0 +1,197 @@
> > +#!/bin/sh
> > +
> > +test_description='parallel-checkout basics
> > +
> > +Ensure that parallel-checkout basically works on clone and checkout, spawning
> > +the required number of workers and correctly populating both the index and
> > +working tree.
> > +'
> > +
> > +TEST_NO_CREATE_REPO=1
> > +. ./test-lib.sh
> > +. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
> > +
> > +# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256"
> > +# currently produces a wrong result (See
> > +# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/).
> > +# So we skip the "parallel-checkout during clone" tests when this test flag is
> > +# set to "sha256". Remove this when the bug is fixed.
> > +#
> > +if test "$GIT_TEST_DEFAULT_HASH" = "sha256"
> > +then
> > +     skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256"
> > +     test_done
> > +fi
> > +
> > +R_BASE=$GIT_BUILD_DIR
> > +
> > +test_expect_success 'sequential clone' '
> > +     git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential &&
>
> This fails when I run it when building from a tarball, which is
> presenting me from releasing this patch series to Debian experimental.

Sorry for the trouble :( It didn't occur to me, while writing the
test, that it could also be run from the tarball.

> Can we use an artificial repo instead of git.git?  Using git.git as
> test data seems like a recipe for hard-to-reproduce test failures.

I think we could maybe drop these tests. There are already some
similar tests below these, which use an artificial repository. The
goal of using git.git in this section was to test parallel-checkout
with a real-world repo, and hopefully catch errors that we might not
see with small artificial ones.  But you have a very valid concern, as
well. Hmm, I'm not sure what is the best solution to this case. What
do you think?
Jonathan Nieder Oct. 20, 2020, 4:16 a.m. UTC | #4
Matheus Tavares Bernardino wrote:
> On Mon, Oct 19, 2020 at 10:36 PM Jonathan Nieder <jrnieder@gmail.com> wrote:

>> Can we use an artificial repo instead of git.git?  Using git.git as
>> test data seems like a recipe for hard-to-reproduce test failures.
>
> I think we could maybe drop these tests. There are already some
> similar tests below these, which use an artificial repository. The
> goal of using git.git in this section was to test parallel-checkout
> with a real-world repo, and hopefully catch errors that we might not
> see with small artificial ones.  But you have a very valid concern, as
> well. Hmm, I'm not sure what is the best solution to this case. What
> do you think?

I see.  I suppose my preference would be to have a real-world example
in t/perf/ (see t/perf/README for how it allows an arbitrary repo to
be passed in) instead of in the regression tests.  In the regression
testsuite I'd focus more on particular behaviors I want to test (e.g.,
a file being replaced by a directory, that kind of thing).

Behaviors exercised by git.git are in some sense the *least* important
thing to test here, since developers in the Git project know to
advocate for those and exercise them day-to-day.  Where the testsuite
shines is in being able to advocate for use cases that are exercised
by other populations --- a testsuite failure can be a reminder to not
forget about the features other people need that are not part of our
own daily lives.

Thanks,
Jonathan
Matheus Tavares Oct. 20, 2020, 1:18 p.m. UTC | #5
On Mon, Oct 19, 2020 at 11:55 PM Taylor Blau <me@ttaylorr.com> wrote:
>
> On Mon, Oct 19, 2020 at 06:35:58PM -0700, Jonathan Nieder wrote:
> > > +
> > > +   # Check that the expected number of workers has been used. Note that it
> > > +   # can be different than the requested number in two cases: when the
> > > +   # quantity of entries to be checked out is less than the number of
> > > +   # workers; and when the threshold has not been reached.
> > > +   #
> > > +   local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&
> >
> > Do we use grep's \+ operator in other tests?  I thought we preferred to
> > use the more portable *, but it may be that I'm out of date.
>
> You're not out-of-date; I looked at this myself a couple of months ago:
>
>   https://lore.kernel.org/git/20200812140352.GC74542@syl.lan/

Thanks for the pointer, I'll replace .\+ by ..*, then.

I noticed we also have some uses of + and ? in tests, with `grep -E`
(or egrep). Are we OK with ERE or did these maybe just slip in by
accident?
Junio C Hamano Oct. 20, 2020, 7:09 p.m. UTC | #6
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> I noticed we also have some uses of + and ? in tests, with `grep -E`
> (or egrep). Are we OK with ERE or did these maybe just slip in by
> accident?

We are OK with 'grep -E' and 'egrep' and write '+' and '?' as valid
ERE elements.  What we are not OK with is to invoke ERE elements in
an expression that is supposed to be a BRE by prefixing a backslash,
e.g. '\+'.  Perhaps it is a GNU extension?

We need to remove '\+' in t/perf/bisect_regression used with sed.
What is sad is that this trick and "sed -E" are both GNUisms, and
there is no portable way to use ERE with sed X-<.

But we could resort to Perl in truly tricky cases ;-).
Junio C Hamano Oct. 20, 2020, 7:14 p.m. UTC | #7
Matheus Tavares Bernardino <matheus.bernardino@usp.br> writes:

> Oh, I didn't know about the portability issue with \+. This is already
> in `next`, but I guess it's worth sending a follow-up patch to fix it,
> right? (I see we have a second \+ occurrence in t7508, which could be
> changed in the same patch.)

Note that soon, typically a week, after a release, the tip of the
next branch is rewound and all the topics that did not graduate to
master has a chance to get a clean start.  This may be a good use
case for that chance.
diff mbox series

Patch

diff --git a/t/lib-parallel-checkout.sh b/t/lib-parallel-checkout.sh
new file mode 100644
index 0000000000..c95ca27711
--- /dev/null
+++ b/t/lib-parallel-checkout.sh
@@ -0,0 +1,39 @@ 
+# Helpers for t208* tests
+
+# Runs `git -c checkout.workers=$1 -c checkout.thesholdForParallelism=$2 ${@:4}`
+# and checks that the number of workers spawned is equal to $3.
+git_pc()
+{
+	if test $# -lt 4
+	then
+		BUG "too few arguments to git_pc()"
+	fi
+
+	workers=$1 threshold=$2 expected_workers=$3 &&
+	shift && shift && shift &&
+
+	rm -f trace &&
+	GIT_TRACE2="$(pwd)/trace" git \
+		-c checkout.workers=$workers \
+		-c checkout.thresholdForParallelism=$threshold \
+		-c advice.detachedHead=0 \
+		$@ &&
+
+	# Check that the expected number of workers has been used. Note that it
+	# can be different than the requested number in two cases: when the
+	# quantity of entries to be checked out is less than the number of
+	# workers; and when the threshold has not been reached.
+	#
+	local workers_in_trace=$(grep "child_start\[.\+\] git checkout--helper" trace | wc -l) &&
+	test $workers_in_trace -eq $expected_workers &&
+	rm -f trace
+}
+
+# Verify that both the working tree and the index were created correctly
+verify_checkout()
+{
+	git -C $1 diff-index --quiet HEAD -- &&
+	git -C $1 diff-index --quiet --cached HEAD -- &&
+	git -C $1 status --porcelain >$1.status &&
+	test_must_be_empty $1.status
+}
diff --git a/t/t2080-parallel-checkout-basics.sh b/t/t2080-parallel-checkout-basics.sh
new file mode 100755
index 0000000000..c088a06ecc
--- /dev/null
+++ b/t/t2080-parallel-checkout-basics.sh
@@ -0,0 +1,197 @@ 
+#!/bin/sh
+
+test_description='parallel-checkout basics
+
+Ensure that parallel-checkout basically works on clone and checkout, spawning
+the required number of workers and correctly populating both the index and
+working tree.
+'
+
+TEST_NO_CREATE_REPO=1
+. ./test-lib.sh
+. "$TEST_DIRECTORY/lib-parallel-checkout.sh"
+
+# NEEDSWORK: cloning a SHA1 repo with GIT_TEST_DEFAULT_HASH set to "sha256"
+# currently produces a wrong result (See
+# https://lore.kernel.org/git/20200911151717.43475-1-matheus.bernardino@usp.br/).
+# So we skip the "parallel-checkout during clone" tests when this test flag is
+# set to "sha256". Remove this when the bug is fixed.
+#
+if test "$GIT_TEST_DEFAULT_HASH" = "sha256"
+then
+	skip_all="t2080 currently don't work with GIT_TEST_DEFAULT_HASH=sha256"
+	test_done
+fi
+
+R_BASE=$GIT_BUILD_DIR
+
+test_expect_success 'sequential clone' '
+	git_pc 1 0 0 clone --quiet -- $R_BASE r_sequential &&
+	verify_checkout r_sequential
+'
+
+test_expect_success 'parallel clone' '
+	git_pc 2 0 2 clone --quiet -- $R_BASE r_parallel &&
+	verify_checkout r_parallel
+'
+
+test_expect_success 'fallback to sequential clone (threshold)' '
+	git -C $R_BASE ls-files >files &&
+	nr_files=$(wc -l <files) &&
+	threshold=$(($nr_files + 1)) &&
+
+	git_pc 2 $threshold 0 clone --quiet -- $R_BASE r_sequential_fallback &&
+	verify_checkout r_sequential_fallback
+'
+
+# Just to be paranoid, actually compare the contents of the worktrees directly.
+test_expect_success 'compare working trees from clones' '
+	rm -rf r_sequential/.git &&
+	rm -rf r_parallel/.git &&
+	rm -rf r_sequential_fallback/.git &&
+	diff -qr r_sequential r_parallel &&
+	diff -qr r_sequential r_sequential_fallback
+'
+
+# Test parallel-checkout with different operations (creation, deletion,
+# modification) and entry types. A branch switch from B1 to B2 will contain:
+#
+# - a (file):      modified
+# - e/x (file):    deleted
+# - b (symlink):   deleted
+# - b/f (file):    created
+# - e (symlink):   created
+# - d (submodule): created
+#
+test_expect_success SYMLINKS 'setup repo for checkout with various operations' '
+	git init various &&
+	(
+		cd various &&
+		git checkout -b B1 &&
+		echo a>a &&
+		mkdir e &&
+		echo e/x >e/x &&
+		ln -s e b &&
+		git add -A &&
+		git commit -m B1 &&
+
+		git checkout -b B2 &&
+		echo modified >a &&
+		rm -rf e &&
+		rm b &&
+		mkdir b &&
+		echo b/f >b/f &&
+		ln -s b e &&
+		git init d &&
+		test_commit -C d f &&
+		git submodule add ./d &&
+		git add -A &&
+		git commit -m B2 &&
+
+		git checkout --recurse-submodules B1
+	)
+'
+
+test_expect_success SYMLINKS 'sequential checkout' '
+	cp -R various various_sequential &&
+	git_pc 1 0 0 -C various_sequential checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential
+'
+
+test_expect_success SYMLINKS 'parallel checkout' '
+	cp -R various various_parallel &&
+	git_pc 2 0 2 -C various_parallel checkout --recurse-submodules B2 &&
+	verify_checkout various_parallel
+'
+
+test_expect_success SYMLINKS 'fallback to sequential checkout (threshold)' '
+	cp -R various various_sequential_fallback &&
+	git_pc 2 100 0 -C various_sequential_fallback checkout --recurse-submodules B2 &&
+	verify_checkout various_sequential_fallback
+'
+
+test_expect_success SYMLINKS 'compare working trees from checkouts' '
+	rm -rf various_sequential/.git &&
+	rm -rf various_parallel/.git &&
+	rm -rf various_sequential_fallback/.git &&
+	diff -qr various_sequential various_parallel &&
+	diff -qr various_sequential various_sequential_fallback
+'
+
+test_cmp_str()
+{
+	echo "$1" >tmp &&
+	test_cmp tmp "$2"
+}
+
+test_expect_success 'parallel checkout respects --[no]-force' '
+	git init dirty &&
+	(
+		cd dirty &&
+		mkdir D &&
+		test_commit D/F &&
+		test_commit F &&
+
+		echo changed >F.t &&
+		rm -rf D &&
+		echo changed >D &&
+
+		# We expect 0 workers because there is nothing to be updated
+		git_pc 2 0 0 checkout HEAD &&
+		test_path_is_file D &&
+		test_cmp_str changed D &&
+		test_cmp_str changed F.t &&
+
+		git_pc 2 0 2 checkout --force HEAD &&
+		test_path_is_dir D &&
+		test_cmp_str D/F D/F.t &&
+		test_cmp_str F F.t
+	)
+'
+
+test_expect_success SYMLINKS 'parallel checkout checks for symlinks in leading dirs' '
+	git init symlinks &&
+	(
+		cd symlinks &&
+		mkdir D E &&
+
+		# Create two entries in D to have enough work for 2 parallel
+		# workers
+		test_commit D/A &&
+		test_commit D/B &&
+		test_commit E/C &&
+		rm -rf D &&
+		ln -s E D &&
+
+		git_pc 2 0 2 checkout --force HEAD &&
+		! test -L D &&
+		test_cmp_str D/A D/A.t &&
+		test_cmp_str D/B D/B.t
+	)
+'
+
+test_expect_success SYMLINKS,CASE_INSENSITIVE_FS 'symlink colliding with leading dir' '
+	git init colliding-symlink &&
+	(
+		cd colliding-symlink &&
+		file_hex=$(git hash-object -w --stdin </dev/null) &&
+		file_oct=$(echo $file_hex | hex2oct) &&
+
+		sym_hex=$(echo "./D" | git hash-object -w --stdin) &&
+		sym_oct=$(echo $sym_hex | hex2oct) &&
+
+		printf "100644 D/A\0${file_oct}" >tree &&
+		printf "100644 E/B\0${file_oct}" >>tree &&
+		printf "120000 e\0${sym_oct}" >>tree &&
+
+		tree_hex=$(git hash-object -w -t tree --stdin <tree) &&
+		commit_hex=$(git commit-tree -m collisions $tree_hex) &&
+		git update-ref refs/heads/colliding-symlink $commit_hex &&
+
+		git_pc 2 0 2 checkout colliding-symlink &&
+		test_path_is_dir D &&
+		test_path_is_missing D/B
+	)
+'
+
+test_done