diff mbox series

t/perf: handle worktrees as test repos

Message ID YCwnPVFsYDa0SNmG@coredump.intra.peff.net (mailing list archive)
State Superseded
Headers show
Series t/perf: handle worktrees as test repos | expand

Commit Message

Jeff King Feb. 16, 2021, 8:12 p.m. UTC
The perf suite gets confused when test_perf_default_repo is pointed at a
worktree (which includes when it is run from within a worktree at all,
since the default is to use the current repository).

Here's an example:

  $ git worktree add ~/foo
  Preparing worktree (new branch 'foo')
  HEAD is now at 328c109303 The eighth batch
  $ cd ~/foo
  $ make
  [...build output...]
  $ cd t/perf
  $ ./p0000-perf-lib-sanity.sh -v -i
  [...]
  perf 1 - test_perf_default_repo works:
  running:
  	foo=$(git rev-parse HEAD) &&
  	test_export foo

  fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
  Use '--' to separate paths from revisions, like this:
  'git <command> [<revision>...] -- [<file>...]'

The problem is that we didn't copy all of the necessary files from the
source repository (in this case we got HEAD, but we have no refs!). We
discover the git-dir with "rev-parse --git-dir", but this points to the
worktree's partial repository in .../.git/worktrees/foo.

That partial repository has a "commondir" file which points to the main
repository, where the actual refs are stored, but we don't copy it. This
is the correct thing to do, though! If we did copy it, then our scratch
test repo would be pointing back to the original main repo, and any ref
updates we made in the tests would impact that original repo.

Instead, we need to either:

  1. Make a scratch copy of the original main repo (in addition to the
     worktree repo), and point the scratch worktree repo's commondir at
     it. This preserves the original relationship, but it's doubtful any
     script really cares (if they are testing worktree performance,
     they'd probably make their own worktrees). And it's trickier to get
     right.

  2. Collapse the main and worktree repos into a single scratch repo.
     This can be done by copying everything from both, preferring any
     files from the worktree repo.

This patch does the second one. With this applied, the example above
results in p0000 running successfully.

Reported-by: Derrick Stolee <dstolee@microsoft.com>
Signed-off-by: Jeff King <peff@peff.net>
---
Having written that, it occurs to me that an even simpler solution is to
just always use the commondir as the source of the scratch repo. It does
not produce the same outcome, but the point is generally just to find a
suitable starting point for a repository. Grabbing the main repo instead
of one of its worktrees is probably OK for most tests.

 t/perf/perf-lib.sh | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

Comments

Jeff King Feb. 16, 2021, 8:16 p.m. UTC | #1
On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote:

> Having written that, it occurs to me that an even simpler solution is to
> just always use the commondir as the source of the scratch repo. It does
> not produce the same outcome, but the point is generally just to find a
> suitable starting point for a repository. Grabbing the main repo instead
> of one of its worktrees is probably OK for most tests.

The patch there is delightfully simple:

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index e385c6896f..7018256cd4 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -75,7 +75,7 @@ test_perf_create_repo_from () {
 	BUG "not 2 parameters to test-create-repo"
 	repo="$1"
 	source="$2"
-	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
+	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
 	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
 	mkdir -p "$repo/.git"
 	(

but I do wonder if somebody would find it confusing.

-Peff
Derrick Stolee Feb. 16, 2021, 8:36 p.m. UTC | #2
On 2/16/2021 3:16 PM, Jeff King wrote:
> On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote:
> 
>> Having written that, it occurs to me that an even simpler solution is to
>> just always use the commondir as the source of the scratch repo. It does
>> not produce the same outcome, but the point is generally just to find a
>> suitable starting point for a repository. Grabbing the main repo instead
>> of one of its worktrees is probably OK for most tests.
> 
> The patch there is delightfully simple:

I do like this simplicity.
 
> diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> index e385c6896f..7018256cd4 100644
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -75,7 +75,7 @@ test_perf_create_repo_from () {
>  	BUG "not 2 parameters to test-create-repo"
>  	repo="$1"
>  	source="$2"
> -	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
> +	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
>  	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
>  	mkdir -p "$repo/.git"
>  	(
> 
> but I do wonder if somebody would find it confusing.

It would be confusing, especially if one let the "main" worktree
languish far behind another worktree. Rather, one case that applies
mostly to me and my team is when we work on git-for-windows/git or
microsoft/git in a worktree off of git/git. I think it would be
appropriate to use either, as the differences at HEAD are not so
significant to matter. But, any deviation from the HEAD of the
current worktree might be confusing when trying to reproduce some
surprising behavior.

Thanks,
-Stolee
Johannes Schindelin Feb. 16, 2021, 9:13 p.m. UTC | #3
Hi Peff,

On Tue, 16 Feb 2021, Jeff King wrote:

> The perf suite gets confused when test_perf_default_repo is pointed at a
> worktree (which includes when it is run from within a worktree at all,
> since the default is to use the current repository).
>
> Here's an example:
>
>   $ git worktree add ~/foo
>   Preparing worktree (new branch 'foo')
>   HEAD is now at 328c109303 The eighth batch
>   $ cd ~/foo
>   $ make
>   [...build output...]
>   $ cd t/perf
>   $ ./p0000-perf-lib-sanity.sh -v -i
>   [...]
>   perf 1 - test_perf_default_repo works:
>   running:
>   	foo=$(git rev-parse HEAD) &&
>   	test_export foo
>
>   fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree.
>   Use '--' to separate paths from revisions, like this:
>   'git <command> [<revision>...] -- [<file>...]'
>
> The problem is that we didn't copy all of the necessary files from the
> source repository (in this case we got HEAD, but we have no refs!). We
> discover the git-dir with "rev-parse --git-dir", but this points to the
> worktree's partial repository in .../.git/worktrees/foo.
>
> That partial repository has a "commondir" file which points to the main
> repository, where the actual refs are stored, but we don't copy it. This
> is the correct thing to do, though! If we did copy it, then our scratch
> test repo would be pointing back to the original main repo, and any ref
> updates we made in the tests would impact that original repo.
>
> Instead, we need to either:
>
>   1. Make a scratch copy of the original main repo (in addition to the
>      worktree repo), and point the scratch worktree repo's commondir at
>      it. This preserves the original relationship, but it's doubtful any
>      script really cares (if they are testing worktree performance,
>      they'd probably make their own worktrees). And it's trickier to get
>      right.
>
>   2. Collapse the main and worktree repos into a single scratch repo.
>      This can be done by copying everything from both, preferring any
>      files from the worktree repo.
>
> This patch does the second one. With this applied, the example above
> results in p0000 running successfully.
>
> Reported-by: Derrick Stolee <dstolee@microsoft.com>
> Signed-off-by: Jeff King <peff@peff.net>
> ---

I think you'll also need the equivalent of:

-- snip --
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index 22d727cef83..0949c360ec4 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -84,7 +84,7 @@ test_perf_create_repo_from () {
 			cp -R "$objects_dir" "$repo/.git/"; } &&
 		for stuff in "$source_git"/*; do
 			case "$stuff" in
-				*/objects|*/hooks|*/config|*/commondir)
+				*/objects|*/hooks|*/config|*/commondir|*/gitdir)
 					;;
 				*)
 					cp -R "$stuff" "$repo/.git/" || exit 1
-- snap --

> Having written that, it occurs to me that an even simpler solution is to
> just always use the commondir as the source of the scratch repo. It does
> not produce the same outcome, but the point is generally just to find a
> suitable starting point for a repository. Grabbing the main repo instead
> of one of its worktrees is probably OK for most tests.

Good point: we probably also need to exclude `*/worktrees/*`, but that is
a bit trickier as we would not want to exclude, say,
`refs/heads/worktrees/cleanup`.

Ciao,
Dscho

>
>  t/perf/perf-lib.sh | 31 ++++++++++++++++++++++---------
>  1 file changed, 22 insertions(+), 9 deletions(-)
>
> diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> index e385c6896f..1226be4005 100644
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -70,27 +70,40 @@ test_perf_do_repo_symlink_config_ () {
>  	test_have_prereq SYMLINKS || git config core.symlinks false
>  }
>
> +test_perf_copy_repo_contents () {
> +	for stuff in "$1"/*
> +	do
> +		case "$stuff" in
> +		*/objects|*/hooks|*/config|*/commondir)
> +			;;
> +		*)
> +			cp -R "$stuff" "$repo/.git/" || exit 1
> +			;;
> +		esac
> +	done
> +}
> +
>  test_perf_create_repo_from () {
>  	test "$#" = 2 ||
>  	BUG "not 2 parameters to test-create-repo"
>  	repo="$1"
>  	source="$2"
>  	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
>  	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
> +	common_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
>  	mkdir -p "$repo/.git"
>  	(
>  		cd "$source" &&
>  		{ cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null ||
>  			cp -R "$objects_dir" "$repo/.git/"; } &&
> -		for stuff in "$source_git"/*; do
> -			case "$stuff" in
> -				*/objects|*/hooks|*/config|*/commondir)
> -					;;
> -				*)
> -					cp -R "$stuff" "$repo/.git/" || exit 1
> -					;;
> -			esac
> -		done
> +
> +		# common_dir must come first here, since we want source_git to
> +		# take precedence and overwrite any overlapping files
> +		test_perf_copy_repo_contents "$common_dir"
> +		if test "$source_git" != "$common_dir"
> +		then
> +			test_perf_copy_repo_contents "$source_git"
> +		fi
>  	) &&
>  	(
>  		cd "$repo" &&
> --
> 2.30.1.989.g5e01c2f281
>
>
Jeff King Feb. 16, 2021, 9:38 p.m. UTC | #4
On Tue, Feb 16, 2021 at 10:13:49PM +0100, Johannes Schindelin wrote:

> I think you'll also need the equivalent of:
> 
> -- snip --
> diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> index 22d727cef83..0949c360ec4 100644
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -84,7 +84,7 @@ test_perf_create_repo_from () {
>  			cp -R "$objects_dir" "$repo/.git/"; } &&
>  		for stuff in "$source_git"/*; do
>  			case "$stuff" in
> -				*/objects|*/hooks|*/config|*/commondir)
> +				*/objects|*/hooks|*/config|*/commondir|*/gitdir)
>  					;;
>  				*)
>  					cp -R "$stuff" "$repo/.git/" || exit 1
> -- snap --

I think that's reasonable to do, but isn't it orthogonal?

My patch is fixing the case that we do not copy enough files from a
workdir.

Both before and after my patch, we'd be copying the gitdir file. I don't
think it would actually cause a problem in practice, since a "gitdir"
file in the main repo dir doesn't have any meaning. But I do think it's
prudent to avoid copying it (just as we avoid commondir) to avoid any
confusion, or commands accidentally touching the original repository.

Likewise...

> > Having written that, it occurs to me that an even simpler solution is to
> > just always use the commondir as the source of the scratch repo. It does
> > not produce the same outcome, but the point is generally just to find a
> > suitable starting point for a repository. Grabbing the main repo instead
> > of one of its worktrees is probably OK for most tests.
> 
> Good point: we probably also need to exclude `*/worktrees/*`, but that is
> a bit trickier as we would not want to exclude, say,
> `refs/heads/worktrees/cleanup`.

Yes, for the same reason, I think we should exclude the whole worktrees
directory. I don't think we have to worry about that case (and if we
did, we'd already have trouble with "refs/heads/config" or similar). The
reason is that the case statement is only looking at the glob made from
the top-level. The actual recursive expansion of "refs/", etc, is done
by "cp -R".

Anyway, what I'm suggesting is that it would be a separate patch to
avoid looking at gitdir and worktrees, in order to increase overall
safety. Do you want to do that on top, or should I?

-Peff
Junio C Hamano Feb. 16, 2021, 10:52 p.m. UTC | #5
Jeff King <peff@peff.net> writes:

> On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote:
>
>> Having written that, it occurs to me that an even simpler solution is to
>> just always use the commondir as the source of the scratch repo. It does
>> not produce the same outcome, but the point is generally just to find a
>> suitable starting point for a repository. Grabbing the main repo instead
>> of one of its worktrees is probably OK for most tests.
>
> The patch there is delightfully simple:
>
> diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> index e385c6896f..7018256cd4 100644
> --- a/t/perf/perf-lib.sh
> +++ b/t/perf/perf-lib.sh
> @@ -75,7 +75,7 @@ test_perf_create_repo_from () {
>  	BUG "not 2 parameters to test-create-repo"
>  	repo="$1"
>  	source="$2"
> -	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
> +	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
>  	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
>  	mkdir -p "$repo/.git"
>  	(
>
> but I do wonder if somebody would find it confusing.

That does look quite a lot simpler.

What are the possible downsides?  Per-worktree references may not be
pointing at the same objects?
Jeff King Feb. 16, 2021, 10:56 p.m. UTC | #6
On Tue, Feb 16, 2021 at 02:52:57PM -0800, Junio C Hamano wrote:

> Jeff King <peff@peff.net> writes:
> 
> > On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote:
> >
> >> Having written that, it occurs to me that an even simpler solution is to
> >> just always use the commondir as the source of the scratch repo. It does
> >> not produce the same outcome, but the point is generally just to find a
> >> suitable starting point for a repository. Grabbing the main repo instead
> >> of one of its worktrees is probably OK for most tests.
> >
> > The patch there is delightfully simple:
> >
> > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
> > index e385c6896f..7018256cd4 100644
> > --- a/t/perf/perf-lib.sh
> > +++ b/t/perf/perf-lib.sh
> > @@ -75,7 +75,7 @@ test_perf_create_repo_from () {
> >  	BUG "not 2 parameters to test-create-repo"
> >  	repo="$1"
> >  	source="$2"
> > -	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
> > +	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
> >  	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
> >  	mkdir -p "$repo/.git"
> >  	(
> >
> > but I do wonder if somebody would find it confusing.
> 
> That does look quite a lot simpler.
> 
> What are the possible downsides?  Per-worktree references may not be
> pointing at the same objects?

The main one IMHO is that HEAD would not be pointing where the user
might expect it to be.

-Peff
Jeff King Feb. 26, 2021, 7:09 a.m. UTC | #7
I don't think v1 of this patch got picked up at all, so here it is
again. There was a question of whether we could do the much simpler
solution discussed in:

  https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/

But I think it would be confusing. So patch 1 is unchanged here from v1.

Johannes suggested we could add some extra protections to avoid
accidentally modifying the original repo. Patch 2 does that.

  [1/2]: t/perf: handle worktrees as test repos
  [2/2]: t/perf: avoid copying worktree files from test repo

 t/perf/perf-lib.sh | 31 ++++++++++++++++++++++---------
 1 file changed, 22 insertions(+), 9 deletions(-)

-Peff
Derrick Stolee Feb. 26, 2021, 3:43 p.m. UTC | #8
On 2/26/2021 2:09 AM, Jeff King wrote:
> I don't think v1 of this patch got picked up at all, so here it is
> again. There was a question of whether we could do the much simpler
> solution discussed in:
> 
>   https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/
> 
> But I think it would be confusing. So patch 1 is unchanged here from v1.
> 
> Johannes suggested we could add some extra protections to avoid
> accidentally modifying the original repo. Patch 2 does that.

Thanks. LGTM.

-Stolee
Johannes Schindelin March 1, 2021, 10:03 p.m. UTC | #9
Hi,

On Fri, 26 Feb 2021, Derrick Stolee wrote:

> On 2/26/2021 2:09 AM, Jeff King wrote:
> > I don't think v1 of this patch got picked up at all, so here it is
> > again. There was a question of whether we could do the much simpler
> > solution discussed in:
> >
> >   https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/
> >
> > But I think it would be confusing. So patch 1 is unchanged here from v1.
> >
> > Johannes suggested we could add some extra protections to avoid
> > accidentally modifying the original repo. Patch 2 does that.
>
> Thanks. LGTM.

TMT (To me, too)

Thanks,
Dscho
diff mbox series

Patch

diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh
index e385c6896f..1226be4005 100644
--- a/t/perf/perf-lib.sh
+++ b/t/perf/perf-lib.sh
@@ -70,27 +70,40 @@  test_perf_do_repo_symlink_config_ () {
 	test_have_prereq SYMLINKS || git config core.symlinks false
 }
 
+test_perf_copy_repo_contents () {
+	for stuff in "$1"/*
+	do
+		case "$stuff" in
+		*/objects|*/hooks|*/config|*/commondir)
+			;;
+		*)
+			cp -R "$stuff" "$repo/.git/" || exit 1
+			;;
+		esac
+	done
+}
+
 test_perf_create_repo_from () {
 	test "$#" = 2 ||
 	BUG "not 2 parameters to test-create-repo"
 	repo="$1"
 	source="$2"
 	source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)"
 	objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)"
+	common_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)"
 	mkdir -p "$repo/.git"
 	(
 		cd "$source" &&
 		{ cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null ||
 			cp -R "$objects_dir" "$repo/.git/"; } &&
-		for stuff in "$source_git"/*; do
-			case "$stuff" in
-				*/objects|*/hooks|*/config|*/commondir)
-					;;
-				*)
-					cp -R "$stuff" "$repo/.git/" || exit 1
-					;;
-			esac
-		done
+
+		# common_dir must come first here, since we want source_git to
+		# take precedence and overwrite any overlapping files
+		test_perf_copy_repo_contents "$common_dir"
+		if test "$source_git" != "$common_dir"
+		then
+			test_perf_copy_repo_contents "$source_git"
+		fi
 	) &&
 	(
 		cd "$repo" &&