Message ID | YCwnPVFsYDa0SNmG@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | t/perf: handle worktrees as test repos | expand |
On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote: > Having written that, it occurs to me that an even simpler solution is to > just always use the commondir as the source of the scratch repo. It does > not produce the same outcome, but the point is generally just to find a > suitable starting point for a repository. Grabbing the main repo instead > of one of its worktrees is probably OK for most tests. The patch there is delightfully simple: diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index e385c6896f..7018256cd4 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -75,7 +75,7 @@ test_perf_create_repo_from () { BUG "not 2 parameters to test-create-repo" repo="$1" source="$2" - source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" + source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" mkdir -p "$repo/.git" ( but I do wonder if somebody would find it confusing. -Peff
On 2/16/2021 3:16 PM, Jeff King wrote: > On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote: > >> Having written that, it occurs to me that an even simpler solution is to >> just always use the commondir as the source of the scratch repo. It does >> not produce the same outcome, but the point is generally just to find a >> suitable starting point for a repository. Grabbing the main repo instead >> of one of its worktrees is probably OK for most tests. > > The patch there is delightfully simple: I do like this simplicity. > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh > index e385c6896f..7018256cd4 100644 > --- a/t/perf/perf-lib.sh > +++ b/t/perf/perf-lib.sh > @@ -75,7 +75,7 @@ test_perf_create_repo_from () { > BUG "not 2 parameters to test-create-repo" > repo="$1" > source="$2" > - source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" > + source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" > objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" > mkdir -p "$repo/.git" > ( > > but I do wonder if somebody would find it confusing. It would be confusing, especially if one let the "main" worktree languish far behind another worktree. Rather, one case that applies mostly to me and my team is when we work on git-for-windows/git or microsoft/git in a worktree off of git/git. I think it would be appropriate to use either, as the differences at HEAD are not so significant to matter. But, any deviation from the HEAD of the current worktree might be confusing when trying to reproduce some surprising behavior. Thanks, -Stolee
Hi Peff, On Tue, 16 Feb 2021, Jeff King wrote: > The perf suite gets confused when test_perf_default_repo is pointed at a > worktree (which includes when it is run from within a worktree at all, > since the default is to use the current repository). > > Here's an example: > > $ git worktree add ~/foo > Preparing worktree (new branch 'foo') > HEAD is now at 328c109303 The eighth batch > $ cd ~/foo > $ make > [...build output...] > $ cd t/perf > $ ./p0000-perf-lib-sanity.sh -v -i > [...] > perf 1 - test_perf_default_repo works: > running: > foo=$(git rev-parse HEAD) && > test_export foo > > fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree. > Use '--' to separate paths from revisions, like this: > 'git <command> [<revision>...] -- [<file>...]' > > The problem is that we didn't copy all of the necessary files from the > source repository (in this case we got HEAD, but we have no refs!). We > discover the git-dir with "rev-parse --git-dir", but this points to the > worktree's partial repository in .../.git/worktrees/foo. > > That partial repository has a "commondir" file which points to the main > repository, where the actual refs are stored, but we don't copy it. This > is the correct thing to do, though! If we did copy it, then our scratch > test repo would be pointing back to the original main repo, and any ref > updates we made in the tests would impact that original repo. > > Instead, we need to either: > > 1. Make a scratch copy of the original main repo (in addition to the > worktree repo), and point the scratch worktree repo's commondir at > it. This preserves the original relationship, but it's doubtful any > script really cares (if they are testing worktree performance, > they'd probably make their own worktrees). And it's trickier to get > right. > > 2. Collapse the main and worktree repos into a single scratch repo. > This can be done by copying everything from both, preferring any > files from the worktree repo. > > This patch does the second one. With this applied, the example above > results in p0000 running successfully. > > Reported-by: Derrick Stolee <dstolee@microsoft.com> > Signed-off-by: Jeff King <peff@peff.net> > --- I think you'll also need the equivalent of: -- snip -- diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index 22d727cef83..0949c360ec4 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -84,7 +84,7 @@ test_perf_create_repo_from () { cp -R "$objects_dir" "$repo/.git/"; } && for stuff in "$source_git"/*; do case "$stuff" in - */objects|*/hooks|*/config|*/commondir) + */objects|*/hooks|*/config|*/commondir|*/gitdir) ;; *) cp -R "$stuff" "$repo/.git/" || exit 1 -- snap -- > Having written that, it occurs to me that an even simpler solution is to > just always use the commondir as the source of the scratch repo. It does > not produce the same outcome, but the point is generally just to find a > suitable starting point for a repository. Grabbing the main repo instead > of one of its worktrees is probably OK for most tests. Good point: we probably also need to exclude `*/worktrees/*`, but that is a bit trickier as we would not want to exclude, say, `refs/heads/worktrees/cleanup`. Ciao, Dscho > > t/perf/perf-lib.sh | 31 ++++++++++++++++++++++--------- > 1 file changed, 22 insertions(+), 9 deletions(-) > > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh > index e385c6896f..1226be4005 100644 > --- a/t/perf/perf-lib.sh > +++ b/t/perf/perf-lib.sh > @@ -70,27 +70,40 @@ test_perf_do_repo_symlink_config_ () { > test_have_prereq SYMLINKS || git config core.symlinks false > } > > +test_perf_copy_repo_contents () { > + for stuff in "$1"/* > + do > + case "$stuff" in > + */objects|*/hooks|*/config|*/commondir) > + ;; > + *) > + cp -R "$stuff" "$repo/.git/" || exit 1 > + ;; > + esac > + done > +} > + > test_perf_create_repo_from () { > test "$#" = 2 || > BUG "not 2 parameters to test-create-repo" > repo="$1" > source="$2" > source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" > objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" > + common_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" > mkdir -p "$repo/.git" > ( > cd "$source" && > { cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null || > cp -R "$objects_dir" "$repo/.git/"; } && > - for stuff in "$source_git"/*; do > - case "$stuff" in > - */objects|*/hooks|*/config|*/commondir) > - ;; > - *) > - cp -R "$stuff" "$repo/.git/" || exit 1 > - ;; > - esac > - done > + > + # common_dir must come first here, since we want source_git to > + # take precedence and overwrite any overlapping files > + test_perf_copy_repo_contents "$common_dir" > + if test "$source_git" != "$common_dir" > + then > + test_perf_copy_repo_contents "$source_git" > + fi > ) && > ( > cd "$repo" && > -- > 2.30.1.989.g5e01c2f281 > >
On Tue, Feb 16, 2021 at 10:13:49PM +0100, Johannes Schindelin wrote: > I think you'll also need the equivalent of: > > -- snip -- > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh > index 22d727cef83..0949c360ec4 100644 > --- a/t/perf/perf-lib.sh > +++ b/t/perf/perf-lib.sh > @@ -84,7 +84,7 @@ test_perf_create_repo_from () { > cp -R "$objects_dir" "$repo/.git/"; } && > for stuff in "$source_git"/*; do > case "$stuff" in > - */objects|*/hooks|*/config|*/commondir) > + */objects|*/hooks|*/config|*/commondir|*/gitdir) > ;; > *) > cp -R "$stuff" "$repo/.git/" || exit 1 > -- snap -- I think that's reasonable to do, but isn't it orthogonal? My patch is fixing the case that we do not copy enough files from a workdir. Both before and after my patch, we'd be copying the gitdir file. I don't think it would actually cause a problem in practice, since a "gitdir" file in the main repo dir doesn't have any meaning. But I do think it's prudent to avoid copying it (just as we avoid commondir) to avoid any confusion, or commands accidentally touching the original repository. Likewise... > > Having written that, it occurs to me that an even simpler solution is to > > just always use the commondir as the source of the scratch repo. It does > > not produce the same outcome, but the point is generally just to find a > > suitable starting point for a repository. Grabbing the main repo instead > > of one of its worktrees is probably OK for most tests. > > Good point: we probably also need to exclude `*/worktrees/*`, but that is > a bit trickier as we would not want to exclude, say, > `refs/heads/worktrees/cleanup`. Yes, for the same reason, I think we should exclude the whole worktrees directory. I don't think we have to worry about that case (and if we did, we'd already have trouble with "refs/heads/config" or similar). The reason is that the case statement is only looking at the glob made from the top-level. The actual recursive expansion of "refs/", etc, is done by "cp -R". Anyway, what I'm suggesting is that it would be a separate patch to avoid looking at gitdir and worktrees, in order to increase overall safety. Do you want to do that on top, or should I? -Peff
Jeff King <peff@peff.net> writes: > On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote: > >> Having written that, it occurs to me that an even simpler solution is to >> just always use the commondir as the source of the scratch repo. It does >> not produce the same outcome, but the point is generally just to find a >> suitable starting point for a repository. Grabbing the main repo instead >> of one of its worktrees is probably OK for most tests. > > The patch there is delightfully simple: > > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh > index e385c6896f..7018256cd4 100644 > --- a/t/perf/perf-lib.sh > +++ b/t/perf/perf-lib.sh > @@ -75,7 +75,7 @@ test_perf_create_repo_from () { > BUG "not 2 parameters to test-create-repo" > repo="$1" > source="$2" > - source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" > + source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" > objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" > mkdir -p "$repo/.git" > ( > > but I do wonder if somebody would find it confusing. That does look quite a lot simpler. What are the possible downsides? Per-worktree references may not be pointing at the same objects?
On Tue, Feb 16, 2021 at 02:52:57PM -0800, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > On Tue, Feb 16, 2021 at 03:12:45PM -0500, Jeff King wrote: > > > >> Having written that, it occurs to me that an even simpler solution is to > >> just always use the commondir as the source of the scratch repo. It does > >> not produce the same outcome, but the point is generally just to find a > >> suitable starting point for a repository. Grabbing the main repo instead > >> of one of its worktrees is probably OK for most tests. > > > > The patch there is delightfully simple: > > > > diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh > > index e385c6896f..7018256cd4 100644 > > --- a/t/perf/perf-lib.sh > > +++ b/t/perf/perf-lib.sh > > @@ -75,7 +75,7 @@ test_perf_create_repo_from () { > > BUG "not 2 parameters to test-create-repo" > > repo="$1" > > source="$2" > > - source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" > > + source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" > > objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" > > mkdir -p "$repo/.git" > > ( > > > > but I do wonder if somebody would find it confusing. > > That does look quite a lot simpler. > > What are the possible downsides? Per-worktree references may not be > pointing at the same objects? The main one IMHO is that HEAD would not be pointing where the user might expect it to be. -Peff
I don't think v1 of this patch got picked up at all, so here it is again. There was a question of whether we could do the much simpler solution discussed in: https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/ But I think it would be confusing. So patch 1 is unchanged here from v1. Johannes suggested we could add some extra protections to avoid accidentally modifying the original repo. Patch 2 does that. [1/2]: t/perf: handle worktrees as test repos [2/2]: t/perf: avoid copying worktree files from test repo t/perf/perf-lib.sh | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-) -Peff
On 2/26/2021 2:09 AM, Jeff King wrote: > I don't think v1 of this patch got picked up at all, so here it is > again. There was a question of whether we could do the much simpler > solution discussed in: > > https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/ > > But I think it would be confusing. So patch 1 is unchanged here from v1. > > Johannes suggested we could add some extra protections to avoid > accidentally modifying the original repo. Patch 2 does that. Thanks. LGTM. -Stolee
Hi, On Fri, 26 Feb 2021, Derrick Stolee wrote: > On 2/26/2021 2:09 AM, Jeff King wrote: > > I don't think v1 of this patch got picked up at all, so here it is > > again. There was a question of whether we could do the much simpler > > solution discussed in: > > > > https://lore.kernel.org/git/22378ce3-6845-1cd9-996a-8bdc3a8b65d7@gmail.com/ > > > > But I think it would be confusing. So patch 1 is unchanged here from v1. > > > > Johannes suggested we could add some extra protections to avoid > > accidentally modifying the original repo. Patch 2 does that. > > Thanks. LGTM. TMT (To me, too) Thanks, Dscho
diff --git a/t/perf/perf-lib.sh b/t/perf/perf-lib.sh index e385c6896f..1226be4005 100644 --- a/t/perf/perf-lib.sh +++ b/t/perf/perf-lib.sh @@ -70,27 +70,40 @@ test_perf_do_repo_symlink_config_ () { test_have_prereq SYMLINKS || git config core.symlinks false } +test_perf_copy_repo_contents () { + for stuff in "$1"/* + do + case "$stuff" in + */objects|*/hooks|*/config|*/commondir) + ;; + *) + cp -R "$stuff" "$repo/.git/" || exit 1 + ;; + esac + done +} + test_perf_create_repo_from () { test "$#" = 2 || BUG "not 2 parameters to test-create-repo" repo="$1" source="$2" source_git="$("$MODERN_GIT" -C "$source" rev-parse --git-dir)" objects_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-path objects)" + common_dir="$("$MODERN_GIT" -C "$source" rev-parse --git-common-dir)" mkdir -p "$repo/.git" ( cd "$source" && { cp -Rl "$objects_dir" "$repo/.git/" 2>/dev/null || cp -R "$objects_dir" "$repo/.git/"; } && - for stuff in "$source_git"/*; do - case "$stuff" in - */objects|*/hooks|*/config|*/commondir) - ;; - *) - cp -R "$stuff" "$repo/.git/" || exit 1 - ;; - esac - done + + # common_dir must come first here, since we want source_git to + # take precedence and overwrite any overlapping files + test_perf_copy_repo_contents "$common_dir" + if test "$source_git" != "$common_dir" + then + test_perf_copy_repo_contents "$source_git" + fi ) && ( cd "$repo" &&
The perf suite gets confused when test_perf_default_repo is pointed at a worktree (which includes when it is run from within a worktree at all, since the default is to use the current repository). Here's an example: $ git worktree add ~/foo Preparing worktree (new branch 'foo') HEAD is now at 328c109303 The eighth batch $ cd ~/foo $ make [...build output...] $ cd t/perf $ ./p0000-perf-lib-sanity.sh -v -i [...] perf 1 - test_perf_default_repo works: running: foo=$(git rev-parse HEAD) && test_export foo fatal: ambiguous argument 'HEAD': unknown revision or path not in the working tree. Use '--' to separate paths from revisions, like this: 'git <command> [<revision>...] -- [<file>...]' The problem is that we didn't copy all of the necessary files from the source repository (in this case we got HEAD, but we have no refs!). We discover the git-dir with "rev-parse --git-dir", but this points to the worktree's partial repository in .../.git/worktrees/foo. That partial repository has a "commondir" file which points to the main repository, where the actual refs are stored, but we don't copy it. This is the correct thing to do, though! If we did copy it, then our scratch test repo would be pointing back to the original main repo, and any ref updates we made in the tests would impact that original repo. Instead, we need to either: 1. Make a scratch copy of the original main repo (in addition to the worktree repo), and point the scratch worktree repo's commondir at it. This preserves the original relationship, but it's doubtful any script really cares (if they are testing worktree performance, they'd probably make their own worktrees). And it's trickier to get right. 2. Collapse the main and worktree repos into a single scratch repo. This can be done by copying everything from both, preferring any files from the worktree repo. This patch does the second one. With this applied, the example above results in p0000 running successfully. Reported-by: Derrick Stolee <dstolee@microsoft.com> Signed-off-by: Jeff King <peff@peff.net> --- Having written that, it occurs to me that an even simpler solution is to just always use the commondir as the source of the scratch repo. It does not produce the same outcome, but the point is generally just to find a suitable starting point for a repository. Grabbing the main repo instead of one of its worktrees is probably OK for most tests. t/perf/perf-lib.sh | 31 ++++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 9 deletions(-)