diff mbox series

[5/5] maintenance: allow custom refspecs during prefetch

Message ID 7f6c127dac48409ddc8d30ad236182bee21c1957.1617627856.git.gitgitgadget@gmail.com (mailing list archive)
State Superseded
Headers show
Series Maintenance: adapt custom refspecs | expand

Commit Message

Derrick Stolee April 5, 2021, 1:04 p.m. UTC
From: Derrick Stolee <dstolee@microsoft.com>

The prefetch task previously used the default refspec source plus a
custom refspec destination to avoid colliding with remote refs:

	+refs/heads/*:refs/prefetch/<remote>/*

However, some users customize their refspec to reduce how much data they
download from specific remotes. This can involve restrictive patterns
for fetching or negative patterns to avoid downloading some refs.

Modify fetch_remote() to iterate over the remote's refspec list and
translate that into the appropriate prefetch scenario. Specifically,
re-parse the raw form of the refspec into a new 'struct refspec' and
modify the 'dst' member to replace a leading "refs/" substring with
"refs/prefetch/", or prepend "refs/prefetch/" to 'dst' otherwise.
Negative refspecs do not have a 'dst' so they can be transferred to the
'git fetch' command unmodified.

This prefix change provides the benefit of keeping whatever collisions
may exist in the custom refspecs, if that is a desirable outcome.

This changes the names of the refs that would be fetched by the default
refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
is not a seriously breaking one: these refs are intended to be hidden
and not used.

Update the documentation to be more generic about the destination refs.
Do not mention custom refpecs explicitly, as that does not need to be
highlighted in this documentation. The important part of placing refs in
refs/prefetch remains.

Reported-by: Tom Saeger <tom.saeger@oracle.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 Documentation/git-maintenance.txt |  3 +--
 builtin/gc.c                      | 34 +++++++++++++++++++++++-
 t/t7900-maintenance.sh            | 43 ++++++++++++++++++++++++++-----
 3 files changed, 71 insertions(+), 9 deletions(-)

Comments

Tom Saeger April 5, 2021, 5:16 p.m. UTC | #1
On Mon, Apr 05, 2021 at 01:04:15PM +0000, Derrick Stolee via GitGitGadget wrote:
> From: Derrick Stolee <dstolee@microsoft.com>
> 
> The prefetch task previously used the default refspec source plus a
> custom refspec destination to avoid colliding with remote refs:
> 
> 	+refs/heads/*:refs/prefetch/<remote>/*
> 
> However, some users customize their refspec to reduce how much data they
> download from specific remotes. This can involve restrictive patterns
> for fetching or negative patterns to avoid downloading some refs.
> 
> Modify fetch_remote() to iterate over the remote's refspec list and
> translate that into the appropriate prefetch scenario. Specifically,
> re-parse the raw form of the refspec into a new 'struct refspec' and
> modify the 'dst' member to replace a leading "refs/" substring with
> "refs/prefetch/", or prepend "refs/prefetch/" to 'dst' otherwise.
> Negative refspecs do not have a 'dst' so they can be transferred to the
> 'git fetch' command unmodified.
> 
> This prefix change provides the benefit of keeping whatever collisions
> may exist in the custom refspecs, if that is a desirable outcome.
> 
> This changes the names of the refs that would be fetched by the default
> refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
> to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
> is not a seriously breaking one: these refs are intended to be hidden
> and not used.
> 
> Update the documentation to be more generic about the destination refs.
> Do not mention custom refpecs explicitly, as that does not need to be
> highlighted in this documentation. The important part of placing refs in
> refs/prefetch remains.
> 
> Reported-by: Tom Saeger <tom.saeger@oracle.com>
> Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
> ---
>  Documentation/git-maintenance.txt |  3 +--
>  builtin/gc.c                      | 34 +++++++++++++++++++++++-
>  t/t7900-maintenance.sh            | 43 ++++++++++++++++++++++++++-----
>  3 files changed, 71 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
> index 80ddd33ceba0..95a24264eb10 100644
> --- a/Documentation/git-maintenance.txt
> +++ b/Documentation/git-maintenance.txt
> @@ -94,8 +94,7 @@ prefetch::
>  	objects from all registered remotes. For each remote, a `git fetch`
>  	command is run. The refmap is custom to avoid updating local or remote
>  	branches (those in `refs/heads` or `refs/remotes`). Instead, the
> -	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
> -	not updated.
> +	refs are stored in `refs/prefetch/`. Also, tags are not updated.
>  +
>  This is done to avoid disrupting the remote-tracking branches. The end users
>  expect these refs to stay unmoved unless they initiate a fetch.  With prefetch
> diff --git a/builtin/gc.c b/builtin/gc.c
> index fa8128de9ae1..92cb8b4e0bfa 100644
> --- a/builtin/gc.c
> +++ b/builtin/gc.c
> @@ -32,6 +32,7 @@
>  #include "remote.h"
>  #include "object-store.h"
>  #include "exec-cmd.h"
> +#include "refspec.h"
>  
>  #define FAILED_RUN "failed to run %s"
>  
> @@ -877,6 +878,7 @@ static int fetch_remote(struct remote *remote, void *cbdata)
>  {
>  	struct maintenance_run_opts *opts = cbdata;
>  	struct child_process child = CHILD_PROCESS_INIT;
> +	int i;
>  
>  	child.git_cmd = 1;
>  	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
> @@ -886,7 +888,37 @@ static int fetch_remote(struct remote *remote, void *cbdata)
>  	if (opts->quiet)
>  		strvec_push(&child.args, "--quiet");
>  
> -	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
> +	for (i = 0; i < remote->fetch.nr; i++) {
> +		struct refspec_item replace;
> +		struct refspec_item *rsi = &remote->fetch.items[i];
> +		struct strbuf new_dst = STRBUF_INIT;
> +		size_t ignore_len = 0;
> +
> +		if (rsi->negative) {
> +			strvec_push(&child.args, remote->fetch.raw[i]);
> +			continue;
> +		}
> +
> +		refspec_item_init(&replace, remote->fetch.raw[i], 1);
> +
> +		/*
> +		 * If a refspec dst starts with "refs/" at the start,
> +		 * then we will replace "refs/" with "refs/prefetch/".
> +		 * Otherwise, we will prepend the dst string with
> +		 * "refs/prefetch/".
> +		 */
> +		if (!strncmp(replace.dst, "refs/", 5))
> +			ignore_len = 5;
> +
> +		strbuf_addstr(&new_dst, "refs/prefetch/");
> +		strbuf_addstr(&new_dst, replace.dst + ignore_len);
> +		free(replace.dst);
> +		replace.dst = strbuf_detach(&new_dst, NULL);
> +
> +		strvec_push(&child.args, refspec_item_format(&replace));

see comment on 3/5, think refspec_item_format is leaking here.
this code looks fine though.

> +
> +		refspec_item_clear(&replace);
> +	}
>  
>  	return !!run_command(&child);
>  }
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index fc2315edec11..3366ea188782 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -142,20 +142,51 @@ test_expect_success 'prefetch multiple remotes' '
>  	test_commit -C clone2 two &&
>  	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
>  	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
> -	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remote1/* <run-prefetch.txt &&
> -	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remote2/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote1/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <run-prefetch.txt &&
>  	test_path_is_missing .git/refs/remotes &&
> -	git log prefetch/remote1/one &&
> -	git log prefetch/remote2/two &&
> +	git log prefetch/remotes/remote1/one &&
> +	git log prefetch/remotes/remote2/two &&
>  	git fetch --all &&
> -	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
> -	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two &&
> +	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remotes/remote1/one &&
> +	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remotes/remote2/two &&
>  
>  	test_cmp_config refs/prefetch/ log.excludedecoration &&
>  	git log --oneline --decorate --all >log &&
>  	! grep "prefetch" log
>  '
>  
> +test_expect_success 'prefetch custom refspecs' '
> +	git -C clone1 branch -f special/fetched HEAD &&
> +	git -C clone1 branch -f special/secret/not-fetched HEAD &&
> +
> +	# create multiple refspecs for remote1
> +	git config --add remote.remote1.fetch +refs/heads/special/fetched:refs/heads/fetched &&
> +	git config --add remote.remote1.fetch ^refs/heads/special/secret/not-fetched &&
> +
> +	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
> +
> +	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
> +
> +	# skips second refspec because it is not a pattern type
> +	rs1="+refs/heads/*:refs/prefetch/remotes/remote1/*" &&
> +	rs2="+refs/heads/special/fetched:refs/prefetch/heads/fetched" &&
> +	rs3="^refs/heads/special/secret/not-fetched" &&
> +
> +	test_subcommand git fetch remote1 $fetchargs $rs1 $rs2 $rs3 <prefetch-refspec.txt &&
> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <prefetch-refspec.txt &&
> +
> +	# first refspec is overridden by second
> +	test_must_fail git rev-parse refs/prefetch/special/fetched &&
> +	git rev-parse refs/prefetch/heads/fetched &&
> +
> +	# possible incorrect places for the non-fetched ref
> +	test_must_fail git rev-parse refs/prefetch/remotes/remote1/secret/not-fetched &&
> +	test_must_fail git rev-parse refs/prefetch/remotes/remote1/not-fetched &&
> +	test_must_fail git rev-parse refs/heads/secret/not-fetched &&
> +	test_must_fail git rev-parse refs/heads/not-fetched
> +'
> +
>  test_expect_success 'prefetch and existing log.excludeDecoration values' '
>  	git config --unset-all log.excludeDecoration &&
>  	git config log.excludeDecoration refs/remotes/remote1/ &&
> -- 
> gitgitgadget
Derrick Stolee April 6, 2021, 11:15 a.m. UTC | #2
On 4/5/2021 1:16 PM, Tom Saeger wrote:
> On Mon, Apr 05, 2021 at 01:04:15PM +0000, Derrick Stolee via GitGitGadget wrote:
>> From: Derrick Stolee <dstolee@microsoft.com>
>> +		strvec_push(&child.args, refspec_item_format(&replace));
> 
> see comment on 3/5, think refspec_item_format is leaking here.
> this code looks fine though.

I will respond to the comments on patch 3, but this is the reason
a static strbuf is used: we can print like this without needing
to store the buffer in a variable and free() it here. Seemed like
an easier-to-use API for a non-critical area of code. I'll
continue the discussion over on that patch thread.

Thanks,
-Stolee
Ævar Arnfjörð Bjarmason April 7, 2021, 8:53 a.m. UTC | #3
On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:

> [...]
> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
> index fc2315edec11..3366ea188782 100755
> --- a/t/t7900-maintenance.sh
> +++ b/t/t7900-maintenance.sh
> @@ -142,20 +142,51 @@ test_expect_success 'prefetch multiple remotes' '
>  	test_commit -C clone2 two &&
>  	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
>  	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
> -	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remote1/* <run-prefetch.txt &&
> -	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remote2/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote1/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <run-prefetch.txt &&
>  	test_path_is_missing .git/refs/remotes &&
> -	git log prefetch/remote1/one &&
> -	git log prefetch/remote2/two &&
> +	git log prefetch/remotes/remote1/one &&
> +	git log prefetch/remotes/remote2/two &&
>  	git fetch --all &&
> -	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
> -	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two &&
> +	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remotes/remote1/one &&
> +	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remotes/remote2/two &&
>  
>  	test_cmp_config refs/prefetch/ log.excludedecoration &&
>  	git log --oneline --decorate --all >log &&
>  	! grep "prefetch" log
>  '
>  
> +test_expect_success 'prefetch custom refspecs' '
> +	git -C clone1 branch -f special/fetched HEAD &&
> +	git -C clone1 branch -f special/secret/not-fetched HEAD &&
> +
> +	# create multiple refspecs for remote1
> +	git config --add remote.remote1.fetch +refs/heads/special/fetched:refs/heads/fetched &&
> +	git config --add remote.remote1.fetch ^refs/heads/special/secret/not-fetched &&
> +
> +	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&

I see this is following some established convention in the file, but is
there really not a way to make this pass without directing stderr to
/dev/null? It makes ad-hoc debugging when reviewing harder.


I tried just removing it, but then (in an earlier test case) the
"test_subcommand" fails because it can't find the line we're looking
for, so us piping stderr to /dev/null impacts our trace2 output?
Ævar Arnfjörð Bjarmason April 7, 2021, 10:26 a.m. UTC | #4
On Wed, Apr 07 2021, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:
>
>> [...]
>> diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
>> index fc2315edec11..3366ea188782 100755
>> --- a/t/t7900-maintenance.sh
>> +++ b/t/t7900-maintenance.sh
>> @@ -142,20 +142,51 @@ test_expect_success 'prefetch multiple remotes' '
>>  	test_commit -C clone2 two &&
>>  	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
>>  	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
>> -	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remote1/* <run-prefetch.txt &&
>> -	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remote2/* <run-prefetch.txt &&
>> +	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote1/* <run-prefetch.txt &&
>> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <run-prefetch.txt &&
>>  	test_path_is_missing .git/refs/remotes &&
>> -	git log prefetch/remote1/one &&
>> -	git log prefetch/remote2/two &&
>> +	git log prefetch/remotes/remote1/one &&
>> +	git log prefetch/remotes/remote2/two &&
>>  	git fetch --all &&
>> -	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
>> -	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two &&
>> +	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remotes/remote1/one &&
>> +	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remotes/remote2/two &&
>>  
>>  	test_cmp_config refs/prefetch/ log.excludedecoration &&
>>  	git log --oneline --decorate --all >log &&
>>  	! grep "prefetch" log
>>  '
>>  
>> +test_expect_success 'prefetch custom refspecs' '
>> +	git -C clone1 branch -f special/fetched HEAD &&
>> +	git -C clone1 branch -f special/secret/not-fetched HEAD &&
>> +
>> +	# create multiple refspecs for remote1
>> +	git config --add remote.remote1.fetch +refs/heads/special/fetched:refs/heads/fetched &&
>> +	git config --add remote.remote1.fetch ^refs/heads/special/secret/not-fetched &&
>> +
>> +	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
>
> I see this is following some established convention in the file, but is
> there really not a way to make this pass without directing stderr to
> /dev/null? It makes ad-hoc debugging when reviewing harder.

As I later found out this is copy/pasted to get around the fact that
--quiet is dependent on isatty(), so without this the result would be
different under --verbose and non-verbose testing.

So that dates back to 3ddaad0e060 (maintenance: add --quiet option,
2020-09-17), but I see other quiet=isatty(2) in related code. I wish we
could isolate that particular behavior so removing the 2>/dev/null when
debugging the tests doesn't cause you to run into this, maybe an
explicit --quiet or --no-quiet option for all but one test that's
checking that isatty() behavior?

> I tried just removing it, but then (in an earlier test case) the
> "test_subcommand" fails because it can't find the line we're looking
> for, so us piping stderr to /dev/null impacts our trace2 output?

I hadn't seen seen test_subcommand before, sorry to be blunt, but "ew!".

So we're ad-hoc grepping trace2 JSON output just to find out whether we
invoked some subcommand. But unlike test_expect_code etc. this one
doesn't run git for you, but instead we have temp *.txt files and the
command disconnected from the run.

And because you're using "grep" and "! grep" to test, you're hiding the
difference between "did not find this line" v.s. "did not find anything
at all".

Because of that the second test using test_subcommand is either buggy or
painfully non-obvious. We check that "run --auto" doesn't contain a
"auto --quiet", but in reality it doesn't contain any subcommands at
all. We didn't run any because it exited with "nothing to pack".

I think converting the whole thing to something like the WIP/RFC patch
below is much better and more readable.

The pattern is basically stolen from test_commit and
check_sub_test_lib_test_err, respectively.

As an aside: The new test_expect_process_tree function would be much
less painful if we had a helper that took the JSON output and emitted
some sensible subset of the information therein. For now I just ad-hoc
grepped the "perf" output. AFAICT the only way to get "depth" from
trace2's JSON so to count slashes in SIDs.

In the case of your tests you're mostly/(only?) interested in a slice of
the "child_start" events. If we had a helper to spew out a pretty-print
version of that (or a subset of it) we could test_cmp against that
directly.

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 3366ea18878..d03fb361562 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -30,28 +30,41 @@ test_expect_success 'help text' '
 '
 
 test_expect_success 'run [--auto|--quiet]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
-		git maintenance run 2>/dev/null &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
-		git maintenance run --auto 2>/dev/null &&
-	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
-		git maintenance run --no-quiet 2>/dev/null &&
-	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
-	test_subcommand git gc --no-quiet <run-no-quiet.txt
+	test_expect_process_tree --depth 0 git maintenance run <<-\OUT 3<<-\ERR &&
+	git gc --quiet
+	OUT
+	ERR
+
+	test_expect_process_tree --depth 0 git maintenance run --auto <<-\OUT 3<<-\ERR &&
+	OUT
+	ERR
+
+	test_expect_process_tree --depth 0 git maintenance run --quiet <<-\OUT 3<<-\ERR &&
+	git gc --quiet
+	OUT
+	ERR
+
+	test_expect_process_tree --depth 0 git maintenance run --no-quiet <<-\OUT 3<<-\ERR
+	git gc --no-quiet
+	OUT
+	ERR
 '
 
 test_expect_success 'maintenance.auto config option' '
-	GIT_TRACE2_EVENT="$(pwd)/default" git commit --quiet --allow-empty -m 1 &&
-	test_subcommand git maintenance run --auto --quiet <default &&
-	GIT_TRACE2_EVENT="$(pwd)/true" \
-		git -c maintenance.auto=true \
-		commit --quiet --allow-empty -m 2 &&
-	test_subcommand git maintenance run --auto --quiet  <true &&
-	GIT_TRACE2_EVENT="$(pwd)/false" \
-		git -c maintenance.auto=false \
-		commit --quiet --allow-empty -m 3 &&
-	test_subcommand ! git maintenance run --auto --quiet  <false
+	test_expect_process_tree git commit --quiet --allow-empty -m 1 <<-\OUT 3<<-\ERR &&
+	git maintenance run --auto --quiet
+	OUT
+	ERR
+
+	test_expect_process_tree git commit --quiet --allow-empty -m 2 <<-\OUT 3<<-\ERR &&
+	git maintenance run --auto --quiet
+	OUT
+	ERR
+
+	test_expect_process_tree git commit --quiet --allow-empty -m 3 <<-\OUT 3<<-\ERR
+	git maintenance run --auto --quiet
+	OUT
+	ERR
 '
 
 test_expect_success 'maintenance.<task>.enabled' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index a5915dec22d..cd1187b473c 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1625,6 +1625,38 @@ test_path_is_hidden () {
 	return 1
 }
 
+test_expect_process_tree () {
+	depth= &&
+	>actual &&
+	cat >expect &&
+	cat <&3 >expect.err
+	while test $# != 0
+	do
+		case "$1" in
+		--depth)
+			depth="$2"
+			shift
+			;;
+		*)
+			break
+			;;
+		esac
+		shift
+	done &&
+	log="$(pwd)/proc-tree.txt" &&
+	>"$log" &&
+	GIT_TRACE2_PERF="$log" "$@" 2>actual.err &&
+	grep "child_start" proc-tree.txt >proc-tree-start.txt || : &&
+	if test -n "$depth"
+	then
+		grep " d$depth " proc-tree-start.txt >tmp.txt || : &&
+		mv tmp.txt proc-tree-start.txt
+	fi &&
+	sed -e 's/^.*argv:\[//' -e 's/\]$//' <proc-tree-start.txt >actual &&
+	test_cmp expect actual &&
+	test_cmp expect.err actual.err
+} 7>&2 2>&4
+
 # Check that the given command was invoked as part of the
 # trace2-format trace on stdin.
 #
Ævar Arnfjörð Bjarmason April 7, 2021, 1:47 p.m. UTC | #5
On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:

> This changes the names of the refs that would be fetched by the default
> refspec. Instead of "refs/prefetch/<remote>/<branch>" they will now go
> to "refs/prefetch/remotes/<remote>/<branch>". While this is a change, it
> is not a seriously breaking one: these refs are intended to be hidden
> and not used.

Not "a seriously breaking one" just because we'll assume nobody had a
remote they'd named "remotes" and they'll need to manually clean that
mess up (if needed), or ...?

> [...]
>  	objects from all registered remotes. For each remote, a `git fetch`
>  	command is run. The refmap is custom to avoid updating local or remote
>  	branches (those in `refs/heads` or `refs/remotes`). Instead, the
> -	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
> -	not updated.
> +	refs are stored in `refs/prefetch/`. Also, tags are not updated.

So, "tags are not updated", but:

>  
> -	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
> +	for (i = 0; i < remote->fetch.nr; i++) {
> +		struct refspec_item replace;
> +		struct refspec_item *rsi = &remote->fetch.items[i];
> +		struct strbuf new_dst = STRBUF_INIT;
> +		size_t ignore_len = 0;
> +
> +		if (rsi->negative) {
> +			strvec_push(&child.args, remote->fetch.raw[i]);
> +			continue;
> +		}
> +
> +		refspec_item_init(&replace, remote->fetch.raw[i], 1);
> +
> +		/*
> +		 * If a refspec dst starts with "refs/" at the start,
> +		 * then we will replace "refs/" with "refs/prefetch/".
> +		 * Otherwise, we will prepend the dst string with
> +		 * "refs/prefetch/".
> +		 */
> +		if (!strncmp(replace.dst, "refs/", 5))
> +			ignore_len = 5;
> +
> +		strbuf_addstr(&new_dst, "refs/prefetch/");
> +		strbuf_addstr(&new_dst, replace.dst + ignore_len);
> +		free(replace.dst);
> +		replace.dst = strbuf_detach(&new_dst, NULL);
> +
> +		strvec_push(&child.args, refspec_item_format(&replace));
> +
> +		refspec_item_clear(&replace);
> +	}

Isn't a blanket replacement of refs/heads/* with refs/* going to change
that? I haven't tested this so maybe it still doesn't work, but:

>  	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
>  	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
> -	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remote1/* <run-prefetch.txt &&
> -	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remote2/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote1/* <run-prefetch.txt &&
> +	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <run-prefetch.txt &&
> [...]
>
> +	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&

It seems we should at least have a test for the case of having a refspec
that pulls down tags.

I suspect that we could document this as an absolute before, as
refs/heads/* is the only namespace that'll refuse tags, but now that we
fetch refs/<whatever> that'll no longer be the case.

I'd think that this new behavior (if I'm right) is a feature, but that
we need to update the docs/tests appropriately.
Derrick Stolee April 9, 2021, 11:48 a.m. UTC | #6
On 4/7/2021 6:26 AM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Wed, Apr 07 2021, Ævar Arnfjörð Bjarmason wrote:
> 
>> On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:
>>> +	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
>>
>> I see this is following some established convention in the file, but is
>> there really not a way to make this pass without directing stderr to
>> /dev/null? It makes ad-hoc debugging when reviewing harder.
> 
> As I later found out this is copy/pasted to get around the fact that
> --quiet is dependent on isatty(), so without this the result would be
> different under --verbose and non-verbose testing.

Yes, adding --quiet directly is a better pattern.

> So that dates back to 3ddaad0e060 (maintenance: add --quiet option,
> 2020-09-17), but I see other quiet=isatty(2) in related code. I wish we
> could isolate that particular behavior so removing the 2>/dev/null when
> debugging the tests doesn't cause you to run into this, maybe an
> explicit --quiet or --no-quiet option for all but one test that's
> checking that isatty() behavior?
> 
>> I tried just removing it, but then (in an earlier test case) the
>> "test_subcommand" fails because it can't find the line we're looking
>> for, so us piping stderr to /dev/null impacts our trace2 output?
> 
> I hadn't seen seen test_subcommand before, sorry to be blunt, but "ew!".

Saying "I'm about to be rude" before being rude doesn't excuse it.
I had to step away and get over this comment before I could examine
the actually constructive feedback below.

A better way to communicate the same information could be "This test
helper seems more complicated than necessary, and has some gaps that
could be filled." I completely agree with this statement.

> So we're ad-hoc grepping trace2 JSON output just to find out whether we
> invoked some subcommand. But unlike test_expect_code etc. this one
> doesn't run git for you, but instead we have temp *.txt files and the
> command disconnected from the run.
> 
> And because you're using "grep" and "! grep" to test, you're hiding the
> difference between "did not find this line" v.s. "did not find anything
> at all".

You're right that there is value in comparing the ordered list of
subcommands run by a given Git command. That will catch buggy tests
that are checking that a subcommand doesn't run.

> Because of that the second test using test_subcommand is either buggy or
> painfully non-obvious. We check that "run --auto" doesn't contain a
> "auto --quiet", but in reality it doesn't contain any subcommands at
> all. We didn't run any because it exited with "nothing to pack".

That exit is from the third command, which does not pass the --auto
command. The "nothing to pack" output is from the 'git gc --no-quiet'
subcommand that is being checked in this third test.

> I think converting the whole thing to something like the WIP/RFC patch
> below is much better and more readable.

This is an interesting approach. I don't see you using the ERR that you
are inputting anywhere, so that seems like an unnecessary bloat to the
consumers. But maybe I haven't discovered all of the places where this
would be useful, but it seems better to pipe stderr to a file for later
comparison when needed.
> +test_expect_process_tree () {
> +	depth= &&
> +	>actual &&
> +	cat >expect &&
> +	cat <&3 >expect.err
> +	while test $# != 0
> +	do
> +		case "$1" in
> +		--depth)
> +			depth="$2"
> +			shift
> +			;;
> +		*)
> +			break
> +			;;
> +		esac
> +		shift
> +	done &&
Do you have an example where this is being checked? Or can depth
be left as 1 for now?

> +	log="$(pwd)/proc-tree.txt" &&
> +	>"$log" &&
> +	GIT_TRACE2_PERF="$log" "$@" 2>actual.err &&
> +	grep "child_start" proc-tree.txt >proc-tree-start.txt || : &&
> +	if test -n "$depth"
> +	then
> +		grep " d$depth " proc-tree-start.txt >tmp.txt || : &&
> +		mv tmp.txt proc-tree-start.txt
> +	fi &&
> +	sed -e 's/^.*argv:\[//' -e 's/\]$//' <proc-tree-start.txt >actual &&
> +	test_cmp expect actual &&
> +	test_cmp expect.err actual.err
> +} 7>&2 2>&4

I think similar ideas could apply to test_region. Giving it a try
now.

-Stolee
Ævar Arnfjörð Bjarmason April 9, 2021, 7:28 p.m. UTC | #7
On Fri, Apr 09 2021, Derrick Stolee wrote:

> On 4/7/2021 6:26 AM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Wed, Apr 07 2021, Ævar Arnfjörð Bjarmason wrote:
>> 
>>> On Mon, Apr 05 2021, Derrick Stolee via GitGitGadget wrote:
>>>> +	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
>>>
>>> I see this is following some established convention in the file, but is
>>> there really not a way to make this pass without directing stderr to
>>> /dev/null? It makes ad-hoc debugging when reviewing harder.
>> 
>> As I later found out this is copy/pasted to get around the fact that
>> --quiet is dependent on isatty(), so without this the result would be
>> different under --verbose and non-verbose testing.
>
> Yes, adding --quiet directly is a better pattern.
>
>> So that dates back to 3ddaad0e060 (maintenance: add --quiet option,
>> 2020-09-17), but I see other quiet=isatty(2) in related code. I wish we
>> could isolate that particular behavior so removing the 2>/dev/null when
>> debugging the tests doesn't cause you to run into this, maybe an
>> explicit --quiet or --no-quiet option for all but one test that's
>> checking that isatty() behavior?
>> 
>>> I tried just removing it, but then (in an earlier test case) the
>>> "test_subcommand" fails because it can't find the line we're looking
>>> for, so us piping stderr to /dev/null impacts our trace2 output?
>> 
>> I hadn't seen seen test_subcommand before, sorry to be blunt, but "ew!".
>
> Saying "I'm about to be rude" before being rude doesn't excuse it.
> I had to step away and get over this comment before I could examine
> the actually constructive feedback below.
>
> A better way to communicate the same information could be "This test
> helper seems more complicated than necessary, and has some gaps that
> could be filled." I completely agree with this statement.

I didn't mean to be rude, my understanding of "blunt" is more like
"forthright" or "without mincing words" or something to that effect, but
I'm not a native speaker of English, so I'll take your understanding
over mine and apologize, sorry.

What I meant to say is: When we look at some prior art in the test suite
that does similar things this pattern is needlessly convoluted, and
separates the command invocation from the test assert itself. Something
similar-ish to check_sub_test_lib_test would allow you to run the
command, but do any asserting of the proc tree in the helper.

>> So we're ad-hoc grepping trace2 JSON output just to find out whether we
>> invoked some subcommand. But unlike test_expect_code etc. this one
>> doesn't run git for you, but instead we have temp *.txt files and the
>> command disconnected from the run.
>> 
>> And because you're using "grep" and "! grep" to test, you're hiding the
>> difference between "did not find this line" v.s. "did not find anything
>> at all".
>
> You're right that there is value in comparing the ordered list of
> subcommands run by a given Git command. That will catch buggy tests
> that are checking that a subcommand doesn't run.
>
>> Because of that the second test using test_subcommand is either buggy or
>> painfully non-obvious. We check that "run --auto" doesn't contain a
>> "auto --quiet", but in reality it doesn't contain any subcommands at
>> all. We didn't run any because it exited with "nothing to pack".
>
> That exit is from the third command, which does not pass the --auto
> command. The "nothing to pack" output is from the 'git gc --no-quiet'
> subcommand that is being checked in this third test.
>
>> I think converting the whole thing to something like the WIP/RFC patch
>> below is much better and more readable.
>
> This is an interesting approach. I don't see you using the ERR that you
> are inputting anywhere, so that seems like an unnecessary bloat to the
> consumers. But maybe I haven't discovered all of the places where this
> would be useful, but it seems better to pipe stderr to a file for later
> comparison when needed.

Yes, it's probably not a good default here. For the test-lib.sh tests
there's check_sub_test_lib_test and check_sub_test_lib_test_err, most of
the tests only test stdout.

>> +test_expect_process_tree () {
>> +	depth= &&
>> +	>actual &&
>> +	cat >expect &&
>> +	cat <&3 >expect.err
>> +	while test $# != 0
>> +	do
>> +		case "$1" in
>> +		--depth)
>> +			depth="$2"
>> +			shift
>> +			;;
>> +		*)
>> +			break
>> +			;;
>> +		esac
>> +		shift
>> +	done &&
> Do you have an example where this is being checked? Or can depth
> be left as 1 for now?

It can probably be hardcoded, but I was hoping someone more familiar
with trace2 would chime in, but I'm fairly sure there's not a way to do
it without parsing the existing output with either some clever
grep/awk-ing of the PERF output, or stateful parsing of the JSON.

I thought that for git maintenance tests perhaps something wanted to
assert that we didn't have maintenance invoking maintenance, or that
something expected to prune refs really invoked the relevant prune
command via "gc".

>> +	log="$(pwd)/proc-tree.txt" &&
>> +	>"$log" &&
>> +	GIT_TRACE2_PERF="$log" "$@" 2>actual.err &&
>> +	grep "child_start" proc-tree.txt >proc-tree-start.txt || : &&
>> +	if test -n "$depth"
>> +	then
>> +		grep " d$depth " proc-tree-start.txt >tmp.txt || : &&
>> +		mv tmp.txt proc-tree-start.txt
>> +	fi &&
>> +	sed -e 's/^.*argv:\[//' -e 's/\]$//' <proc-tree-start.txt >actual &&
>> +	test_cmp expect actual &&
>> +	test_cmp expect.err actual.err
>> +} 7>&2 2>&4
>
> I think similar ideas could apply to test_region. Giving it a try
> now.

Probably, I didn't even notice that one...
Derrick Stolee April 10, 2021, 12:56 a.m. UTC | #8
On 4/9/2021 3:28 PM, Ævar Arnfjörð Bjarmason wrote:
> 
> On Fri, Apr 09 2021, Derrick Stolee wrote:
> 
>> On 4/7/2021 6:26 AM, Ævar Arnfjörð Bjarmason wrote:
>>> I think converting the whole thing to something like the WIP/RFC patch
>>> below is much better and more readable.
>>
>> This is an interesting approach. I don't see you using the ERR that you
>> are inputting anywhere, so that seems like an unnecessary bloat to the
>> consumers. But maybe I haven't discovered all of the places where this
>> would be useful, but it seems better to pipe stderr to a file for later
>> comparison when needed.
> 
> Yes, it's probably not a good default here. For the test-lib.sh tests
> there's check_sub_test_lib_test and check_sub_test_lib_test_err, most of
> the tests only test stdout.
> 
>>> +test_expect_process_tree () {
>>> +	depth= &&
>>> +	>actual &&
>>> +	cat >expect &&
>>> +	cat <&3 >expect.err
>>> +	while test $# != 0
>>> +	do
>>> +		case "$1" in
>>> +		--depth)
>>> +			depth="$2"
>>> +			shift
>>> +			;;
>>> +		*)
>>> +			break
>>> +			;;
>>> +		esac
>>> +		shift
>>> +	done &&
>> Do you have an example where this is being checked? Or can depth
>> be left as 1 for now?
> 
> It can probably be hardcoded, but I was hoping someone more familiar
> with trace2 would chime in, but I'm fairly sure there's not a way to do
> it without parsing the existing output with either some clever
> grep/awk-ing of the PERF output, or stateful parsing of the JSON.
> 
> I thought that for git maintenance tests perhaps something wanted to
> assert that we didn't have maintenance invoking maintenance, or that
> something expected to prune refs really invoked the relevant prune
> command via "gc".
> 
>>> +	log="$(pwd)/proc-tree.txt" &&
>>> +	>"$log" &&
>>> +	GIT_TRACE2_PERF="$log" "$@" 2>actual.err &&
>>> +	grep "child_start" proc-tree.txt >proc-tree-start.txt || : &&
>>> +	if test -n "$depth"
>>> +	then
>>> +		grep " d$depth " proc-tree-start.txt >tmp.txt || : &&
>>> +		mv tmp.txt proc-tree-start.txt
>>> +	fi &&
>>> +	sed -e 's/^.*argv:\[//' -e 's/\]$//' <proc-tree-start.txt >actual &&
>>> +	test_cmp expect actual &&
>>> +	test_cmp expect.err actual.err
>>> +} 7>&2 2>&4
>>
>> I think similar ideas could apply to test_region. Giving it a try
>> now.
> 
> Probably, I didn't even notice that one...

I gave this a few hours today, and I'm giving up. I'm the first to
admit that I don't have the correct scripting skills to do some of
these things.

I've got what I tried below. It certainly looks like it would work.
It solves the problem of "what if the test is flaky?" by ensuring that
all subcommands (at depth 0) match the inputs exactly.

However, the problem comes when trying to make that work for all of
the maintenance tests, specifically the 'incremental-repack' task.
That task dynamically computes a --batch-size=X parameter, and that
is not stable across runs of the script.

This was avoided in the past by only checking for the first of three
subcommands when verifying that the 'incremental-repack' task worked.
That is, except for the EXPENSIVE test that checks that the --batch-size
maxes out at 2g.

The thing that might make these changing parameters work is to allow
the specified lines be a _prefix_ of the actual parameters. Or, let
each line be a pattern that is checked against that line. Issues come
up with how to handle this line-by-line check that I was unable to
overcome.

The good news is that the idea of adding a '--prefetch' option to
'git fetch' makes the change to t7900-maintenance.sh much easier,
making this change to test_subcommand less of a priority.

I include my attempt here as a patch. Feel free to take whatever
you want of it, or none of it and start over. I do think that it
makes the test script look much nicer.

Thanks,
-Stolee

-- >8 --

From 449d098f2a13860f44b2e6fb96fb2dd5872b511b Mon Sep 17 00:00:00 2001
From: Derrick Stolee <dstolee@microsoft.com>
Date: Fri, 9 Apr 2021 08:42:26 -0400
Subject: [PATCH 1/2] test-lib: add test_subcommands
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

The test_subcommand helper in test-lib-functions.sh satisfied a need to
check for certain subcommands in Git processes. This is especially
needed in t7900-maintenance.sh to ensure that the maintenance builtin
properly calls certain subcommands based on command-line options, config
values, and the state of the repository.

However, test_subcommand has some complexities in its use that can be
improved. First, it requires that the caller knows to create a log file
with GIT_TRACE2_EVENT then supply that to test_subcommand. Further, it
only checks that some exact subcommands exist or do not exist. It does
not guarantee that the list of subcommands exactly matches a given list.
Because of this drawback, the tests that check a subcommand does _not_
run are particularly flaky to slight changes in behavior.

Introduce a new helper, test_subcommands, that resolves these drawbacks:

1. It runs the supplied command and handles the trace log itself.

2. It takes a list of commands overs stdin and compares this to the
   complete list of subcommands run from the top-level command.

The helper does not test that the full subcommand tree matches, because
that would cause tests to be too fragile to changes unrelated to the
component being tested. This could easily be extended to allow the full
tree with an option, if desired.

To ensure we only check the first level, use GIT_TRACE2_PERF output and
scan for " d0 " in the rows that include the "child_start" event. The
last column includes a way to scrape the subcommand itself from the
trace. Sometimes arguments are quoted, such as when passing a refspec
with '*' to the subcommand. This makes it difficult to create a matching
string within the single-quoted test definitions, so strip these single
quotes from the arguments before matching the input.

Only modify one test in t7900-maintenance.sh. The rest of the callers to
test_subcommand will be converted to test_subcommands in a later change.

Helped-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Derrick Stolee <dstolee@microsoft.com>
---
 t/t7900-maintenance.sh  | 16 +++++++---------
 t/test-lib-functions.sh | 24 ++++++++++++++++++++++++
 2 files changed, 31 insertions(+), 9 deletions(-)

diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index 2412d8c5c0..e170ab7862 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -30,15 +30,13 @@ test_expect_success 'help text' '
 '
 
 test_expect_success 'run [--auto|--quiet]' '
-	GIT_TRACE2_EVENT="$(pwd)/run-no-auto.txt" \
-		git maintenance run 2>/dev/null &&
-	GIT_TRACE2_EVENT="$(pwd)/run-auto.txt" \
-		git maintenance run --auto 2>/dev/null &&
-	GIT_TRACE2_EVENT="$(pwd)/run-no-quiet.txt" \
-		git maintenance run --no-quiet 2>/dev/null &&
-	test_subcommand git gc --quiet <run-no-auto.txt &&
-	test_subcommand ! git gc --auto --quiet <run-auto.txt &&
-	test_subcommand git gc --no-quiet <run-no-quiet.txt
+	test_subcommands git maintenance run --quiet <<-EOF &&
+	git gc --quiet
+	EOF
+	test_subcommands git maintenance run --auto --quiet </dev/null &&
+	test_subcommands git maintenance run --no-quiet <<-EOF
+	git gc --no-quiet
+	EOF
 '
 
 test_expect_success 'maintenance.auto config option' '
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 6348e8d733..53ffeb07f8 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1658,6 +1658,30 @@ test_subcommand () {
 	fi
 }
 
+# Run a command and ensure it succeeds. Use the
+# GIT_TRACE2_PERF logs to ensure that every subcommand
+# run by this top-level Git process is exactly the
+# set supplied over stdin.
+#
+# Redirects stderr to a file named "err". This can
+# be used by tests, but it also provides consistent
+# use of isatty(2) which can affect subcommand calls.
+test_subcommands () {
+	local log line &&
+
+	cat >expect &&
+
+	log="$(pwd)"/subcommand-trace.txt &&
+
+	GIT_TRACE2_PERF="$log" "$@" 2>err &&
+	grep "child_start" "$log" | grep " d0 " >processes || : &&
+
+	sed -e 's/^.*argv:\[//' -e 's/\]$//' -e "s/'//g" <processes >actual &&
+	test_cmp expect actual &&
+
+	rm -f "$log" expect actual processes
+}
+
 # Check that the given command was invoked as part of the
 # trace2-format trace on stdin.
 #
Ævar Arnfjörð Bjarmason April 10, 2021, 11:37 a.m. UTC | #9
On Sat, Apr 10 2021, Derrick Stolee wrote:

> On 4/9/2021 3:28 PM, Ævar Arnfjörð Bjarmason wrote:
>> 
>> On Fri, Apr 09 2021, Derrick Stolee wrote:
>> 
>>> On 4/7/2021 6:26 AM, Ævar Arnfjörð Bjarmason wrote:
>>>> I think converting the whole thing to something like the WIP/RFC patch
>>>> below is much better and more readable.
>>>
>>> This is an interesting approach. I don't see you using the ERR that you
>>> are inputting anywhere, so that seems like an unnecessary bloat to the
>>> consumers. But maybe I haven't discovered all of the places where this
>>> would be useful, but it seems better to pipe stderr to a file for later
>>> comparison when needed.
>> 
>> Yes, it's probably not a good default here. For the test-lib.sh tests
>> there's check_sub_test_lib_test and check_sub_test_lib_test_err, most of
>> the tests only test stdout.
>> 
>>>> +test_expect_process_tree () {
>>>> +	depth= &&
>>>> +	>actual &&
>>>> +	cat >expect &&
>>>> +	cat <&3 >expect.err
>>>> +	while test $# != 0
>>>> +	do
>>>> +		case "$1" in
>>>> +		--depth)
>>>> +			depth="$2"
>>>> +			shift
>>>> +			;;
>>>> +		*)
>>>> +			break
>>>> +			;;
>>>> +		esac
>>>> +		shift
>>>> +	done &&
>>> Do you have an example where this is being checked? Or can depth
>>> be left as 1 for now?
>> 
>> It can probably be hardcoded, but I was hoping someone more familiar
>> with trace2 would chime in, but I'm fairly sure there's not a way to do
>> it without parsing the existing output with either some clever
>> grep/awk-ing of the PERF output, or stateful parsing of the JSON.
>> 
>> I thought that for git maintenance tests perhaps something wanted to
>> assert that we didn't have maintenance invoking maintenance, or that
>> something expected to prune refs really invoked the relevant prune
>> command via "gc".
>> 
>>>> +	log="$(pwd)/proc-tree.txt" &&
>>>> +	>"$log" &&
>>>> +	GIT_TRACE2_PERF="$log" "$@" 2>actual.err &&
>>>> +	grep "child_start" proc-tree.txt >proc-tree-start.txt || : &&
>>>> +	if test -n "$depth"
>>>> +	then
>>>> +		grep " d$depth " proc-tree-start.txt >tmp.txt || : &&
>>>> +		mv tmp.txt proc-tree-start.txt
>>>> +	fi &&
>>>> +	sed -e 's/^.*argv:\[//' -e 's/\]$//' <proc-tree-start.txt >actual &&
>>>> +	test_cmp expect actual &&
>>>> +	test_cmp expect.err actual.err
>>>> +} 7>&2 2>&4
>>>
>>> I think similar ideas could apply to test_region. Giving it a try
>>> now.
>> 
>> Probably, I didn't even notice that one...
>
> I gave this a few hours today, and I'm giving up. I'm the first to
> admit that I don't have the correct scripting skills to do some of
> these things.
>
> I've got what I tried below. It certainly looks like it would work.
> It solves the problem of "what if the test is flaky?" by ensuring that
> all subcommands (at depth 0) match the inputs exactly.

Looks good!

> However, the problem comes when trying to make that work for all of
> the maintenance tests, specifically the 'incremental-repack' task.
> That task dynamically computes a --batch-size=X parameter, and that
> is not stable across runs of the script.
>
> This was avoided in the past by only checking for the first of three
> subcommands when verifying that the 'incremental-repack' task worked.
> That is, except for the EXPENSIVE test that checks that the --batch-size
> maxes out at 2g.
>
> The thing that might make these changing parameters work is to allow
> the specified lines be a _prefix_ of the actual parameters. Or, let
> each line be a pattern that is checked against that line. Issues come
> up with how to handle this line-by-line check that I was unable to
> overcome.
>
> The good news is that the idea of adding a '--prefetch' option to
> 'git fetch' makes the change to t7900-maintenance.sh much easier,
> making this change to test_subcommand less of a priority.
>
> I include my attempt here as a patch. Feel free to take whatever
> you want of it, or none of it and start over. I do think that it
> makes the test script look much nicer.

I think a good way to deal with that is to have a lower-level helper
function that doesn't do the test_cmp, and instead just runs the
command, and leaves the stdout/stderr files in-place for another "check"
helper.

On a WIP topic I split up the t0000-basic.sh "test test-lib.sh itself"
code to do pretty much that:
https://github.com/avar/git/blob/avar/support-test-verbose-under-prove-2/t/lib-subtest.sh

It's basically back to the existing model of "run a command and grep it
later", except that we can pass the JSON through some basic parser, and
extract common cases like test_cmp, prefix munging before test_cmp etc.
diff mbox series

Patch

diff --git a/Documentation/git-maintenance.txt b/Documentation/git-maintenance.txt
index 80ddd33ceba0..95a24264eb10 100644
--- a/Documentation/git-maintenance.txt
+++ b/Documentation/git-maintenance.txt
@@ -94,8 +94,7 @@  prefetch::
 	objects from all registered remotes. For each remote, a `git fetch`
 	command is run. The refmap is custom to avoid updating local or remote
 	branches (those in `refs/heads` or `refs/remotes`). Instead, the
-	remote refs are stored in `refs/prefetch/<remote>/`. Also, tags are
-	not updated.
+	refs are stored in `refs/prefetch/`. Also, tags are not updated.
 +
 This is done to avoid disrupting the remote-tracking branches. The end users
 expect these refs to stay unmoved unless they initiate a fetch.  With prefetch
diff --git a/builtin/gc.c b/builtin/gc.c
index fa8128de9ae1..92cb8b4e0bfa 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -32,6 +32,7 @@ 
 #include "remote.h"
 #include "object-store.h"
 #include "exec-cmd.h"
+#include "refspec.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -877,6 +878,7 @@  static int fetch_remote(struct remote *remote, void *cbdata)
 {
 	struct maintenance_run_opts *opts = cbdata;
 	struct child_process child = CHILD_PROCESS_INIT;
+	int i;
 
 	child.git_cmd = 1;
 	strvec_pushl(&child.args, "fetch", remote->name, "--prune", "--no-tags",
@@ -886,7 +888,37 @@  static int fetch_remote(struct remote *remote, void *cbdata)
 	if (opts->quiet)
 		strvec_push(&child.args, "--quiet");
 
-	strvec_pushf(&child.args, "+refs/heads/*:refs/prefetch/%s/*", remote->name);
+	for (i = 0; i < remote->fetch.nr; i++) {
+		struct refspec_item replace;
+		struct refspec_item *rsi = &remote->fetch.items[i];
+		struct strbuf new_dst = STRBUF_INIT;
+		size_t ignore_len = 0;
+
+		if (rsi->negative) {
+			strvec_push(&child.args, remote->fetch.raw[i]);
+			continue;
+		}
+
+		refspec_item_init(&replace, remote->fetch.raw[i], 1);
+
+		/*
+		 * If a refspec dst starts with "refs/" at the start,
+		 * then we will replace "refs/" with "refs/prefetch/".
+		 * Otherwise, we will prepend the dst string with
+		 * "refs/prefetch/".
+		 */
+		if (!strncmp(replace.dst, "refs/", 5))
+			ignore_len = 5;
+
+		strbuf_addstr(&new_dst, "refs/prefetch/");
+		strbuf_addstr(&new_dst, replace.dst + ignore_len);
+		free(replace.dst);
+		replace.dst = strbuf_detach(&new_dst, NULL);
+
+		strvec_push(&child.args, refspec_item_format(&replace));
+
+		refspec_item_clear(&replace);
+	}
 
 	return !!run_command(&child);
 }
diff --git a/t/t7900-maintenance.sh b/t/t7900-maintenance.sh
index fc2315edec11..3366ea188782 100755
--- a/t/t7900-maintenance.sh
+++ b/t/t7900-maintenance.sh
@@ -142,20 +142,51 @@  test_expect_success 'prefetch multiple remotes' '
 	test_commit -C clone2 two &&
 	GIT_TRACE2_EVENT="$(pwd)/run-prefetch.txt" git maintenance run --task=prefetch 2>/dev/null &&
 	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
-	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remote1/* <run-prefetch.txt &&
-	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remote2/* <run-prefetch.txt &&
+	test_subcommand git fetch remote1 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote1/* <run-prefetch.txt &&
+	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <run-prefetch.txt &&
 	test_path_is_missing .git/refs/remotes &&
-	git log prefetch/remote1/one &&
-	git log prefetch/remote2/two &&
+	git log prefetch/remotes/remote1/one &&
+	git log prefetch/remotes/remote2/two &&
 	git fetch --all &&
-	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remote1/one &&
-	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remote2/two &&
+	test_cmp_rev refs/remotes/remote1/one refs/prefetch/remotes/remote1/one &&
+	test_cmp_rev refs/remotes/remote2/two refs/prefetch/remotes/remote2/two &&
 
 	test_cmp_config refs/prefetch/ log.excludedecoration &&
 	git log --oneline --decorate --all >log &&
 	! grep "prefetch" log
 '
 
+test_expect_success 'prefetch custom refspecs' '
+	git -C clone1 branch -f special/fetched HEAD &&
+	git -C clone1 branch -f special/secret/not-fetched HEAD &&
+
+	# create multiple refspecs for remote1
+	git config --add remote.remote1.fetch +refs/heads/special/fetched:refs/heads/fetched &&
+	git config --add remote.remote1.fetch ^refs/heads/special/secret/not-fetched &&
+
+	GIT_TRACE2_EVENT="$(pwd)/prefetch-refspec.txt" git maintenance run --task=prefetch 2>/dev/null &&
+
+	fetchargs="--prune --no-tags --no-write-fetch-head --recurse-submodules=no --refmap= --quiet" &&
+
+	# skips second refspec because it is not a pattern type
+	rs1="+refs/heads/*:refs/prefetch/remotes/remote1/*" &&
+	rs2="+refs/heads/special/fetched:refs/prefetch/heads/fetched" &&
+	rs3="^refs/heads/special/secret/not-fetched" &&
+
+	test_subcommand git fetch remote1 $fetchargs $rs1 $rs2 $rs3 <prefetch-refspec.txt &&
+	test_subcommand git fetch remote2 $fetchargs +refs/heads/*:refs/prefetch/remotes/remote2/* <prefetch-refspec.txt &&
+
+	# first refspec is overridden by second
+	test_must_fail git rev-parse refs/prefetch/special/fetched &&
+	git rev-parse refs/prefetch/heads/fetched &&
+
+	# possible incorrect places for the non-fetched ref
+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/secret/not-fetched &&
+	test_must_fail git rev-parse refs/prefetch/remotes/remote1/not-fetched &&
+	test_must_fail git rev-parse refs/heads/secret/not-fetched &&
+	test_must_fail git rev-parse refs/heads/not-fetched
+'
+
 test_expect_success 'prefetch and existing log.excludeDecoration values' '
 	git config --unset-all log.excludeDecoration &&
 	git config log.excludeDecoration refs/remotes/remote1/ &&