[05/15] t/Makefile: optimize chainlint self-test

Message ID	20211213063059.19424-6-sunshine@sunshineco.com (mailing list archive)
State	Accepted
Commit	f30c1d5eb1ceeec460ea4cd2089ece63156f5eb0
Headers	show Return-Path: <git-owner@kernel.org> Sender: Eric Sunshine <ericsunshine@gmail.com> From: Eric Sunshine <sunshine@sunshineco.com> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Elijah Newren <newren@gmail.com>, Fabian Stelzer <fs@gigacodes.de>, Eric Sunshine <sunshine@sunshineco.com> Subject: [PATCH 05/15] t/Makefile: optimize chainlint self-test Date: Mon, 13 Dec 2021 01:30:49 -0500 Message-Id: <20211213063059.19424-6-sunshine@sunshineco.com> In-Reply-To: <20211213063059.19424-1-sunshine@sunshineco.com> References: <20211213063059.19424-1-sunshine@sunshineco.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	generalize chainlint self-tests \| expand [00/15] generalize chainlint self-tests [01/15] t/chainlint/.test: don't use invalid shell syntax [02/15] t/chainlint/.test: fix invalid test cases due to mixing quote types [03/15] t/chainlint/*.test: generalize self-test commentary [04/15] t/chainlint/one-liner: avoid overly intimate chainlint.sed knowledge [05/15] t/Makefile: optimize chainlint self-test [06/15] chainlint.sed: improve ?!AMP?! placement accuracy [07/15] chainlint.sed: improve ?!SEMI?! placement accuracy [08/15] chainlint.sed: tolerate harmless ";" at end of last line in block [09/15] chainlint.sed: drop unnecessary distinction between ?!AMP?! and ?!SEMI?! [10/15] chainlint.sed: drop subshell-closing ">" annotation [11/15] chainlint.sed: make here-doc "<<-" operator recognition more POSIX-like [12/15] chainlint.sed: don't mistake `<< word` in string as here-doc operator [13/15] chainlint.sed: stop throwing away here-doc tags [14/15] chainlint.sed: swallow comments consistently [15/15] chainlint.sed: stop splitting "(..." into separate lines "(" and "..."

Eric Sunshine Dec. 13, 2021, 6:30 a.m. UTC

Rather than running `chainlint` and `diff` once per self-test -- which
may become expensive as more tests are added -- instead run `chainlint`
a single time over all tests bodies collectively and compare the result
to the collective "expected" output.

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
---
 t/Makefile | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

Fabian Stelzer Dec. 13, 2021, 10:22 a.m. UTC | #1

On 13.12.2021 01:30, Eric Sunshine wrote:
>Rather than running `chainlint` and `diff` once per self-test -- which
>may become expensive as more tests are added -- instead run `chainlint`
>a single time over all tests bodies collectively and compare the result
>to the collective "expected" output.
>
>Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
>---
> t/Makefile | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
>diff --git a/t/Makefile b/t/Makefile
>index 882d26eee3..f4ae40be46 100644
>--- a/t/Makefile
>+++ b/t/Makefile
>@@ -71,12 +71,10 @@ clean-chainlint:
>
> check-chainlint:
> 	@mkdir -p '$(CHAINLINTTMP_SQ)' && \
>-	err=0 && \
>-	for i in $(CHAINLINTTESTS); do \
>-		$(CHAINLINT) <chainlint/$$i.test | \
>-		sed -e '/^# LINT: /d' >'$(CHAINLINTTMP_SQ)'/$$i.actual && \
>-		diff -u chainlint/$$i.expect '$(CHAINLINTTMP_SQ)'/$$i.actual || err=1; \
>-	done && exit $$err
>+	sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
>+	cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
>+	$(CHAINLINT) '$(CHAINLINTTMP_SQ)'/tests >'$(CHAINLINTTMP_SQ)'/actual && \
>+	diff -u '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual
>

If I read this right you are relying on the order of the .test & .expect 
files to match. I did something similar in a test prereq which resulted in a 
bug when setting the test_output_dir to something residing in /dev/shm, 
since the order of files in /dev/shm is reversed (at least on some 
platforms). Even though this should work as is I could see this leading to a 
similar bug in the future.

Eric Sunshine Dec. 13, 2021, 2:27 p.m. UTC | #2

On Mon, Dec 13, 2021 at 5:22 AM Fabian Stelzer <fs@gigacodes.de> wrote:
> On 13.12.2021 01:30, Eric Sunshine wrote:
> > check-chainlint:
> >+      sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
> >+      cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
> >+      $(CHAINLINT) '$(CHAINLINTTMP_SQ)'/tests >'$(CHAINLINTTMP_SQ)'/actual && \
> >+      diff -u '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual
>
> If I read this right you are relying on the order of the .test & .expect
> files to match. I did something similar in a test prereq which resulted in a
> bug when setting the test_output_dir to something residing in /dev/shm,
> since the order of files in /dev/shm is reversed (at least on some
> platforms). Even though this should work as is I could see this leading to a
> similar bug in the future.

It's not seen in the patch context, but earlier in the file we have:

    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))

which provides stability via `sort`, thus ensures that the order of
the ".test" and ".expect" match.

I think that addresses your concern (unless I misunderstand your observation).

Fabian Stelzer Dec. 13, 2021, 3:43 p.m. UTC | #3

On 13.12.2021 09:27, Eric Sunshine wrote:
>On Mon, Dec 13, 2021 at 5:22 AM Fabian Stelzer <fs@gigacodes.de> wrote:
>> On 13.12.2021 01:30, Eric Sunshine wrote:
>> > check-chainlint:
>> >+      sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
>> >+      cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
>> >+      $(CHAINLINT) '$(CHAINLINTTMP_SQ)'/tests >'$(CHAINLINTTMP_SQ)'/actual && \
>> >+      diff -u '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual
>>
>> If I read this right you are relying on the order of the .test & .expect
>> files to match. I did something similar in a test prereq which resulted in a
>> bug when setting the test_output_dir to something residing in /dev/shm,
>> since the order of files in /dev/shm is reversed (at least on some
>> platforms). Even though this should work as is I could see this leading to a
>> similar bug in the future.
>
>It's not seen in the patch context, but earlier in the file we have:
>
>    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))
>
>which provides stability via `sort`, thus ensures that the order of
>the ".test" and ".expect" match.
>
>I think that addresses your concern (unless I misunderstand your observation).

Yes, thats what i meant. I didn't realize $CHAINLINTTESTS is already the 
sorted glob. Thanks for clarifying.

Personally i find the initial for loop variant to be the most readable.  
Ævars makefile targets could be very nice too, but especially:

+$(BUILT_CHAINLINTTESTS): | .build/chainlint
+$(BUILT_CHAINLINTTESTS): .build/%.actual: %
+       $(CHAINLINT) <$< | \
+	 sed -e '/^# LINT: /d' >$@ && \
+       diff -u $(basename $<).expect $@

i find very hard to grasp :/
I have no idea what is going on here: `<$< |` ?

Eric Sunshine Dec. 13, 2021, 4:02 p.m. UTC | #4

On Mon, Dec 13, 2021 at 10:43 AM Fabian Stelzer <fs@gigacodes.de> wrote:
> Personally i find the initial for loop variant to be the most readable.
> Ævars makefile targets could be very nice too, but especially:
>
> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
> +       $(CHAINLINT) <$< | \
> +        sed -e '/^# LINT: /d' >$@ && \
> +       diff -u $(basename $<).expect $@
>
> i find very hard to grasp :/
> I have no idea what is going on here: `<$< |` ?

Ya, that line-noise is an unfortunate combination of shell and
Makefile gobbledygook. The `$<` is effectively the source file (the
file being passed into chainlint.sed), and the rest of it is just
normal shell. `<` is redirection (using the source file `$<` as
stdin), and `|` is the pipe operator (sending the output of
chainlint.sed to another `sed`), and capturing it all via shell `>`
redirection in `$@` which is the Makefile variable for the target
file.

Anyhow, although the commit message tries to sell this change as some
sort of optimization, it's really in preparation for the new chainlint
which wants to check all tests in all files with a single invocation
rather than being invoked over and over and over. The self-test files
also require more preprocessing to work with the new chainlint, so the
implementation of `check-chainlint` gets rather more complex once the
end state is reached. I'll think about it a bit, but at the moment,
I'm still leaning toward this intermediate step as being beneficial to
reaching the end state. However, my opinion could change since the way
this is done here was probably influenced by an earlier iteration of
the new chainlint, but now that the implementation of the new
chainlint is concrete, it may not be especially important to do it
this way.

Ævar Arnfjörð Bjarmason Dec. 13, 2021, 4:11 p.m. UTC | #5

On Mon, Dec 13 2021, Eric Sunshine wrote:

> On Mon, Dec 13, 2021 at 10:43 AM Fabian Stelzer <fs@gigacodes.de> wrote:
>> Personally i find the initial for loop variant to be the most readable.
>> Ævars makefile targets could be very nice too, but especially:
>>
>> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
>> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
>> +       $(CHAINLINT) <$< | \
>> +        sed -e '/^# LINT: /d' >$@ && \
>> +       diff -u $(basename $<).expect $@
>>
>> i find very hard to grasp :/
>> I have no idea what is going on here: `<$< |` ?
>
> Ya, that line-noise is an unfortunate combination of shell and
> Makefile gobbledygook. The `$<` is effectively the source file (the
> file being passed into chainlint.sed), and the rest of it is just
> normal shell. `<` is redirection (using the source file `$<` as
> stdin), and `|` is the pipe operator (sending the output of
> chainlint.sed to another `sed`), and capturing it all via shell `>`
> redirection in `$@` which is the Makefile variable for the target
> file.

To add to that;
https://www.gnu.org/software/make/manual/html_node/Rules.html#Rules and
other relevant parts of the GNU make manual are very helpful here.

> Anyhow, although the commit message tries to sell this change as some
> sort of optimization, it's really in preparation for the new chainlint
> which wants to check all tests in all files with a single invocation
> rather than being invoked over and over and over. The self-test files
> also require more preprocessing to work with the new chainlint, so the
> implementation of `check-chainlint` gets rather more complex once the
> end state is reached. I'll think about it a bit, but at the moment,
> I'm still leaning toward this intermediate step as being beneficial to
> reaching the end state. However, my opinion could change since the way
> this is done here was probably influenced by an earlier iteration of
> the new chainlint, but now that the implementation of the new
> chainlint is concrete, it may not be especially important to do it
> this way.

I don't really care about the details of whether it's invoked once or N
times, although I think the N times with proper dependencies tends to
give you better error messages, but maybe you'll be changing it
significantly enough that the current map between chainlint files and
approximately what sort of thing they check won't be there anymore.

In any case, I'd think that a rule that used $< now (i.e. 1=1 file->out
prereq) would be better for the current state, and could just be changed
to use one of $^ or $+ later.

I.e. you can declare a "check.done" that depends on {1..10}.todo, and
get a list of all of those {1..10}.todo files if one changes, or just
the ones whose mtime is newer than a "check.done".

The reason I looked at this to begin with is that it takes it ~100-150ms
to run now, which adds up if you're e.g. using "make test T=<glob>" in
"git rebase -i --exec".

Fabian Stelzer Dec. 13, 2021, 4:14 p.m. UTC | #6

On 13.12.2021 11:02, Eric Sunshine wrote:
>On Mon, Dec 13, 2021 at 10:43 AM Fabian Stelzer <fs@gigacodes.de> wrote:
>> Personally i find the initial for loop variant to be the most readable.
>> Ævars makefile targets could be very nice too, but especially:
>>
>> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
>> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
>> +       $(CHAINLINT) <$< | \
>> +        sed -e '/^# LINT: /d' >$@ && \
>> +       diff -u $(basename $<).expect $@
>>
>> i find very hard to grasp :/
>> I have no idea what is going on here: `<$< |` ?
>
>Ya, that line-noise is an unfortunate combination of shell and
>Makefile gobbledygook. The `$<` is effectively the source file (the
>file being passed into chainlint.sed), and the rest of it is just
>normal shell. `<` is redirection (using the source file `$<` as
>stdin), and `|` is the pipe operator (sending the output of
>chainlint.sed to another `sed`), and capturing it all via shell `>`
>redirection in `$@` which is the Makefile variable for the target
>file.
>

Thanks, that explains it nicely. I'm not familiar enough with makefile 
syntax.

>Anyhow, although the commit message tries to sell this change as some
>sort of optimization, it's really in preparation for the new chainlint
>which wants to check all tests in all files with a single invocation
>rather than being invoked over and over and over. The self-test files
>also require more preprocessing to work with the new chainlint, so the
>implementation of `check-chainlint` gets rather more complex once the
>end state is reached. I'll think about it a bit, but at the moment,
>I'm still leaning toward this intermediate step as being beneficial to
>reaching the end state. However, my opinion could change since the way
>this is done here was probably influenced by an earlier iteration of
>the new chainlint, but now that the implementation of the new
>chainlint is concrete, it may not be especially important to do it
>this way.

I don't mind much either way and i tend to favor the efficient variant as 
well. On the other hand i can also see some bug mixing up the contents of 
these files producing a huge diff with no good indication for the developer 
of what has happened.

Eric Sunshine Dec. 13, 2021, 5:05 p.m. UTC | #7

On Mon, Dec 13, 2021 at 11:17 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Mon, Dec 13 2021, Eric Sunshine wrote:
> > On Mon, Dec 13, 2021 at 10:43 AM Fabian Stelzer <fs@gigacodes.de> wrote:
> >> I have no idea what is going on here: `<$< |` ?
> >
> > Ya, that line-noise is an unfortunate combination of shell and
> > Makefile gobbledygook. The `$<` is effectively the source file (the
> > file being passed into chainlint.sed), and the rest of it is just
> > normal shell. `<` is redirection (using the source file `$<` as
> > stdin), and `|` is the pipe operator (sending the output of
> > chainlint.sed to another `sed`), and capturing it all via shell `>`
> > redirection in `$@` which is the Makefile variable for the target
> > file.
>
> To add to that;
> https://www.gnu.org/software/make/manual/html_node/Rules.html#Rules and
> other relevant parts of the GNU make manual are very helpful here.

And the Makefile variables $< and $@, in particular, are documented here:
https://www.gnu.org/software/make/manual/html_node/Automatic-Variables.html

> I don't really care about the details of whether it's invoked once or N
> times, although I think the N times with proper dependencies tends to
> give you better error messages, but maybe you'll be changing it
> significantly enough that the current map between chainlint files and
> approximately what sort of thing they check won't be there anymore.
>
> In any case, I'd think that a rule that used $< now (i.e. 1=1 file->out
> prereq) would be better for the current state, and could just be changed
> to use one of $^ or $+ later.
>
> I.e. you can declare a "check.done" that depends on {1..10}.todo, and
> get a list of all of those {1..10}.todo files if one changes, or just
> the ones whose mtime is newer than a "check.done".
>
> The reason I looked at this to begin with is that it takes it ~100-150ms
> to run now, which adds up if you're e.g. using "make test T=<glob>" in
> "git rebase -i --exec".

Regarding this last point, one idea I strongly considered (and have
not rejected) is to stop making `check-chainlin` a dependency of
`test` and `prove`. Unlike most of the test suite, in which a change
to any part of the Git source code could potentially cause any test to
fail -- thus, it is important to run the full test suite for any
source code change -- the `check-chainlint` target is completely
isolated from everything else; it only checks whether `chainlint`
itself functions correctly. The only time it really makes sense to run
`check-chainlint` is when chainlint itself is changed in order to
verify that it still functions as expected. Considering how
infrequently (i.e. never) chainlint is modified, it seems somewhat
silly for every `make test` or `make prove` invoked by anybody
anywhere to repeatedly and forever validate chainlint[*]. Instead, it
could be the responsibility of the person modifying chainlint to run
the `check-chainlint` self-tests.

[*]: There is at least one exception. Various implementations of `sed`
could behave differently, thus impacting the behavior of
chainlint.sed. This is not just a theoretical concern. I did all the
development of this series on macOS, where everything worked as
intended. Shortly before sending the series to the list, I subjected
it to other platforms via CI and found that it failed on Linux due to
minor behavioral differences in `sed` on Linux (though, very oddly, it
worked just fine on Windows). I might not have caught this problem if
`check-chainlint` had not been run automatically by `make test`.

Eric Sunshine Dec. 13, 2021, 5:25 p.m. UTC | #8

On Mon, Dec 13, 2021 at 12:05 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
> On Mon, Dec 13, 2021 at 11:17 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
> > The reason I looked at this to begin with is that it takes it ~100-150ms
> > to run now, which adds up if you're e.g. using "make test T=<glob>" in
> > "git rebase -i --exec".
>
> Regarding this last point, one idea I strongly considered (and have
> not rejected) is to stop making `check-chainlin` a dependency of
> `test` and `prove`. [...]

Another less sledge-hammer approach would be to make t/Makefile
respect GIT_TEST_CHAIN_LINT so that it doesn't run `check-chainlint`
for `test` and `prove` when that variable is `0`. That would allow
your `git rebase -i --exec` case to avoid the wasted extra overhead of
`check-chainlint` (and chainlint in general).

Ævar Arnfjörð Bjarmason Dec. 13, 2021, 7:33 p.m. UTC | #9

On Mon, Dec 13 2021, Eric Sunshine wrote:

> On Mon, Dec 13, 2021 at 12:05 PM Eric Sunshine <sunshine@sunshineco.com> wrote:
>> On Mon, Dec 13, 2021 at 11:17 AM Ævar Arnfjörð Bjarmason
>> <avarab@gmail.com> wrote:
>> > The reason I looked at this to begin with is that it takes it ~100-150ms
>> > to run now, which adds up if you're e.g. using "make test T=<glob>" in
>> > "git rebase -i --exec".
>>
>> Regarding this last point, one idea I strongly considered (and have
>> not rejected) is to stop making `check-chainlin` a dependency of
>> `test` and `prove`. [...]
>
> Another less sledge-hammer approach would be to make t/Makefile
> respect GIT_TEST_CHAIN_LINT so that it doesn't run `check-chainlint`
> for `test` and `prove` when that variable is `0`. That would allow
> your `git rebase -i --exec` case to avoid the wasted extra overhead of
> `check-chainlint` (and chainlint in general).

Yes, but I just don't see why it's needed.

We need to build e.g. t/helpers/ to run the tests, and doing that is
probably always going to take eons of compilation times compared to
these assertions.

But it's a one-off eon because we declare the dependency DAG and don't
recompile them unless the sources or their dependencies change. Likewise
for these chainlint tests we can (and maybe should) make them optional,
but as long as we're not needlessly running them when no changes have
happened...

Eric Sunshine Dec. 13, 2021, 9:37 p.m. UTC | #10

On Mon, Dec 13, 2021 at 2:36 PM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Mon, Dec 13 2021, Eric Sunshine wrote:
> > Another less sledge-hammer approach would be to make t/Makefile
> > respect GIT_TEST_CHAIN_LINT so that it doesn't run `check-chainlint`
> > for `test` and `prove` when that variable is `0`. That would allow
> > your `git rebase -i --exec` case to avoid the wasted extra overhead of
> > `check-chainlint` (and chainlint in general).
>
> Yes, but I just don't see why it's needed.
>
> We need to build e.g. t/helpers/ to run the tests, and doing that is
> probably always going to take eons of compilation times compared to
> these assertions.
>
> But it's a one-off eon because we declare the dependency DAG and don't
> recompile them unless the sources or their dependencies change. Likewise
> for these chainlint tests we can (and maybe should) make them optional,
> but as long as we're not needlessly running them when no changes have
> happened...

That's a good point, and the same observation applies to
t/check-non-portable-shell.pl which is run unconditionally, as well,
whenever `test` and `prove` are run.

Ævar Arnfjörð Bjarmason Dec. 16, 2021, 1:17 p.m. UTC | #11

On Mon, Dec 13 2021, Fabian Stelzer wrote:

> On 13.12.2021 09:27, Eric Sunshine wrote:
>>On Mon, Dec 13, 2021 at 5:22 AM Fabian Stelzer <fs@gigacodes.de> wrote:
>>> On 13.12.2021 01:30, Eric Sunshine wrote:
>>> > check-chainlint:
>>> >+      sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
>>> >+      cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
>>> >+      $(CHAINLINT) '$(CHAINLINTTMP_SQ)'/tests >'$(CHAINLINTTMP_SQ)'/actual && \
>>> >+      diff -u '$(CHAINLINTTMP_SQ)'/expect '$(CHAINLINTTMP_SQ)'/actual
>>>
>>> If I read this right you are relying on the order of the .test & .expect
>>> files to match. I did something similar in a test prereq which resulted in a
>>> bug when setting the test_output_dir to something residing in /dev/shm,
>>> since the order of files in /dev/shm is reversed (at least on some
>>> platforms). Even though this should work as is I could see this leading to a
>>> similar bug in the future.
>>
>>It's not seen in the patch context, but earlier in the file we have:
>>
>>    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))
>>
>>which provides stability via `sort`, thus ensures that the order of
>>the ".test" and ".expect" match.
>>
>>I think that addresses your concern (unless I misunderstand your observation).
>
> Yes, thats what i meant. I didn't realize $CHAINLINTTESTS is already
> the sorted glob. Thanks for clarifying.
>
> Personally i find the initial for loop variant to be the most
> readable.  Ævars makefile targets could be very nice too, but
> especially:
>
> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
> +       $(CHAINLINT) <$< | \
> +	 sed -e '/^# LINT: /d' >$@ && \
> +       diff -u $(basename $<).expect $@
>
> i find very hard to grasp :/
> I have no idea what is going on here: `<$< |` ?

It's a minor point, and not relevant to this series proceeding.

But just FWIW I think both of you are wrong about the potenital for a
".test" and ".expect" bug here.

I.e. yes the CHAINLINTTESTS variable is sorted:

    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))

But in Eric's patch we just have this relevant to this concern of
(paraphrased) "would it not be sorted break it?":

	+	sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
	+	cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \

So it doesn't matter if it's sorted our not.

I.e. we've got {A,B,C}.{test,expect} files in a directory, and we're
constructing a "A.test" and "A.expect" via "$(patsubst)".

So if it's "A B C", "C B A", "A C B" etc. won't matter. We'll always get
".test" files corresponding to ".expect".

If it's not sorted we'll get failure output in an unexpected order, but
it won't matter to whether we detect a failure or not.

Eric Sunshine Dec. 16, 2021, 3:47 p.m. UTC | #12

On Thu, Dec 16, 2021 at 8:22 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> > On 13.12.2021 09:27, Eric Sunshine wrote:
> >>It's not seen in the patch context, but earlier in the file we have:
> >>
> >>    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))
> >>
> >>which provides stability via `sort`, thus ensures that the order of
> >>the ".test" and ".expect" match.
> >>
> >>I think that addresses your concern (unless I misunderstand your observation).
>
> But just FWIW I think both of you are wrong about the potenital for a
> ".test" and ".expect" bug here.
>
> I.e. yes the CHAINLINTTESTS variable is sorted:
>
> But in Eric's patch we just have this relevant to this concern of
> (paraphrased) "would it not be sorted break it?":
>
>         +       sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
>         +       cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
>
> So it doesn't matter if it's sorted our not.
>
> I.e. we've got {A,B,C}.{test,expect} files in a directory, and we're
> constructing a "A.test" and "A.expect" via "$(patsubst)".
>
> So if it's "A B C", "C B A", "A C B" etc. won't matter. We'll always get
> ".test" files corresponding to ".expect".

Yes, sorry, I meant to say something along these lines in my reply, in
addition to mentioning `sort`, but forgot. Taking a look at this
again, though, makes me wonder if the CHAINLINTTESTS assignment should
be done with `:=` rather than `=` (unless GNU make is smart enough to
only invoke the `wildcard` operation only once, in which case it
wouldn't particularly matter).

Ævar Arnfjörð Bjarmason Dec. 16, 2021, 7:26 p.m. UTC | #13

On Thu, Dec 16 2021, Eric Sunshine wrote:

> On Thu, Dec 16, 2021 at 8:22 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> > On 13.12.2021 09:27, Eric Sunshine wrote:
>> >>It's not seen in the patch context, but earlier in the file we have:
>> >>
>> >>    CHAINLINTTESTS = $(sort $(...,$(wildcard chainlint/*.test)))
>> >>
>> >>which provides stability via `sort`, thus ensures that the order of
>> >>the ".test" and ".expect" match.
>> >>
>> >>I think that addresses your concern (unless I misunderstand your observation).
>>
>> But just FWIW I think both of you are wrong about the potenital for a
>> ".test" and ".expect" bug here.
>>
>> I.e. yes the CHAINLINTTESTS variable is sorted:
>>
>> But in Eric's patch we just have this relevant to this concern of
>> (paraphrased) "would it not be sorted break it?":
>>
>>         +       sed -e '/^# LINT: /d' $(patsubst %,chainlint/%.test,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/tests && \
>>         +       cat $(patsubst %,chainlint/%.expect,$(CHAINLINTTESTS)) >'$(CHAINLINTTMP_SQ)'/expect && \
>>
>> So it doesn't matter if it's sorted our not.
>>
>> I.e. we've got {A,B,C}.{test,expect} files in a directory, and we're
>> constructing a "A.test" and "A.expect" via "$(patsubst)".
>>
>> So if it's "A B C", "C B A", "A C B" etc. won't matter. We'll always get
>> ".test" files corresponding to ".expect".
>
> Yes, sorry, I meant to say something along these lines in my reply, in
> addition to mentioning `sort`, but forgot. Taking a look at this
> again, though, makes me wonder if the CHAINLINTTESTS assignment should
> be done with `:=` rather than `=` (unless GNU make is smart enough to
> only invoke the `wildcard` operation only once, in which case it
> wouldn't particularly matter).

It appears to be smart enough to do that, i.e. it'll invoke a $(shell)
assignment N times for -jN unless you use simply-expanded variables, but
not for a simple wildcard.

But I really don't think that'll matter, doing that I/O is trivial
compared to other things involved there.

If for some weird reson (e.g. hundreds of thousands of dependency files)
you find yourself trying to optimize make runs on this basis, this is
the wrong way to go about it.

Instead you'd have a rule depending on the contain directory, whose
mtime will be updated should anything there change. You shouldn't need
that trickery for anything the size of the git codebase, but that
approach will beat trying to optimize the O(n) lstat() calls you'll
otherwise be doing on the contents of the directory.

[05/15] t/Makefile: optimize chainlint self-test

Commit Message

Comments

Patch