diff mbox series

[RFC] t/Makefile: use dependency graph for "check-chainlint"

Message ID RFC-patch-1.1-bb3f1577829-20211213T095456Z-avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series [RFC] t/Makefile: use dependency graph for "check-chainlint" | expand

Commit Message

Ævar Arnfjörð Bjarmason Dec. 13, 2021, 10:09 a.m. UTC
This gets rid of this fixed cost when running "make T=<name>":

$ git hyperfine -L rev HEAD~1,HEAD~0 -s 'make -C t check-chainlint' 'make -C t check-chainlint'
Benchmark 1: make -C t check-chainlint' in 'HEAD~1
  Time (mean ± σ):     111.3 ms ±   0.5 ms    [User: 116.9 ms, System: 26.2 ms]
  Range (min … max):   110.5 ms … 112.5 ms    26 runs

Benchmark 2: make -C t check-chainlint' in 'HEAD~0
  Time (mean ± σ):      12.5 ms ±   0.2 ms    [User: 11.5 ms, System: 1.0 ms]
  Range (min … max):    12.1 ms …  13.2 ms    223 runs

Summary
  'make -C t check-chainlint' in 'HEAD~0' ran
    8.92 ± 0.16 times faster than 'make -C t check-chainlint' in 'HEAD~1'
---
On Mon, Dec 13 2021, Eric Sunshine wrote:

> Rather than running `chainlint` and `diff` once per self-test -- which
> may become expensive as more tests are added -- instead run `chainlint`
> a single time over all tests bodies collectively and compare the result
> to the collective "expected" output.

I think that "optimizing" things like this is an anti-pattern. I.e. we
have N chainlint test files, and N potential outputs from that (ok or
not, and with/without error). If one of the chainlint tests changes
we'd like to re-run it, if not we can re-use an earlier run.

This is something make's dependency logic is perfectly suited for, and
will be faster than any optimization of turning a for-loop into a
"sed" command we run every time, since we'll only need to "stat" a few
things to see that there's nothing to do.

I've had the below as part of my local build for a while. It
conflicts/would be improved by my in-flight ab/make-dependency, which
you also ran into conflicts with.

This also has the advantage that you'll see what specific test failed
from the Makefile output. Aside from the ".build" rule needing to be
fixed up to use the "mkdir" trick in ab/make-dependency it should also
use $(QUIET_GEN) or whatever.

I'm not quite happy with the below, the wildcard/patsubst is a bit
messy, and it doesn't properly work around "make test" and how it runs
"clean" after its run, but you'll find that "make chainlint" will
properly cache things with this.

 t/.gitignore |  1 +
 t/Makefile   | 29 +++++++++++++++--------------
 2 files changed, 16 insertions(+), 14 deletions(-)

Comments

Eric Sunshine Dec. 14, 2021, 7:44 a.m. UTC | #1
On Mon, Dec 13, 2021 at 5:09 AM Ævar Arnfjörð Bjarmason
<avarab@gmail.com> wrote:
> On Mon, Dec 13 2021, Eric Sunshine wrote:
> > Rather than running `chainlint` and `diff` once per self-test -- which
> > may become expensive as more tests are added -- instead run `chainlint`
> > a single time over all tests bodies collectively and compare the result
> > to the collective "expected" output.
>
> I think that "optimizing" things like this is an anti-pattern. I.e. we
> have N chainlint test files, and N potential outputs from that (ok or
> not, and with/without error). If one of the chainlint tests changes
> we'd like to re-run it, if not we can re-use an earlier run.

As mentioned in a reply elsewhere, the commit message sells this as an
optimization, but that's not the real reason for the change, which is
that the rewritten `check-chainlint` target for the upcoming new
chainlint really wants to have a composite file of the "test" input
and a composite of the "expect" output. I didn't know how to sell that
change in this preparatory patch series, so I weakly framed it as an
optimization. The reason for making this a preparatory step is that it
makes for a less noisy patch later on when the new chainlint is
actually plugged into the `check-chainlint` target. At least, it was
less noisy originally... in the final implementation, I think it ends
up being noisy anyhow. So, maybe it makes sense to drop this patch
altogether(?).

> This is something make's dependency logic is perfectly suited for, and
> will be faster than any optimization of turning a for-loop into a
> "sed" command we run every time, since we'll only need to "stat" a few
> things to see that there's nothing to do.
>
> +BUILT_CHAINLINTTESTS = $(patsubst %,.build/%.actual,$(CHAINLINTTESTS))
> +
> +.build/chainlint:
> +       mkdir -p $@
> +
> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
> +       $(CHAINLINT) <$< | \
> +       sed -e '/^# LINT: /d' >$@ && \
> +       diff -u $(basename $<).expect $@
> +
> +check-chainlint: $(BUILT_CHAINLINTTESTS)

This sort of optimization makes sense (I think) even with the new
chainlint preferring to see composite "test" and "expect" files rather
than the individual files. The individual files would be prerequisites
of the composite files, thus the composites would only be regenerated
if the individual files change. And the composite files would be
prerequisites of the `check-chainlint` target, so it would only run if
the composite files change (or if chainlint itself changes).

In fact, with the new chainlint checking all tests in all scripts at
once, this technique should apply nicely to it, as well, since the
names of test scripts (t????-*.sh) are fed to it as command-line
arguments. Thus, the t????-*.sh files could be prerequisites of the
chainlint rule which would use $? to check only test scripts which
have changed since the previous run.

Having said all that, I don't think think the changes in this series
or the upcoming new chainlint series make the situation any worse
(Makefile-wise) than its current state, and the sort of optimizations
discussed here can easily be made after those series land. (And, as my
Git time is rather limited these days, I'd really like to focus on
getting the core components landed.)
Ævar Arnfjörð Bjarmason Dec. 14, 2021, 12:34 p.m. UTC | #2
On Tue, Dec 14 2021, Eric Sunshine wrote:

> On Mon, Dec 13, 2021 at 5:09 AM Ævar Arnfjörð Bjarmason
> <avarab@gmail.com> wrote:
>> On Mon, Dec 13 2021, Eric Sunshine wrote:
>> > Rather than running `chainlint` and `diff` once per self-test -- which
>> > may become expensive as more tests are added -- instead run `chainlint`
>> > a single time over all tests bodies collectively and compare the result
>> > to the collective "expected" output.
>>
>> I think that "optimizing" things like this is an anti-pattern. I.e. we
>> have N chainlint test files, and N potential outputs from that (ok or
>> not, and with/without error). If one of the chainlint tests changes
>> we'd like to re-run it, if not we can re-use an earlier run.
>
> As mentioned in a reply elsewhere, the commit message sells this as an
> optimization, but that's not the real reason for the change, which is
> that the rewritten `check-chainlint` target for the upcoming new
> chainlint really wants to have a composite file of the "test" input
> and a composite of the "expect" output. I didn't know how to sell that
> change in this preparatory patch series, so I weakly framed it as an
> optimization. The reason for making this a preparatory step is that it
> makes for a less noisy patch later on when the new chainlint is
> actually plugged into the `check-chainlint` target. At least, it was
> less noisy originally... in the final implementation, I think it ends
> up being noisy anyhow. So, maybe it makes sense to drop this patch
> altogether(?).

I think you should do whatever you think makes sense here, I was just
pointing out that alternate Makefile approach in case it was helpful.

>> This is something make's dependency logic is perfectly suited for, and
>> will be faster than any optimization of turning a for-loop into a
>> "sed" command we run every time, since we'll only need to "stat" a few
>> things to see that there's nothing to do.
>>
>> +BUILT_CHAINLINTTESTS = $(patsubst %,.build/%.actual,$(CHAINLINTTESTS))
>> +
>> +.build/chainlint:
>> +       mkdir -p $@
>> +
>> +$(BUILT_CHAINLINTTESTS): | .build/chainlint
>> +$(BUILT_CHAINLINTTESTS): .build/%.actual: %
>> +       $(CHAINLINT) <$< | \
>> +       sed -e '/^# LINT: /d' >$@ && \
>> +       diff -u $(basename $<).expect $@
>> +
>> +check-chainlint: $(BUILT_CHAINLINTTESTS)
>
> This sort of optimization makes sense (I think) even with the new
> chainlint preferring to see composite "test" and "expect" files rather
> than the individual files. The individual files would be prerequisites
> of the composite files, thus the composites would only be regenerated
> if the individual files change. And the composite files would be
> prerequisites of the `check-chainlint` target, so it would only run if
> the composite files change (or if chainlint itself changes).
>
> In fact, with the new chainlint checking all tests in all scripts at
> once, this technique should apply nicely to it, as well, since the
> names of test scripts (t????-*.sh) are fed to it as command-line
> arguments. Thus, the t????-*.sh files could be prerequisites of the
> chainlint rule which would use $? to check only test scripts which
> have changed since the previous run.

*nod*

> Having said all that, I don't think think the changes in this series
> or the upcoming new chainlint series make the situation any worse
> (Makefile-wise) than its current state, and the sort of optimizations
> discussed here can easily be made after those series land. (And, as my
> Git time is rather limited these days, I'd really like to focus on
> getting the core components landed.)

Sounds good to me. We have plenty of these "doesn't have deps" targets
(e.g. the check shell syntax one you noted), we can just fix/change them
some other time.
diff mbox series

Patch

diff --git a/t/.gitignore b/t/.gitignore
index 91cf5772fe5..f142d6d42fd 100644
--- a/t/.gitignore
+++ b/t/.gitignore
@@ -1,3 +1,4 @@ 
+/.build/
 /trash directory*
 /test-results
 /.prove
diff --git a/t/Makefile b/t/Makefile
index 882d26eee30..9bac0e683c9 100644
--- a/t/Makefile
+++ b/t/Makefile
@@ -18,10 +18,8 @@  TEST_LINT ?= test-lint
 
 ifdef TEST_OUTPUT_DIRECTORY
 TEST_RESULTS_DIRECTORY = $(TEST_OUTPUT_DIRECTORY)/test-results
-CHAINLINTTMP = $(TEST_OUTPUT_DIRECTORY)/chainlinttmp
 else
 TEST_RESULTS_DIRECTORY = test-results
-CHAINLINTTMP = chainlinttmp
 endif
 
 # Shell quote;
@@ -29,13 +27,12 @@  SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))
 TEST_SHELL_PATH_SQ = $(subst ','\'',$(TEST_SHELL_PATH))
 PERL_PATH_SQ = $(subst ','\'',$(PERL_PATH))
 TEST_RESULTS_DIRECTORY_SQ = $(subst ','\'',$(TEST_RESULTS_DIRECTORY))
-CHAINLINTTMP_SQ = $(subst ','\'',$(CHAINLINTTMP))
 
 T = $(sort $(wildcard t[0-9][0-9][0-9][0-9]-*.sh))
 TGITWEB = $(sort $(wildcard t95[0-9][0-9]-*.sh))
 THELPERS = $(sort $(filter-out $(T),$(wildcard *.sh)))
 TPERF = $(sort $(wildcard perf/p[0-9][0-9][0-9][0-9]-*.sh))
-CHAINLINTTESTS = $(sort $(patsubst chainlint/%.test,%,$(wildcard chainlint/*.test)))
+CHAINLINTTESTS = $(wildcard chainlint/*.test)
 CHAINLINT = sed -f chainlint.sed
 
 all: $(DEFAULT_TEST_TARGET)
@@ -67,16 +64,20 @@  clean: clean-except-prove-cache
 	$(RM) .prove
 
 clean-chainlint:
-	$(RM) -r '$(CHAINLINTTMP_SQ)'
-
-check-chainlint:
-	@mkdir -p '$(CHAINLINTTMP_SQ)' && \
-	err=0 && \
-	for i in $(CHAINLINTTESTS); do \
-		$(CHAINLINT) <chainlint/$$i.test | \
-		sed -e '/^# LINT: /d' >'$(CHAINLINTTMP_SQ)'/$$i.actual && \
-		diff -u chainlint/$$i.expect '$(CHAINLINTTMP_SQ)'/$$i.actual || err=1; \
-	done && exit $$err
+	$(RM) -r .build/chainlint
+
+BUILT_CHAINLINTTESTS = $(patsubst %,.build/%.actual,$(CHAINLINTTESTS))
+
+.build/chainlint:
+	mkdir -p $@
+
+$(BUILT_CHAINLINTTESTS): | .build/chainlint
+$(BUILT_CHAINLINTTESTS): .build/%.actual: %
+	$(CHAINLINT) <$< | \
+	sed -e '/^# LINT: /d' >$@ && \
+	diff -u $(basename $<).expect $@
+
+check-chainlint: $(BUILT_CHAINLINTTESTS)
 
 test-lint: test-lint-duplicates test-lint-executable test-lint-shell-syntax \
 	test-lint-filenames