diff mbox series

[v4,2/4] Makefile/coccicheck: speed up and fix bug with duplicate hunks

Message ID 3bca3239cb84488a1638f2bb269392f47f78c6da.1616414951.git.avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series Makefile/coccicheck: fix bugs and speed it up | expand

Commit Message

Ævar Arnfjörð Bjarmason March 22, 2021, 12:11 p.m. UTC
Change the coccicheck target to run on all of our *.c and *.h files
with --include-headers-for-types, instead of trusting it to find *.h
files and other includes to modify from its recursive walking of
includes as it has been doing with only --all-includes.

The --all-includes option introduced in a9a884aea57 (coccicheck: use
--all-includes by default, 2016-09-30) is needed because we have rules
that e.g. use the "type T" construct that wouldn't match unless we
scoured our headers for the definition of the relevant type.

But combining --all-includes it with processing N files at a time with
xargs as we've done since 960154b9c17 (coccicheck: optionally batch
spatch invocations, 2019-05-06) introduced a subtle bug with duplicate
hunks being written to the generated *.patch files. I did not dig down
to the root cause of that, but I think it's due to spatch doing (and
failing to do) some magical locking/racy mtime checking to decide if
it emits a hunk. See [1] for a way to reproduce the issue, and a
discussion of it.

This change speeds up the runtime of "make -j8 coccicheck" from ~1m50s
to ~1m20s for me. See [2] for more timings.

We could also use --no-includes for a runtime of ~55s, but that would
produce broken patches (we miss some hunks) in cases where we need the
type or other definitions from included files.

If anyone cares there's probably an optimization opportunity here to
e.g. detect that the whole file doesn't need to consider includes,
since the rules would be unambiguous without considering them.

Note on portability: The --include-headers-for-types option is not in
my "man spatch", but it's been part of spatch since 2014. See its
fe3a327a (include headers for types option, 2014-07-27), it was first
released with version 1.0.0 of spatch, released in April 2015. The
spatch developers are just inconsistent about updating their TeX docs
and manpages at the same time.

1. https://lore.kernel.org/git/20210305170724.23859-3-avarab@gmail.com/
2. https://lore.kernel.org/git/20210306192525.15197-1-avarab@gmail.com/

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

René Scharfe March 22, 2021, 6:05 p.m. UTC | #1
Am 22.03.21 um 13:11 schrieb Ævar Arnfjörð Bjarmason:
> Change the coccicheck target to run on all of our *.c and *.h files
> with --include-headers-for-types, instead of trusting it to find *.h
> files and other includes to modify from its recursive walking of
> includes as it has been doing with only --all-includes.
>
> The --all-includes option introduced in a9a884aea57 (coccicheck: use
> --all-includes by default, 2016-09-30) is needed because we have rules
> that e.g. use the "type T" construct that wouldn't match unless we
> scoured our headers for the definition of the relevant type.
>
> But combining --all-includes it with processing N files at a time with
> xargs as we've done since 960154b9c17 (coccicheck: optionally batch
> spatch invocations, 2019-05-06) introduced a subtle bug with duplicate
> hunks being written to the generated *.patch files. I did not dig down
> to the root cause of that, but I think it's due to spatch doing (and
> failing to do) some magical locking/racy mtime checking to decide if
> it emits a hunk. See [1] for a way to reproduce the issue, and a
> discussion of it.
>
> This change speeds up the runtime of "make -j8 coccicheck" from ~1m50s
> to ~1m20s for me. See [2] for more timings.

Without this patch, /usr/bin/time -l reports:

       95.08 real       598.29 user        32.62 sys
           103727104  maximum resident set size

With --include-headers-for-types, I get this:

       76.37 real       483.83 user        26.77 sys
            97107968  maximum resident set size

This is similar to your numbers.  And adding that option to the demo
script in your [1] gives consistent results for all xargs -n values,
so that option alone fixes the subtle bug you refer to above, right?

However, with the second part of this patch also applied (adding %.h
to FOUND_C_SOURCES), I get this:

       90.38 real       558.38 user        38.32 sys
           100073472  maximum resident set size

Is this still necessary?

>
> We could also use --no-includes for a runtime of ~55s, but that would
> produce broken patches (we miss some hunks) in cases where we need the
> type or other definitions from included files.
>
> If anyone cares there's probably an optimization opportunity here to
> e.g. detect that the whole file doesn't need to consider includes,
> since the rules would be unambiguous without considering them.
>
> Note on portability: The --include-headers-for-types option is not in
> my "man spatch", but it's been part of spatch since 2014. See its
> fe3a327a (include headers for types option, 2014-07-27), it was first
> released with version 1.0.0 of spatch, released in April 2015. The
> spatch developers are just inconsistent about updating their TeX docs
> and manpages at the same time.
>
> 1. https://lore.kernel.org/git/20210305170724.23859-3-avarab@gmail.com/
> 2. https://lore.kernel.org/git/20210306192525.15197-1-avarab@gmail.com/
>
> Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
> ---
>  Makefile | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index eef99b4705d..e43a9618df5 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1199,7 +1199,8 @@ SPARSE_FLAGS ?=
>  SP_EXTRA_FLAGS = -Wno-universal-initializer
>
>  # For the 'coccicheck' target
> -SPATCH_FLAGS = --all-includes --patch .
> +SPATCH_FLAGS = --all-includes --include-headers-for-types --patch .
> +
>  # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
>  # usually result in less CPU usage at the cost of higher peak memory.
>  # Setting it to 0 will feed all files in a single spatch invocation.
> @@ -2860,7 +2861,7 @@ check: config-list.h command-list.h
>  		exit 1; \
>  	fi
>
> -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES)))
> +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES)))
>  COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES))
>
>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>
Junio C Hamano March 22, 2021, 7:09 p.m. UTC | #2
Ævar Arnfjörð Bjarmason  <avarab@gmail.com> writes:

> Change the coccicheck target to run on all of our *.c and *.h files
> with --include-headers-for-types, instead of trusting it to find *.h
> files and other includes to modify from its recursive walking of
> includes as it has been doing with only --all-includes.

Meaning '--all-includes' that is fed a C source would use all the
headers included (recursively) in it, but if we add the other
option, --include-headers-for-types, some *.h files are missed?

If so, the above explains both hunks in the patch well (although it
is unclear where that need to include *.h independently comes from,
e.g. if it is working around a bug in spatch that we may expect for
it to be fixed someday).

> diff --git a/Makefile b/Makefile
> index eef99b4705d..e43a9618df5 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1199,7 +1199,8 @@ SPARSE_FLAGS ?=
>  SP_EXTRA_FLAGS = -Wno-universal-initializer
>  
>  # For the 'coccicheck' target
> -SPATCH_FLAGS = --all-includes --patch .
> +SPATCH_FLAGS = --all-includes --include-headers-for-types --patch .
> +
>  # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
>  # usually result in less CPU usage at the cost of higher peak memory.
>  # Setting it to 0 will feed all files in a single spatch invocation.
> @@ -2860,7 +2861,7 @@ check: config-list.h command-list.h
>  		exit 1; \
>  	fi
>  
> -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES)))
> +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES)))
>  COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES))
>  
>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
Jeff King March 24, 2021, 7:19 p.m. UTC | #3
On Mon, Mar 22, 2021 at 07:05:06PM +0100, René Scharfe. wrote:

> > This change speeds up the runtime of "make -j8 coccicheck" from ~1m50s
> > to ~1m20s for me. See [2] for more timings.
> 
> Without this patch, /usr/bin/time -l reports:
> 
>        95.08 real       598.29 user        32.62 sys
>            103727104  maximum resident set size
> 
> With --include-headers-for-types, I get this:
> 
>        76.37 real       483.83 user        26.77 sys
>             97107968  maximum resident set size
> 
> This is similar to your numbers.  And adding that option to the demo
> script in your [1] gives consistent results for all xargs -n values,
> so that option alone fixes the subtle bug you refer to above, right?
> 
> However, with the second part of this patch also applied (adding %.h
> to FOUND_C_SOURCES), I get this:
> 
>        90.38 real       558.38 user        38.32 sys
>            100073472  maximum resident set size
> 
> Is this still necessary?

If I understand what the spatch options are doing (and I'm not at all
sure that I do), then with --include-headers-for-types, aren't we
avoiding looking for transformations in the included files?

In which case not adding *.h to the list of files we pass means that
we'd fail to find instances in the header files that need to be
mentioned in the output.

-Peff
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index eef99b4705d..e43a9618df5 100644
--- a/Makefile
+++ b/Makefile
@@ -1199,7 +1199,8 @@  SPARSE_FLAGS ?=
 SP_EXTRA_FLAGS = -Wno-universal-initializer
 
 # For the 'coccicheck' target
-SPATCH_FLAGS = --all-includes --patch .
+SPATCH_FLAGS = --all-includes --include-headers-for-types --patch .
+
 # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will
 # usually result in less CPU usage at the cost of higher peak memory.
 # Setting it to 0 will feed all files in a single spatch invocation.
@@ -2860,7 +2861,7 @@  check: config-list.h command-list.h
 		exit 1; \
 	fi
 
-FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES)))
+FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES)))
 COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES))
 
 %.cocci.patch: %.cocci $(COCCI_SOURCES)