Message ID | 20210305170724.23859-3-avarab@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Makefile: fix bugs in coccicheck and speed it up | expand |
On Fri, Mar 05, 2021 at 06:07:22PM +0100, Ævar Arnfjörð Bjarmason wrote: > Using --no-includes also fixes a subtle bug introduced in > 960154b9c17 (coccicheck: optionally batch spatch invocations, > 2019-05-06) with duplicate hunks being written to the > generated *.patch files. > > This is because that change altered a file-at-a-time for-loop to an > invocation of "xargs -n X". This would not matter for most other > programs, but it matters for spatch. > > This is because each spatch invocation will maintain shared lock files > in /tmp, check if files being parsed were changed etc. I haven't dug > into why exactly, but it's easy to reproduce the issue[2]. The issue > goes away entirely if we just use --no-includes, which as noted above > would have made sense even without that issue. This part still doesn't make any sense to me. If we are running with SPATCH_BATCH_SIZE=1, which is the default, then "xargs -n" is still going to run it in file-at-a-time mode. From spatch's perspective, there's no difference between a for-loop and "xargs -n 1" (unless it somehow cares about stdin, but it shouldn't). Using strace, I do see it creating files in /tmp, but they are all named after the process id, and cleaned up before exit. So I don't see how they could interfere with each other (certainly not in a sequential run, but even if you were to use "xargs -P" to get parallel runs, they seem distinct). If we increase the batch size, I'd expect _fewer_ duplicates. Because in file-at-a-time mode with --all-includes, wouldn't every file that mentions an include possibly end up emitting a patch for it? The results you show here (which do replicate for me) imply something much weirder is going on: > with xargs -n 1: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > with xargs -n 2: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > with xargs -n 4: > 1 +++ b/convert.c > 1 +++ b/strbuf.c These results are wrong! They are not finding the entry in strbuf.h that should be changed. > with xargs -n 16: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 2 +++ b/strbuf.h > with xargs -n 64: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 2 +++ b/strbuf.h > with xargs -n 128: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 2 +++ b/strbuf.h These ones are also wrong. Now we find the strbuf.h mention, but we are finding it twice. > with xargs -n 512: > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 1 +++ b/strbuf.h And this one, which is given all of the paths in one invocation gets it right. I'd expect that over the "128" version, which is running two independent spatch invocations. But the fact that "64" and "16" produce exactly two duplicates makes little sense. And even less that 1, 2, and 4 don't find the header change at all. Running the same test with a for loop produces the same (wrong) results as "-n 1", as expected: $ for i in *.c; do spatch --sp-file test.cocci --all-includes --patch . $i 2>/dev/null done | grep -F +++ | sort | uniq -c 1 +++ b/convert.c 1 +++ b/strbuf.c So in short I have no idea what spatch is doing. But I remain unconvinced that there is anything wrong with the batch-size patch. Running it like your patch will, feeding the header directly along with --no-includes, does find the correct result: $ for i in *.c *.h; do spatch --sp-file test.cocci --no-includes --patch . $i 2>/dev/null done | grep -F +++ | sort | uniq -c 1 +++ b/convert.c 1 +++ b/strbuf.c 1 +++ b/strbuf.h though I am still concerned by René's example which is missing more results. -Peff
On Sat, Mar 06 2021, Jeff King wrote: > On Fri, Mar 05, 2021 at 06:07:22PM +0100, Ævar Arnfjörð Bjarmason wrote: > >> Using --no-includes also fixes a subtle bug introduced in >> 960154b9c17 (coccicheck: optionally batch spatch invocations, >> 2019-05-06) with duplicate hunks being written to the >> generated *.patch files. >> >> This is because that change altered a file-at-a-time for-loop to an >> invocation of "xargs -n X". This would not matter for most other >> programs, but it matters for spatch. >> >> This is because each spatch invocation will maintain shared lock files >> in /tmp, check if files being parsed were changed etc. I haven't dug >> into why exactly, but it's easy to reproduce the issue[2]. The issue >> goes away entirely if we just use --no-includes, which as noted above >> would have made sense even without that issue. > > This part still doesn't make any sense to me. If we are running with > SPATCH_BATCH_SIZE=1, which is the default, then "xargs -n" is still > going to run it in file-at-a-time mode. From spatch's perspective, > there's no difference between a for-loop and "xargs -n 1" (unless it > somehow cares about stdin, but it shouldn't). > > Using strace, I do see it creating files in /tmp, but they are all named > after the process id, and cleaned up before exit. So I don't see how > they could interfere with each other (certainly not in a sequential run, > but even if you were to use "xargs -P" to get parallel runs, they seem > distinct). > > If we increase the batch size, I'd expect _fewer_ duplicates. Because in > file-at-a-time mode with --all-includes, wouldn't every file that > mentions an include possibly end up emitting a patch for it? > > The results you show here (which do replicate for me) imply something > much weirder is going on: > >> with xargs -n 1: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> with xargs -n 2: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> with xargs -n 4: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c > > These results are wrong! They are not finding the entry in strbuf.h that > should be changed. > >> with xargs -n 16: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h >> with xargs -n 64: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h >> with xargs -n 128: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 2 +++ b/strbuf.h > > These ones are also wrong. Now we find the strbuf.h mention, but we are > finding it twice. > >> with xargs -n 512: >> 1 +++ b/convert.c >> 1 +++ b/strbuf.c >> 1 +++ b/strbuf.h > > And this one, which is given all of the paths in one invocation gets it > right. I'd expect that over the "128" version, which is running two > independent spatch invocations. But the fact that "64" and "16" produce > exactly two duplicates makes little sense. And even less that 1, 2, and > 4 don't find the header change at all. Yes, I don't know what's going on with spatch. Just the observed results of duplication before/after youc commit. > Running the same test with a for loop produces the same (wrong) results > as "-n 1", as expected: > > $ for i in *.c; do > spatch --sp-file test.cocci --all-includes --patch . $i 2>/dev/null > done | grep -F +++ | sort | uniq -c > 1 +++ b/convert.c > 1 +++ b/strbuf.c > > So in short I have no idea what spatch is doing. But I remain > unconvinced that there is anything wrong with the batch-size patch. Right, but that's an unrelated "didn't change this thing it should have" bug (which we had before), not the duplicate hunk bug. > Running it like your patch will, feeding the header directly along with > --no-includes, does find the correct result: > > $ for i in *.c *.h; do > spatch --sp-file test.cocci --no-includes --patch . $i 2>/dev/null > done | grep -F +++ | sort | uniq -c > 1 +++ b/convert.c > 1 +++ b/strbuf.c > 1 +++ b/strbuf.h > > though I am still concerned by René's example which is missing more > results. *nod*. I just sent a patch on top to fix that, although it undoes most of the speedup in this series.
diff --git a/Makefile b/Makefile index f881b558c44..798a0517131 100644 --- a/Makefile +++ b/Makefile @@ -1196,7 +1196,8 @@ SPARSE_FLAGS ?= SP_EXTRA_FLAGS = -Wno-universal-initializer # For the 'coccicheck' target -SPATCH_FLAGS = --all-includes --patch . +SPATCH_FLAGS = --no-includes --patch . + # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will # usually result in less CPU usage at the cost of higher peak memory. # Setting it to 0 will feed all files in a single spatch invocation. @@ -2853,7 +2854,7 @@ check: config-list.h command-list.h exit 1; \ fi -FOUND_C_SOURCES = $(filter %.c,$(shell $(FIND_SOURCE_FILES))) +FOUND_C_SOURCES = $(filter %.c %.h,$(shell $(FIND_SOURCE_FILES))) COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES)) %.cocci.patch: %.cocci $(COCCI_SOURCES)
Change the coccicheck target to run on all of our *.c and *.h files with --no-includes, instead of only on the *.c files with --all-includes. This speeds it up significantly and reduces its memory usage, since it doesn't need to parse N includes for every file it visits. See [1] for a discussion thread about this commit with some timings for details, but briefly: This change speeds it up by ~2x and makes it use much less memory. Or a reduction of a max of around ~2GB per-process (under the old SPATCH_BATCH_SIZE=0) to around ~200MB. Looking at the history of the coccicheck target I think we've always been running it in the wrong mode for what we wanted to achieve: - When it was added in 63f0a758a06 (add coccicheck make target, 2016-09-15) it explicitly processed only the %.c files. - Then in a9a884aea57 (coccicheck: use --all-includes by default, 2016-09-30) it started processing the %.h files by looking around for its own includes. Let's instead just point it to both our *.c and *.h files, then there's no need to have it recursively look around for included files to change. Setting --no-includes would not work if we expected to set COCCI_SOURCES to a subset of our source files, but that's not what we're doing here. If anyone manually tweaks COCCI_SOURCES they'll now need to tweak SPATCH_FLAGS too. The speed and correctness we gain is worth that small trade-off. Using --no-includes also fixes a subtle bug introduced in 960154b9c17 (coccicheck: optionally batch spatch invocations, 2019-05-06) with duplicate hunks being written to the generated *.patch files. This is because that change altered a file-at-a-time for-loop to an invocation of "xargs -n X". This would not matter for most other programs, but it matters for spatch. This is because each spatch invocation will maintain shared lock files in /tmp, check if files being parsed were changed etc. I haven't dug into why exactly, but it's easy to reproduce the issue[2]. The issue goes away entirely if we just use --no-includes, which as noted above would have made sense even without that issue. 1. https://lore.kernel.org/git/20210302205103.12230-1-avarab@gmail.com/ 2. A session showing racy spatch under xargs -n X: $ cat test.cocci @@ expression E1; @@ - strbuf_avail(E1) + strbuf_has(E1) $ for i in 1 2 4 16 64 128 512 do echo with xargs -n $i: && echo *.c | xargs -n $i spatch --sp-file \ test.cocci --all-includes --patch . 2>/dev/null | \ grep -F +++ | sort | uniq -c done with xargs -n 1: 1 +++ b/convert.c 1 +++ b/strbuf.c with xargs -n 2: 1 +++ b/convert.c 1 +++ b/strbuf.c with xargs -n 4: 1 +++ b/convert.c 1 +++ b/strbuf.c with xargs -n 16: 1 +++ b/convert.c 1 +++ b/strbuf.c 2 +++ b/strbuf.h with xargs -n 64: 1 +++ b/convert.c 1 +++ b/strbuf.c 2 +++ b/strbuf.h with xargs -n 128: 1 +++ b/convert.c 1 +++ b/strbuf.c 2 +++ b/strbuf.h with xargs -n 512: 1 +++ b/convert.c 1 +++ b/strbuf.c 1 +++ b/strbuf.h Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com> --- Makefile | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)