diff mbox series

coccicheck: optionally process every source file at once

Message ID 20190506051148.GB30003@sigill.intra.peff.net (mailing list archive)
State New, archived
Headers show
Series coccicheck: optionally process every source file at once | expand

Commit Message

Jeff King May 6, 2019, 5:11 a.m. UTC
On Sun, May 05, 2019 at 11:08:35AM -0700, Jacob Keller wrote:

> > Jacob Keller's patches to collapse this to a single invocation do fix it
> > (and I've used them selectively in the past rather than cleaning up the
> > resulting patch manually ;) ).
> >
> > Sort of tangential to your point, I guess, but it might be a topic to
> > revisit.
> 
> Yea, my patches are much faster. However, they trade off a huge
> increase in memory consumption, and from what I remember, the failure
> if you don't have enough RAM was not a good experience.

I think there was a suggestion to make it conditional. So maybe
something like this?

-- >8 --
From: Jacob Keller <jacob.keller@gmail.com>
Subject: [PATCH] coccicheck: optionally process every source file at once

make coccicheck is used in order to apply coccinelle semantic patches,
and see if any of the transformations found within contrib/coccinelle/
can be applied to the current code base.

Provide an option to pass every file to a single invocation of spatch,
instead of running spatch once per source file.

This reduces the time required to run make coccicheck by a significant
amount of time:

Prior timing of make coccicheck
  real    6m14.090s
  user    25m2.606s
  sys     1m22.919s

New timing of make coccicheck
  real    1m36.580s
  user    7m55.933s
  sys     0m18.219s

This is nearly a 4x decrease in the time required to run make
coccicheck. This is due to the overhead of restarting spatch for every
file. By processing all files at once, we can amortize this startup cost
across the total number of files, rather than paying it once per file.

However, it comes at a cost. The RSS of each spatch process goes from
~50MB to ~1500MB (and peak memory usage may be even higher if make runs
multiple rule files in parallel due to "-j"). That's enough to make some
systems (like our Travis build!) fail the whole process, or could make
things slower due to swapping. So let's make the new behavior optional,
and people with a lot of memory can choose to use it.

[peff: modified Jacob's patch to make the behavior optional, since
 people reported complications due to the memory use]

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
Signed-off-by: Jeff King <peff@peff.net>
---
The conditional here is implemented as shell. I couldn't find a way to
do it readably with a make ifdef, since we're in the middle of a
backslash-connected shell command in the recipe. And trying to use $(if)
just turned into a mess.

But maybe some gnu make hotshot wants to show me how it's done. ;)

 Makefile | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

Comments

Duy Nguyen May 6, 2019, 9:34 a.m. UTC | #1
On Mon, May 6, 2019 at 12:13 PM Jeff King <peff@peff.net> wrote:
> This reduces the time required to run make coccicheck by a significant
> amount of time:
>
> Prior timing of make coccicheck
>   real    6m14.090s
>   user    25m2.606s
>   sys     1m22.919s
>
> New timing of make coccicheck
>   real    1m36.580s
>   user    7m55.933s
>   sys     0m18.219s
>
> This is nearly a 4x decrease in the time required to run make

Whoa.

> coccicheck. This is due to the overhead of restarting spatch for every
> file. By processing all files at once, we can amortize this startup cost
> across the total number of files, rather than paying it once per file.
>
> However, it comes at a cost. The RSS of each spatch process goes from
> ~50MB to ~1500MB (and peak memory usage may be even higher if make runs

1.5G should be fine. Trying...

Even with no -j, my htop's RES column goes up 6GB and put my laptop in
"swap every bit of memory out, including the bits handling the screen"
mode :( I don't think it was even the peak.

It's probably a bit too much to ask, but is it possible to handle N
files at a time (instead of all files), which consumes less memory and
runs a bit slower, but still better than the default mode? I can see
it already gets tricky doing complicated stuff in Makefile so "no" is
perfectly ok.

> multiple rule files in parallel due to "-j"). That's enough to make some
> systems (like our Travis build!) fail the whole process, or could make
> things slower due to swapping. So let's make the new behavior optional,
> and people with a lot of memory can choose to use it.
>
> [peff: modified Jacob's patch to make the behavior optional, since
>  people reported complications due to the memory use]
>
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> Signed-off-by: Jeff King <peff@peff.net>
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index 9f1b6e8926..5870ea200e 100644
--- a/Makefile
+++ b/Makefile
@@ -1174,8 +1174,11 @@  PTHREAD_CFLAGS =
 SPARSE_FLAGS ?=
 SP_EXTRA_FLAGS =
 
-# For the 'coccicheck' target
+# For the 'coccicheck' target; set USE_SINGLE_SPATCH to invoke a single spatch
+# for all sources, rather than one per source file. That generally runs faster,
+# at the cost of using much more peak memory (on the order of 1-2GB).
 SPATCH_FLAGS = --all-includes --patch .
+USE_SINGLE_SPATCH =
 
 include config.mak.uname
 -include config.mak.autogen
@@ -2790,11 +2793,16 @@  endif
 
 %.cocci.patch: %.cocci $(COCCI_SOURCES)
 	@echo '    ' SPATCH $<; \
-	ret=0; \
-	for f in $(COCCI_SOURCES); do \
-		$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
-			{ ret=$$?; break; }; \
-	done >$@+ 2>$@.log; \
+	if test -z "$(USE_SINGLE_SPATCH)"; then \
+		ret=0; \
+		for f in $(COCCI_SOURCES); do \
+			$(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
+				{ ret=$$?; break; }; \
+		done >$@+ 2>$@.log; \
+	else \
+		$(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
+		ret=$$?; \
+	fi; \
 	if test $$ret != 0; \
 	then \
 		cat $@.log; \