Message ID | 20190506234334.GA13296@sigill.intra.peff.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | coccicheck: optionally batch spatch invocations | expand |
On Mon, May 6, 2019 at 4:43 PM Jeff King <peff@peff.net> wrote: > > On Mon, May 06, 2019 at 04:34:09PM +0700, Duy Nguyen wrote: > > > > However, it comes at a cost. The RSS of each spatch process goes from > > > ~50MB to ~1500MB (and peak memory usage may be even higher if make runs > > > > 1.5G should be fine. Trying... > > > > Even with no -j, my htop's RES column goes up 6GB and put my laptop in > > "swap every bit of memory out, including the bits handling the screen" > > mode :( I don't think it was even the peak. > > Interesting if you have a different version of spatch. I'm using 1.0.4 > from Debian unstable. > > I had just been eyeballing the values in "top" before, but I actually > measured more carefully. My peak was actually ~1900MB. > > > It's probably a bit too much to ask, but is it possible to handle N > > files at a time (instead of all files), which consumes less memory and > > runs a bit slower, but still better than the default mode? I can see > > it already gets tricky doing complicated stuff in Makefile so "no" is > > perfectly ok. > > I almost did this initially but I feared that nobody would actually use > it. :) So given at least one person who wants it, I took a look. If we > rely on xargs, then it is really not too bad (and is in fact shorter > than the current code). I also wrote up a pure-shell version, but it's > rather verbose even after taking some shortcuts with whitespace > splitting. > > So here's what I think we should apply: > > -- >8 -- > Subject: [PATCH] coccicheck: optionally batch spatch invocations > > In our "make coccicheck" rule, we currently feed each source file to its > own individual invocation of spatch. This has a few downsides: > > - it repeats any overhead spatch has for starting up and reading the > patch file > > - any included header files may get processed from multiple > invocations. This is slow (we see the same header files multiple > times) and may produce a resulting patch with repeated hunks (which > cannot be applied without further cleanup) > > Ideally we'd just invoke a single instance of spatch per rule-file and > feed it all source files. But spatch can be rather memory hungry when > run in this way. I measured the peak RSS going from ~90MB for a single > file to ~1900MB for all files. Multiplied by multiple rule files being > processed at the same time (for "make -j"), this can make things slower > or even cause them to fail (e.g., this is reported to happen on our > Travis builds). > > Instead, let's provide a tunable knob. We'll leave the default at "1", > but it can be cranked up to "999" for maximum CPU/memory tradeoff, or > people can find points in between that serve their particular machines. > > Here are a few numbers running a single rule via: > > SIZES='1 4 16 999' > RULE=contrib/coccinelle/object_id.cocci > for i in $SIZES; do > make clean > /usr/bin/time -o $i.out --format='%e | %U | %S | %M' \ > make $RULE.patch SPATCH_BATCH_SIZE=$i > done > for i in $SIZES; do > printf '%4d | %s\n' $i "$(cat $i.out)" > done > > which yields: > > 1 | 97.73 | 93.38 | 4.33 | 100128 > 4 | 52.80 | 51.14 | 1.69 | 135204 > 16 | 35.82 | 35.09 | 0.76 | 284124 > 999 | 23.30 | 23.13 | 0.20 | 1903852 > > The implementation is done with xargs, which should be widely available; > it's in POSIX, we rely on it already in the test suite. And "coccicheck" > is really a developer-only tool anyway, so it's not a big deal if > obscure systems can't run it. > > Signed-off-by: Jeff King <peff@peff.net> > --- > I left the default at 1 for safety. Probably 4 or 16 would be an OK > default, but I don't have any interest in figuring out exactly what > Travis or some hypothetical average machine can handle. I'll be setting > mine to 999. ;) > > Making "0" work as "unlimited" might be nice, but xargs doesn't support > that and I didn't want to make the recipe any more unreadable than it > already is. > > Makefile | 13 ++++++------- > 1 file changed, 6 insertions(+), 7 deletions(-) > > diff --git a/Makefile b/Makefile > index 9f1b6e8926..daba958b8f 100644 > --- a/Makefile > +++ b/Makefile > @@ -1174,8 +1174,10 @@ PTHREAD_CFLAGS = > SPARSE_FLAGS ?= > SP_EXTRA_FLAGS = > > -# For the 'coccicheck' target > +# For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will > +# usually result in less CPU usage at the cost of higher peak memory. > SPATCH_FLAGS = --all-includes --patch . > +SPATCH_BATCH_SIZE = 1 > > include config.mak.uname > -include config.mak.autogen > @@ -2790,12 +2792,9 @@ endif > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > @echo ' ' SPATCH $<; \ > - ret=0; \ > - for f in $(COCCI_SOURCES); do \ > - $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \ > - { ret=$$?; break; }; \ > - done >$@+ 2>$@.log; \ > - if test $$ret != 0; \ > + if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ > + $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > + >$@+ 2>$@.log; \ > then \ > cat $@.log; \ > exit 1; \ > -- > 2.21.0.1314.g224b191707 > This looks reasonable to me :) Thanks, Jake
On Mon, May 06, 2019 at 06:41:03PM -0700, Jacob Keller wrote: > > So here's what I think we should apply: > > > > -- >8 -- > > Subject: [PATCH] coccicheck: optionally batch spatch invocations > [...] > > This looks reasonable to me :) Thanks. I should have added a: Based-on-a-patch-by: Jacob Keller <jacob.keller@gmail.com> trailer. -Peff
Jeff King <peff@peff.net> writes: > The implementation is done with xargs, which should be widely available; > it's in POSIX, we rely on it already in the test suite. And "coccicheck" > is really a developer-only tool anyway, so it's not a big deal if > obscure systems can't run it. OK. > I left the default at 1 for safety. Probably 4 or 16 would be an OK > default, but I don't have any interest in figuring out exactly what > Travis or some hypothetical average machine can handle. I'll be setting > mine to 999. ;) > > Making "0" work as "unlimited" might be nice, but xargs doesn't support > that and I didn't want to make the recipe any more unreadable than it > already is. Sounds good. After reading the log message, I was curious if there is a mechanism that makes 999 special (like 0 in your hypothetical "0 means unlimited"), but I guess it is just "any number that is greater than the number of source files we have will do". Thanks.
On Tue, May 07, 2019 at 11:42:28AM +0900, Junio C Hamano wrote: > > Making "0" work as "unlimited" might be nice, but xargs doesn't support > > that and I didn't want to make the recipe any more unreadable than it > > already is. > > Sounds good. After reading the log message, I was curious if there > is a mechanism that makes 999 special (like 0 in your hypothetical > "0 means unlimited"), but I guess it is just "any number that is > greater than the number of source files we have will do". Yes, 2^31-1 is probably a better number, but it's harder to write out. :) Here's what a patch might look like to implement "0". By still using xargs in the unlimited code path, it's not too bad. I dunno. --- diff --git a/Makefile b/Makefile index daba958b8f..0765a59b7a 100644 --- a/Makefile +++ b/Makefile @@ -2792,7 +2792,12 @@ endif %.cocci.patch: %.cocci $(COCCI_SOURCES) @echo ' ' SPATCH $<; \ - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ + if test $(SPATCH_BATCH_SIZE) = 0; then \ + limit=; \ + else \ + limit='-n $(SPATCH_BATCH_SIZE)'; \ + fi; \ + if ! echo $(COCCI_SOURCES) | xargs $$limit \ $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ >$@+ 2>$@.log; \ then \ -Peff
On Mon, May 6, 2019 at 7:55 PM Jeff King <peff@peff.net> wrote: > Here's what a patch might look like to implement "0". By still using > xargs in the unlimited code path, it's not too bad. I dunno. > > --- > diff --git a/Makefile b/Makefile > index daba958b8f..0765a59b7a 100644 > --- a/Makefile > +++ b/Makefile > @@ -2792,7 +2792,12 @@ endif > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > @echo ' ' SPATCH $<; \ > - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ > + if test $(SPATCH_BATCH_SIZE) = 0; then \ > + limit=; \ > + else \ > + limit='-n $(SPATCH_BATCH_SIZE)'; \ > + fi; \ > + if ! echo $(COCCI_SOURCES) | xargs $$limit \ > $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > >$@+ 2>$@.log; \ > then \ > > -Peff This doesn't seem too complicated to me..
Jeff King <peff@peff.net> writes: > Yes, 2^31-1 is probably a better number, but it's harder to write out. :) > > Here's what a patch might look like to implement "0". By still using > xargs in the unlimited code path, it's not too bad. I dunno. As somebody who is too used to run "diff -U999" and be happy, I cannot claim that I care enough, but the result does not look too bad. > > --- > diff --git a/Makefile b/Makefile > index daba958b8f..0765a59b7a 100644 > --- a/Makefile > +++ b/Makefile > @@ -2792,7 +2792,12 @@ endif > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > @echo ' ' SPATCH $<; \ > - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ > + if test $(SPATCH_BATCH_SIZE) = 0; then \ > + limit=; \ > + else \ > + limit='-n $(SPATCH_BATCH_SIZE)'; \ > + fi; \ > + if ! echo $(COCCI_SOURCES) | xargs $$limit \ > $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > >$@+ 2>$@.log; \ > then \ > > -Peff
On Tue, May 7, 2019 at 6:43 AM Jeff King <peff@peff.net> wrote: > > On Mon, May 06, 2019 at 04:34:09PM +0700, Duy Nguyen wrote: > > > > However, it comes at a cost. The RSS of each spatch process goes from > > > ~50MB to ~1500MB (and peak memory usage may be even higher if make runs > > > > 1.5G should be fine. Trying... > > > > Even with no -j, my htop's RES column goes up 6GB and put my laptop in > > "swap every bit of memory out, including the bits handling the screen" > > mode :( I don't think it was even the peak. > > Interesting if you have a different version of spatch. I'm using 1.0.4 > from Debian unstable. I'm using 1.0.6 > Ideally we'd just invoke a single instance of spatch per rule-file and > feed it all source files. But spatch can be rather memory hungry when > run in this way. I measured the peak RSS going from ~90MB for a single > file to ~1900MB for all files. Multiplied by multiple rule files being > processed at the same time (for "make -j"), this can make things slower > or even cause them to fail (e.g., this is reported to happen on our > Travis builds). > > Instead, let's provide a tunable knob. We'll leave the default at "1", > but it can be cranked up to "999" for maximum CPU/memory tradeoff, or > people can find points in between that serve their particular machines. > > Here are a few numbers running a single rule via: > > SIZES='1 4 16 999' > RULE=contrib/coccinelle/object_id.cocci > for i in $SIZES; do > make clean > /usr/bin/time -o $i.out --format='%e | %U | %S | %M' \ > make $RULE.patch SPATCH_BATCH_SIZE=$i > done > for i in $SIZES; do > printf '%4d | %s\n' $i "$(cat $i.out)" > done > > which yields: > > 1 | 97.73 | 93.38 | 4.33 | 100128 > 4 | 52.80 | 51.14 | 1.69 | 135204 > 16 | 35.82 | 35.09 | 0.76 | 284124 > 999 | 23.30 | 23.13 | 0.20 | 1903852 I think I'm going to settle around 32 which has peak at 2GB for me and I can still make use of 4 hyperthread cores. It's weird for me though that no SIZES=1 is actually the fastest. The larger SPATCH_BATCH_SIZE is, the slower: > ~/w/git/temp $ rm contrib/coccinelle/commit.cocci.patch > ~/w/git/temp $ time make contrib/coccinelle/commit.cocci.patch SPATCH contrib/coccinelle/commit.cocci real 1m20,242s user 1m15,863s sys 0m4,376s > ~/w/git/temp $ rm contrib/coccinelle/commit.cocci.patch > ~/w/git/temp $ time make contrib/coccinelle/commit.cocci.patch SPATCH_BATCH_SIZE=16 SPATCH contrib/coccinelle/commit.cocci real 3m3,131s user 2m59,345s sys 0m3,819s > ~/w/git/temp $ rm contrib/coccinelle/commit.cocci.patch > ~/w/git/temp $ time make contrib/coccinelle/commit.cocci.patch SPATCH_BATCH_SIZE=32 SPATCH contrib/coccinelle/commit.cocci real 5m10,966s user 5m5,434s sys 0m5,500s But that might be just cocci or environmental problem. Since you and Jacob both see good speedup, this patch is good.
On Mon, May 06, 2019 at 07:43:34PM -0400, Jeff King wrote: > Subject: [PATCH] coccicheck: optionally batch spatch invocations > > In our "make coccicheck" rule, we currently feed each source file to its > own individual invocation of spatch. This has a few downsides: > > - it repeats any overhead spatch has for starting up and reading the > patch file > > - any included header files may get processed from multiple > invocations. This is slow (we see the same header files multiple > times) and may produce a resulting patch with repeated hunks (which > cannot be applied without further cleanup) I wonder what would happen (and how it would perform) if we told spatch to ignore all include files from our C sources, but then run it on each of our header files as well. It still would have to parse the semantic patches over and over again, but those are rather small (compared to, say, 'git-compar-util.h' and 'cache.h' that are included in most of our source files). But I don't know how the lack of information about preprocessor macros would affect Coccinelle's analyzis. > Ideally we'd just invoke a single instance of spatch per rule-file and > feed it all source files. Well, ideally we would invoke spatch with '--dir .' to "process all files in directory recursively", and exclude those that we are not interested in (parts of 'compat/', 'sha1dc')... but alas, such an exclude mechanism doesn't seem to exist.
On Tue, May 07, 2019 at 01:52:32PM +0900, Junio C Hamano wrote: > Jeff King <peff@peff.net> writes: > > > Yes, 2^31-1 is probably a better number, but it's harder to write out. :) > > > > Here's what a patch might look like to implement "0". By still using > > xargs in the unlimited code path, it's not too bad. I dunno. > > As somebody who is too used to run "diff -U999" and be happy, I > cannot claim that I care enough, but the result does not look > too bad. OK. With two "not too bad" comments (plus my own similar feeling), let's just do it. Here it is as a patch on top. It can also be squashed into the tip of jk/cocci-batch, but then we should probably s/999/0/ in the commit message. :) -- >8 -- Subject: [PATCH] coccicheck: make batch size of 0 mean "unlimited" If you have the memory to handle it, the ideal case is to run a single spatch invocation with all of the source files. But the only way to do so now is to pick an arbitrarily large batch size. Let's make "0" do this, which is a little friendlier (and doesn't otherwise have a useful meaning). Signed-off-by: Jeff King <peff@peff.net> --- Makefile | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index daba958b8f..9cea614523 100644 --- a/Makefile +++ b/Makefile @@ -1176,6 +1176,7 @@ SP_EXTRA_FLAGS = # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will # usually result in less CPU usage at the cost of higher peak memory. +# Setting it to 0 will feed all files in a single spatch invocation. SPATCH_FLAGS = --all-includes --patch . SPATCH_BATCH_SIZE = 1 @@ -2792,7 +2793,12 @@ endif %.cocci.patch: %.cocci $(COCCI_SOURCES) @echo ' ' SPATCH $<; \ - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ + if test $(SPATCH_BATCH_SIZE) = 0; then \ + limit=; \ + else \ + limit='-n $(SPATCH_BATCH_SIZE)'; \ + fi; \ + if ! echo $(COCCI_SOURCES) | xargs $$limit \ $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ >$@+ 2>$@.log; \ then \
Hi Jeff, On Wed, May 08, 2019 at 03:07:54AM -0400, Jeff King wrote: > On Tue, May 07, 2019 at 01:52:32PM +0900, Junio C Hamano wrote: > > > Jeff King <peff@peff.net> writes: > > > > > Yes, 2^31-1 is probably a better number, but it's harder to write out. :) > > > > > > Here's what a patch might look like to implement "0". By still using > > > xargs in the unlimited code path, it's not too bad. I dunno. > > > > As somebody who is too used to run "diff -U999" and be happy, I > > cannot claim that I care enough, but the result does not look > > too bad. > > OK. With two "not too bad" comments (plus my own similar feeling), let's > just do it. Here it is as a patch on top. It can also be squashed into > the tip of jk/cocci-batch, but then we should probably s/999/0/ in the > commit message. :) > > -- >8 -- > Subject: [PATCH] coccicheck: make batch size of 0 mean "unlimited" > > If you have the memory to handle it, the ideal case is to run a single > spatch invocation with all of the source files. But the only way to do > so now is to pick an arbitrarily large batch size. Let's make "0" do > this, which is a little friendlier (and doesn't otherwise have a useful > meaning). > > Signed-off-by: Jeff King <peff@peff.net> > --- > Makefile | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/Makefile b/Makefile > index daba958b8f..9cea614523 100644 > --- a/Makefile > +++ b/Makefile > @@ -1176,6 +1176,7 @@ SP_EXTRA_FLAGS = > > # For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will > # usually result in less CPU usage at the cost of higher peak memory. > +# Setting it to 0 will feed all files in a single spatch invocation. > SPATCH_FLAGS = --all-includes --patch . > SPATCH_BATCH_SIZE = 1 > > @@ -2792,7 +2793,12 @@ endif > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > @echo ' ' SPATCH $<; \ > - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ > + if test $(SPATCH_BATCH_SIZE) = 0; then \ > + limit=; \ > + else \ > + limit='-n $(SPATCH_BATCH_SIZE)'; \ > + fi; \ Could we pull `limit` out of the recipe and into a make variable? You mentioned earlier that you wanted to do this but it was too complicated but now that it's written like this, it seem like it'd be pretty easy to do. > + if ! echo $(COCCI_SOURCES) | xargs $$limit \ > $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ > >$@+ 2>$@.log; \ > then \ > -- > 2.21.0.1314.g224b191707 >
On Wed, May 08, 2019 at 08:36:10AM -0400, Denton Liu wrote: > > %.cocci.patch: %.cocci $(COCCI_SOURCES) > > @echo ' ' SPATCH $<; \ > > - if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ > > + if test $(SPATCH_BATCH_SIZE) = 0; then \ > > + limit=; \ > > + else \ > > + limit='-n $(SPATCH_BATCH_SIZE)'; \ > > + fi; \ > > Could we pull `limit` out of the recipe and into a make variable? You > mentioned earlier that you wanted to do this but it was too complicated > but now that it's written like this, it seem like it'd be pretty easy to > do. Yes, we could, because this part of it is a straight text substitution in the recipe, and not a conditional. I don't know that it's any more readable, though. Either that same conditional gets split out like: ifeq ($(SPATCH_BATCH_SIZE),0) SPATCH_LIMIT = else SPATCH_LIMIT = -n $(SPATCH_BATCH_SIZE) endif or we do it inline, but IMHO it's still pretty cluttered: diff --git a/Makefile b/Makefile index 9cea614523..aedc7b80a8 100644 --- a/Makefile +++ b/Makefile @@ -2793,12 +2793,8 @@ endif %.cocci.patch: %.cocci $(COCCI_SOURCES) @echo ' ' SPATCH $<; \ - if test $(SPATCH_BATCH_SIZE) = 0; then \ - limit=; \ - else \ - limit='-n $(SPATCH_BATCH_SIZE)'; \ - fi; \ - if ! echo $(COCCI_SOURCES) | xargs $$limit \ + if ! echo $(COCCI_SOURCES) | \ + xargs $(if $(filter 0,$(SPATCH_BATCH_SIZE)),,-n $(SPATCH_BATCH_SIZE)) \ $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ >$@+ 2>$@.log; \ then \ It gets a little better if we make the sentinel value the empty string instead of "0" (you can drop the filter). -Peff
diff --git a/Makefile b/Makefile index 9f1b6e8926..daba958b8f 100644 --- a/Makefile +++ b/Makefile @@ -1174,8 +1174,10 @@ PTHREAD_CFLAGS = SPARSE_FLAGS ?= SP_EXTRA_FLAGS = -# For the 'coccicheck' target +# For the 'coccicheck' target; setting SPATCH_BATCH_SIZE higher will +# usually result in less CPU usage at the cost of higher peak memory. SPATCH_FLAGS = --all-includes --patch . +SPATCH_BATCH_SIZE = 1 include config.mak.uname -include config.mak.autogen @@ -2790,12 +2792,9 @@ endif %.cocci.patch: %.cocci $(COCCI_SOURCES) @echo ' ' SPATCH $<; \ - ret=0; \ - for f in $(COCCI_SOURCES); do \ - $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \ - { ret=$$?; break; }; \ - done >$@+ 2>$@.log; \ - if test $$ret != 0; \ + if ! echo $(COCCI_SOURCES) | xargs -n $(SPATCH_BATCH_SIZE) \ + $(SPATCH) --sp-file $< $(SPATCH_FLAGS) \ + >$@+ 2>$@.log; \ then \ cat $@.log; \ exit 1; \