diff mbox series

[v4,3/4] Makefile/coccicheck: allow for setting xargs concurrency

Message ID 9d5814dacdc281389c4cb163ddbe4b749e6c0852.1616414951.git.avarab@gmail.com (mailing list archive)
State New, archived
Headers show
Series Makefile/coccicheck: fix bugs and speed it up | expand

Commit Message

Ævar Arnfjörð Bjarmason March 22, 2021, 12:11 p.m. UTC
Extend the SPATCH_BATCH_SIZE facility added in
960154b9c17 (coccicheck: optionally batch spatch invocations,
2019-05-06) and bcb4edf7af7 (coccicheck: make batch size of 0 mean
"unlimited", 2019-05-08) to allow for both setting
SPATCH_BATCH_SIZE=N, and also to have a more advanced SPATCH_XARGS
argument.

The reason to replace the "$$limit" is that now you actually see under
V=1 what argument your program will get invoked with.

The reason for the "9999" limit is that if you e.g. try to define an
"$(XARGS)" which is conditionally an empty string or not depending on
this setting then e.g.:

    echo $(FILES) | $(XARGS) $(XARGS_FLAGS) $(SPATCH)

Over multiple lines with "\" will error out. I think just setting it
to "xargs -n 9999" as a trivial workaround is the best solution here.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
---
 Makefile | 26 ++++++++++++++++++++------
 1 file changed, 20 insertions(+), 6 deletions(-)

Comments

Jeff King March 24, 2021, 7:26 p.m. UTC | #1
On Mon, Mar 22, 2021 at 01:11:49PM +0100, Ævar Arnfjörð Bjarmason wrote:

> Extend the SPATCH_BATCH_SIZE facility added in
> 960154b9c17 (coccicheck: optionally batch spatch invocations,
> 2019-05-06) and bcb4edf7af7 (coccicheck: make batch size of 0 mean
> "unlimited", 2019-05-08) to allow for both setting
> SPATCH_BATCH_SIZE=N, and also to have a more advanced SPATCH_XARGS
> argument.
> 
> The reason to replace the "$$limit" is that now you actually see under
> V=1 what argument your program will get invoked with.
> 
> The reason for the "9999" limit is that if you e.g. try to define an
> "$(XARGS)" which is conditionally an empty string or not depending on
> this setting then e.g.:
> 
>     echo $(FILES) | $(XARGS) $(XARGS_FLAGS) $(SPATCH)
> 
> Over multiple lines with "\" will error out. I think just setting it
> to "xargs -n 9999" as a trivial workaround is the best solution here.

I don't understand this 9999 comment. The original was sometimes setting
$limit to the empty string, and then doing:

 xargs $limit

How is that any different than setting SPATCH_XARGS to just "xargs" for
the unlimited case?

> +# For the 'coccicheck' target; SPATCH_XARGS can be used to manually
> +# tweak the xargs invocation. By default we invoke "xargs -n 1", and
> +# "xargs -n 9999" under SPATCH_BATCH_SIZE=0.
> +#
> +# Setting SPATCH_XARGS overrides SPATCH_BATCH_SIZE. To get concurrency
> +# when targeting a single contrib/coccinelle/%.patch use e.g. "-P" if
> +# your xargs(1) supports it:
> +#
> +#    make contrib/coccinelle/strbuf.cocci.patch SPATCH_XARGS="xargs -P 8 -n 8"
> +#
> +# Or a combination -jN and "xargs -P":
> +#
> +#    make -j4 coccicheck SPATCH_XARGS="xargs -P 2 -n 8"

As I mentioned in the last round, using "-P" is racy. I'm not sure if
it's something we should be recommending to people.

-Peff
Ævar Arnfjörð Bjarmason March 25, 2021, 2:29 a.m. UTC | #2
On Wed, Mar 24 2021, Jeff King wrote:

> On Mon, Mar 22, 2021 at 01:11:49PM +0100, Ævar Arnfjörð Bjarmason wrote:
>
>> Extend the SPATCH_BATCH_SIZE facility added in
>> 960154b9c17 (coccicheck: optionally batch spatch invocations,
>> 2019-05-06) and bcb4edf7af7 (coccicheck: make batch size of 0 mean
>> "unlimited", 2019-05-08) to allow for both setting
>> SPATCH_BATCH_SIZE=N, and also to have a more advanced SPATCH_XARGS
>> argument.
>> 
>> The reason to replace the "$$limit" is that now you actually see under
>> V=1 what argument your program will get invoked with.
>> 
>> The reason for the "9999" limit is that if you e.g. try to define an
>> "$(XARGS)" which is conditionally an empty string or not depending on
>> this setting then e.g.:
>> 
>>     echo $(FILES) | $(XARGS) $(XARGS_FLAGS) $(SPATCH)
>> 
>> Over multiple lines with "\" will error out. I think just setting it
>> to "xargs -n 9999" as a trivial workaround is the best solution here.
>
> I don't understand this 9999 comment. The original was sometimes setting
> $limit to the empty string, and then doing:
>
>  xargs $limit
>
> How is that any different than setting SPATCH_XARGS to just "xargs" for
> the unlimited case?

The "over multiple lines" is important. But it seems not
anymore. I.e. in an earlier version I had:

    $(XARGS) \
        $(XARGS_FLAGS) \
        $(SPATCH)

And it would brek if XARGS_FLAGS was empty. So I set it to -n 9999 as a
fallback.

>> +# For the 'coccicheck' target; SPATCH_XARGS can be used to manually
>> +# tweak the xargs invocation. By default we invoke "xargs -n 1", and
>> +# "xargs -n 9999" under SPATCH_BATCH_SIZE=0.
>> +#
>> +# Setting SPATCH_XARGS overrides SPATCH_BATCH_SIZE. To get concurrency
>> +# when targeting a single contrib/coccinelle/%.patch use e.g. "-P" if
>> +# your xargs(1) supports it:
>> +#
>> +#    make contrib/coccinelle/strbuf.cocci.patch SPATCH_XARGS="xargs -P 8 -n 8"
>> +#
>> +# Or a combination -jN and "xargs -P":
>> +#
>> +#    make -j4 coccicheck SPATCH_XARGS="xargs -P 2 -n 8"
>
> As I mentioned in the last round, using "-P" is racy. I'm not sure if
> it's something we should be recommending to people.

Yes, this would need to use N tempfiles we'd cat together at the end to
be POSIX-reliable.

In practice I've never seen this sort of thing be unreliable for a case
like this.

POSIX just guarantees that the output won't be interleaved up to
PIPE_BUF, which is typically 4k. We certainly get patches bigger than
that from spatch in some cases.

But from the OS's perspective emitting this output happens at a glacial
pace. So even if it crosses that boundary it's unlikely to be
interleaved.

Even:

    perl -wE 'print "a" for 1..1024*1024*100' >1
    perl -wE 'print "b" for 1..1024*1024*100' >2
    perl -wE 'print "\n" for 1..1024*1024*100' >3
    $ du -shc 1 2 3
    100M    1
    100M    2
    100M    3
    300M    total

Which at least on this computer I can't get to not print:

    $ echo 1 2 3 | xargs -P 3 -n 1 cat|wc -l
    104857600

Suggesting that even for output of that size the \n's aren't mixed
up. YMMV.

Anyway. I think I'll just redact this topic. Maybe I'll change my mind
on it, or pick it up some other month/year/whatever.

I appreciate everyone's feedback on it, but for something I thought
would be a rather trivial bugfix I've sunk way too much time into it.
At this point I can't even remember why I was trying to run coccicheck
in the first place.

There's e.g. still your outstanding feedback on what's really going on
with that underlying spatch race I was trying to fix with this
--include-headers-for-types workaround.

I think digging into that would be interesting for someone, but right
now that's not me. I really don't want to spend time on digging into
spatch's OCaml guts to figure that & any other outstanding stuff out.
Jeff King March 26, 2021, 4:11 a.m. UTC | #3
On Thu, Mar 25, 2021 at 03:29:11AM +0100, Ævar Arnfjörð Bjarmason wrote:

> > I don't understand this 9999 comment. The original was sometimes setting
> > $limit to the empty string, and then doing:
> >
> >  xargs $limit
> >
> > How is that any different than setting SPATCH_XARGS to just "xargs" for
> > the unlimited case?
> 
> The "over multiple lines" is important. But it seems not
> anymore. I.e. in an earlier version I had:
> 
>     $(XARGS) \
>         $(XARGS_FLAGS) \
>         $(SPATCH)
> 
> And it would brek if XARGS_FLAGS was empty. So I set it to -n 9999 as a
> fallback.

Ah, OK, that makes more sense. Though I'm still slightly confused, just
because I think the Makefile will eat those backslashed newlines, and
the shell will just see extra whitespace. I.e.:

  $ cat Makefile
  foo:
  	echo \
  		$(A) \
  		$(B) \
  		$(C)

  $ make foo A=one C=three
  echo \
  	one \
  	 \
  	three
  one three

I suspect whatever you wrote that hit the problem was just slightly
different. I doubt this is worth thinking about any further, but it is a
weird curiosity to me. So feel free to explore and respond if you find
it interesting, but don't feel compelled to. :)

> > As I mentioned in the last round, using "-P" is racy. I'm not sure if
> > it's something we should be recommending to people.
> 
> Yes, this would need to use N tempfiles we'd cat together at the end to
> be POSIX-reliable.
> 
> In practice I've never seen this sort of thing be unreliable for a case
> like this.
> 
> POSIX just guarantees that the output won't be interleaved up to
> PIPE_BUF, which is typically 4k. We certainly get patches bigger than
> that from spatch in some cases.

One nit (because I spent quite a long time looking into this a while ago
for an unrelated thing): POSIX only talks about PIPE_BUF for pipes. For
regular files in O_APPEND mode (or two processes sharing a descriptor
from their parent, as we'd have here), I think write() is allowed to do
a short write, at which point you'd lose atomicity.

In practice, I'd expect most small-ish (say, less than a page) writes to
happen in a single go, especially on modern operating systems. But I
wouldn't be too surprised if it depends on details of the filesystem, or
even the file you're writing into. E.g., if there are 20 bytes left in a
filesystem block that the end of the file is currently pointing to, and
you ask to write 30 bytes, it seems plausible that we might write the
first 20 to fill out the block, and then have a point where we could get
interrupted by a signal and return early, etc.

> But from the OS's perspective emitting this output happens at a glacial
> pace. So even if it crosses that boundary it's unlikely to be
> interleaved.

Yes, I think even if it's possible to race, the general lack of volume
of writes is likely to save us.

Everything below is more curiosity, so again, don't sink time into it if
it's not an interesting tangent for you.

> Even:
> 
>     perl -wE 'print "a" for 1..1024*1024*100' >1
>     perl -wE 'print "b" for 1..1024*1024*100' >2
>     perl -wE 'print "\n" for 1..1024*1024*100' >3
>     $ du -shc 1 2 3
>     100M    1
>     100M    2
>     100M    3
>     300M    total
> 
> Which at least on this computer I can't get to not print:
> 
>     $ echo 1 2 3 | xargs -P 3 -n 1 cat|wc -l
>     104857600
> 
> Suggesting that even for output of that size the \n's aren't mixed
> up. YMMV.

I don't think this is telling us much, for two reasons:

  - it's a pipe, not a file, so PIPE_BUF _does_ count here

  - "wc -l" is counting the number of newlines, which always will be the
    same, whether there are interleaved blocks or not. The interesting
    thing is whether a single write() from one of the "cat" calls is
    interleaved with another, but we can't tell that without knowing how
    big a block cat is using.

A more interesting test is something like:

  for i in a b c; do
    perl -e '
      syswrite(STDOUT,"this is $ARGV[0]\n") for 1..1024*1024*10
    ' $i &
  done >out
  wait
  sort <out | uniq -c

We are writing to a shared file here, and we care whether each
individual syswrite ever got interleaved with another (which we would
notice because our sort/uniq output would have more than the expected
three lines).

We can even spice it up with some signals on one of the processes by
putting:

  $SIG{ALRM} = sub { print STDERR "alarm $ARGV[0]\n" };

into the perl, and then doing:

  for t in $(seq 1000); do
    kill -ALRM %1
    sleep 0.01
  done

before the wait. But at least on Linux (with ext4), that seems to always produce
atomic results for each write(). Even if I increase the size of the
message to 4k or larger.

So it seems pretty solid there, but I'm not sure I would guarantee it on
other platforms or filesystems.

-Peff
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index e43a9618df5..eaac14275bb 100644
--- a/Makefile
+++ b/Makefile
@@ -1206,6 +1206,25 @@  SPATCH_FLAGS = --all-includes --include-headers-for-types --patch .
 # Setting it to 0 will feed all files in a single spatch invocation.
 SPATCH_BATCH_SIZE = 1
 
+# For the 'coccicheck' target; SPATCH_XARGS can be used to manually
+# tweak the xargs invocation. By default we invoke "xargs -n 1", and
+# "xargs -n 9999" under SPATCH_BATCH_SIZE=0.
+#
+# Setting SPATCH_XARGS overrides SPATCH_BATCH_SIZE. To get concurrency
+# when targeting a single contrib/coccinelle/%.patch use e.g. "-P" if
+# your xargs(1) supports it:
+#
+#    make contrib/coccinelle/strbuf.cocci.patch SPATCH_XARGS="xargs -P 8 -n 8"
+#
+# Or a combination -jN and "xargs -P":
+#
+#    make -j4 coccicheck SPATCH_XARGS="xargs -P 2 -n 8"
+ifeq (0,$(SPATCH_BATCH_SIZE))
+SPATCH_XARGS = xargs -n 9999
+else
+SPATCH_XARGS = xargs -n $(SPATCH_BATCH_SIZE)
+endif
+
 include config.mak.uname
 -include config.mak.autogen
 -include config.mak
@@ -2866,12 +2885,7 @@  COCCI_SOURCES = $(filter-out $(THIRD_PARTY_SOURCES),$(FOUND_C_SOURCES))
 
 %.cocci.patch: %.cocci $(COCCI_SOURCES)
 	$(QUIET_SPATCH) \
-	if test $(SPATCH_BATCH_SIZE) = 0; then \
-		limit=; \
-	else \
-		limit='-n $(SPATCH_BATCH_SIZE)'; \
-	fi; \
-	if ! echo $(COCCI_SOURCES) | xargs $$limit \
+	if ! echo $(COCCI_SOURCES) | $(SPATCH_XARGS) \
 		$(SPATCH) --sp-file $< $(SPATCH_FLAGS) \
 		>$@+ 2>$@.log; \
 	then \