[v3] coccicheck: process every source file at once

Message ID	20181002200710.15721-1-jacob.e.keller@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <git-owner@kernel.org> From: Jacob Keller <jacob.e.keller@intel.com> To: git@vger.kernel.org Cc: Jeff King <peff@peff.net>, Jacob Keller <jacob.keller@gmail.com> Subject: [PATCH v3] coccicheck: process every source file at once Date: Tue, 2 Oct 2018 13:07:10 -0700 Message-Id: <20181002200710.15721-1-jacob.e.keller@intel.com> Sender: git-owner@vger.kernel.org Precedence: bulk
Series	[v3] coccicheck: process every source file at once \| expand [v3] coccicheck: process every source file at once

Keller, Jacob E Oct. 2, 2018, 8:07 p.m. UTC

From: Jacob Keller <jacob.keller@gmail.com>

make coccicheck is used in order to apply coccinelle semantic patches,
and see if any of the transformations found within contrib/coccinelle/
can be applied to the current code base.

Pass every file to a single invocation of spatch, instead of running
spatch once per source file.

This reduces the time required to run make coccicheck by a significant
amount of time:

Prior timing of make coccicheck
  real    6m14.090s
  user    25m2.606s
  sys     1m22.919s

New timing of make coccicheck
  real    1m36.580s
  user    7m55.933s
  sys     0m18.219s

This is nearly a 4x decrease in the time required to run make
coccicheck. This is due to the overhead of restarting spatch for every
file. By processing all files at once, we can amortize this startup cost
across the total number of files, rather than paying it once per file.

Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
---
 Makefile | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Jacob Keller Oct. 2, 2018, 8:18 p.m. UTC | #1

On Tue, Oct 2, 2018 at 1:07 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>
> From: Jacob Keller <jacob.keller@gmail.com>
>
> make coccicheck is used in order to apply coccinelle semantic patches,
> and see if any of the transformations found within contrib/coccinelle/
> can be applied to the current code base.
>
> Pass every file to a single invocation of spatch, instead of running
> spatch once per source file.
>
> This reduces the time required to run make coccicheck by a significant
> amount of time:
>
> Prior timing of make coccicheck
>   real    6m14.090s
>   user    25m2.606s
>   sys     1m22.919s
>
> New timing of make coccicheck
>   real    1m36.580s
>   user    7m55.933s
>   sys     0m18.219s
>
> This is nearly a 4x decrease in the time required to run make
> coccicheck. This is due to the overhead of restarting spatch for every
> file. By processing all files at once, we can amortize this startup cost
> across the total number of files, rather than paying it once per file.
>
> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> ---

Forgot to add what changed. I dropped the subshell and "||" bit around
invoking spatch.

Thanks,
Jake


>  Makefile | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/Makefile b/Makefile
> index df1df9db78da..da692ece9e12 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2715,10 +2715,8 @@ endif
>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>         @echo '    ' SPATCH $<; \
>         ret=0; \
> -       for f in $(COCCI_SOURCES); do \
> -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> -                       { ret=$$?; break; }; \
> -       done >$@+ 2>$@.log; \
> +       $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
> +       ret=$$?; \
>         if test $$ret != 0; \
>         then \
>                 cat $@.log; \
> --
> 2.18.0.219.gaf81d287a9da
>

Jacob Keller Oct. 5, 2018, 2:17 a.m. UTC | #2

On Tue, Oct 2, 2018 at 1:18 PM Jacob Keller <jacob.keller@gmail.com> wrote:
>
> On Tue, Oct 2, 2018 at 1:07 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
> >
> > From: Jacob Keller <jacob.keller@gmail.com>
> >
> > make coccicheck is used in order to apply coccinelle semantic patches,
> > and see if any of the transformations found within contrib/coccinelle/
> > can be applied to the current code base.
> >
> > Pass every file to a single invocation of spatch, instead of running
> > spatch once per source file.
> >
> > This reduces the time required to run make coccicheck by a significant
> > amount of time:
> >
> > Prior timing of make coccicheck
> >   real    6m14.090s
> >   user    25m2.606s
> >   sys     1m22.919s
> >
> > New timing of make coccicheck
> >   real    1m36.580s
> >   user    7m55.933s
> >   sys     0m18.219s
> >
> > This is nearly a 4x decrease in the time required to run make
> > coccicheck. This is due to the overhead of restarting spatch for every
> > file. By processing all files at once, we can amortize this startup cost
> > across the total number of files, rather than paying it once per file.
> >
> > Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
> > ---
>
> Forgot to add what changed. I dropped the subshell and "||" bit around
> invoking spatch.
>
> Thanks,
> Jake
>

Junio, do you want me to update the commit message on my side with the
memory concerns? Or could you update it to mention memory as a noted
trade off.

Thanks,
Jake

>
> >  Makefile | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/Makefile b/Makefile
> > index df1df9db78da..da692ece9e12 100644
> > --- a/Makefile
> > +++ b/Makefile
> > @@ -2715,10 +2715,8 @@ endif
> >  %.cocci.patch: %.cocci $(COCCI_SOURCES)
> >         @echo '    ' SPATCH $<; \
> >         ret=0; \
> > -       for f in $(COCCI_SOURCES); do \
> > -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> > -                       { ret=$$?; break; }; \
> > -       done >$@+ 2>$@.log; \
> > +       $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
> > +       ret=$$?; \
> >         if test $$ret != 0; \
> >         then \
> >                 cat $@.log; \
> > --
> > 2.18.0.219.gaf81d287a9da
> >

SZEDER Gábor Oct. 5, 2018, 12:40 p.m. UTC | #3

On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> Junio, do you want me to update the commit message on my side with the
> memory concerns? Or could you update it to mention memory as a noted
> trade off.

We have been running 'make -j2 coccicheck' in the static analysis
build job on Travis CI, which worked just fine so far.  The Travis CI
build environments have 3GB of memory available [1], but, as shown in
[2], with this patch the memory consumption jumps up to about
1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
very likely bump into this limit.

So this patch should definitely change that build script to run only a
single job.


1 - https://docs.travis-ci.com/user/common-build-problems/#my-build-script-is-killed-without-any-error
2 - https://public-inbox.org/git/20181003101658.GM23446@localhost/


> > >  Makefile | 6 ++----
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/Makefile b/Makefile
> > > index df1df9db78da..da692ece9e12 100644
> > > --- a/Makefile
> > > +++ b/Makefile
> > > @@ -2715,10 +2715,8 @@ endif
> > >  %.cocci.patch: %.cocci $(COCCI_SOURCES)
> > >         @echo '    ' SPATCH $<; \
> > >         ret=0; \
> > > -       for f in $(COCCI_SOURCES); do \
> > > -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> > > -                       { ret=$$?; break; }; \
> > > -       done >$@+ 2>$@.log; \
> > > +       $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
> > > +       ret=$$?; \
> > >         if test $$ret != 0; \
> > >         then \
> > >                 cat $@.log; \
> > > --
> > > 2.18.0.219.gaf81d287a9da
> > >

Jeff King Oct. 5, 2018, 4:25 p.m. UTC | #4

On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:

> On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > Junio, do you want me to update the commit message on my side with the
> > memory concerns? Or could you update it to mention memory as a noted
> > trade off.
> 
> We have been running 'make -j2 coccicheck' in the static analysis
> build job on Travis CI, which worked just fine so far.  The Travis CI
> build environments have 3GB of memory available [1], but, as shown in
> [2], with this patch the memory consumption jumps up to about
> 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> very likely bump into this limit.
> 
> So this patch should definitely change that build script to run only a
> single job.

It should still be a net win, since the total CPU seems to drop by a
factor of 3-4.

Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
doesn't feel like an exorbitant request for a developer-only tool these
days, but I have noticed some people on the list tend to have lousier
machines than I do. ;)

-Peff

Keller, Jacob E Oct. 5, 2018, 4:53 p.m. UTC | #5

> -----Original Message-----
> From: Jeff King [mailto:peff@peff.net]
> Sent: Friday, October 05, 2018 9:25 AM
> To: SZEDER Gábor <szeder.dev@gmail.com>
> Cc: Jacob Keller <jacob.keller@gmail.com>; Keller, Jacob E
> <jacob.e.keller@intel.com>; Git mailing list <git@vger.kernel.org>
> Subject: Re: [PATCH v3] coccicheck: process every source file at once
> 
> On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:
> 
> > On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > > Junio, do you want me to update the commit message on my side with the
> > > memory concerns? Or could you update it to mention memory as a noted
> > > trade off.
> >
> > We have been running 'make -j2 coccicheck' in the static analysis
> > build job on Travis CI, which worked just fine so far.  The Travis CI
> > build environments have 3GB of memory available [1], but, as shown in
> > [2], with this patch the memory consumption jumps up to about
> > 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> > very likely bump into this limit.
> >
> > So this patch should definitely change that build script to run only a
> > single job.
> 
> It should still be a net win, since the total CPU seems to drop by a
> factor of 3-4.
> 
> Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> doesn't feel like an exorbitant request for a developer-only tool these
> days, but I have noticed some people on the list tend to have lousier
> machines than I do. ;)
> 
> -Peff

It's probably not worth trying to make this more complicated and scale up how many files we do at once based on the amount of available memory on the system...

Thanks,
Jake

Jeff King Oct. 5, 2018, 4:59 p.m. UTC | #6

On Fri, Oct 05, 2018 at 04:53:35PM +0000, Keller, Jacob E wrote:

> > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > doesn't feel like an exorbitant request for a developer-only tool these
> > days, but I have noticed some people on the list tend to have lousier
> > machines than I do. ;)
> > 
> > -Peff
> 
> It's probably not worth trying to make this more complicated and scale
> up how many files we do at once based on the amount of available
> memory on the system...

Yeah, that sounds too complicated. At most I'd give a Makefile knob to
say "spatch in batches of $(N)". But I'd prefer to avoid even that
complexity if we can.

-Peff

SZEDER Gábor Oct. 5, 2018, 6:39 p.m. UTC | #7

On Fri, Oct 05, 2018 at 12:25:17PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 02:40:48PM +0200, SZEDER Gábor wrote:
> 
> > On Thu, Oct 04, 2018 at 07:17:47PM -0700, Jacob Keller wrote:
> > > Junio, do you want me to update the commit message on my side with the
> > > memory concerns? Or could you update it to mention memory as a noted
> > > trade off.
> > 
> > We have been running 'make -j2 coccicheck' in the static analysis
> > build job on Travis CI, which worked just fine so far.  The Travis CI
> > build environments have 3GB of memory available [1], but, as shown in
> > [2], with this patch the memory consumption jumps up to about
> > 1.3-1.8GB for each of those jobs.  So with two parallel jobs we will
> > very likely bump into this limit.
> > 
> > So this patch should definitely change that build script to run only a
> > single job.
> 
> It should still be a net win, since the total CPU seems to drop by a
> factor of 3-4.

Well, that's true when you have unlimited resources... :)  or it's
true even then, when I have just enough resources, but not much
contention.  After all, Coccinelle doesn't have to parse the same
header files over and over again.  However, on Travis CI, where who
knows how many other build jobs are running next to our static
analysis, it doesn't seem to be the case.

On current master with an additional 'time' in front:

  time make --jobs=2 coccicheck
  <...>
  695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 91448maxresident)k
  5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps

  https://travis-ci.org/szeder/git/jobs/437733874#L574

With this patch, but without -j2 to fit into 3GB:

  960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 1606156maxresident)k
  5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps

  https://travis-ci.org/szeder/git/jobs/437734003#L575

Note that both the runtime and the CPU time increased. (and RSS, of
course)

> Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> doesn't feel like an exorbitant request for a developer-only tool these
> days, but I have noticed some people on the list tend to have lousier
> machines than I do. ;)
> 
> -Peff

SZEDER Gábor Oct. 5, 2018, 6:50 p.m. UTC | #8

On Fri, Oct 05, 2018 at 12:59:01PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 04:53:35PM +0000, Keller, Jacob E wrote:
> 
> > > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > > doesn't feel like an exorbitant request for a developer-only tool these
> > > days, but I have noticed some people on the list tend to have lousier
> > > machines than I do. ;)
> > > 
> > > -Peff
> > 
> > It's probably not worth trying to make this more complicated and scale
> > up how many files we do at once based on the amount of available
> > memory on the system...
> 
> Yeah, that sounds too complicated. At most I'd give a Makefile knob to
> say "spatch in batches of $(N)". But I'd prefer to avoid even that
> complexity if we can.

But perhaps one more if-else, e.g.:

  if test -n "$(COCCICHECK_ALL_AT_ONCE)"; then \
      <all at once from Jacob>
  else
      <old for loop>
  fi

would be an acceptable compromise?  Dunno.

Jeff King Oct. 5, 2018, 7 p.m. UTC | #9

On Fri, Oct 05, 2018 at 08:50:50PM +0200, SZEDER Gábor wrote:

> On Fri, Oct 05, 2018 at 12:59:01PM -0400, Jeff King wrote:
> > On Fri, Oct 05, 2018 at 04:53:35PM +0000, Keller, Jacob E wrote:
> > 
> > > > Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
> > > > doesn't feel like an exorbitant request for a developer-only tool these
> > > > days, but I have noticed some people on the list tend to have lousier
> > > > machines than I do. ;)
> > > > 
> > > > -Peff
> > > 
> > > It's probably not worth trying to make this more complicated and scale
> > > up how many files we do at once based on the amount of available
> > > memory on the system...
> > 
> > Yeah, that sounds too complicated. At most I'd give a Makefile knob to
> > say "spatch in batches of $(N)". But I'd prefer to avoid even that
> > complexity if we can.
> 
> But perhaps one more if-else, e.g.:
> 
>   if test -n "$(COCCICHECK_ALL_AT_ONCE)"; then \
>       <all at once from Jacob>
>   else
>       <old for loop>
>   fi
> 
> would be an acceptable compromise?  Dunno.

That's OK, too, assuming people would actually want to use it. I'm also
OK shipping this (with the "make -j" fix you suggested) and seeing if
anybody actually complains. I assume there are only a handful of people
running coccicheck in the first place.

-Peff

Jeff King Oct. 5, 2018, 7:02 p.m. UTC | #10

On Fri, Oct 05, 2018 at 08:39:04PM +0200, SZEDER Gábor wrote:

> > It should still be a net win, since the total CPU seems to drop by a
> > factor of 3-4.
> 
> Well, that's true when you have unlimited resources... :)  or it's
> true even then, when I have just enough resources, but not much
> contention.  After all, Coccinelle doesn't have to parse the same
> header files over and over again.  However, on Travis CI, where who
> knows how many other build jobs are running next to our static
> analysis, it doesn't seem to be the case.
> 
> On current master with an additional 'time' in front:
> 
>   time make --jobs=2 coccicheck
>   <...>
>   695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 91448maxresident)k
>   5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps
> 
>   https://travis-ci.org/szeder/git/jobs/437733874#L574
> 
> With this patch, but without -j2 to fit into 3GB:
> 
>   960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 1606156maxresident)k
>   5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps
> 
>   https://travis-ci.org/szeder/git/jobs/437734003#L575
> 
> Note that both the runtime and the CPU time increased. (and RSS, of
> course)

I'm not sure what to make of those results. Was the jump in CPU _caused_
by the patch, or does it independently fluctuate based on other things
happening on the Travis servers?

I.e., in the second run, do we know that the time would not have
actually been worse with the first patch?

-Peff

SZEDER Gábor Oct. 5, 2018, 7:54 p.m. UTC | #11

On Fri, Oct 05, 2018 at 03:02:16PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 08:39:04PM +0200, SZEDER Gábor wrote:
> 
> > > It should still be a net win, since the total CPU seems to drop by a
> > > factor of 3-4.
> > 
> > Well, that's true when you have unlimited resources... :)  or it's
> > true even then, when I have just enough resources, but not much
> > contention.  After all, Coccinelle doesn't have to parse the same
> > header files over and over again.  However, on Travis CI, where who
> > knows how many other build jobs are running next to our static
> > analysis, it doesn't seem to be the case.
> > 
> > On current master with an additional 'time' in front:
> > 
> >   time make --jobs=2 coccicheck
> >   <...>
> >   695.70user 50.27system 6:27.88elapsed 192%CPU (0avgtext+0avgdata 91448maxresident)k
> >   5976inputs+2536outputs (42major+18411888minor)pagefaults 0swaps
> > 
> >   https://travis-ci.org/szeder/git/jobs/437733874#L574
> > 
> > With this patch, but without -j2 to fit into 3GB:
> > 
> >   960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 1606156maxresident)k
> >   5976inputs+1320outputs (26major+4548440minor)pagefaults 0swaps
> > 
> >   https://travis-ci.org/szeder/git/jobs/437734003#L575
> > 
> > Note that both the runtime and the CPU time increased. (and RSS, of
> > course)
> 
> I'm not sure what to make of those results. Was the jump in CPU _caused_
> by the patch, or does it independently fluctuate based on other things
> happening on the Travis servers?
> 
> I.e., in the second run, do we know that the time would not have
> actually been worse with the first patch?

Runtimes tend to fluctuate quite a bit more on Travis CI compared to
my machine, but not this much, and it seems to be consistent so far.

After scripting/querying the Travis CI API a bit, I found that from
the last 100 static analysis build jobs 78 did actully run 'make
coccicheck' [1], avaraging 470s for the whole build job, with only 4
build job exceeding the 10min mark.

I had maybe 6-8 build jobs running this patch over the last 2-3 days,
I think all of them were over 15min.  (I restarted some of them, so I
don't have separate logs for all of them, hence the uncertainty.)


1 - There are a couple of canceled build jobs, and we skip the build
    job of branches when they happen to match a tags.

Jacob Keller Oct. 5, 2018, 11:10 p.m. UTC | #12

On Fri, Oct 5, 2018 at 12:00 PM Jeff King <peff@peff.net> wrote:
> That's OK, too, assuming people would actually want to use it. I'm also
> OK shipping this (with the "make -j" fix you suggested) and seeing if
> anybody actually complains. I assume there are only a handful of people
> running coccicheck in the first place.
>
> -Peff

Ok. I can go this route if we have consensus on the "break it and see
if someone complains" route.

Regards,
Jake

René Scharfe Oct. 6, 2018, 8:42 a.m. UTC | #13

Am 05.10.2018 um 21:00 schrieb Jeff King:
> On Fri, Oct 05, 2018 at 08:50:50PM +0200, SZEDER Gábor wrote:
> 
>> On Fri, Oct 05, 2018 at 12:59:01PM -0400, Jeff King wrote:
>>> On Fri, Oct 05, 2018 at 04:53:35PM +0000, Keller, Jacob E wrote:
>>>
>>>>> Are we OK with saying 1.3-1.8GB is necessary to run coccicheck? That
>>>>> doesn't feel like an exorbitant request for a developer-only tool these
>>>>> days, but I have noticed some people on the list tend to have lousier
>>>>> machines than I do. ;)
>>>>>
>>>>> -Peff
>>>>
>>>> It's probably not worth trying to make this more complicated and scale
>>>> up how many files we do at once based on the amount of available
>>>> memory on the system...
>>>
>>> Yeah, that sounds too complicated. At most I'd give a Makefile knob to
>>> say "spatch in batches of $(N)". But I'd prefer to avoid even that
>>> complexity if we can.
>>
>> But perhaps one more if-else, e.g.:
>>
>>   if test -n "$(COCCICHECK_ALL_AT_ONCE)"; then \
>>       <all at once from Jacob>
>>   else
>>       <old for loop>
>>   fi
>>
>> would be an acceptable compromise?  Dunno.
> 
> That's OK, too, assuming people would actually want to use it. I'm also
> OK shipping this (with the "make -j" fix you suggested) and seeing if
> anybody actually complains. I assume there are only a handful of people
> running coccicheck in the first place.

FWIW, my development environment is a virtual machine with 1200MB RAM
and 900MB swap space.  coccicheck takes almost eight minutes
sequentially, and four and a half minutes with -j4.

Unsurprisingly, it fails after almost three minutes with the patch,
reporting that it ran out of memory.  With 2900MB it fails after almost
two minutes, with 3000MB it succeeds after a good two minutes.

time(1) says (for -j1):

433.30user 36.17system 7:49.84elapsed 99%CPU (0avgtext+0avgdata 108212maxresident)k
192inputs+1512outputs (0major+16409056minor)pagefaults 0swaps

129.74user 2.06system 2:13.27elapsed 98%CPU (0avgtext+0avgdata 1884568maxresident)k
236896inputs+1096outputs (795major+462129minor)pagefaults 0swaps

So with the patch it's more than three times faster, but needs more
than seventeen times more memory.  And I need a bigger VM. :-/

René

Beat Bolli Oct. 7, 2018, 11:36 a.m. UTC | #14

On 02.10.18 22:18, Jacob Keller wrote:
> On Tue, Oct 2, 2018 at 1:07 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>
>> From: Jacob Keller <jacob.keller@gmail.com>
>>
>> make coccicheck is used in order to apply coccinelle semantic patches,
>> and see if any of the transformations found within contrib/coccinelle/
>> can be applied to the current code base.
>>
>> Pass every file to a single invocation of spatch, instead of running
>> spatch once per source file.
>>
>> This reduces the time required to run make coccicheck by a significant
>> amount of time:
>>
>> Prior timing of make coccicheck
>>   real    6m14.090s
>>   user    25m2.606s
>>   sys     1m22.919s
>>
>> New timing of make coccicheck
>>   real    1m36.580s
>>   user    7m55.933s
>>   sys     0m18.219s
>>
>> This is nearly a 4x decrease in the time required to run make
>> coccicheck. This is due to the overhead of restarting spatch for every
>> file. By processing all files at once, we can amortize this startup cost
>> across the total number of files, rather than paying it once per file.
>>
>> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
>> ---
> 
> Forgot to add what changed. I dropped the subshell and "||" bit around
> invoking spatch.
> 
> Thanks,
> Jake
> 
> 
>>  Makefile | 6 ++----
>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index df1df9db78da..da692ece9e12 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -2715,10 +2715,8 @@ endif
>>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>>         @echo '    ' SPATCH $<; \
>>         ret=0; \
>> -       for f in $(COCCI_SOURCES); do \
>> -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
>> -                       { ret=$$?; break; }; \
>> -       done >$@+ 2>$@.log; \
>> +       $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
>> +       ret=$$?; \
>>         if test $$ret != 0; \
>>         then \
>>                 cat $@.log; \
>> --
>> 2.18.0.219.gaf81d287a9da
>>

Wouldn't the following be even simpler?

diff --git a/Makefile b/Makefile
index 5c8307b7c479..a37b2724d526 100644
--- a/Makefile
+++ b/Makefile
@@ -2701,12 +2701,7 @@ endif

 %.cocci.patch: %.cocci $(COCCI_SOURCES)
        @echo '    ' SPATCH $<; \
-       ret=0; \
-       for f in $(COCCI_SOURCES); do \
-               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
-                       { ret=$$?; break; }; \
-       done >$@+ 2>$@.log; \
-       if test $$ret != 0; \
+       if ! $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS)
>$@+ 2>$@.log; \
        then \
                cat $@.log; \
                exit 1; \

Cheers,
Beat

Beat Bolli Oct. 7, 2018, 11:49 a.m. UTC | #15

On 07.10.18 13:36, Beat Bolli wrote:
> On 02.10.18 22:18, Jacob Keller wrote:
>> On Tue, Oct 2, 2018 at 1:07 PM Jacob Keller <jacob.e.keller@intel.com> wrote:
>>>
>>> From: Jacob Keller <jacob.keller@gmail.com>
>>>
>>> make coccicheck is used in order to apply coccinelle semantic patches,
>>> and see if any of the transformations found within contrib/coccinelle/
>>> can be applied to the current code base.
>>>
>>> Pass every file to a single invocation of spatch, instead of running
>>> spatch once per source file.
>>>
>>> This reduces the time required to run make coccicheck by a significant
>>> amount of time:
>>>
>>> Prior timing of make coccicheck
>>>   real    6m14.090s
>>>   user    25m2.606s
>>>   sys     1m22.919s
>>>
>>> New timing of make coccicheck
>>>   real    1m36.580s
>>>   user    7m55.933s
>>>   sys     0m18.219s
>>>
>>> This is nearly a 4x decrease in the time required to run make
>>> coccicheck. This is due to the overhead of restarting spatch for every
>>> file. By processing all files at once, we can amortize this startup cost
>>> across the total number of files, rather than paying it once per file.
>>>
>>> Signed-off-by: Jacob Keller <jacob.keller@gmail.com>
>>> ---
>>
>> Forgot to add what changed. I dropped the subshell and "||" bit around
>> invoking spatch.
>>
>> Thanks,
>> Jake
>>
>>
>>>  Makefile | 6 ++----
>>>  1 file changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/Makefile b/Makefile
>>> index df1df9db78da..da692ece9e12 100644
>>> --- a/Makefile
>>> +++ b/Makefile
>>> @@ -2715,10 +2715,8 @@ endif
>>>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>>>         @echo '    ' SPATCH $<; \
>>>         ret=0; \
>>> -       for f in $(COCCI_SOURCES); do \
>>> -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
>>> -                       { ret=$$?; break; }; \
>>> -       done >$@+ 2>$@.log; \
>>> +       $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) >$@+ 2>$@.log; \
>>> +       ret=$$?; \
>>>         if test $$ret != 0; \
>>>         then \
>>>                 cat $@.log; \
>>> --
>>> 2.18.0.219.gaf81d287a9da
>>>
> 
> Wouldn't the following be even simpler?
> 
> diff --git a/Makefile b/Makefile
> index 5c8307b7c479..a37b2724d526 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2701,12 +2701,7 @@ endif
> 
>  %.cocci.patch: %.cocci $(COCCI_SOURCES)
>         @echo '    ' SPATCH $<; \
> -       ret=0; \
> -       for f in $(COCCI_SOURCES); do \
> -               $(SPATCH) --sp-file $< $$f $(SPATCH_FLAGS) || \
> -                       { ret=$$?; break; }; \
> -       done >$@+ 2>$@.log; \
> -       if test $$ret != 0; \
> +       if ! $(SPATCH) --sp-file $< $(COCCI_SOURCES) $(SPATCH_FLAGS) 
>> $@+ 2>$@.log; \

The "If !" and the output redirection should be on one line,
obviously... Sorry about this.

Beat

Jeff King Oct. 9, 2018, 3:11 a.m. UTC | #16

On Sat, Oct 06, 2018 at 10:42:57AM +0200, René Scharfe wrote:

> > That's OK, too, assuming people would actually want to use it. I'm also
> > OK shipping this (with the "make -j" fix you suggested) and seeing if
> > anybody actually complains. I assume there are only a handful of people
> > running coccicheck in the first place.
> 
> FWIW, my development environment is a virtual machine with 1200MB RAM
> and 900MB swap space.  coccicheck takes almost eight minutes
> sequentially, and four and a half minutes with -j4.
> 
> Unsurprisingly, it fails after almost three minutes with the patch,
> reporting that it ran out of memory.  With 2900MB it fails after almost
> two minutes, with 3000MB it succeeds after a good two minutes.
> 
> time(1) says (for -j1):
> 
> 433.30user 36.17system 7:49.84elapsed 99%CPU (0avgtext+0avgdata 108212maxresident)k
> 192inputs+1512outputs (0major+16409056minor)pagefaults 0swaps
> 
> 129.74user 2.06system 2:13.27elapsed 98%CPU (0avgtext+0avgdata 1884568maxresident)k
> 236896inputs+1096outputs (795major+462129minor)pagefaults 0swaps
> 
> So with the patch it's more than three times faster, but needs more
> than seventeen times more memory.  And I need a bigger VM. :-/

Yuck. :) So if we want to take this as a complaint, then I guess we can
jump straight to implementing the fallback to the existing behavior
(though it may be worth it for you to expand your VM to get the
decreased CPU time).

I'm still puzzled by Gábor's counter-intuitive CI numbers, though.

-Peff

Jeff King Oct. 9, 2018, 3:15 a.m. UTC | #17

On Fri, Oct 05, 2018 at 09:54:13PM +0200, SZEDER Gábor wrote:

> Runtimes tend to fluctuate quite a bit more on Travis CI compared to
> my machine, but not this much, and it seems to be consistent so far.
> 
> After scripting/querying the Travis CI API a bit, I found that from
> the last 100 static analysis build jobs 78 did actully run 'make
> coccicheck' [1], avaraging 470s for the whole build job, with only 4
> build job exceeding the 10min mark.
> 
> I had maybe 6-8 build jobs running this patch over the last 2-3 days,
> I think all of them were over 15min.  (I restarted some of them, so I
> don't have separate logs for all of them, hence the uncertainty.)

So that's really weird and counter-intuitive, since we should be doing
strictly less work. I know that spatch tries to parallelize itself,
though from my tests, 1.0.4 does not. I wonder if the version in Travis
differs in that respect and starts too many threads, and the extra time
is going to contention and context switches.

Have you tried passing "-j1" to spatch? My 1.0.4 does not even recognize
it.

That seems like a pretty unlikely explanation to me, but I am having
trouble coming up with another one.

I guess the other plausible thing is that the extra memory is forcing us
into some slower path. E.g., a hypervisor may even be swapping,
unbeknownst to the child OS, and it gets accounted in the child OS as
"boy, that memory load was really slow", which becomes used CPU.

That actually sounds more credible to me.

-Peff

SZEDER Gábor Oct. 10, 2018, 11:44 a.m. UTC | #18

On Mon, Oct 08, 2018 at 11:15:42PM -0400, Jeff King wrote:
> On Fri, Oct 05, 2018 at 09:54:13PM +0200, SZEDER Gábor wrote:
> 
> > Runtimes tend to fluctuate quite a bit more on Travis CI compared to
> > my machine, but not this much, and it seems to be consistent so far.
> > 
> > After scripting/querying the Travis CI API a bit, I found that from
> > the last 100 static analysis build jobs 78 did actully run 'make
> > coccicheck' [1], avaraging 470s for the whole build job, with only 4
> > build job exceeding the 10min mark.
> > 
> > I had maybe 6-8 build jobs running this patch over the last 2-3 days,
> > I think all of them were over 15min.  (I restarted some of them, so I
> > don't have separate logs for all of them, hence the uncertainty.)
> 
> So that's really weird and counter-intuitive, since we should be doing
> strictly less work. I know that spatch tries to parallelize itself,
> though from my tests, 1.0.4 does not. I wonder if the version in Travis
> differs in that respect and starts too many threads, and the extra time
> is going to contention and context switches.

I don't think it does any parallel work.

Here is the timing again from my previous email:

  960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 1606156maxresident)k

Notice that 16:23 is 983s, and that it matches the sum of the user and
system times.  I usually saw this kind of timing with CPU-intensive
single-threaded programs, and if there were any parallelization, then I
would expect the elapsed time to be at least somewhat smaller than the
other two.

> Have you tried passing "-j1" to spatch? My 1.0.4 does not even recognize
> it.

I have just gave it a try, but the v1.0.0 on Travis CI errored out with
"unknown option `-j'.

  https://travis-ci.org/szeder/git/jobs/439532822#L566

> That seems like a pretty unlikely explanation to me, but I am having
> trouble coming up with another one.
> 
> I guess the other plausible thing is that the extra memory is forcing us
> into some slower path. E.g., a hypervisor may even be swapping,
> unbeknownst to the child OS, and it gets accounted in the child OS as
> "boy, that memory load was really slow", which becomes used CPU.
> 
> That actually sounds more credible to me.
> 
> -Peff

Jeff King Oct. 10, 2018, 1:59 p.m. UTC | #19

On Wed, Oct 10, 2018 at 01:44:41PM +0200, SZEDER Gábor wrote:

> > So that's really weird and counter-intuitive, since we should be doing
> > strictly less work. I know that spatch tries to parallelize itself,
> > though from my tests, 1.0.4 does not. I wonder if the version in Travis
> > differs in that respect and starts too many threads, and the extra time
> > is going to contention and context switches.
> 
> I don't think it does any parallel work.
> 
> Here is the timing again from my previous email:
> 
>   960.50user 22.59system 16:23.74elapsed 99%CPU (0avgtext+0avgdata 1606156maxresident)k
> 
> Notice that 16:23 is 983s, and that it matches the sum of the user and
> system times.  I usually saw this kind of timing with CPU-intensive
> single-threaded programs, and if there were any parallelization, then I
> would expect the elapsed time to be at least somewhat smaller than the
> other two.

Ah, right, I should have been able to figure that out myself. So scratch
that theory. My "hypervisor stalling our memory reads" theory is still
plausible, but I don't know how we would test it.

I guess in some sense it doesn't matter. If it's slower, we're not
likely to be able to fix that. So I guess we just need the fallback to
the current behavior.

-Peff

[v3] coccicheck: process every source file at once

Commit Message

Comments

Patch