diff mbox series

New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base]

Message ID 20181022173935.GG30222@szeder.dev
State New, archived
Headers show
Series New semantic patches vs. in-flight topics [was: Re: [PATCH 00/19] Bring more repository handles into our code base] | expand

Commit Message

SZEDER Gábor Oct. 22, 2018, 5:39 p.m. UTC
On Tue, Oct 16, 2018 at 04:35:31PM -0700, Stefan Beller wrote:
> the last patch (applying the semantic patches) has been omitted as that
> would produce a lot of merge conflicts. Without that patch, this merges
> cleanly to next.
> 
> As for when to apply the semantic patches, I wondered if we would prefer a dirty merge
> (created via "make coccicheck && git apply contrib/coccinelle/the_repository.cocci.patch")
> of the semantic patches, or if we'd actually trickle in the changes over some time?

>     This series takes another approach as it doesn't change the signature of
>     functions, but introduces new functions that can deal with arbitrary 
>     repositories, keeping the old function signature around using a shallow wrapper.
>     
>     Additionally each patch adds a semantic patch, that would port from the old to
>     the new function. These semantic patches are all applied in the very last patch,
>     but we could omit applying the last patch if it causes too many merge conflicts
>     and trickl in the semantic patches over time when there are no merge conflicts.

I don't really like how this or the previous RFC patch series deal
with semantic patches (or how some past patch series dealt with them,
for that matter), for various reasons:

  - Applying the transformations from several semantic patches in one
    single patch makes it harder to review it, because we won't know
    which change came from which semantic patch.
    
    For comparison, see the patch series adding hasheq()/oideq(),
    merged in 769af0fd9e (Merge branch 'jk/cocci', 2018-09-17), in
    particular the four "convert <this> to <that>" patches.

  - 'make coccicheck' won't run clean (and the static analysis build
    job on Travis CI will fail) for all commits following adding the
    new semantic patches but before applying the resulting
    transformations.

  - These semantic patches interact badly with 'pu' and 'next',
    because those integration branches can contain topic branches
    adding new code that should be transformed by these semanic
    patches.  Consequently, 'make coccicheck' won't run clean and the
    static analysis build job will fail until all those topics reach
    'master', and the remaining transformations are applied on top.
    
    This was (and still is!) an issue with the hasheq()/oideq() series
    as well: that series was added on 2018-08-28, and the static
    analysis build job is red on 'pu' ever since.  See the follow-up
    patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
    one more follow-up will be necessary after the builtin stash topic
    is merged to 'master'.

    This makes it harder to review other patch series.

  - Is it really necessary to carry these semantic patches in-tree?
    Let me ellaborate.  There are basically two main use cases for
    semantic patches:

      - To avoid undesirable code patterns, e.g. we should not use
        sha1_to_hex(oid.hash) or strbuf_addf(&sb, "fixed string"), but
        use oid_to_hex(&oid) or strbuf_addstr(&sb, "fixed string")
        instead.  Note that in these cases we don't remove the
        functions sha1_to_hex() or strbuf_addf(), because there are
        good reasons to use them in other scenarios.

        Our semantic patches under 'contrib/coccinelle/' fall into
        this category, and we have 'make coccicheck' and the static
        analysis build job on Travis CI to catch these undesirable
        code patterns preferably early, and to prevent them from
        entering our codebase.

      - To perform one-off code transformations, e.g. to modify a
        function's name and/or signature and convert all its
        callsites; see e.g. commits abef9020e3 (sha1_file: convert
        sha1_object_info* to object_id, 2018-03-12) and b4f5aca40e
        (sha1_file: convert read_sha1_file to struct object_id,
        2018-03-12).

        As far as I understand this patch series falls into this
        category: once the conversion is complete the old functions
        will be removed.  After that there will be no use for these
        semanic patches.

        Having said that, it's certainly easier to double-check the
        resulting transformations when one can apply the semantic
        patches locally, and doing so is easier when the semantic
        patches are in tree than when they must be copy-pasted from a
        commit message.

OK, that was already long.  Now, can we do better?

How about introducing the concept of "pending" semantic patches,
stored in 'contrib/coccinelle/<name>.pending.cocci' files, modifying
'make coccicheck' to skip them, and adding the new 'make
coccicheck-pending' target to make it convenient to apply them, e.g.
something like the simple patch at the end.

So the process would go something like this:

  - A new semantic patch should be added as "pending", e.g. to the
    file 'the_repository.pending.cocci', together with the resulting
    transformations in the same commit.

    This way neither 'make coccicheck' nor the static analysis build
    job would complain in the topic branch or in the two integration
    branches.  And if they do complain, then we would know right away
    that they complain because of a well-established semantic patch.
    Yet, anyone interested could run 'make coccicheck-pending' to see
    where are we heading.

  - The author of the "pending" semanting patch should then keep an
    eye on already cooking topics: whether any of them contain new
    code that should be transformed, and how they progress to
    'master', and sending followup patch(es) with the remaining
    transformations when applicable.
    
    Futhermore, the author should also pay attention to any new topics
    that branch off after the "pending" semantic patch, and whether
    any of them introduce code to be transformed, warning their
    authors as necessary.

  - Finally, after all the dust settled, the dev should follow up with
    a patch to:
    
      - promote the "penging" patch to '<name>.cocci', if its purpose
        is to avoid undesirable code patterns in the future, or
    
      - remove the semantic patch, if it was used in a one-off
        transformation.

Thoughts?


  -- >8 --

Subject: [PATCH] coccinelle: add support for "pending" semantic patches

(should add a 'contrib/coccinelle/README' to explain what
'.pending.cocci' means)

---
 Makefile | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Stefan Beller Oct. 22, 2018, 6:54 p.m. UTC | #1
On Mon, Oct 22, 2018 at 10:39 AM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Tue, Oct 16, 2018 at 04:35:31PM -0700, Stefan Beller wrote:
> > the last patch (applying the semantic patches) has been omitted as that
> > would produce a lot of merge conflicts. Without that patch, this merges
> > cleanly to next.
> >
> > As for when to apply the semantic patches, I wondered if we would prefer a dirty merge
> > (created via "make coccicheck && git apply contrib/coccinelle/the_repository.cocci.patch")
> > of the semantic patches, or if we'd actually trickle in the changes over some time?
>
> >     This series takes another approach as it doesn't change the signature of
> >     functions, but introduces new functions that can deal with arbitrary
> >     repositories, keeping the old function signature around using a shallow wrapper.
> >
> >     Additionally each patch adds a semantic patch, that would port from the old to
> >     the new function. These semantic patches are all applied in the very last patch,
> >     but we could omit applying the last patch if it causes too many merge conflicts
> >     and trickl in the semantic patches over time when there are no merge conflicts.
>
> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
>
>   - Applying the transformations from several semantic patches in one
>     single patch makes it harder to review it, because we won't know
>     which change came from which semantic patch.

Good point, so to improve the series sb/more-repo-in-api, I could
send the application of the semantic patch just after each patch
that adds another semantic patching rule? I personally dislike
applying the semantic patch in the same patch as where the
semantic rule was introduced, as then the mechanical conversions
(from the semantic patch) drown out reviewers attention to the
manual changes.

>     For comparison, see the patch series adding hasheq()/oideq(),
>     merged in 769af0fd9e (Merge branch 'jk/cocci', 2018-09-17), in
>     particular the four "convert <this> to <that>" patches.

The four patches "convert <this> to <that> only contain the semantic
patch as well as its effects, the manual changes are separated out
to later patches, which is quite nice.

>   - 'make coccicheck' won't run clean (and the static analysis build
>     job on Travis CI will fail) for all commits following adding the
>     new semantic patches but before applying the resulting
>     transformations.
>
>   - These semantic patches interact badly with 'pu' and 'next',
>     because those integration branches can contain topic branches
>     adding new code that should be transformed by these semanic
>     patches.

And I thought of this as a feature. There is no merge conflict and it
still compiles, which makes Junios work easy.

Of course it put the load elsewhere. :/

For the sake of a good history, I would think running 'make coccicheck'
and applying the resulting patches would be best as part of the (dirty)
merge of any topic that proposes new semantic patches, but that would
add load to Junio as it would be an extra step during the merge.

One could argue that the step of applying such transformations into
the dirty merge is cheaper than resolving merge conflicts that are
had when the topic includes the transformation.

>     Consequently, 'make coccicheck' won't run clean and the
>     static analysis build job will fail until all those topics reach
>     'master', and the remaining transformations are applied on top.
>
>     This was (and still is!) an issue with the hasheq()/oideq() series
>     as well: that series was added on 2018-08-28, and the static
>     analysis build job is red on 'pu' ever since.  See the follow-up
>     patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
>     one more follow-up will be necessary after the builtin stash topic
>     is merged to 'master'.

In my understanding this follow up is a feature, as it helps to avoid
merge conflicts with other topics in flight.

>     This makes it harder to review other patch series.

as 'make coccicheck' is an integral part of your review?

>   - Is it really necessary to carry these semantic patches in-tree?
>     Let me ellaborate.  There are basically two main use cases for
>     semantic patches:
>
>       - To avoid undesirable code patterns, e.g. we should not use
>         sha1_to_hex(oid.hash) or strbuf_addf(&sb, "fixed string"), but
>         use oid_to_hex(&oid) or strbuf_addstr(&sb, "fixed string")
>         instead.  Note that in these cases we don't remove the
>         functions sha1_to_hex() or strbuf_addf(), because there are
>         good reasons to use them in other scenarios.
>
>         Our semantic patches under 'contrib/coccinelle/' fall into
>         this category, and we have 'make coccicheck' and the static
>         analysis build job on Travis CI to catch these undesirable
>         code patterns preferably early, and to prevent them from
>         entering our codebase.
>
>       - To perform one-off code transformations, e.g. to modify a
>         function's name and/or signature and convert all its
>         callsites; see e.g. commits abef9020e3 (sha1_file: convert
>         sha1_object_info* to object_id, 2018-03-12) and b4f5aca40e
>         (sha1_file: convert read_sha1_file to struct object_id,
>         2018-03-12).
>
>         As far as I understand this patch series falls into this
>         category: once the conversion is complete the old functions
>         will be removed.  After that there will be no use for these
>         semanic patches.
>
>         Having said that, it's certainly easier to double-check the
>         resulting transformations when one can apply the semantic
>         patches locally, and doing so is easier when the semantic
>         patches are in tree than when they must be copy-pasted from a
>         commit message.
>
> OK, that was already long.  Now, can we do better?
>
> How about introducing the concept of "pending" semantic patches,
> stored in 'contrib/coccinelle/<name>.pending.cocci' files, modifying
> 'make coccicheck' to skip them, and adding the new 'make
> coccicheck-pending' target to make it convenient to apply them, e.g.
> something like the simple patch at the end.
>
> So the process would go something like this:
>
>   - A new semantic patch should be added as "pending", e.g. to the
>     file 'the_repository.pending.cocci', together with the resulting
>     transformations in the same commit.
>
>     This way neither 'make coccicheck' nor the static analysis build
>     job would complain in the topic branch or in the two integration
>     branches.  And if they do complain, then we would know right away
>     that they complain because of a well-established semantic patch.
>     Yet, anyone interested could run 'make coccicheck-pending' to see
>     where are we heading.
>
>   - The author of the "pending" semanting patch should then keep an
>     eye on already cooking topics: whether any of them contain new
>     code that should be transformed, and how they progress to
>     'master', and sending followup patch(es) with the remaining
>     transformations when applicable.
>
>     Futhermore, the author should also pay attention to any new topics
>     that branch off after the "pending" semantic patch, and whether
>     any of them introduce code to be transformed, warning their
>     authors as necessary.
>
>   - Finally, after all the dust settled, the dev should follow up with
>     a patch to:
>
>       - promote the "penging" patch to '<name>.cocci', if its purpose
>         is to avoid undesirable code patterns in the future, or
>
>       - remove the semantic patch, if it was used in a one-off
>         transformation.
>
> Thoughts?

I like the approach of having separate classes of semantic patches:
(a) the regular "we need to keep checking these" as they address
    undesirable code patterns, which is what we currently have,
    and what 'make coccicheck' would complain about.
(b) The pending patches as you propose. However I would
    argue that we'd not want to include the transformation into
    the same patch as then the patch will have merge conflicts.
    Ideally we'd have an automated process/bot that would apply
    all pending semantic patches onto master and then checks for
    conflicts in HEAD..pu, and only sends off the non-conflicting
    diffs as a topic.
    Then after a couple integration cycles we'd have all pending
    changes in, with no conflicts on Junios side.

So I think we should add a patch like you post, but we would
need to discuss the exact approach how to deal with pending
patches. Is it the original dev who should push forward on their
own pending patches, or does it become a pooled effort?

Thanks for starting this discussion!
Stefan
Junio C Hamano Oct. 22, 2018, 10:49 p.m. UTC | #2
SZEDER Gábor <szeder.dev@gmail.com> writes:

> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
> ...
> How about introducing the concept of "pending" semantic patches,
> stored in 'contrib/coccinelle/<name>.pending.cocci' files, modifying
> 'make coccicheck' to skip them, and adding the new 'make
> coccicheck-pending' target to make it convenient to apply them, e.g.
> something like the simple patch at the end.
>
> So the process would go something like this:
>
>   - A new semantic patch should be added as "pending", e.g. to the
>     file 'the_repository.pending.cocci', together with the resulting
>     transformations in the same commit.
>
>     This way neither 'make coccicheck' nor the static analysis build
>     job would complain in the topic branch or in the two integration
>     branches.  And if they do complain, then we would know right away
>     that they complain because of a well-established semantic patch.
>     Yet, anyone interested could run 'make coccicheck-pending' to see
>     where are we heading.
>
>   - The author of the "pending" semanting patch should then keep an
>     eye on already cooking topics: whether any of them contain new
>     code that should be transformed, and how they progress to
>     'master', and sending followup patch(es) with the remaining
>     transformations when applicable.
>     
>     Futhermore, the author should also pay attention to any new topics
>     that branch off after the "pending" semantic patch, and whether
>     any of them introduce code to be transformed, warning their
>     authors as necessary.
>
>   - Finally, after all the dust settled, the dev should follow up with
>     a patch to:
>     
>       - promote the "penging" patch to '<name>.cocci', if its purpose
>         is to avoid undesirable code patterns in the future, or
>     
>       - remove the semantic patch, if it was used in a one-off
>         transformation.
> ...
> Thoughts?

I actually think this round does a far nicer job playing well with
other topics than any earlier series.  The pain you are observing I
think come primarily from my not making the best use of these
patches.

Steppng back a bit, I'd imagine in an ideal world where "make
coccicheck" can be done instantaneously _and_ the spatch machinery
is just as reliable as C compilers.  In such a world, I may rename
all the *.c files in my tree to *.c.in, make the very first step of
the "make all" to run coccicheck and transform *.c.in to *.c before
starting the compilation.  There is no need to have changes to
*.c.in that spatch would fix.  Such an idealized set-up has a few
nice property.

 - Mechanical conflict resolutions between topics in flight and a
   series that modifies or adds new .cocci rules would greatly be
   reduced.

 - Still, *.c files that are "compiled" from *.c.in file used by
   each build will by definition cocci-clean.

 - Bugs like the one that required 6afedba8 ("object_id.cocci: match
   only expressions of type 'struct object_id'", 2018-10-15) are
   still possible, but some of them may be caught by C compilers
   that inspect the result of spatch compilation from *.c.in to *.c

Now we do not live in that ideal world and there is no separate
"turn *.c.in into *.c" step in our build procedure.  In such a
"real" world, if we had a rule like "each individual commit must
pass 'make coccicheck' cleanly", anybody who does such a merge and
resolve huge conflics would essentially be wasting time on something
that the tool could do, and then the result of the merge would
further need to be amended for semantic conflicts (i.e. the other
branch may have introduced new instances of old pattern .cocci
patches in our branch would want to transform)).  By not insisting
on cocci-cleanness at each commit level, we can gain (or at least
sumulate gaining) some benefit that we would reap in the ideal
world, and Stefan's latest series already does so for the former.
If we insisted that these patches must be accompanied with the
result of the cocci transformations, such a series will have zero
chance of landing in 'pu', unless we freeze the world.

What I _could_ do (and what I did do only for one round of pushing
out 'pu') is to queue a coccinelle transformation at the tip of
integration branches.  If "make coccicheck" ran in subsecond, I
could even automate it in the script that is used to rebuild 'pu'
every day, so that after merging each and every topic, do the "make
coccicheck" and apply resulting .cocci.patch files and squash that
into the merge commit.

But with the current "make coccicheck" performance, that is not
realistic.

I am wondering if it is feasible to do it at the tip of 'pu' (which
is rebuilt two to three times a day), 'next' (which is updated once
or twice a week) and 'master'.

I find that your "pending" idea may be nicer, as it distributes the
load.  Whoever wants to change the world order by updating the .cocci
rules is primarily responsible for making it happen without breaking
the world during the transition.  That's more scalable.

But at some point in the ideal future, it may turn out that it is
making the process scalable by distributing a task that does not
have to be done by any human, not by the integrator or by individual
developer---but the new compilation step from *.c.in to *.c should
be able to do.  We obviously are not there yet, so...
Stefan Beller Oct. 23, 2018, 12:26 a.m. UTC | #3
> Stepping back a bit, I'd imagine in an ideal world where "make
> coccicheck" can be done instantaneously _and_ the spatch machinery
> is just as reliable as C compilers.
> [...]
> Now we do not live in that ideal world and [...]
>  such a series will have zero
> chance of landing in 'pu', unless we freeze the world.

I wonder if we could approximate the ideal world with
rerere+spatch a bit more:

1) I resend the series that includes "apply cocci patches"
    as the last patch and you queue it as usual

2) The first time such a series is merged, you'd merge
    HEAD^ (i.e. excluding the "apply the semantic patch)
    to pu instead. I view this as a yet-to-be invented mode
    '--theirs-is-stale-use-tree-instead=THEIRS~1^{tree}',
    then run spatch to reproduce the last patch into the
    dirty merge (which has pu and the last patch as parent)

    This step is done to 'pre-heat' the rerere cache.

3) Any further integration (e.g. rebuilding pu) would
    benefit from the hot rerere cache and very little work
    is actually required as the conflicts are resolved by rerere.

Am I overestimating or misunderstanding rerere here?

> What I _could_ do (and what I did do only for one round of pushing
> out 'pu') is to queue a coccinelle transformation at the tip of
> integration branches.  If "make coccicheck" ran in subsecond, I
> could even automate it in the script that is used to rebuild 'pu'
> every day, so that after merging each and every topic, do the "make
> coccicheck" and apply resulting .cocci.patch files and squash that
> into the merge commit.
>
> But with the current "make coccicheck" performance, that is not
> realistic.

Would it be realistic for next and master branch instead of pu?

I'd be wary for the master branch, as we may not want to rely on
spatch without review. (It can produce funny white space issues,
but seems to produce working/correct code)

> I am wondering if it is feasible to do it at the tip of 'pu' (which
> is rebuilt two to three times a day), 'next' (which is updated once
> or twice a week) and 'master'.

We could even optimize that, by checking if contrib/cocci/ has
changes for the new tip of next/master respectively.

Another thing I wonder is if we care about the distinction between
the (a) pending changes as described by SZEDER, as we introduce
these deliberately, whereas (b) undesirable code patterns
(e.g. free and null instead of FREE_AND_NULL macro) might be
caught and reported in pu/next and then someone learns from it.
Automatic rewriting the (b) cases seems not just as desirable as
(a), where we do it purely to avoid resolving merge conflicts by
hand.

> I find that your "pending" idea may be nicer, as it distributes the
> load.  Whoever wants to change the world order by updating the .cocci
> rules is primarily responsible for making it happen without breaking
> the world during the transition.  That's more scalable.

... and I think SZEDER considers the current world broken as
'make coccicheck' returns non-empty, so it sounds to me as if
the current transition is thought less-than-optimal.
Junio C Hamano Oct. 23, 2018, 4:24 a.m. UTC | #4
Stefan Beller <sbeller@google.com> writes:

> Am I overestimating or misunderstanding rerere here?

Yes.

> Would it be realistic for next and master branch instead of pu?
>
> I'd be wary for the master branch, as we may not want to rely on
> spatch without review. (It can produce funny white space issues,
> but seems to produce working/correct code)

Yes, that is why I think it is a bit too late to do the "fixup with
spatch" after merging to 'next' and 'master'.
Junio C Hamano Oct. 23, 2018, 9:38 a.m. UTC | #5
Junio C Hamano <gitster@pobox.com> writes:

> I actually think this round does a far nicer job playing well with
> other topics than any earlier series.  The pain you are observing I
> think come primarily from my not making the best use of these
> patches.
>
> Steppng back a bit, I'd imagine in an ideal world where "make
> coccicheck" can be done instantaneously _and_ the spatch machinery
> is just as reliable as C compilers....
>
> What I _could_ do (and what I did do only for one round of pushing
> out 'pu') is to queue a coccinelle transformation at the tip of
> integration branches.  If "make coccicheck" ran in subsecond, I
> could even automate it in the script that is used to rebuild 'pu'
> every day, ...

Anyway, even though "make coccicheck" does not run in subsecond,
I've updated my machinery to rebuild the integration branches so
that I can optionally queue generated coccicheck patches, and what I
pushed out tonight has one at the tip of 'pu' and also another at
the tip of 'next'.  The latter seems to be passing all archs and
executing Windows run.

The tip of 'pu' has trouble with -Wunused on Apple around the
delta-islands series.


next: https://travis-ci.org/git/git/builds/444969406
pu:   https://travis-ci.org/git/git/builds/444969342
Carlo Arenas Oct. 23, 2018, 10:15 a.m. UTC | #6
On Tue, Oct 23, 2018 at 2:40 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> The tip of 'pu' has trouble with -Wunused on Apple around the
> delta-islands series.

FWIW the "problem" is actually with -Wunused-function and is AFAIK not
related to the semantic changes or Apple (AKA macOS)

Indeed, I saw this issue before also in Linux (Fedora Rawhide) when
using the latest clang and presumed it was just expected because the
topic branch was most likely incomplete.

Carlo
Junio C Hamano Oct. 23, 2018, 10:21 a.m. UTC | #7
Carlo Arenas <carenas@gmail.com> writes:

> On Tue, Oct 23, 2018 at 2:40 AM Junio C Hamano <gitster@pobox.com> wrote:
>>
>> The tip of 'pu' has trouble with -Wunused on Apple around the
>> delta-islands series.
>
> FWIW the "problem" is actually with -Wunused-function and is AFAIK not
> related to the semantic changes or Apple (AKA macOS)

Oh, don't get me wrong.  By Apple, I meant "the versions of compiler
used on the Apple build at TravisCI site".  I could have sent the
above two lines in a separate topic, as the issue does not have
anything to do with "new semantic patches" discussion, but I thought
it would be obvious to everybody who is reading the thread---it
seems I thought wrong ;-)
Stefan Beller Oct. 23, 2018, 5:30 p.m. UTC | #8
On Tue, Oct 23, 2018 at 2:38 AM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I actually think this round does a far nicer job playing well with
> > other topics than any earlier series.  The pain you are observing I
> > think come primarily from my not making the best use of these
> > patches.
> >
> > Steppng back a bit, I'd imagine in an ideal world where "make
> > coccicheck" can be done instantaneously _and_ the spatch machinery
> > is just as reliable as C compilers....
> >
> > What I _could_ do (and what I did do only for one round of pushing
> > out 'pu') is to queue a coccinelle transformation at the tip of
> > integration branches.  If "make coccicheck" ran in subsecond, I
> > could even automate it in the script that is used to rebuild 'pu'
> > every day, ...
>
> Anyway, even though "make coccicheck" does not run in subsecond,
> I've updated my machinery to rebuild the integration branches so
> that I can optionally queue generated coccicheck patches, and what I
> pushed out tonight has one at the tip of 'pu' and also another at
> the tip of 'next'.  The latter seems to be passing all archs and
> executing Windows run.

That is pretty exciting!

Looking at the commit in next, you also included the suggestion
from [1] to use a postincrement instead of a preincrement and I got
excited to see how we express such a thing in coccinelle,
but it turns out that it slipped in unrelated to the coccinelle patches.

How would we go from here?
It is not obvious to me how such changes would be integrated,
as regenerating them on top of pu will not help getting these changes
merged down, and applying the semantic patch on next (once
sb/more-repo-in-api lands in next) would created the merge conflicts for
all topics that are merged to next after that series.

[1] https://public-inbox.org/git/xmqqin1wyxvz.fsf@gitster-ct.c.googlers.com/
Junio C Hamano Oct. 24, 2018, 1:22 a.m. UTC | #9
Stefan Beller <sbeller@google.com> writes:

>> Anyway, even though "make coccicheck" does not run in subsecond,
>> I've updated my machinery to rebuild the integration branches so
>> that I can optionally queue generated coccicheck patches, and what I
>> pushed out tonight has one at the tip of 'pu' and also another at
>> the tip of 'next'.  The latter seems to be passing all archs and
>> executing Windows run.
>
> That is pretty exciting!
>
> Looking at the commit in next, you also included the suggestion
> from [1] to use a postincrement instead of a preincrement and I got
> excited to see how we express such a thing in coccinelle,
> but it turns out that it slipped in unrelated to the coccinelle patches.

See below, which was sitting in my working tree.

> How would we go from here?
> It is not obvious to me how such changes would be integrated,
> as regenerating them on top of pu will not help getting these changes
> merged down, and applying the semantic patch on next (once
> sb/more-repo-in-api lands in next) would created the merge conflicts for
> all topics that are merged to next after that series.

Conflicts with later topics is indeed worrysome.  That is why I did
it as an experiment.  If it becomes too painful, I'd probably stop
doing it while merging to anything other than 'pu', and then we can
follow the more distributed approach along the lines of what Szeder
suggested, to see how smoothly it goes.

-- >8 --
Subject: [PATCH] cocci: simplify "if (++u > 1)" to "if (u++)"

It is more common to use post-increment than pre-increment when the
side effect is the primary thing we want in our code and in C in
general (unlike C++).

Initializing a variable to 0, incrementing it every time we do
something, and checking if we have already done that thing to guard
the code to do that thing, is easier to understand when written

	if (u++)
		; /* we've done that! */
	else
		do_it(); /* just once. */

but if you try to use pre-increment, you end up with a less natural
looking

	if (++u > 1)

Signed-off-by: Junio C Hamano <gitster@pobox.com>
---
 contrib/coccinelle/preincr.cocci | 5 +++++
 1 file changed, 5 insertions(+)
 create mode 100644 contrib/coccinelle/preincr.cocci

diff --git a/contrib/coccinelle/preincr.cocci b/contrib/coccinelle/preincr.cocci
new file mode 100644
index 0000000000..7fe1e8d2d9
--- /dev/null
+++ b/contrib/coccinelle/preincr.cocci
@@ -0,0 +1,5 @@
+@ preincrement @
+identifier i;
+@@
+- ++i > 1
++ i++
SZEDER Gábor Oct. 25, 2018, 1:59 a.m. UTC | #10
On Mon, Oct 22, 2018 at 11:54:06AM -0700, Stefan Beller wrote:

> For the sake of a good history, I would think running 'make coccicheck'
> and applying the resulting patches would be best as part of the (dirty)
> merge of any topic that proposes new semantic patches, but that would
> add load to Junio as it would be an extra step during the merge.
> 
> One could argue that the step of applying such transformations into
> the dirty merge is cheaper than resolving merge conflicts that are
> had when the topic includes the transformation.

Please consider that merge commits' have uglier diffs than regular
commits, and that merge commits cause additional complications when
'git bisect' points the finger at them, both of which are exacerbated
by additional changes squeezed into evil merges.

> >     Consequently, 'make coccicheck' won't run clean and the
> >     static analysis build job will fail until all those topics reach
> >     'master', and the remaining transformations are applied on top.
> >
> >     This was (and still is!) an issue with the hasheq()/oideq() series
> >     as well: that series was added on 2018-08-28, and the static
> >     analysis build job is red on 'pu' ever since.  See the follow-up
> >     patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
> >     one more follow-up will be necessary after the builtin stash topic
> >     is merged to 'master'.
> 
> In my understanding this follow up is a feature, as it helps to avoid
> merge conflicts with other topics in flight.

I don't see how such a follow up patch helps to avoid merge conflicts.

There were topics that branched off before the introduction of oideq()
into 'master', therefore they couldn't make use of this new function
until they were merged to 'master' as well, so they added their own
!oidcmp() calls.  That follow up patch was necessary to transform
these new !oidcmp() calls after those topics reached 'master'.  Merge
conflicts had nothing to do with it.

So this follow up patch is not a feature, but rather an inherent
consequence of the project's branching model, with lots of parallel
running topics branching off at different points and progressing at
different speeds.

> >     This makes it harder to review other patch series.
> 
> as 'make coccicheck' is an integral part of your review?

Erm, right, "review" was not the right word here.  Anyway, as it is,
'make coccicheck' is an integral part of our automated tests, not only
on Travis CI but on the upcoming Azure thing as well.  I just try to
pay attention to its results and the results of a bunch of my
additional builds, and complain or even send a fix when something goes
reproducibly wrong.  This has certainly became more cumbersome with
the permanently failing static analysis build job in the last couple
of weeks.

> > How about introducing the concept of "pending" semantic patches,
> > stored in 'contrib/coccinelle/<name>.pending.cocci' files, modifying
> > 'make coccicheck' to skip them, and adding the new 'make
> > coccicheck-pending' target to make it convenient to apply them, e.g.
> > something like the simple patch at the end.
> >
> > So the process would go something like this:
> >
> >   - A new semantic patch should be added as "pending", e.g. to the
> >     file 'the_repository.pending.cocci', together with the resulting
> >     transformations in the same commit.
> >
> >     This way neither 'make coccicheck' nor the static analysis build
> >     job would complain in the topic branch or in the two integration
> >     branches.  And if they do complain, then we would know right away
> >     that they complain because of a well-established semantic patch.
> >     Yet, anyone interested could run 'make coccicheck-pending' to see
> >     where are we heading.
> >
> >   - The author of the "pending" semanting patch should then keep an
> >     eye on already cooking topics: whether any of them contain new
> >     code that should be transformed, and how they progress to
> >     'master', and sending followup patch(es) with the remaining
> >     transformations when applicable.
> >
> >     Futhermore, the author should also pay attention to any new topics
> >     that branch off after the "pending" semantic patch, and whether
> >     any of them introduce code to be transformed, warning their
> >     authors as necessary.
> >
> >   - Finally, after all the dust settled, the dev should follow up with
> >     a patch to:
> >
> >       - promote the "penging" patch to '<name>.cocci', if its purpose
> >         is to avoid undesirable code patterns in the future, or
> >
> >       - remove the semantic patch, if it was used in a one-off
> >         transformation.
> >
> > Thoughts?
> 
> I like the approach of having separate classes of semantic patches:
> (a) the regular "we need to keep checking these" as they address
>     undesirable code patterns, which is what we currently have,
>     and what 'make coccicheck' would complain about.
> (b) The pending patches as you propose. However I would
>     argue that we'd not want to include the transformation into
>     the same patch as then the patch will have merge conflicts.

Since we have a lot of parallel running topics, merge conflicts are
basically unavoidable anyway.  If the conflicts from the
transformation are really that severe, then perhaps the whole series
should be postponed to a calmer, more suitable time.

In the case of 'the_repository.cocci', merging its transformations
into 'pu' resulted in only four conflicts, and I found all four on the
easy side to resolve.  I don't think it's worth waiting with the
transformations in this particular case.

>     Ideally we'd have an automated process/bot that would apply
>     all pending semantic patches onto master and then checks for
>     conflicts in HEAD..pu, and only sends off the non-conflicting
>     diffs as a topic.

New semantic patches didn't pop up all that frequently in the past, so
I'm not sure it's worth investing in such an automation.  Of course
they can become more frequent in the future, and in that case we might
want to reconsider it.  Unfortunately, however, Coccinelle's results
can't be completely trusted, either because our semantic patches or
because Coccinelle itself are buggy...

>     Then after a couple integration cycles we'd have all pending
>     changes in, with no conflicts on Junios side.
> 
> So I think we should add a patch like you post, but we would
> need to discuss the exact approach how to deal with pending
> patches. Is it the original dev who should push forward on their
> own pending patches, or does it become a pooled effort?

Well, it makes sense to me that whoever proposes a change with an
accompanying new semantic patch should also deal with the necessary
followups.  However, it doesn't really matter who deals with them, as
long as somebody deals with them.  I don't think it's much different
from e.g. sending a followup bugfix to someone else's patch series.
Jeff King Oct. 25, 2018, 5:39 a.m. UTC | #11
On Mon, Oct 22, 2018 at 07:39:35PM +0200, SZEDER Gábor wrote:

> I don't really like how this or the previous RFC patch series deal
> with semantic patches (or how some past patch series dealt with them,
> for that matter), for various reasons:
> [..]

I am a little late to this thread, but it really seems to me that the
crux of the issue here:

>   - 'make coccicheck' won't run clean (and the static analysis build
>     job on Travis CI will fail) for all commits following adding the
>     new semantic patches but before applying the resulting
>     transformations.
> 
>   - These semantic patches interact badly with 'pu' and 'next',
>     because those integration branches can contain topic branches
>     adding new code that should be transformed by these semanic
>     patches.  Consequently, 'make coccicheck' won't run clean and the
>     static analysis build job will fail until all those topics reach
>     'master', and the remaining transformations are applied on top.

Is that we are considering it a failure for "pu" to have pending
coccinelle patches. Are there any coccicheck transformations that aren't
just "we prefer to do it like X instead of Y"?

Changes that actually break things should be caught by the compiler
(either by their nature, or because we take care to rename or change
interfaces when making a semantic change to a function).

I think that's getting at what you're saying here:

>   - Is it really necessary to carry these semantic patches in-tree?
>     Let me ellaborate.  There are basically two main use cases for
>     semantic patches:
> 
>       - To avoid undesirable code patterns, e.g. we should not use
> [...]
>       - To perform one-off code transformations, e.g. to modify a

I am mostly thinking of your first type here. And I think it makes sense
for people to avoid submitting code that does not pass "make coccicheck"
_as it was at the state of their branch_. But for cocci changes that are
in flight at the same time as their branch, I do not see any need to
strictly adhere to them. The code is "undesirable", not "wrong".

For the second type, I agree that we do not necessarily need to carry
them in-tree. Eventually the transformation happens, and nobody would
use the "old" way because it doesn't work anymore. Problem solved.

I do not mind carrying them for a while as a convenience to people who
will need to fix up their topics in flight (or more likely to the
maintainer, who will need to fixup the merge). But we should make sure
not to forget to remove them when their usefulness has passed.

Likewise, we should not forget to occasionally run "make coccicheck" on
master to make sure people have a clean base to build on. If Junio is
able to do that, then great. But other people can also do so and submit
patches (to go on their respective topics, or to be a new mass-cleanup
topic).

I guess there is some lag there if Junio pushes out a master branch that
fails coccicheck, because contributors may build on it before somebody
gets around to fixing it.

>         Having said that, it's certainly easier to double-check the
>         resulting transformations when one can apply the semantic
>         patches locally, and doing so is easier when the semantic
>         patches are in tree than when they must be copy-pasted from a
>         commit message.

I've wondered if we could have a script that pulls a coccinelle snippet
from a commit message and runs it. It may be a hassle to find the right
commit, though (you'd start the procedure from "oops, my compile now
seems broken; what was the change that I need to apply to adapt?").

>   - A new semantic patch should be added as "pending", e.g. to the
>     file 'the_repository.pending.cocci', together with the resulting
>     transformations in the same commit.
> 
>     This way neither 'make coccicheck' nor the static analysis build
>     job would complain in the topic branch or in the two integration
>     branches.  And if they do complain, then we would know right away
>     that they complain because of a well-established semantic patch.
>     Yet, anyone interested could run 'make coccicheck-pending' to see
>     where are we heading.

OK, makes sense.

>   - The author of the "pending" semanting patch should then keep an
>     eye on already cooking topics: whether any of them contain new
>     code that should be transformed, and how they progress to
>     'master', and sending followup patch(es) with the remaining
>     transformations when applicable.
>     
>     Futhermore, the author should also pay attention to any new topics
>     that branch off after the "pending" semantic patch, and whether
>     any of them introduce code to be transformed, warning their
>     authors as necessary.

This part seems tricky, though. There's a race condition between
promoting the patch from pending to not-pending and other topics
branching off. And remember that we do not always see other people's
branches, which they may work on in private for a long time (though I
suppose "when Junio applies it" is one effective cutoff).

>   - Finally, after all the dust settled, the dev should follow up with
>     a patch to:
>     
>       - promote the "penging" patch to '<name>.cocci', if its purpose
>         is to avoid undesirable code patterns in the future, or
>     
>       - remove the semantic patch, if it was used in a one-off
>         transformation.

I'm not wild about leaving these loose ends that need to be tied up
manually. It adds a lot of friction and effort.

-Peff
Stefan Beller Oct. 25, 2018, 7:25 p.m. UTC | #12
On Wed, Oct 24, 2018 at 6:59 PM SZEDER Gábor <szeder.dev@gmail.com> wrote:
>
> On Mon, Oct 22, 2018 at 11:54:06AM -0700, Stefan Beller wrote:
>
> > For the sake of a good history, I would think running 'make coccicheck'
> > and applying the resulting patches would be best as part of the (dirty)
> > merge of any topic that proposes new semantic patches, but that would
> > add load to Junio as it would be an extra step during the merge.
> >
> > One could argue that the step of applying such transformations into
> > the dirty merge is cheaper than resolving merge conflicts that are
> > had when the topic includes the transformation.
>
> Please consider that merge commits' have uglier diffs than regular
> commits, and that merge commits cause additional complications when
> 'git bisect' points the finger at them, both of which are exacerbated
> by additional changes squeezed into evil merges.
>
> > >     Consequently, 'make coccicheck' won't run clean and the
> > >     static analysis build job will fail until all those topics reach
> > >     'master', and the remaining transformations are applied on top.
> > >
> > >     This was (and still is!) an issue with the hasheq()/oideq() series
> > >     as well: that series was added on 2018-08-28, and the static
> > >     analysis build job is red on 'pu' ever since.  See the follow-up
> > >     patch e43d2dcce1 (more oideq/hasheq conversions, 2018-10-02), and
> > >     one more follow-up will be necessary after the builtin stash topic
> > >     is merged to 'master'.
> >
> > In my understanding this follow up is a feature, as it helps to avoid
> > merge conflicts with other topics in flight.
>
> I don't see how such a follow up patch helps to avoid merge conflicts.

Well, you merge first (the new topic and the cocci patches), and then
do the transformation. But that is putting a lot more work on Junio
as the way to integrate is not just merge a new topic into the pile of
topics (whether it is pu/next/master), but to first perform a merge
of the topic and the coccinelle patches, apply the transformation
and then merge to the pile, assuming the pile is already transformed
(as it happened in "treewide: apply cocci patch" in next/pu).

> > as 'make coccicheck' is an integral part of your review?
>
> Erm, right, "review" was not the right word here.  Anyway, as it is,
> 'make coccicheck' is an integral part of our automated tests, not only
> on Travis CI but on the upcoming Azure thing as well.  I just try to
> pay attention to its results and the results of a bunch of my
> additional builds, and complain or even send a fix when something goes
> reproducibly wrong.  This has certainly became more cumbersome with
> the permanently failing static analysis build job in the last couple
> of weeks.

I seem to not pay as much attention as I should to these,
mostly because they are unreliable  on the aggregate level (a failure
there most likely means another topic than the one I am interested
broke; except in this case, where we discuss the fallout there via
this topic.)

> > I like the approach of having separate classes of semantic patches:
> > (a) the regular "we need to keep checking these" as they address
> >     undesirable code patterns, which is what we currently have,
> >     and what 'make coccicheck' would complain about.
> > (b) The pending patches as you propose. However I would
> >     argue that we'd not want to include the transformation into
> >     the same patch as then the patch will have merge conflicts.
>
> Since we have a lot of parallel running topics, merge conflicts are
> basically unavoidable anyway.  If the conflicts from the
> transformation are really that severe, then perhaps the whole series
> should be postponed to a calmer, more suitable time.

Merge conflicts of this kind could be avoided, by running the
transformation on both sides before merging (or not running them
on both sides, but only after merging).

So maybe for these larger 'the_repository.pending.cocci' patches
we could get them into the tree and wait until all (most) of the topics
are including that semantic patch in their base, such that application
of the patch is easy whether before or after writing a series
(as the semantic patch is in its base).

Another short term plan would be renaming the_repository.cocci
such that it would break the 'make coccicheck'

> In the case of 'the_repository.cocci', merging its transformations
> into 'pu' resulted in only four conflicts, and I found all four on the
> easy side to resolve.  I don't think it's worth waiting with the
> transformations in this particular case.

I am not worried about the current conflicts, but about those to come
in new series.

>
> >     Ideally we'd have an automated process/bot that would apply
> >     all pending semantic patches onto master and then checks for
> >     conflicts in HEAD..pu, and only sends off the non-conflicting
> >     diffs as a topic.
>
> New semantic patches didn't pop up all that frequently in the past, so
> I'm not sure it's worth investing in such an automation.  Of course
> they can become more frequent in the future, and in that case we might
> want to reconsider it.  Unfortunately, however, Coccinelle's results
> can't be completely trusted, either because our semantic patches or
> because Coccinelle itself are buggy...

For the first, we could just be become better at reviewing the Cocci
patch. ;-) (Well, they are harder to review than usual patches, so
this doesn't surprise me. Also not fully understanding the whole
tool may have impact on reviewability)

>
> >     Then after a couple integration cycles we'd have all pending
> >     changes in, with no conflicts on Junios side.
> >
> > So I think we should add a patch like you post, but we would
> > need to discuss the exact approach how to deal with pending
> > patches. Is it the original dev who should push forward on their
> > own pending patches, or does it become a pooled effort?
>
> Well, it makes sense to me that whoever proposes a change with an
> accompanying new semantic patch should also deal with the necessary
> followups.  However, it doesn't really matter who deals with them, as
> long as somebody deals with them.  I don't think it's much different
> from e.g. sending a followup bugfix to someone else's patch series.

Ok, I plan on resending these patches after I get
origin/sb/submodule-recursive-fetch-gets-the-tip going
resent.

Stefan
diff mbox series

Patch

diff --git a/Makefile b/Makefile
index d18ab0fe78..67a4441f39 100644
--- a/Makefile
+++ b/Makefile
@@ -2728,9 +2728,11 @@  endif
 	then \
 		echo '    ' SPATCH result: $@; \
 	fi
-coccicheck: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.cocci))
+coccicheck: $(addsuffix .patch,$(filter-out %.pending.cocci,$(wildcard contrib/coccinelle/*.cocci)))
 
-.PHONY: coccicheck
+coccicheck-pending: $(addsuffix .patch,$(wildcard contrib/coccinelle/*.pending.cocci))
+
+.PHONY: coccicheck coccicheck-pending
 
 ### Installation rules