diff mbox series

blame.c: replace instance of !oidcmp for oideq

Message ID 20200907171639.766547-1-eantoranz@gmail.com (mailing list archive)
State Superseded
Commit 1302badd16ad36bc9441367b240e053130d15f7a
Headers show
Series blame.c: replace instance of !oidcmp for oideq | expand

Commit Message

Edmundo Carmona Antoranz Sept. 7, 2020, 5:16 p.m. UTC
---
 blame.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Edmundo Carmona Antoranz Sept. 7, 2020, 5:21 p.m. UTC | #1
On Mon, Sep 7, 2020 at 11:16 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
Blamed the wrong branch. I should have looped Derrick instead of Jeff.
Sorry about that.
Edmundo Carmona Antoranz Sept. 8, 2020, 1:55 p.m. UTC | #2
On Mon, Sep 7, 2020 at 11:21 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
> Blamed the wrong branch. I should have looped Derrick instead of Jeff.
> Sorry about that.

I realized I didn't sign it off. Should I send it again? Or given that
it's an almost 1-liner, it's ok?
If I send it again, I will provide just a little more context about
having the !oidcmp calls replaced
for oideq in previous versions.
Derrick Stolee Sept. 8, 2020, 7:07 p.m. UTC | #3
On 9/7/2020 1:16 PM, Edmundo Carmona Antoranz wrote:
> ---

Please include sign-off. I saw you reported your intention there
in another message, but it's probably best to just send it again.

This message could also mention 14438c4 (introduce hasheq() and
oideq(), 2018-08-28) which introduced oideq().

This use of !oidcmp() was introduced by 0906ac2b (blame: use
changed-path Bloom filters, 2020-04-16). My bad. There is no
good reason to introduce this use since it is well after the
oideq() method was introduced.

> @@ -1353,8 +1353,8 @@ static struct blame_origin *find_origin(struct repository *r,
>  	else {
>  		int compute_diff = 1;
>  		if (origin->commit->parents &&
> -		    !oidcmp(&parent->object.oid,
> -			    &origin->commit->parents->item->object.oid))
> +		    oideq(&parent->object.oid,
> +			  &origin->commit->parents->item->object.oid))
>  			compute_diff = maybe_changed_path(r, origin, bd);

The code itself looks correct.

Thanks,
-Stolee
Jeff King Sept. 9, 2020, 9:11 a.m. UTC | #4
On Tue, Sep 08, 2020 at 03:07:34PM -0400, Derrick Stolee wrote:

> This message could also mention 14438c4 (introduce hasheq() and
> oideq(), 2018-08-28) which introduced oideq().
> 
> This use of !oidcmp() was introduced by 0906ac2b (blame: use
> changed-path Bloom filters, 2020-04-16). My bad. There is no
> good reason to introduce this use since it is well after the
> oideq() method was introduced.
> 
> > @@ -1353,8 +1353,8 @@ static struct blame_origin *find_origin(struct repository *r,
> >  	else {
> >  		int compute_diff = 1;
> >  		if (origin->commit->parents &&
> > -		    !oidcmp(&parent->object.oid,
> > -			    &origin->commit->parents->item->object.oid))
> > +		    oideq(&parent->object.oid,
> > +			  &origin->commit->parents->item->object.oid))
> >  			compute_diff = maybe_changed_path(r, origin, bd);
> 
> The code itself looks correct.

Yeah, it looks obviously correct. I am puzzled why "make coccicheck"
doesn't find this, though. +cc René, as my favorite target for
coccinelle nerd-snipes. :)

(But clearly we should make the change with or without figuring out the
coccinelle part).

-Peff
Edmundo Carmona Antoranz Sept. 9, 2020, 2 p.m. UTC | #5
On Wed, Sep 9, 2020 at 3:11 AM Jeff King <peff@peff.net> wrote:
>
> Yeah, it looks obviously correct. I am puzzled why "make coccicheck"
> doesn't find this, though. +cc René, as my favorite target for
> coccinelle nerd-snipes. :)
>

I added this to contrib/coccinelle/object_id.cocci in v2.27.0

@@
identifier f != oideq;
expression E1, E2;
@@
- !oidcmp(E1, E2)
+ oideq(E1, E2)

And it found it:

$ cat contrib/coccinelle/object_id.cocci.patch
diff -u -p a/blame.c b/blame.c
--- a/blame.c
+++ b/blame.c
@@ -1352,8 +1352,7 @@ static struct blame_origin *find_origin(
       else {
               int compute_diff = 1;
               if (origin->commit->parents &&
-                   !oidcmp(&parent->object.oid,
-                           &origin->commit->parents->item->object.oid))
+                   oideq(&parent->object.oid,
&origin->commit->parents->item->object.oid))
                       compute_diff = maybe_changed_path(r, origin, bd);

               if (compute_diff)


Do I need to add more things into the coccinelle definition so that it
is more restrictive in terms of the
expression we are hunting down?
Jeff Smith Sept. 9, 2020, 5:48 p.m. UTC | #6
I haven't had a chance to look at the cocci script, but I did have one
thought...

Derrick pointed out, 14438c4 added both oideq and hasheq.
It might be good to have a similar check for hasheq, if there is not
one already.

On Wed, Sep 9, 2020 at 9:01 AM Edmundo Carmona Antoranz
<eantoranz@gmail.com> wrote:
>
> On Wed, Sep 9, 2020 at 3:11 AM Jeff King <peff@peff.net> wrote:
> >
> > Yeah, it looks obviously correct. I am puzzled why "make coccicheck"
> > doesn't find this, though. +cc René, as my favorite target for
> > coccinelle nerd-snipes. :)
> >
>
> I added this to contrib/coccinelle/object_id.cocci in v2.27.0
>
> @@
> identifier f != oideq;
> expression E1, E2;
> @@
> - !oidcmp(E1, E2)
> + oideq(E1, E2)
>
> And it found it:
>
> $ cat contrib/coccinelle/object_id.cocci.patch
> diff -u -p a/blame.c b/blame.c
> --- a/blame.c
> +++ b/blame.c
> @@ -1352,8 +1352,7 @@ static struct blame_origin *find_origin(
>        else {
>                int compute_diff = 1;
>                if (origin->commit->parents &&
> -                   !oidcmp(&parent->object.oid,
> -                           &origin->commit->parents->item->object.oid))
> +                   oideq(&parent->object.oid,
> &origin->commit->parents->item->object.oid))
>                        compute_diff = maybe_changed_path(r, origin, bd);
>
>                if (compute_diff)
>
>
> Do I need to add more things into the coccinelle definition so that it
> is more restrictive in terms of the
> expression we are hunting down?
Jeff King Sept. 9, 2020, 7:13 p.m. UTC | #7
On Wed, Sep 09, 2020 at 08:00:57AM -0600, Edmundo Carmona Antoranz wrote:

> On Wed, Sep 9, 2020 at 3:11 AM Jeff King <peff@peff.net> wrote:
> >
> > Yeah, it looks obviously correct. I am puzzled why "make coccicheck"
> > doesn't find this, though. +cc René, as my favorite target for
> > coccinelle nerd-snipes. :)
> >
> 
> I added this to contrib/coccinelle/object_id.cocci in v2.27.0
> 
> @@
> identifier f != oideq;
> expression E1, E2;
> @@
> - !oidcmp(E1, E2)
> + oideq(E1, E2)
> 
> And it found it:

Interesting. The existing rule is:

  struct object_id *OIDPTR1;
  struct object_id *OIDPTR2;
  @@
  - oidcmp(OIDPTR1, OIDPTR2) == 0
  + oideq(OIDPTR1, OIDPTR2)

The "== 0" part looks like it might be significant, but it's not.
Coccinelle knows that "!foo" is the same as "foo == 0" (and you can
confirm by tweaking it).

The addition of "identifer f != oideq" here isn't necessary (we don't
even define an "f" in the semantic patch part). And anyway, we use
hasheq() inside oideq(), so no need to override the rule there.

So the relevant part is probably that our existing rule specifies the
exact type, whereas your rule allows any expression.

And indeed, if I do this, it works:

diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci
index ddf4f22bd7..62a6cee0eb 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -55,8 +55,8 @@ struct object_id OID;
 + oidcmp(&OID, OIDPTR)
 
 @@
-struct object_id *OIDPTR1;
-struct object_id *OIDPTR2;
+expression OIDPTR1;
+expression OIDPTR2;
 @@
 - oidcmp(OIDPTR1, OIDPTR2) == 0
 + oideq(OIDPTR1, OIDPTR2)

Which really _seems_ like a bug in coccinelle, unless I am missing
something. Because both of those parameters look like object_id pointers
(and the compiler would be complaining if it were not the case).  But I
also wonder if giving the specific types in the coccinelle rule is
buying us anything. If you passed two void pointers or ints or whatever
to !oidcmp(), we'd still want to rewrite it as oideq().

-Peff
Jeff King Sept. 9, 2020, 7:17 p.m. UTC | #8
On Wed, Sep 09, 2020 at 03:13:46PM -0400, Jeff King wrote:

> Which really _seems_ like a bug in coccinelle, unless I am missing
> something. Because both of those parameters look like object_id pointers
> (and the compiler would be complaining if it were not the case).  But I
> also wonder if giving the specific types in the coccinelle rule is
> buying us anything. If you passed two void pointers or ints or whatever
> to !oidcmp(), we'd still want to rewrite it as oideq().

And indeed, just blindly swapping out "struct object_id" for
"expression" in the coccinelle file (patch below), shows another spot
that was missed:

diff -u -p a/packfile.c b/packfile.c
--- a/packfile.c
+++ b/packfile.c
@@ -735,7 +735,7 @@ struct packed_git *add_packed_git(const
 	p->mtime = st.st_mtime;
 	if (path_len < the_hash_algo->hexsz ||
 	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
-		hashclr(p->hash);
+		oidclr(p);
 	return p;
 }
 

Maybe it's worth being looser in our cocci patch definitions. I'm having
trouble thinking of a downside...

-Peff

-- >8 --
Here's the patch to loosen object_id.cocci. Perhaps we'd want to do the
same in other files.

diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci
index ddf4f22bd7..738c60923e 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -1,62 +1,62 @@
 @@
-struct object_id OID;
+expression OID;
 @@
 - is_null_sha1(OID.hash)
 + is_null_oid(&OID)
 
 @@
-struct object_id *OIDPTR;
+expression *OIDPTR;
 @@
 - is_null_sha1(OIDPTR->hash)
 + is_null_oid(OIDPTR)
 
 @@
-struct object_id OID;
+expression OID;
 @@
 - hashclr(OID.hash)
 + oidclr(&OID)
 
 @@
 identifier f != oidclr;
-struct object_id *OIDPTR;
+expression *OIDPTR;
 @@
   f(...) {<...
 - hashclr(OIDPTR->hash)
 + oidclr(OIDPTR)
   ...>}
 
 @@
-struct object_id OID1, OID2;
+expression OID1, OID2;
 @@
 - hashcmp(OID1.hash, OID2.hash)
 + oidcmp(&OID1, &OID2)
 
 @@
 identifier f != oidcmp;
-struct object_id *OIDPTR1, OIDPTR2;
+expression *OIDPTR1, OIDPTR2;
 @@
   f(...) {<...
 - hashcmp(OIDPTR1->hash, OIDPTR2->hash)
 + oidcmp(OIDPTR1, OIDPTR2)
   ...>}
 
 @@
-struct object_id *OIDPTR;
-struct object_id OID;
+expression *OIDPTR;
+expression OID;
 @@
 - hashcmp(OIDPTR->hash, OID.hash)
 + oidcmp(OIDPTR, &OID)
 
 @@
-struct object_id *OIDPTR;
-struct object_id OID;
+expression *OIDPTR;
+expression OID;
 @@
 - hashcmp(OID.hash, OIDPTR->hash)
 + oidcmp(&OID, OIDPTR)
 
 @@
-struct object_id *OIDPTR1;
-struct object_id *OIDPTR2;
+expression OIDPTR1;
+expression OIDPTR2;
 @@
 - oidcmp(OIDPTR1, OIDPTR2) == 0
 + oideq(OIDPTR1, OIDPTR2)
@@ -71,8 +71,8 @@ expression E1, E2;
   ...>}
 
 @@
-struct object_id *OIDPTR1;
-struct object_id *OIDPTR2;
+expression *OIDPTR1;
+expression *OIDPTR2;
 @@
 - oidcmp(OIDPTR1, OIDPTR2) != 0
 + !oideq(OIDPTR1, OIDPTR2)
René Scharfe Sept. 9, 2020, 7:54 p.m. UTC | #9
Am 09.09.20 um 21:17 schrieb Jeff King:
> On Wed, Sep 09, 2020 at 03:13:46PM -0400, Jeff King wrote:
>
>> Which really _seems_ like a bug in coccinelle, unless I am missing
>> something. Because both of those parameters look like object_id pointers
>> (and the compiler would be complaining if it were not the case).  But I
>> also wonder if giving the specific types in the coccinelle rule is
>> buying us anything. If you passed two void pointers or ints or whatever
>> to !oidcmp(), we'd still want to rewrite it as oideq().

Right, using expressions for such a like-for-like transformation is safe
and practical in the sense that it won't break correct code, and broken
code will be flagged by the compiler.

>
> And indeed, just blindly swapping out "struct object_id" for
> "expression" in the coccinelle file (patch below), shows another spot
> that was missed:
>
> diff -u -p a/packfile.c b/packfile.c
> --- a/packfile.c
> +++ b/packfile.c
> @@ -735,7 +735,7 @@ struct packed_git *add_packed_git(const
>  	p->mtime = st.st_mtime;
>  	if (path_len < the_hash_algo->hexsz ||
>  	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
> -		hashclr(p->hash);
> +		oidclr(p);
>  	return p;
>  }
>
>
> Maybe it's worth being looser in our cocci patch definitions. I'm having
> trouble thinking of a downside...

For transformations that change the type as in the example above we
should insist on getting the right one, otherwise we might introduce
bugs -- like in the example above.  p points to a struct packed_git and
not to a struct object_id, so this introduces a type mismatch.

We better make sure our semantic patches are safe, otherwise we have to
check all conversions very carefully, and then we might be better off
doing them manually..

René
Jeff King Sept. 9, 2020, 7:58 p.m. UTC | #10
On Wed, Sep 09, 2020 at 09:54:55PM +0200, René Scharfe wrote:

> > diff -u -p a/packfile.c b/packfile.c
> > --- a/packfile.c
> > +++ b/packfile.c
> > @@ -735,7 +735,7 @@ struct packed_git *add_packed_git(const
> >  	p->mtime = st.st_mtime;
> >  	if (path_len < the_hash_algo->hexsz ||
> >  	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
> > -		hashclr(p->hash);
> > +		oidclr(p);
> >  	return p;
> >  }
> >
> >
> > Maybe it's worth being looser in our cocci patch definitions. I'm having
> > trouble thinking of a downside...
> 
> For transformations that change the type as in the example above we
> should insist on getting the right one, otherwise we might introduce
> bugs -- like in the example above.  p points to a struct packed_git and
> not to a struct object_id, so this introduces a type mismatch.

Heh. You'd think that I would have applied that patch and run "make". Or
even read it carefully.

Thanks for pointing that out. I guess now we have a real example of a
downside (the compiler _would_ still catch it, but it means "make
coccicheck" is useless if it's repeatedly suggesting a bad
transformation).

-Peff
Junio C Hamano Sept. 9, 2020, 8:03 p.m. UTC | #11
René Scharfe <l.s.r@web.de> writes:

>> diff -u -p a/packfile.c b/packfile.c
>> --- a/packfile.c
>> +++ b/packfile.c
>> @@ -735,7 +735,7 @@ struct packed_git *add_packed_git(const
>>  	p->mtime = st.st_mtime;
>>  	if (path_len < the_hash_algo->hexsz ||
>>  	    get_sha1_hex(path + path_len - the_hash_algo->hexsz, p->hash))
>> -		hashclr(p->hash);
>> +		oidclr(p);
>>  	return p;
>>  }
>>
>>
>> Maybe it's worth being looser in our cocci patch definitions. I'm having
>> trouble thinking of a downside...
>
> For transformations that change the type as in the example above we
> should insist on getting the right one, otherwise we might introduce
> bugs -- like in the example above.  p points to a struct packed_git and
> not to a struct object_id, so this introduces a type mismatch.

;-)  A good counter-example.

> We better make sure our semantic patches are safe, otherwise we have to
> check all conversions very carefully, and then we might be better off
> doing them manually..

Yes, that is a sensible suggestion.
Junio C Hamano Sept. 9, 2020, 8:06 p.m. UTC | #12
Jeff King <peff@peff.net> writes:

>  @@
> -struct object_id *OIDPTR1;
> -struct object_id *OIDPTR2;
> +expression OIDPTR1;
> +expression OIDPTR2;
>  @@
>  - oidcmp(OIDPTR1, OIDPTR2) == 0
>  + oideq(OIDPTR1, OIDPTR2)
> @@ -71,8 +71,8 @@ expression E1, E2;
>    ...>}
>  
>  @@
> -struct object_id *OIDPTR1;
> -struct object_id *OIDPTR2;
> +expression *OIDPTR1;
> +expression *OIDPTR2;
>  @@
>  - oidcmp(OIDPTR1, OIDPTR2) != 0
>  + !oideq(OIDPTR1, OIDPTR2)

With an extra insight from the counter-example Réne pointed out in
your message, I think the above two are safe but all the others are
unsafe.
René Scharfe Sept. 9, 2020, 8:43 p.m. UTC | #13
Am 09.09.20 um 21:13 schrieb Jeff King:
> On Wed, Sep 09, 2020 at 08:00:57AM -0600, Edmundo Carmona Antoranz wrote:
>
>> On Wed, Sep 9, 2020 at 3:11 AM Jeff King <peff@peff.net> wrote:
>>>
>>> Yeah, it looks obviously correct. I am puzzled why "make coccicheck"
>>> doesn't find this, though. +cc René, as my favorite target for
>>> coccinelle nerd-snipes. :)
>>>
>>
>> I added this to contrib/coccinelle/object_id.cocci in v2.27.0
>>
>> @@
>> identifier f != oideq;
>> expression E1, E2;
>> @@
>> - !oidcmp(E1, E2)
>> + oideq(E1, E2)
>>
>> And it found it:
>
> Interesting. The existing rule is:
>
>   struct object_id *OIDPTR1;
>   struct object_id *OIDPTR2;
>   @@
>   - oidcmp(OIDPTR1, OIDPTR2) == 0
>   + oideq(OIDPTR1, OIDPTR2)
>
> The "== 0" part looks like it might be significant, but it's not.
> Coccinelle knows that "!foo" is the same as "foo == 0" (and you can
> confirm by tweaking it).

It is significant in the sense that "x == 0" in the semantic patch also
matches "!x" in the code, but "!x" in the semantic patch doesn't match
"x == 0".  That's because coccinelle has this isomorphism built in
(in /usr/lib/coccinelle/standard.iso on my machine):

Expression
@ not_int1 @
int X;
@@
 !X => X == 0

It's a one-way isomorphism (i.e. a rule that says that certain
expressions have the same meaning).  So we should use "x == 0" over "!x"
in semantic patches to cover both cases.

> So the relevant part is probably that our existing rule specifies the
> exact type, whereas your rule allows any expression.
>
> And indeed, if I do this, it works:
>
> diff --git a/contrib/coccinelle/object_id.cocci b/contrib/coccinelle/object_id.cocci
> index ddf4f22bd7..62a6cee0eb 100644
> --- a/contrib/coccinelle/object_id.cocci
> +++ b/contrib/coccinelle/object_id.cocci
> @@ -55,8 +55,8 @@ struct object_id OID;
>  + oidcmp(&OID, OIDPTR)
>
>  @@
> -struct object_id *OIDPTR1;
> -struct object_id *OIDPTR2;
> +expression OIDPTR1;
> +expression OIDPTR2;
>  @@
>  - oidcmp(OIDPTR1, OIDPTR2) == 0
>  + oideq(OIDPTR1, OIDPTR2)
>
> Which really _seems_ like a bug in coccinelle, unless I am missing
> something. Because both of those parameters look like object_id pointers
> (and the compiler would be complaining if it were not the case).
Yes, seems it looks like coccinelle gives up trying to determine the
type of these things.

And while this one here matches the example in blame.c:

@@
expression A, B;
@@
- 0 == oidcmp(A, B)
+ oideq(A, B)

... and this one does as well:

@@
expression A, B;
@@
- !oidcmp
+ oideq
  (A, B)

... the following one doesn't:

@@
expression A, B;
@@
- 0 == oidcmp
+ oideq
  (A, B)

... and neither does this one:

@@
expression A, B;
@@
- oidcmp
+ oideq
  (A, B)
- == 0

So it helps to try some variants in the hope to bypass some of the
restrictions/bugs/misunderstandings. O_o

René
diff mbox series

Patch

diff --git a/blame.c b/blame.c
index 1be1cd82a2..b475bfa1c0 100644
--- a/blame.c
+++ b/blame.c
@@ -1353,8 +1353,8 @@  static struct blame_origin *find_origin(struct repository *r,
 	else {
 		int compute_diff = 1;
 		if (origin->commit->parents &&
-		    !oidcmp(&parent->object.oid,
-			    &origin->commit->parents->item->object.oid))
+		    oideq(&parent->object.oid,
+			  &origin->commit->parents->item->object.oid))
 			compute_diff = maybe_changed_path(r, origin, bd);
 
 		if (compute_diff)