diff mbox series

xen/lib: Fix strcmp() and strncmp()

Message ID 20210727184715.17699-1-jane.malalane@citrix.com (mailing list archive)
State New, archived
Headers show
Series xen/lib: Fix strcmp() and strncmp() | expand

Commit Message

Jane Malalane July 27, 2021, 6:47 p.m. UTC
The C standard requires that each character be compared as unsigned
char. Xen's current behaviour compares as signed char, which changes
the answer when chars with a value greater than 0x7f are used.

Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
---
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: George Dunlap <george.dunlap@citrix.com>
CC: Ian Jackson <iwj@xenproject.org>
CC: Jan Beulich <jbeulich@suse.com>
CC: Julien Grall <julien@xen.org>
CC: Stefano Stabellini <sstabellini@kernel.org>
CC: Wei Liu <wl@xen.org>
---
 xen/lib/strcmp.c  | 8 +++++---
 xen/lib/strncmp.c | 8 +++++---
 2 files changed, 10 insertions(+), 6 deletions(-)

Comments

Ian Jackson July 28, 2021, 10:42 a.m. UTC | #1
Jane Malalane writes ("[PATCH] xen/lib: Fix strcmp() and strncmp()"):
> The C standard requires that each character be compared as unsigned
> char. Xen's current behaviour compares as signed char, which changes
> the answer when chars with a value greater than 0x7f are used.
> 
> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
> Signed-off-by: Jane Malalane <jane.malalane@citrix.com>

Thanks for this.

What are the practical effects of this bug ?  AFAICT in the hypervisor
code all the call sites simply test for zero/nonzero.

Of course we should fix this because

> -		if ((__res = *cs - *ct++) != 0 || !*cs++)

this substraction is UB if it overflows.  So in theory the compiler
could miscompile it - although in practice I can't see how the
assumption that this doesn't overflow would "help" the compiler.

Ian.
Jane Malalane July 30, 2021, 6:52 a.m. UTC | #2
On 28/07/2021 11:42, Ian Jackson wrote:
> [CAUTION - EXTERNAL EMAIL] DO NOT reply, click links, or open attachments unless you have verified the sender and know the content is safe.
>
> Jane Malalane writes ("[PATCH] xen/lib: Fix strcmp() and strncmp()"):
>> The C standard requires that each character be compared as unsigned
>> char. Xen's current behaviour compares as signed char, which changes
>> the answer when chars with a value greater than 0x7f are used.
>>
>> Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
>> Signed-off-by: Jane Malalane <jane.malalane@citrix.com>
> Thanks for this.
>
> What are the practical effects of this bug ?  AFAICT in the hypervisor
> code all the call sites simply test for zero/nonzero.
>
> Of course we should fix this because
>
>> -		if ((__res = *cs - *ct++) != 0 || !*cs++)
> this substraction is UB if it overflows.  So in theory the compiler
> could miscompile it - although in practice I can't see how the
> assumption that this doesn't overflow would "help" the compiler.
>
> Ian.
>
>
>
> This fix was just to make the code spec compliant and mainly for 
> practice as I'm currently being introduced to Xen.
>
> Jane
>
Ian Jackson July 30, 2021, 9:50 a.m. UTC | #3
Jane Malalane writes ("Re: [PATCH] xen/lib: Fix strcmp() and strncmp()"):
> On 28/07/2021 11:42, Ian Jackson wrote:
>     What are the practical effects of this bug ?  AFAICT in the hypervisor
>     code all the call sites simply test for zero/nonzero.
...
>     This fix was just to make the code spec compliant and mainly for practice as I'm currently being introduced to Xen.

OK, great.  As I say it looks correct to me.  I just wanted to make
sure I wasn't missing anything.

So,

Reviewed-by: Ian Jackson <iwj@xenproject.org>

and I will queue this.

Ian.
Andrew Cooper July 30, 2021, 10:12 a.m. UTC | #4
On 27/07/2021 19:47, Jane Malalane wrote:
> diff --git a/xen/lib/strcmp.c b/xen/lib/strcmp.c
> index 465f1c4191..f85c1e8741 100644
> --- a/xen/lib/strcmp.c
> +++ b/xen/lib/strcmp.c
> @@ -11,14 +11,16 @@
>   */
>  int (strcmp)(const char *cs, const char *ct)
>  {
> -	register signed char __res;
> +	unsigned char *csu = (unsigned char *)cs;
> +	unsigned char *ctu = (unsigned char *)ct;

So there was actually one final thing, but it is holiday season, hence
the lack of replies from others.

We should not be casting away const-ness on the pointers, because that
is undefined behaviour and compilers are starting to warn about it. 
Therefore, we want something like:

const unsigned char *csu = (const unsigned char *)cs;

~Andrew
Ian Jackson July 30, 2021, 12:29 p.m. UTC | #5
Andrew Cooper writes ("Re: [PATCH] xen/lib: Fix strcmp() and strncmp()"):
> On 27/07/2021 19:47, Jane Malalane wrote:
> > -	register signed char __res;
> > +	unsigned char *csu = (unsigned char *)cs;
> > +	unsigned char *ctu = (unsigned char *)ct;
> 
> So there was actually one final thing, but it is holiday season, hence
> the lack of replies from others.

Oh.

> We should not be casting away const-ness on the pointers, because that
> is undefined behaviour and compilers are starting to warn about it. 

I don't think casting away const is UB.  Perhaps you (and perhaps
others) are seeing this in 6.3.2.3(2):

 | For any qualifier q, a pointer to a non-q-qualified type may be
 | converted to a pointer to the q-qualified version of the type; the
 | values stored in the original and converted pointers shall compare
 | equal.p

This does indeed define the meaning of *adding* qualifiers to a
pointer type but not define the meaning of removing them.  But that
whole paragraph is almost redundant, because in 6.3.2.3(7):

 | A pointer to an object or incomplete type may be converted to a
 | pointer to a different object or incomplete type. If the resulting
 | pointer is not correctly aligned57) for the pointed-to type, the
 | behavior is undefined. Otherwise, when converted back again, the
 | result shall compare equal to the original pointer.

This defines the meaning of conversions of pointers to object types
(like char*) regardless of the qualifiers.

I read that as "a pointer to an object type or to an incomplete type".
But the precise reading doesn't matter because these pointers are
actually to objects.

There's also this in 6.7.3(5):

 | If an attempt is made to modify an object defined with a
 | const-qualified type through use of an lvalue with
 | non-const-qualified type, the behavior is undefined.  made to refer
 | to an object defined with a volatile-qualified type through

But there is no attempt to modify.  (Also this paragraph doesn't apply
because characters in string literals have type char, not type const
char, but 6.4.6(6) directly prohibits modification of characters in
string literals.)

6.2.7(2) says

 | All declarations that refer to the same object or function shall
 | have compatible type; otherwise, the behavior is undefined.

but I don't think these pointers variables are declarations of the
chars pointed to.

> Therefore, we want something like:
> 
> const unsigned char *csu = (const unsigned char *)cs;

Having said all thst, I agree that that not casting away const would
be better (especially if it generates compiler warnings).

I pushed it already.  If thios is UB we should revert it but as I say
I think it isn't, so we can wait for a followup.

Thanks,
Ian.
diff mbox series

Patch

diff --git a/xen/lib/strcmp.c b/xen/lib/strcmp.c
index 465f1c4191..f85c1e8741 100644
--- a/xen/lib/strcmp.c
+++ b/xen/lib/strcmp.c
@@ -11,14 +11,16 @@ 
  */
 int (strcmp)(const char *cs, const char *ct)
 {
-	register signed char __res;
+	unsigned char *csu = (unsigned char *)cs;
+	unsigned char *ctu = (unsigned char *)ct;
+	int res;
 
 	while (1) {
-		if ((__res = *cs - *ct++) != 0 || !*cs++)
+		if ((res = *csu - *ctu++) != 0 || !*csu++)
 			break;
 	}
 
-	return __res;
+	return res;
 }
 
 /*
diff --git a/xen/lib/strncmp.c b/xen/lib/strncmp.c
index 9af7fa1c99..1480f58c2e 100644
--- a/xen/lib/strncmp.c
+++ b/xen/lib/strncmp.c
@@ -12,15 +12,17 @@ 
  */
 int (strncmp)(const char *cs, const char *ct, size_t count)
 {
-	register signed char __res = 0;
+	unsigned char *csu = (unsigned char *)cs;
+	unsigned char *ctu = (unsigned char *)ct;
+	int res = 0;
 
 	while (count) {
-		if ((__res = *cs - *ct++) != 0 || !*cs++)
+		if ((res = *csu - *ctu++) != 0 || !*csu++)
 			break;
 		count--;
 	}
 
-	return __res;
+	return res;
 }
 
 /*