diff mbox series

warn when zero-extending a negation

Message ID 20201216222448.2054-1-luc.vanoostenryck@gmail.com (mailing list archive)
State New
Headers show
Series warn when zero-extending a negation | expand

Commit Message

Luc Van Oostenryck Dec. 16, 2020, 10:24 p.m. UTC
When an unsigned value is negated (logical or arithmetical) and
then converted to a wider type, this value will be zero-extended,
not sign-extended. In other words, upper bits won't be negated.

This may be the intention but may also be a source of errors.

So, add a warning for this. Also, because this warning may be too
noise because most catches will possibly be false-positives, add a
specific warning flag to disable it: -Wno-zero-extend-negation.

Link: https://lore.kernel.org/r/CAHk-=wjiC6UejP6xob9BMQy98O6OLGDhy-qDfaFcOJxo90iOFg@mail.gmail.com
Originally-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
---

On my usual test setup (defconfig + allyesconfig) this gives 199 warnings.
I only checked a couple of them, they we're false positives but somehow
error-prone if some definitions are changed. For example:
* struct super_block::s_flags is defined as 'unsigned long', all flags
  are hold in 32-bits but struct fs_context::sb_flags_mask is defined
  as 'unsigned int'.
* struct inode::i_stat is defined as 'unsigned long', all I_* are defined
  as (signed) 'int' but some code do 'unsigned dirty = I_DIRTY;'

For the moment, I've left the warning enabled by default but it should
probably only be enabled at W=1.


@Linus,

I suppose that it is fine for you that I your SoB instead of the
'Originally-by' I used here?

-- Luc


 linearize.c | 25 +++++++++++++++++++++++++
 options.c   |  2 ++
 options.h   |  1 +
 sparse.1    |  8 ++++++++
 4 files changed, 36 insertions(+)

Comments

Linus Torvalds Dec. 16, 2020, 10:37 p.m. UTC | #1
On Wed, Dec 16, 2020 at 2:25 PM Luc Van Oostenryck
<luc.vanoostenryck@gmail.com> wrote:
>
> I suppose that it is fine for you that I your SoB instead of the
> 'Originally-by' I used here?

Either works for me.

Some of the cases I saw (from my very quick look) were because of
annoying zero extensions that should have been optimized away.

Example:

    static inline unsigned char bytemask(int bit)
    {
        return ~(1<<bit);
    }

    unsigned char maskout(unsigned char a, int bit)
    {
        return a & bytemask(bit);
    }

note how this is all safe, because everything is operating on just bytes.

But sparse complains:

    t.c:3:21: warning: zero-extending a negation - upper bits not negated

because obviously the usual C type expansion to 'int'.

But that really is because sparse is benign stupid, and leaves those
type expansions around even though they then get truncated down again.
You can see it in the code generation too:

Zero-extend, and then truncate:

  zext.32     %r2 <- (8) %arg1
  shl.32      %r5 <- $1, %arg2
  trunc.8     %r6 <- (32) %r5

then do the 'not' in 8 bits, because we did that part ok:

   not.8       %r7 <- %r6

and then the zero-extend and truncate thing again:

    zext.32     %r9 <- (8) %r7
    and.32      %r10 <- %r2, %r9
    trunc.8     %r11 <- (32) %r10

and then the return in 8 bits:

    ret.8       %r11

because sparse doesn't do the simplification to just do the shl and
and in 8 bits (but sparse *does* do the simplification to do the 'not'
in 8 bits).

So the warning comes from the fact that we kept that zero extend
around, even though it really wasn't relevant..

I don't know how many of the false positives were due to things like
this, but at least a couple were.

                Linus
Luc Van Oostenryck Dec. 16, 2020, 11:51 p.m. UTC | #2
On Wed, Dec 16, 2020 at 02:37:04PM -0800, Linus Torvalds wrote:
> On Wed, Dec 16, 2020 at 2:25 PM Luc Van Oostenryck
> <luc.vanoostenryck@gmail.com> wrote:
> >
> > I suppose that it is fine for you that I your SoB instead of the
> > 'Originally-by' I used here?
> 
> Either works for me.
> 
> Some of the cases I saw (from my very quick look) were because of
> annoying zero extensions that should have been optimized away.

...
 
> Zero-extend, and then truncate:
> 
>   zext.32     %r2 <- (8) %arg1
>   shl.32      %r5 <- $1, %arg2
>   trunc.8     %r6 <- (32) %r5
> 
> then do the 'not' in 8 bits, because we did that part ok:
> 
>    not.8       %r7 <- %r6
> 
> and then the zero-extend and truncate thing again:
> 
>     zext.32     %r9 <- (8) %r7
>     and.32      %r10 <- %r2, %r9
>     trunc.8     %r11 <- (32) %r10
> 
> and then the return in 8 bits:
> 
>     ret.8       %r11
> 
> because sparse doesn't do the simplification to just do the shl and
> and in 8 bits (but sparse *does* do the simplification to do the 'not'
> in 8 bits).

But replacing a trunc + zext by the corresponding masking, very
little, if anything is done for such 'mixed-width' expressions.
So, I'm even a bit surprised by the not.8 but well ... 

I also confess, that coming from an ARM background, seeing a
not.8 or a shl.8 seems quite unnatural to me. I would tend to
force everything to at least the same width as 'int'.

> So the warning comes from the fact that we kept that zero extend
> around, even though it really wasn't relevant..
> 
> I don't know how many of the false positives were due to things like
> this, but at least a couple were.

Yes, most probably.
I suppose my (old old) series about bitfield simplification will
help a little bit here. I'll try to look at this during the holidays.

-- Luc
Luc Van Oostenryck Dec. 17, 2020, 12:35 a.m. UTC | #3
On Thu, Dec 17, 2020 at 12:51:52AM +0100, Luc Van Oostenryck wrote:
> 
> But replacing a trunc + zext by the corresponding masking, very
> little, if anything is done for such 'mixed-width' expressions.
> So, I'm even a bit surprised by the not.8 but well ... 

This bothered me a bit and kept me awake, so I had to check.

I think that the situation is caused by some premature optimization
for the ~ operator in expression.c:cast_to(). It saves the allocation
and initialization of one expression but makes things more complicated
at linearization and simplification. If this is disabled, then the IR
simplification returns what I was expecting:
	maskout:
		zext.32     %r2 <- (8) %arg1
		shl.32      %r5 <- $1, %arg2
		not.32      %r6 <- %r5
		and.32      %r9 <- %r6, $255
		and.32      %r10 <- %r2, %r9
		trunc.8     %r11 <- (32) %r10
		ret.8       %r11

and some reassociation patches (coming soon) will simplify away
the masking with $255 and the trunc.8

-- Luc
diff mbox series

Patch

diff --git a/linearize.c b/linearize.c
index 0250c6bb17ef..b9faac78ebb7 100644
--- a/linearize.c
+++ b/linearize.c
@@ -2520,6 +2520,27 @@  static void check_tainted_insn(struct instruction *insn)
 	}
 }
 
+static void check_zero_extend(struct instruction *insn)
+{
+	struct instruction *def;
+	pseudo_t src = insn->src1;
+
+	if (!Wzero_extend_negation)
+		return;
+	if (src->type != PSEUDO_REG)
+		return;
+	def = src->def;
+	if (!def)
+		return;
+	switch (def->opcode) {
+	case OP_NEG: case OP_NOT:
+		warning(insn->pos, "zero-extending a negation - upper bits not negated");
+		break;
+	default:
+		break;
+	}
+}
+
 ///
 // issue warnings after all possible DCE
 static void late_warnings(struct entrypoint *ep)
@@ -2537,6 +2558,10 @@  static void late_warnings(struct entrypoint *ep)
 				// Check for illegal offsets.
 				check_access(insn);
 				break;
+			case OP_ZEXT:
+				// Check for missing sign extension..
+				check_zero_extend(insn);
+				break;
 			}
 		} END_FOR_EACH_PTR(insn);
 	} END_FOR_EACH_PTR(bb);
diff --git a/options.c b/options.c
index 17da5f367e24..5323ddc05861 100644
--- a/options.c
+++ b/options.c
@@ -139,6 +139,7 @@  int Wunion_cast = 0;
 int Wuniversal_initializer = 0;
 int Wunknown_attribute = 0;
 int Wvla = 1;
+int Wzero_extend_negation = 1;
 
 ////////////////////////////////////////////////////////////////////////////////
 // Helpers for option parsing
@@ -884,6 +885,7 @@  static const struct flag warnings[] = {
 	{ "universal-initializer", &Wuniversal_initializer },
 	{ "unknown-attribute", &Wunknown_attribute },
 	{ "vla", &Wvla },
+	{ "zero-extend-negation", &Wzero_extend_negation },
 	{ }
 };
 
diff --git a/options.h b/options.h
index 0aec8764d27d..3403c9518ead 100644
--- a/options.h
+++ b/options.h
@@ -138,6 +138,7 @@  extern int Wunion_cast;
 extern int Wuniversal_initializer;
 extern int Wunknown_attribute;
 extern int Wvla;
+extern int Wzero_extend_negation;
 
 extern char **handle_switch(char *arg, char **next);
 extern void handle_switch_finalize(void);
diff --git a/sparse.1 b/sparse.1
index 430b3710b260..928e3513b9b6 100644
--- a/sparse.1
+++ b/sparse.1
@@ -494,6 +494,14 @@  Warn on casts to union types.
 
 Sparse does not issues these warnings by default.
 .
+.TP
+.B -Wzero-extend-negation
+Warn when an unsigned value is first negated (logical or arithmetical)
+and then converted to a wider type.
+
+Sparse issues these warnings by default.
+To turn them off, use \fB-Wno-zero-extend-negation\fR.
+.
 .SH MISC OPTIONS
 .TP
 .B \-\-arch=\fIARCH\fR