Message ID | 5345273.JScMX9oohG@wuerfel (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, May 27, 2016 at 5:33 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > gcc can not always figure out which code is only used in an error > condition an assignment to indirect argument is only done after the > use of IS_ERR() catches errors. In gfs2, this results in a warning > about correct code: I figured out why gcc warns. The code is correct, but gcc isn't able to tell that, because when we return "int" for the error case from get_leaf_nr(), gcc thinks that that *could* be a zero. This code: if (IS_ERR(hash)) return PTR_ERR(hash); is obviously "return non-zero" to a kernel developer (because that's how our error codes work), but to a compiler that "return PTR_ERR()" ends up casting a "long" to an "int" return value. And the compiler thinks that while the "long" was clearly non-zero and negative, the "int" it cast things to might be zero. Yes, a very smart compiler might have figured out that the IS_ERR_VALUE check also means that the value will be non-zero in "int" as well, but that actually takes value range analysis, not just "we already ttested it". And yes, the error goes away if I turn the "int" into "long" into the affected codepaths. I'm not entirely happy about your patch, because I think it makes the code worse. You fix it by effectively making gcc test the resulting value after the type conversion. I'd really prefer to figure out some way to let gcc actually understand this error handling model. Because it's not just the warning, it also generates unnecessary double tests (first comparing a 64-bit value against the error range, and then comparing the truncated 32-bit error code against zero). Making errors be "long" does fix not just the warning but also the code generation. But we've traditionally used "int" for error returns, even though the system call path then again turns it into "long" (for somewhat related reasons, actually - we want to make sure that the "int" error code is valid in all 64 bits). So we *could* just start encouraging people to use "long" for error handling. It would fix warnings and improve the results. But let me think about this a bit more. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 27, 2016 at 10:23 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > This code: > > if (IS_ERR(hash)) > return PTR_ERR(hash); > > is obviously "return non-zero" to a kernel developer (because that's > how our error codes work), but to a compiler that "return PTR_ERR()" > ends up casting a "long" to an "int" return value. It doesn't help that gfs2 is using the IS_ERR_VALUE() macro on an "int", whih is bogus to begin with. It happens to *work*, but by the time you've cast an error pointer to "int", you've lost enough bits that "IS_ERR_VALUE()" isn't a valid thing to do. You should just check against zero (possibly against negative). But fixing that and just checking against a non-zero int doesn't actually fix the warning. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, May 27, 2016 10:23:00 AM CEST Linus Torvalds wrote: > On Fri, May 27, 2016 at 5:33 AM, Arnd Bergmann <arnd@arndb.de> wrote: > > > > > > gcc can not always figure out which code is only used in an error > > condition an assignment to indirect argument is only done after the > > use of IS_ERR() catches errors. In gfs2, this results in a warning > > about correct code: > > I figured out why gcc warns. > > The code is correct, but gcc isn't able to tell that, because when we > return "int" for the error case from get_leaf_nr(), gcc thinks that > that *could* be a zero. > > This code: > > if (IS_ERR(hash)) > return PTR_ERR(hash); > > is obviously "return non-zero" to a kernel developer (because that's > how our error codes work), but to a compiler that "return PTR_ERR()" > ends up casting a "long" to an "int" return value. > > And the compiler thinks that while the "long" was clearly non-zero and > negative, the "int" it cast things to might be zero. Exactly, this was my conclusion back in January when I did the 67893f12e537 patch that fixed it for 32-bit architectures: gcc cannot figure out that 'if (IS_ERR(ptr))' and the '!err' check are equivalent but it does manage to find 'IS_ERR(ptr)' and 'IS_ERR_VAL(PTR_ERR(ptr))' are the same, across multiple inlined functions. > Yes, a very smart compiler might have figured out that the > IS_ERR_VALUE check also means that the value will be non-zero in "int" > as well, but that actually takes value range analysis, not just "we > already ttested it". > > And yes, the error goes away if I turn the "int" into "long" into the > affected codepaths. > > I'm not entirely happy about your patch, because I think it makes the > code worse. You fix it by effectively making gcc test the resulting > value after the type conversion. I'd really prefer to figure out some > way to let gcc actually understand this error handling model. Because > it's not just the warning, it also generates unnecessary double tests > (first comparing a 64-bit value against the error range, and then > comparing the truncated 32-bit error code against zero). I see what you mean. On 32-bit ARM, we get the double compare for any of the versions, while on 64-bit x86, the current version in mainline is the worst, while the original code (with the warning) and the version after my second patch still need the second comparison against zero. > Making errors be "long" does fix not just the warning but also the > code generation. But we've traditionally used "int" for error returns, > even though the system call path then again turns it into "long" (for > somewhat related reasons, actually - we want to make sure that the > "int" error code is valid in all 64 bits). > > So we *could* just start encouraging people to use "long" for error > handling. It would fix warnings and improve the results. But let me > think about this a bit more. > On Friday, May 27, 2016 11:28:48 AM CEST Linus Torvalds wrote: > It doesn't help that gfs2 is using the IS_ERR_VALUE() macro on an > "int", whih is bogus to begin with. It happens to *work*, but by the > time you've cast an error pointer to "int", you've lost enough bits > that "IS_ERR_VALUE()" isn't a valid thing to do. You should just check > against zero (possibly against negative). > > But fixing that and just checking against a non-zero int doesn't > actually fix the warning. Right, removing the IS_ERR_VALUE() gets us back to the state before my original patch, and we get the warning on all architectures, and we certainly don't want to promote the use of IS_ERR_VALUE() further because of the problems you mention with unsigned types that are shorter than 'long'. In fact, the patch that I have in my private tree that was hiding the warning for me on x86 is one that removes all instances of IS_ERR_VALUE() with arguments other than 'unsigned long', see http://pastebin.com/uYa2mkgC for reference. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 27, 2016 at 1:04 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > In fact, the patch that I have in my private tree that was hiding the > warning for me on x86 is one that removes all instances of IS_ERR_VALUE() > with arguments other than 'unsigned long', see http://pastebin.com/uYa2mkgC > for reference. Please just send me that patch, we need to do this (and then add a cast to pointer (and back to unsigned long) in IS_ERR_VALUE() so that we get warnings for when that macro is mis-used. I didn't look at the details of your patch, but I did look at several IS_ERR_VALUE() uses in the standard kernel, and they were basically all wrong. Even the ones that used it for the rigth reason (vm_brk() that returns a pointer or an error in an "unsigned long") had actively screwed up and truncated that (correct) unsigned long value to "int" before doing the IS_ERR_VALUE(), which made it all wrong again. And the other users just looked entirely bogus, and were all just "zero or negative error code" that has nothing to do with IS_ERR_VALUE(). The code should just check against zero, not use that macro that was designed for something different. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, May 27, 2016 1:20:29 PM CEST Linus Torvalds wrote: > On Fri, May 27, 2016 at 1:04 PM, Arnd Bergmann <arnd@arndb.de> wrote: > > > > In fact, the patch that I have in my private tree that was hiding the > > warning for me on x86 is one that removes all instances of IS_ERR_VALUE() > > with arguments other than 'unsigned long', see http://pastebin.com/uYa2mkgC > > for reference. > > Please just send me that patch, we need to do this (and then add a > cast to pointer (and back to unsigned long) in IS_ERR_VALUE() so that > we get warnings for when that macro is mis-used. Ok, I've tried to come up with a summary of what happened so far on this, and sent the patch your way, leaving out the modified IS_ERR_VALUE() definition for the moment. I've added the people on Cc whose drivers had the most invasive changes. > I didn't look at the details of your patch, but I did look at several > IS_ERR_VALUE() uses in the standard kernel, and they were basically > all wrong. Yes, that matches what we found earlier this year when Andrzej Hajda did some work on fixing the worst problems he found. Arnd -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, May 27, 2016 at 01:20:29PM -0700, Linus Torvalds wrote: > I didn't look at the details of your patch, but I did look at several > IS_ERR_VALUE() uses in the standard kernel, and they were basically > all wrong. Even the ones that used it for the rigth reason (vm_brk() > that returns a pointer or an error in an "unsigned long") had actively > screwed up and truncated that (correct) unsigned long value to "int" > before doing the IS_ERR_VALUE(), which made it all wrong again. > > And the other users just looked entirely bogus, and were all just > "zero or negative error code" that has nothing to do with > IS_ERR_VALUE(). The code should just check against zero, not use that > macro that was designed for something different. Both gfs2 ones and the one in fs/romfs are crap. AFS one is simply ridiculous - result = generic_file_write_iter(iocb, from); if (IS_ERR_VALUE(result)) { _leave(" = %zd", result); return result; } _leave(" = %zd", result); return result; In binfmt*.c some callers are valid (vm_mmap and vm_brk values; BTW, the code in binfmt_flat.c checks for vm_mmap returning 0, which should never happen, AFAICS). Most of binfmt_flat.c ones are pure cargo-culting. net/9p/client.c ones are results of serious mistake in design of 9P.L extensions to 9P - they assume that numeric values of E... are architecture-independent and send them over the wire. The uses of IS_ERR_VALUE there are papering over that. Badly. A couple of sound/* ones should be simply "is negative". netfilter ones are caused by bad calling conventions of xt_percpu_counter_alloc(). I hadn't looked through drivers/* users, but judging by fs/, net/ and sound/, I would expect them to be pointless garbage in the best case and red flags for broken code in the worst. IMO IS_ERR_VALUE() should be renamed to something less generic and be used a lot less than it is right now. -- To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/gfs2/dir.c b/fs/gfs2/dir.c index 4a01f30e9995..271d93905bac 100644 --- a/fs/gfs2/dir.c +++ b/fs/gfs2/dir.c @@ -783,12 +783,15 @@ static int get_leaf_nr(struct gfs2_inode *dip, u32 index, u64 *leaf_out) { __be64 *hash; + int error; hash = gfs2_dir_get_hash_table(dip); - if (IS_ERR(hash)) - return PTR_ERR(hash); - *leaf_out = be64_to_cpu(*(hash + index)); - return 0; + error = PTR_ERR_OR_ZERO(hash); + + if (!error) + *leaf_out = be64_to_cpu(*(hash + index)); + + return error; } static int get_first_leaf(struct gfs2_inode *dip, u32 index, @@ -798,7 +801,7 @@ static int get_first_leaf(struct gfs2_inode *dip, u32 index, int error; error = get_leaf_nr(dip, index, &leaf_no); - if (!IS_ERR_VALUE(error)) + if (!error) error = get_leaf(dip, leaf_no, bh_out); return error; @@ -1014,7 +1017,7 @@ static int dir_split_leaf(struct inode *inode, const struct qstr *name) index = name->hash >> (32 - dip->i_depth); error = get_leaf_nr(dip, index, &leaf_no); - if (IS_ERR_VALUE(error)) + if (error) return error; /* Get the old leaf block */