Message ID | Y5uprmSmSfYechX2@yury-laptop (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | [GIT,PULL] bitmap changes for v6.2-rc1 | expand |
On Thu, Dec 15, 2022 at 3:11 PM Yury Norov <yury.norov@gmail.com> wrote: > > Please pull bitmap patches for v6.2. They spent in -next for more than > a week without any issues. The branch consists of: So I've been holding off on this because these bitmap pulls have always scared me, and I wanted to have the time to actually look through them in detail before pulling. I'm back home, over the travel chaos, and while I have other pulls pending, they seem to be benign fixes so I started looking at this. And when looking at it, I did indeed finx what I think is a fundamental arithmetic bug. That small_const_nbits_off() is simply buggy. Try this: small_const_nbits_off(64,-1); and see it return true. The thing is, doing divides in C is something you need to be very careful about, because they do *not* behave nicely with signed values. They do the "mathematically intuitive" thing of truncating towards zero, but when you are talking about bit ranges like this, that is *NOT* what you want. The comment is fairly good, but it just doesn't match the code. If you want to check that 'nbits-1' and 'off' are in the same word, you simply should not use division at all. Sure, it would work, if you verify that they are both non-negative, but the fact is, even then it's just wrong, wrong, wrong. You want a shift operation or a masking operation. Honestly, in this case, I think the logical thing to do is "check that the upper bits are the same". The way you do that is probably something like !((off) ^ ((nbits)-1) & ~(BITS_PER_LONG-1)) which doesn't have the fundamental bug that the divide version has. So anyway, I won't be pulling this. I appreciate that you are trying to micro-optimize the bitop functions, but this is not the first time your pull requests have made me worried and caused me to not pull. This is such basic functionality that I really need these pull requests to be less worrisome. In other words, not only don't I want to see these kinds of bugs - please make sure that there isn't anythign that looks even *remotely* scary. The micro-optimizations had better be *so* obvious and so well thought through that I not only don't find bugs in them, I don't even get nervous about finding bugs. Side note: that whole small_const_nbits_off() thing could be possibly improved in other ways. It's actually unnecessarily strict (if it was correct, that is): we don't really require 'off' to be constant, what we *do* want is - "nbits > off" needs to be a compile-time true constant - that "off is in the same word as nbits-1" also needs to be a compile-time true constant. and the compiler could determine both of those to be the case even if 'off' itself isn't a constant, if there is code around it that verifies it. For example, if you have code like this: off = a & 7; n = find_next_bit(bitmap, 64, off); then the compiler could still see that it's that "last word only" case, even though 'off' isn't actually a constant value. So you could try using a helper macro like #define COMPILE_TIME_TRUE(x) (__builtin_constant_p(x) && (x)) to handle those kinds of things. But again - the code needs to be OBVIOUSLY CORRECT. Make me feel those warm and fuzzies about it because it's so obviously safe and correct. That said, I see your test-cases for your micro-optimizations, but I'd also like to see references to where it actually improves real code. I know for a fact that 'small_const_nbits()' actually triggers in real life. I don't see that your 'small_const_nbits_off()' would trigger in real life in any situation that isn't already covered by 'small_const_nbits()'. So convince me not only that the optimizations are obviously correct, but also that they actually matter. Linus
On Fri, Dec 23, 2022 at 10:44 AM Linus Torvalds <torvalds@linux-foundation.org> wrote: > > Honestly, in this case, I think the logical thing to do is "check that > the upper bits are the same". The way you do that is probably > something like > > !((off) ^ ((nbits)-1) & ~(BITS_PER_LONG-1)) Note that while the above is probably correct (but you always need to double-check my emailed "something like this" code - I literally write it in the MUA, and I make mistakes too), I'd never want to see that as part of one big complex macro. In fact, I think I am missing a set of parentheses, because '&' has a higher precedence than '^', so the above is actually buggy. So I'd much rather see something like this #define COMPILE_TIME_TRUE(x) (__builtin_constant_p(x) && (x)) #define bits_in_same_word(x,y) \ (!(((x)^(y))&~(BITS_PER_LONG-1))) #define bitmap_off_in_last_word(nbits,off) \ bits_in_same_word((nbits)-1,off) #define small_const_nbits_off(nbits, off) \ (__builtin_constant_p(nbits) && (nbits) > 0 && \ COMPILE_TIME_TRUE(bitmap_off_in_last_word(nbits,off))) where each step does one thing and one thing only, and you don't have one complicated thing that is hard to read. And again, don't take my word blindly for the above. I *think* the above may be correct, but there's a "think" and a "may" there. Plus I'd still like to hear about where the above would actually matter and make a code generation difference in real life (compared to just the simple "optimize the single-word bitmap" case). Linus
On Fri, Dec 23, 2022 at 10:44:07AM -0800, Linus Torvalds wrote: > On Thu, Dec 15, 2022 at 3:11 PM Yury Norov <yury.norov@gmail.com> wrote: > > > > Please pull bitmap patches for v6.2. They spent in -next for more than > > a week without any issues. The branch consists of: > > So I've been holding off on this because these bitmap pulls have > always scared me, and I wanted to have the time to actually look > through them in detail before pulling. > > I'm back home, over the travel chaos, and while I have other pulls > pending, they seem to be benign fixes so I started looking at this. > > And when looking at it, I did indeed finx what I think is a > fundamental arithmetic bug. > > That small_const_nbits_off() is simply buggy. > > Try this: > > small_const_nbits_off(64,-1); > > and see it return true. Hi Linus, Sorry for a delayed reply. small_const_nbits{_off} is used only for bitmap and find_bit functions where both offset and size are unsigned types. -1 there will turn into UINT_MAX or ULONG_MAX, and small_const_nbits_off(64, -1) returns false. The bitops.h functions (set_bit et all) also use unsigned type for nr. Negative offsets are not used in bit operations from very basic level. Notice that '(nbits) > 0' part in small_const_nbits() is there to exclude 0, not negative numbers. So, support for negative offset/size looks irrelevant for all existing users of small_const(). I doubt there'll be new non-bitmap users for the macro anytime soon. small_const_nbits() and proposed small_const_nbits_off() are in fact very bitmap-related macros. 'Small' in this context refers to a single-word bitmap. 0 is definitely a small and constant number, but small_const_nbits(0) will return false - only because inline versions of bitmap functions don't handle 0 correctly. I think, small_const_nbits confuses because it sounds too generic and is located in a very generic place. There's a historical reason for that. Originally, the macro was hosted in include/linux/bitmap.h and at that time find.h was in include/asm-generic/bitops/. In commit 586eaebea5988 (lib: extend the scope of small_const_nbits() macro) I moved the macro to include/asm-generic/bitsperlong.h to optimize find_bit functions too. After that, working on other parts I found that having bitmap.h and find.h in different include paths is a permanent headache due to things like circular dependencies, and moved find.h to include/linux, where it should be. And even made it an internal header for bitmap.h. But didn't move small_const_nbits(). Looks like I have to move it to somewhere in include/linux/bitops.h. [...] > So convince me not only that the optimizations are obviously correct, > but also that they actually matter. There're no existing users for small_const_nbits_off(). I've been reworking bitmap_find_free_region(), added a pile of tests and found that some quite trivial cases are not inlined, for example find_next_bit(addr, 128, 124). Let's ignore this patch unless we'll have real users. Regarding the rest of the series. Can you please take a look? It includes an optimization for CPUs allocations. With quite a simple rework of cpumask_local_spread() we are gaining measurable and significant improvement for many subsystems on NUMA machines. Tariq measured impact of NUMA-based locality on networking in his environment, and from his commit message: Performance tests: TCP multi-stream, using 16 iperf3 instances pinned to 16 cores (with aRFS on). Active cores: 64,65,72,73,80,81,88,89,96,97,104,105,112,113,120,121 +-------------------------+-----------+------------------+------------------+ | | BW (Gbps) | TX side CPU util | RX side CPU util | +-------------------------+-----------+------------------+------------------+ | Baseline | 52.3 | 6.4 % | 17.9 % | +-------------------------+-----------+------------------+------------------+ | Applied on TX side only | 52.6 | 5.2 % | 18.5 % | +-------------------------+-----------+------------------+------------------+ | Applied on RX side only | 94.9 | 11.9 % | 27.2 % | +-------------------------+-----------+------------------+------------------+ | Applied on both sides | 95.1 | 8.4 % | 27.3 % | +-------------------------+-----------+------------------+------------------+ Bottleneck in RX side is released, reached linerate (~1.8x speedup). ~30% less cpu util on TX. We'd really like to have this work in next kernel release. Thanks, Yury