Message ID | 4126C08B-532F-4B6A-A3C2-9F7928EA2806@fb.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Series | "warning: context imbalance" in kernel/bpf/hashtab.c | expand |
On Thu, Nov 05, 2020 at 07:34:55PM +0000, Song Liu wrote: > Hi, > > We are trying to fix some sparse warning in kernel/bpf/hashtab.c (in bpf-next tree). > The sparse I use was v0.6.3-76-gf680124b. > > These new warnings are introduced by [1]. Before [1], hashtab.c got > > [...] > kernel/bpf/hashtab.c:2089:19: error: subtraction of functions? Share your drugs > kernel/bpf/hashtab.c:1421:9: warning: context imbalance in '__htab_map_lookup_and_delete_batch' - different lock contexts for basic block > kernel/bpf/hashtab.c: note: in included file (through include/linux/workqueue.h, include/linux/bpf.h): > ./include/linux/rcupdate.h:693:9: warning: context imbalance in 'bpf_hash_map_seq_find_next' - unexpected unlock > ./include/linux/rcupdate.h:693:9: warning: context imbalance in 'bpf_hash_map_seq_stop' - unexpected unlock > > With [1], we got a few more warnings: > > [...] > kernel/bpf/hashtab.c:2141:19: error: subtraction of functions? Share your drugs > kernel/bpf/hashtab.c:1292:27: warning: context imbalance in 'htab_map_delete_elem' - unexpected unlock > kernel/bpf/hashtab.c:1325:27: warning: context imbalance in 'htab_lru_map_delete_elem' - unexpected unlock > kernel/bpf/hashtab.c: note: in included file (through include/linux/workqueue.h, include/linux/bpf.h): > ./include/linux/rcupdate.h:693:9: warning: context imbalance in '__htab_map_lookup_and_delete_batch' - unexpected unlock > ./include/linux/rcupdate.h:693:9: warning: context imbalance in 'bpf_hash_map_seq_find_next' - unexpected unlock > ./include/linux/rcupdate.h:693:9: warning: context imbalance in 'bpf_hash_map_seq_stop' - unexpected unlock > > So htab_map_delete_elem() and htab_lru_map_delete_elem() got new warnings, and > __htab_map_lookup_and_delete_batch() got a slightly different warning. > > After trying different annotations, including the attached foo.diff by Daniel, > we found the simplest fix was something like: Annotations can't fix anything here. The patch [1] changes htab_lock_bucket(), into conditionally taking a lock while before it took it unconditionally. So, each point where its value is tested now becomes the confluence of 2 distinct paths: one with the lock taken and one with the lock not taken. This is what is meant by 'context imbalance'. It should be noted that this 'context imbalance' doesn't mean things like 'this section of code can be executed without the lock held'. In the present case, if Sparse would be smarter (move around condition testing) it would be able to see something like 'OK, there is an imbalance but this has no impact on the lock being held when needed'. But, in the general case, it's impossible to do and so Sparse doesn't even try. -- Luc Van Oostenryck
On Thu, Nov 5, 2020 at 1:13 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Annotations can't fix anything here. The patch [1] changes htab_lock_bucket(), > into conditionally taking a lock while before it took it unconditionally. > So, each point where its value is tested now becomes the confluence of 2 > distinct paths: one with the lock taken and one with the lock not taken. > This is what is meant by 'context imbalance'. I think the point Song makes is that if sparse did a few more optimizations, it would see the context imbalance go away, because the value it tests would become constant. Linus
On Thu, Nov 05, 2020 at 01:18:02PM -0800, Linus Torvalds wrote: > On Thu, Nov 5, 2020 at 1:13 PM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > Annotations can't fix anything here. The patch [1] changes htab_lock_bucket(), > > into conditionally taking a lock while before it took it unconditionally. > > So, each point where its value is tested now becomes the confluence of 2 > > distinct paths: one with the lock taken and one with the lock not taken. > > This is what is meant by 'context imbalance'. > > I think the point Song makes is that if sparse did a few more > optimizations, it would see the context imbalance go away, because the > value it tests would become constant. It's not how I understood it, but yes, I agree. It's what I tried to briefly explain the in the last paragraph. From what I know from these situations, in most cases, simple 'expression' optimizations are not enough but simple forms of code hoisting would often anything that's needed. -- Luc
> On Nov 5, 2020, at 1:50 PM, Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > On Thu, Nov 05, 2020 at 01:18:02PM -0800, Linus Torvalds wrote: >> On Thu, Nov 5, 2020 at 1:13 PM Luc Van Oostenryck >> <luc.vanoostenryck@gmail.com> wrote: >>> >>> Annotations can't fix anything here. The patch [1] changes htab_lock_bucket(), >>> into conditionally taking a lock while before it took it unconditionally. >>> So, each point where its value is tested now becomes the confluence of 2 >>> distinct paths: one with the lock taken and one with the lock not taken. >>> This is what is meant by 'context imbalance'. >> >> I think the point Song makes is that if sparse did a few more >> optimizations, it would see the context imbalance go away, because the >> value it tests would become constant. > > It's not how I understood it, but yes, I agree. It's what I tried to > briefly explain the in the last paragraph. > > From what I know from these situations, in most cases, simple > 'expression' optimizations are not enough but simple forms of > code hoisting would often anything that's needed. Hi Luc and Linus, Thanks for your comments! I know very little about sparse internal. But I feel this is more like a minor bug. Here is the simplified inlined code of htab_map_delete_elem() if (some_condition) { ret = -EBUSY; } else { spin_lock(); ret = 0; } if (ret) return ret; + ret = 0; [...] if (some_other_condition) ret = -ENOENT; spin_unlock(); return ret; When we reach the point of "+ ret = 0;", we will do spin_unlock() regardless of the value of ret. So whether we know ret is definitely equal to zero should not matter at all? It feels like sparse knows there are two paths before "if (ret)": 1) ret == 0, we did spin_lock() 2) ret == -EBUSY, we didn't do spin_lock() Then, after the "if (ret)", even without explicit "ret = 0;", it is possible to prune path 2), but I guess sparse is not doing it, which is unfortunate but not a bug. However, it seems explicit "ret = 0;" helps sparse prune path 2), which doesn't seem right. Explicit "ret = 0" should only overwrite the value of ret, but not prune any path. Does this make sense? Thanks, Song
On Fri, Nov 6, 2020 at 9:30 AM Song Liu <songliubraving@fb.com> wrote: > > I know very little about sparse internal. But I feel this is more > like a minor bug. It's not really a bug, and the pattern you show isn't actually all that easy to see for a compiler. > Here is the simplified inlined code of htab_map_delete_elem() > > if (some_condition) { > ret = -EBUSY; > } else { > spin_lock(); > ret = 0; > } So here the issue for the sparse optimizer is that while both "ret" and "spin_lock" is tied to "some_condition", it's not really obvious. A compiler would have to follow all the value analysis - and good compilers do that, but sparse is not a "great" compiler, it's more of a "solid and not horribly stupid" one. > if (ret) > return ret; So I'd really suggest you make the code more obvious - it will be more obvious to humans too. Write it as if (some_condition) return -EBUSY; spin_lock(); instead, and now sparse will see that obviously spin_lock() and "some_condition" are mutually incompatible. Linus
> On Nov 6, 2020, at 9:45 AM, Linus Torvalds <torvalds@linux-foundation.org> wrote: > > On Fri, Nov 6, 2020 at 9:30 AM Song Liu <songliubraving@fb.com> wrote: >> >> I know very little about sparse internal. But I feel this is more >> like a minor bug. > > It's not really a bug, and the pattern you show isn't actually all > that easy to see for a compiler. > >> Here is the simplified inlined code of htab_map_delete_elem() >> >> if (some_condition) { >> ret = -EBUSY; >> } else { >> spin_lock(); >> ret = 0; >> } > > So here the issue for the sparse optimizer is that while both "ret" > and "spin_lock" is tied to "some_condition", it's not really obvious. > > A compiler would have to follow all the value analysis - and good > compilers do that, but sparse is not a "great" compiler, it's more of > a "solid and not horribly stupid" one. > >> if (ret) >> return ret; > > So I'd really suggest you make the code more obvious - it will be more > obvious to humans too. > > Write it as > > if (some_condition) > return -EBUSY; > spin_lock(); > > instead, and now sparse will see that obviously spin_lock() and > "some_condition" are mutually incompatible. Thanks for your suggestion! This pattern didn't trigger the warning. We still need some work to apply this to actual code, because the lock part is in an helper: inline int htab_lock_bucket() { if (some_condition) return -EBUSY; spin_lock(); return 0; } int htab_map_delete_elem() { ret = htab_lock_bucket(); if (ret) return ret; [...] } We can probably split htab_lock_bucket() into two helpers, like int htab_map_delete_elem() { ret = htab_lock_is_busy(); // check some_condition if (ret) return ret; htab_do_lock(); // calls spin_lock() [...] } I will do more experiments with this. Thanks again! Song
On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@fb.com> wrote: > > Thanks for your suggestion! This pattern didn't trigger the warning. > We still need some work to apply this to actual code, because the lock > part is in an helper: > > inline int htab_lock_bucket() > { > if (some_condition) > return -EBUSY; > spin_lock(); > return 0; > } So inlining does end up being important, because then sparse can see the pattern in the caller. But yes, as-is you end up with the same issue that sparse then may not be able to see the "return value goes together with the spinlock". That said, in the form you have it now in, it might be easier for sparse to see that, by just adding a branch following phase. Branch following is much simpler than value analysis, but "much simpler" is still not the same as "entirely trivial". Here's a test-case for this behavior for sparse, in the hope that Luc goes "that's easy enough to do": extern void spin_lock(void); extern void spin_unlock(void); static inline int errno(int arg) { if (arg) return -1; spin_lock(); return 0; } int cmp(int i) { if (errno(i)) return -1; spin_unlock(); return 0; } and right now sparse linearizes this to cmp: .L0: <entry-point> select.32 %r4 <- %arg1, $0xffffffff, $0 cbr %arg1, .L5, .L4 .L4: call spin_lock br .L5 .L5: # call %r4 <- errno, %r1 select.32 %r6 <- %r4, $0xffffffff, $0 cbr %arg1, .L6, .L2 .L2: call spin_unlock br .L6 .L6: ret.32 %r6 because sparse isn't smart enough to do a couple of simple optimizations: (a) the conditional following for the return value: select.32 %r4 <- %arg1, $0xffffffff, $0 select.32 %r6 <- %r4, $0xffffffff, $0 Note how it doesn't trigger CSE, because it's not the same instruction (%arg1 vs %r4), but sparse *could* see that a select that depends on a previous select that has constant arguments can be simplified to be conditional on the original select value instead, so it *could* be CSE'd into one single thing. So the second "select" could be optimized away by realizing that it really gets the same value as the first one. That *should* be trivial to do in sparse by just having a simple pattern for select simplification. And once that optimization is in place, the second optimization would be (b) trivial branch following: "conditional branch to conditional branch with the same condition". ie we'd have the pattern of cbr %arg1, .L5, .L4 ... .L5: cbr %arg1, .L6, .L2 and now a trivial branch following optimization would see "oh, the first target of the first cbr is to another cbr that has the same condition, so we can rewrite the first cbr as cbr %arg1, .L6, .L4 instead. And once you've done that trivial branch following, the .L5 target has only that fallthrough source from .L4, and the code becomes cmp: .L0: <entry-point> select.32 %r4 <- %arg1, $0xffffffff, $0 cbr %arg1, .L6, .L4 .L4: call spin_lock cbr %arg1, .L6, .L2 .L2: call spin_unlock br .L6 .L6: ret.32 %r4 which still isn't good enough, because it still has that conditional branch that is now pointless. Removing the second conditional branch entirely would require having a "look, it's now dominated by a previous conditional branch on the same condition, so it can't trigger". That's the only remaining optimization that isn't purely local. Dominance analysis is always painful. sparse does do it (for memory accesses), but it's a *lot* more complex than just some local single-instruction pattern that looks at the source or the target of that single instruction. Anyway - this is all doable, but it's the end of the week so my pull requests for the kernel are starting to pile up and I already spent more time than I should have at looking at sparse. Maybe this gives Luc some ideas, though. Or maybe Luc will point out that even my "simple" optimizations are slightly more painful than I think they are for some reason that I haven't looked at. Linus
On Fri, Nov 06, 2020 at 11:44:00AM -0800, Linus Torvalds wrote: > On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@fb.com> wrote: > (a) the conditional following for the return value: > > select.32 %r4 <- %arg1, $0xffffffff, $0 > select.32 %r6 <- %r4, $0xffffffff, $0 > > Note how it doesn't trigger CSE, because it's not the same instruction > (%arg1 vs %r4), but sparse *could* see that a select that depends on a > previous select that has constant arguments can be simplified to be > conditional on the original select value instead, so it *could* be > CSE'd into one single thing. > > So the second "select" could be optimized away by realizing that it > really gets the same value as the first one. That *should* be trivial > to do in sparse by just having a simple pattern for select > simplification. > > And once that optimization is in place, the second optimization would be I think I may even have already this in an half-polished form. > (b) trivial branch following: "conditional branch to conditional > branch with the same condition". > > ie we'd have the pattern of > > cbr %arg1, .L5, .L4 > ... > .L5: > cbr %arg1, .L6, .L2 > > and now a trivial branch following optimization would see "oh, the > first target of the first cbr is to another cbr that has the same > condition, so we can rewrite the first cbr as > > cbr %arg1, .L6, .L4 > > instead. This, should already be done (by yourself, many years ago). > And once you've done that trivial branch following, the .L5 target has > only that fallthrough source from .L4, and the code becomes > > cmp: > .L0: > <entry-point> > select.32 %r4 <- %arg1, $0xffffffff, $0 > cbr %arg1, .L6, .L4 > > .L4: > call spin_lock > cbr %arg1, .L6, .L2 > > .L2: > call spin_unlock > br .L6 > > .L6: > ret.32 %r4 > > which still isn't good enough, because it still has that conditional > branch that is now pointless. > > Removing the second conditional branch entirely would require having a > "look, it's now dominated by a previous conditional branch on the same > condition, so it can't trigger". > > That's the only remaining optimization that isn't purely local. Yes. > Dominance analysis is always painful. sparse does do it (for memory > accesses), but it's a *lot* more complex than just some local > single-instruction pattern that looks at the source or the target of > that single instruction. This shouldn't be a big problem since we already have the dominance tree (but it's not yet updated when there is changes n the CFG). OTOH, the memops simplification creates invalid phi-nodes which indirectly creates all sort of problems. I've several fixes for this but none I'm really happy with. > Maybe this gives Luc some ideas, though. > > Or maybe Luc will point out that even my "simple" optimizations are > slightly more painful than I think they are for some reason that I > haven't looked at. No no, it's certainly more than a few hours of work but it's also not an immense task. Of course, the devil is always in the details. There is also the option of explicitly handling conditional contexts. I did maybe 6 months ago, in 2 or 3 different ways. It worked well, and would probably solve the problem here but: * it needs new annotations * it's OK when the condition is simple ( = 0 or != 0), it could be adapted for some others (like, for example, >= 0) but more general condition would become hard to handle (the annotations needs to know about the condition). * the benefits were disappointing because, in most cases I saw, it displaced the problem to somewhere else where a 'real' optimization was needed. This is why I stopped to look at it. But so, yes, what you propose here makes a lot of sense and would be very valuable. I'll take a look at it soon (but my stack is already so full that I'm quite reluctant to begin anything new (I currently have 208 topic branches with 120 of them being valid, interesting seriess waiting some polishing, validation and patch description :( ). -- Luc
On Fri, Nov 6, 2020 at 2:19 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > On Fri, Nov 06, 2020 at 11:44:00AM -0800, Linus Torvalds wrote: > > On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@fb.com> wrote: > > (a) the conditional following for the return value: > > > > select.32 %r4 <- %arg1, $0xffffffff, $0 > > select.32 %r6 <- %r4, $0xffffffff, $0 > > > > Note how it doesn't trigger CSE, because it's not the same instruction > > (%arg1 vs %r4), but sparse *could* see that a select that depends on a > > previous select that has constant arguments can be simplified to be > > conditional on the original select value instead, so it *could* be > > CSE'd into one single thing. > > > > So the second "select" could be optimized away by realizing that it > > really gets the same value as the first one. That *should* be trivial > > to do in sparse by just having a simple pattern for select > > simplification. > > > > And once that optimization is in place, the second optimization would be > > I think I may even have already this in an half-polished form. Ugh. I tried to see what I can do, and it does simplify, but not the way I expected. The attached patch looks superficially sane (but probably is broken), and results in one less "select" statement: .L0: <entry-point> cbr %arg1, .L5, .L4 .L4: call spin_lock br .L5 .L5: # call %r4 <- errno, %r1 select.32 %r6 <- %arg1, $0xffffffff, $0 cbr %arg1, .L6, .L2 .L2: call spin_unlock br .L6 .L6: ret.32 %r6 but note how it removed not the select statement I expected, so you don't actually end up with a branch to a branch anyway. Oh well. It's close. That "select" could be anywhere, so the branch following could know that. And my patch may be garbage too. I just decided to take a look at how easy the "combine select conditional" might be. (In fact, my patch *IS* garbage, because it should do the same for "OP_SET_NE" as an def to the select too) Linus
On Fri, Nov 06, 2020 at 02:46:55PM -0800, Linus Torvalds wrote: > On Fri, Nov 6, 2020 at 2:19 PM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > On Fri, Nov 06, 2020 at 11:44:00AM -0800, Linus Torvalds wrote: > > > On Fri, Nov 6, 2020 at 11:05 AM Song Liu <songliubraving@fb.com> wrote: > > > (a) the conditional following for the return value: > > > > > > select.32 %r4 <- %arg1, $0xffffffff, $0 > > > select.32 %r6 <- %r4, $0xffffffff, $0 > > > > > > Note how it doesn't trigger CSE, because it's not the same instruction > > > (%arg1 vs %r4), but sparse *could* see that a select that depends on a > > > previous select that has constant arguments can be simplified to be > > > conditional on the original select value instead, so it *could* be > > > CSE'd into one single thing. > > > > > > So the second "select" could be optimized away by realizing that it > > > really gets the same value as the first one. That *should* be trivial > > > to do in sparse by just having a simple pattern for select > > > simplification. > > > > > > And once that optimization is in place, the second optimization would be > > > > I think I may even have already this in an half-polished form. > > Ugh. I tried to see what I can do, and it does simplify, but not the > way I expected. > > The attached patch looks superficially sane (but probably is broken), > and results in one less "select" statement: > > > .L0: > <entry-point> > cbr %arg1, .L5, .L4 > > .L4: > call spin_lock > br .L5 > > .L5: > # call %r4 <- errno, %r1 > select.32 %r6 <- %arg1, $0xffffffff, $0 > cbr %arg1, .L6, .L2 > > .L2: > call spin_unlock > br .L6 > > .L6: > ret.32 %r6 > > but note how it removed not the select statement I expected, so you > don't actually end up with a branch to a branch anyway. > > Oh well. It's close. That "select" could be anywhere, so the branch > following could know that. Yes, the core of the problem is to move things up. Here is a patch that kinda do this, a bit too much even but the idea is there. With is it gives the following: cmp: .L0: <entry-point> select.32 %r4 <- %arg1, $0xffffffff, $0 select.32 %r6 <- %r4, $0xffffffff, $0 cbr %arg1, .L6, .L4 .L4: call spin_lock # call %r4 <- errno, %r1 call spin_unlock br .L6 .L6: ret.32 %r6 The patch move *all* selects the highest possible using the dominance tree. It complements the select simplification. The only part really interesting is the very last one, the remaining is just infrastructure. So, I think that the next questions are: * which selects should be moved up? * up to where should them be moved? I'll think a bit all this this WE. -- Luc From 7c0d5a759756989919eb2799e7e50d36941d47b3 Mon Sep 17 00:00:00 2001 From: Luc Van Oostenryck <luc.vanoostenryck@gmail.com> Date: Sat, 7 Nov 2020 02:23:21 +0100 Subject: [PATCH] EXP: move up all selects at most as possible --- linearize.c | 21 +++++++++++++++++++++ linearize.h | 1 + simplify.c | 34 ++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/linearize.c b/linearize.c index c1e3455adb67..005d90b3fb45 100644 --- a/linearize.c +++ b/linearize.c @@ -737,6 +737,27 @@ void insert_select(struct basic_block *bb, struct instruction *br, struct instru add_instruction(&bb->insns, br); } +void copy_select(struct basic_block *bb, struct instruction *insn) +{ + struct instruction *br; + pseudo_t target; + struct instruction *select; + + /* Remove the 'br' */ + br = delete_last_instruction(&bb->insns); + + select = alloc_typed_instruction(OP_SEL, insn->type); + select->bb = bb; + + use_pseudo(select, insn->src1, &select->src1); + use_pseudo(select, insn->src2, &select->src2); + use_pseudo(select, insn->src3, &select->src3); + select->target = insn->target; + + add_instruction(&bb->insns, select); + add_instruction(&bb->insns, br); +} + static inline int bb_empty(struct basic_block *bb) { return !bb->insns; diff --git a/linearize.h b/linearize.h index 77ae7c9ac976..cc38fe07c974 100644 --- a/linearize.h +++ b/linearize.h @@ -311,6 +311,7 @@ struct entrypoint { }; extern void insert_select(struct basic_block *bb, struct instruction *br, struct instruction *phi, pseudo_t if_true, pseudo_t if_false); +extern void copy_select(struct basic_block *bb, struct instruction *br); extern void insert_branch(struct basic_block *bb, struct instruction *br, struct basic_block *target); struct instruction *alloc_phisrc(pseudo_t pseudo, struct symbol *type); diff --git a/simplify.c b/simplify.c index f2aaa52de84b..633ebc6fad2c 100644 --- a/simplify.c +++ b/simplify.c @@ -1745,8 +1745,24 @@ static int simplify_cast(struct instruction *insn) return 0; } +#include "flowgraph.h" +static int is_defined_at(pseudo_t src, struct basic_block *bb) +{ + if (src->type != PSEUDO_REG) + return 1; + return domtree_dominates(bb, src->def->bb); +} + +static int move_select_to(struct instruction *insn, struct basic_block *bb) +{ + copy_select(bb, insn); + kill_instruction(insn); + return REPEAT_CSE; +} + static int simplify_select(struct instruction *insn) { + struct basic_block *bb; pseudo_t cond, src1, src2; cond = insn->src1; @@ -1782,6 +1798,24 @@ static int simplify_select(struct instruction *insn) kill_use(&insn->src3); return replace_with_value(insn, 0); } + + // Try to move this select 'up'. + bb = insn->bb; + while (bb->idom) { + bb = bb->idom; + if (bb == insn->bb) + goto out; + if (!is_defined_at(cond, bb)) + goto out; + if (!is_defined_at(src1, bb)) + goto out; + if (!is_defined_at(src2, bb)) + goto out; + } + if (bb != insn->bb) + return move_select_to(insn, bb); + +out: return 0; }
On Fri, Nov 6, 2020 at 5:33 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > So, I think that the next questions are: > * which selects should be moved up? > * up to where should them be moved? Honestly, it's not even clear they should be moved up. It could be moved down too, particularly in this kind of case where there is only one user of the result (ie move it down to just before the use). Moving down to the use is in many ways is much easier than moving up, because then you don't need to worry about whether the sources to the select have been calculated yet. No need for any dominance analysis or anything like that for that case. But my personal guess is that by default we shouldn't try to excessively move instructions unless we have an actual reason to do so. So I don't think your patch a good simplification in general. Particularly the "move up" part, when you might just end up doing an entirely unnecessary select that would normally have been done only in some unusual conditional case. Now, instead, I think the proper select simplification is to tie it together with the conditional branch. Look at what I had after the other select simplifications: .L0: <entry-point> cbr %arg1, .L5, .L4 .L4: call spin_lock br .L5 .L5: # call %r4 <- errno, %r1 select.32 %r6 <- %arg1, $0xffffffff, $0 cbr %arg1, .L6, .L2 .L2: call spin_unlock br .L6 .L6: ret.32 %r6 and notice that cbr %arg1, .L5, .L4 .L5: select.32 %r6 <- %arg1, $0xffffffff, $0 sequence. In particular, notice how we have a conditional branch to a "select" that has the *same* conditional as the branch! So I think the *proper* fix is to not think of this case as "move the select", but as a special case of branch following: the same way that a conditional branch to another conditional branch can be simplified if the condition is the same, you can simplify the case of "conditional branch to select with the same condition". That said, that "proper fix" looks a bit painful. My gut feel is that we shouldn't have generated that "select" in the first place at all. We should have kept it as a conditional branch, then done branch following, and done the whole "turn a branch-over into a select" at a later stage. Have we done that if_convert_phi() simplification too early? I didn't really follow the whole chain of how we got to this point. Linus
On Sat, Nov 07, 2020 at 11:39:42AM -0800, Linus Torvalds wrote: > On Fri, Nov 6, 2020 at 5:33 PM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > So, I think that the next questions are: > > * which selects should be moved up? > > * up to where should them be moved? > > Honestly, it's not even clear they should be moved up. It could be > moved down too, particularly in this kind of case where there is only > one user of the result (ie move it down to just before the use). > > Moving down to the use is in many ways is much easier than moving up, > because then you don't need to worry about whether the sources to the > select have been calculated yet. No need for any dominance analysis or > anything like that for that case. It's just that in my experience, in these situation with context imbalance, very often 'something' had to move up, just before the 'if' where the imbalance is created. > But my personal guess is that by default we shouldn't try to > excessively move instructions unless we have an actual reason to do > so. > > So I don't think your patch a good simplification in general. > Particularly the "move up" part, when you might just end up doing an > entirely unnecessary select that would normally have been done only in > some unusual conditional case. I totally agree with this. > Now, instead, I think the proper select simplification is to tie it > together with the conditional branch. Look at what I had after the > other select simplifications: > > .L0: > <entry-point> > cbr %arg1, .L5, .L4 > > .L4: > call spin_lock > br .L5 > > .L5: > # call %r4 <- errno, %r1 > select.32 %r6 <- %arg1, $0xffffffff, $0 > cbr %arg1, .L6, .L2 > > .L2: > call spin_unlock > br .L6 > > .L6: > ret.32 %r6 > > and notice that > > cbr %arg1, .L5, .L4 > .L5: > select.32 %r6 <- %arg1, $0xffffffff, $0 > > sequence. In particular, notice how we have a conditional branch to a > "select" that has the *same* conditional as the branch! Yes, once the SEL(SEL(x,C,0),C,0) is handled this all simplify to .L0: <entry-point> select.32 %r6 <- %arg1, $0xffffffff, $0 cbr %arg1, .L6, .L4 .L4: call spin_lock # call %r4 <- errno, %r1 call spin_unlock br .L6 .L6: ret.32 %r6 but to me, this seems a very very special case, so not much interesting. > So I think the *proper* fix is to not think of this case as "move the > select", but as a special case of branch following: the same way that > a conditional branch to another conditional branch can be simplified > if the condition is the same, you can simplify the case of > "conditional branch to select with the same condition". > > That said, that "proper fix" looks a bit painful. My gut feel is that > we shouldn't have generated that "select" in the first place at all. > We should have kept it as a conditional branch, then done branch > following, and done the whole "turn a branch-over into a select" at a > later stage. > > Have we done that if_convert_phi() simplification too early? Absolutely. I've often observed that it inhibits other simplifications related to conditionals. And, even when it doesn't, I think that many select simplifications would already be handled at SETxx + CBR level (like your previous SEL(x,x,0)). So, yes, the if-conversion is a very valuable simplification but I also think it's done too early. And there are some other simplifications that are valuable but only if not done too early (sorry, I've no concrete examples at hand). It slowly begins time to organize the optimizations in several phases. -- Luc
On Sat, Nov 7, 2020 at 12:09 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Yes, once the SEL(SEL(x,C,0),C,0) is handled this all simplify to > .L0: > <entry-point> > select.32 %r6 <- %arg1, $0xffffffff, $0 > cbr %arg1, .L6, .L4 > .L4: > call spin_lock > # call %r4 <- errno, %r1 > call spin_unlock > br .L6 > .L6: > ret.32 %r6 > > but to me, this seems a very very special case, so not much interesting. Oh, ok, so just fixing the missing simplification does actually fix it. And I'm not sure how special-case this is. Because I think we _do_ actually generally doing the if-conversion "later", even if it's not a separate phase, simply because we end up doing it at the "OP_PHI" instruction, which should always end up linearizing after the branches that should have been simplified. So sometimes the ordering can be simply due to the order we see and react to instructions. > > Have we done that if_convert_phi() simplification too early? > > Absolutely. I've often observed that it inhibits other simplifications > related to conditionals. And, even when it doesn't, I think that many > select simplifications would already be handled at SETxx + CBR level > (like your previous SEL(x,x,0)). Yeah, the problem with doing multiple passes, though, is that you inevitably end up wanting to repeat them, because a later phase will cause you to be able to then do earlier phases again. So there's never any one "correct" order to do things in, and I think you always end up with those "oh, unlucky, we did that one simplification that then made another one harder" regardless of what you do. Although we do already have "passes" with some things clearly done before others (like turning symbols into pseudos before doing any other optimizations). But those tend to be very clear "convert the tree into the proper format for the next stage", not this kind of "one optimization then causes/hinders others". Linus
On Sat, Nov 07, 2020 at 12:33:52PM -0800, Linus Torvalds wrote: > On Sat, Nov 7, 2020 at 12:09 PM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > Yes, once the SEL(SEL(x,C,0),C,0) is handled this all simplify to > > .L0: > > <entry-point> > > select.32 %r6 <- %arg1, $0xffffffff, $0 > > cbr %arg1, .L6, .L4 > > .L4: > > call spin_lock > > # call %r4 <- errno, %r1 > > call spin_unlock > > br .L6 > > .L6: > > ret.32 %r6 > > > > but to me, this seems a very very special case, so not much interesting. > > Oh, ok, so just fixing the missing simplification does actually fix it. This case, yes. And I suppose will solves the problem Song had. But, for example, none of the select simplifications I posted yesterday (not the 'move-everything-up', of course) have any effect on v5.10-rc1. I mean, I'm sure it they have some effect on the generated code but not on the context imbalance warnings (or any other warning). > And I'm not sure how special-case this is. Because I think we _do_ > actually generally doing the if-conversion "later", even if it's not a > separate phase, simply because we end up doing it at the "OP_PHI" > instruction, which should always end up linearizing after the branches > that should have been simplified. Yes, but the branch simplifications depend on the simplifications that can be made on the condition. For the moment there is a focus on select because it often happens when the condition is simple but for these context imbalance the general case is something like: stuff if (arbitrary condition) take lock do other stuff if (same arbitrary condition) release lock and the problem is (at least) twofold (but partially related): 1) recognize that the second condition is the same as the first one 2) avoid that partial optimizations of the first condition 'leak' into the common part because this often inhibits doing 1) [Sorry, if this is not very clear. I need to find some small examples to illustrate this]. When the condition is very simple, it is converted into a select but very often it's more complex, involves a memory access and/or some function calls or some asm. > So sometimes the ordering can be simply due to the order we see and > react to instructions. > > > > Have we done that if_convert_phi() simplification too early? > > > > Absolutely. I've often observed that it inhibits other simplifications > > related to conditionals. And, even when it doesn't, I think that many > > select simplifications would already be handled at SETxx + CBR level > > (like your previous SEL(x,x,0)). > > Yeah, the problem with doing multiple passes, though, is that you > inevitably end up wanting to repeat them, because a later phase will > cause you to be able to then do earlier phases again. > > So there's never any one "correct" order to do things in, and I think > you always end up with those "oh, unlucky, we did that one > simplification that then made another one harder" regardless of what > you do. Yes, sure, I think it's in general inevitable. But I think that for the if-conversion here it wouldn't be much of a problem because, in itself, it doesn't create other 'functional'/value-related simplifications. It only may create an empty BB that can then be simplified away with some branch simplifications. Maybe I'm just an optimist. But sure, it's hard to predict exactly the effect. I should experiment a bit to see the impact on quality, complexity and time. -- Luc
On Sat, Nov 7, 2020 at 1:23 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > Yes, but the branch simplifications depend on the simplifications > that can be made on the condition. For the moment there is a focus > on select because it often happens when the condition is simple but > for these context imbalance the general case is something like: > stuff > if (arbitrary condition) > take lock > do other stuff > if (same arbitrary condition) > release lock > > and the problem is (at least) twofold (but partially related): > 1) recognize that the second condition is the same as the first one > 2) avoid that partial optimizations of the first condition 'leak' > into the common part because this often inhibits doing 1) > [Sorry, if this is not very clear. I need to find some small > examples to illustrate this]. > > When the condition is very simple, it is converted into a select > but very often it's more complex, involves a memory access and/or > some function calls or some asm. Note that if the condition is complex and involves memory etc, it's generally actually a bug in the kernel. Because if it could possibly change between the lock taking and releasing, you are now broken. So I wouldn't want to accept code like that _anyway_. Any complex conditional needs to be written as lock = complex_conditional; if (lock) take lock do other stuff if (lock) release lock and if sparse handles _that_ case well, then all is ok. If sparse complains about the fragile "repeat complex conditional that couldn't be simplified to show that it was obviously identical", then that's actually a _good_ thing. Because we want to complain about stuff where it's not obviously clear that the conditional is 100% the same. Now, even in the above, if we'd want sparse to say that is ok, then we'd actually need to make the context tracking itself able to deal with "conditional on this pseudo", which it doesn't do right now. So right now it requires that much simpler case where you actually end up not ever having that shared case in the middle at all, and the simplification really ends up showing that the real path is really error = complex_conditional; if (error) return error; take lock do other stuff release lock which apparently sparse now - with your simplification - can see that the code actually ends up doing . Which is even better. I'd much rather have the lock context tracking show that locking is simple, and that we don't have any code that sometimes is called locked, and sometimes is not. Because even if you can prove - with some kind of conditional context tracking - that the context at least _nests_ correctly (ie you always properly unlock something you locked), that isn't actually the big problem. The big problem really is "why did that 'do-other-stuff' sometimes run without locking, and sometimes with locking?" So the lock context tracking is fairly inflexible on purpose. It's _meant_ to not just do "unlock matches a lock" matching. It's meant to also track "ok, there is no odd 'sometimes locked, sometimes not' code". Linus
On Sat, Nov 07, 2020 at 02:20:03PM -0800, Linus Torvalds wrote: > On Sat, Nov 7, 2020 at 1:23 PM Luc Van Oostenryck > <luc.vanoostenryck@gmail.com> wrote: > > > > Yes, but the branch simplifications depend on the simplifications > > that can be made on the condition. For the moment there is a focus > > on select because it often happens when the condition is simple but > > for these context imbalance the general case is something like: > > stuff > > if (arbitrary condition) > > take lock > > do other stuff > > if (same arbitrary condition) > > release lock > > > > and the problem is (at least) twofold (but partially related): > > 1) recognize that the second condition is the same as the first one > > 2) avoid that partial optimizations of the first condition 'leak' > > into the common part because this often inhibits doing 1) > > [Sorry, if this is not very clear. I need to find some small > > examples to illustrate this]. > > > > When the condition is very simple, it is converted into a select > > but very often it's more complex, involves a memory access and/or > > some function calls or some asm. > > Note that if the condition is complex and involves memory etc, it's > generally actually a bug in the kernel. > > Because if it could possibly change between the lock taking and > releasing, you are now broken. > > So I wouldn't want to accept code like that _anyway_. Any complex > conditional needs to be written as > > lock = complex_conditional; > if (lock) > take lock > do other stuff > if (lock) > release lock > > and if sparse handles _that_ case well, then all is ok. > > If sparse complains about the fragile "repeat complex conditional that > couldn't be simplified to show that it was obviously identical", then > that's actually a _good_ thing. > > Because we want to complain about stuff where it's not obviously clear > that the conditional is 100% the same. > > Now, even in the above, if we'd want sparse to say that is ok, then > we'd actually need to make the context tracking itself able to deal > with "conditional on this pseudo", which it doesn't do right now. > > So right now it requires that much simpler case where you actually end > up not ever having that shared case in the middle at all, and the > simplification really ends up showing that the real path is really > > error = complex_conditional; > if (error) > return error; > take lock > do other stuff > release lock > > which apparently sparse now - with your simplification - can see that > the code actually ends up doing . > > Which is even better. > > I'd much rather have the lock context tracking show that locking is > simple, and that we don't have any code that sometimes is called > locked, and sometimes is not. > > Because even if you can prove - with some kind of conditional context > tracking - that the context at least _nests_ correctly (ie you always > properly unlock something you locked), that isn't actually the big > problem. The big problem really is "why did that 'do-other-stuff' > sometimes run without locking, and sometimes with locking?" But isn't a lot of code written to explicitly support this, with an argument or some other condition selecting between the locked mode or the unlocked mode? > So the lock context tracking is fairly inflexible on purpose. It's > _meant_ to not just do "unlock matches a lock" matching. It's meant to > also track "ok, there is no odd 'sometimes locked, sometimes not' > code". I agree with all this but I think the situation is not so clear cut. I've made a few examples, oversimplified, totally made-up but representative of the situations with the context tracking. void take_lock(void) __attribute__((context(x, 0, 1))); void drop_lock(void) __attribute__((context(x, 1, 0))); void minor(void); void major(void); int complex_cond(void); This one is fine for Sparse static inline int cond_lock1(void) { if (complex_cond()) { take_lock(); minor(); return 1; } return 0; } static void ok1(void) { int cond = cond_lock1(); if (cond) { major(); drop_lock(); } } It produces the following: ok1: call.32 %r1 <- complex_cond cbr %r1, .L1, .L5 .L1: call take_lock context 1 call minor # call %r3 <- cond_lock1 call major call drop_lock context -1 br .L5 .L5: ret Good. The next one corresponds to the situation Song Liu reported static inline int cond_lock2(void) { if (!complex_cond()) return -1; take_lock(); minor(); return 0; } static int okish(void) { int cond = cond_lock2(); if (cond) return 0; major(); drop_lock(); return 1; } Sparse currently produces the following: okish: call.32 %r6 <- complex_cond select.32 %r8 <- %r6, $0, $0xffffffff cbr %r6, .L8, .L9 .L8: call take_lock context 1 call minor br .L9 .L9: # call %r8 <- cond_lock2 seteq.32 %r11 <- %r8, $0 cbr %r6, .L11, .L12 .L11: call major call drop_lock context -1 br .L12 .L12: ret.32 %r11 Sparse complains about a context imbalance because of L8/L9. The seteq is not merged with the select but something can be done about it or on the conditional branch that depends on it. I'll look at it right away. [But from what I remember when I analyzed some of these, it's in similar situations that things are messy, when cond_lock() is slightly more complex and partial simplifications begins to mix things from the surrounding blocks]. The one that really annoys me is this one. It's very simple and, as far as I know, quite common in the kernel but kinda hopeless. static void ko(int cond) { if (cond) take_lock(); major(); if (cond) drop_lock(); } It produces the following: ko: cbr %arg1, .L14, .L15 .L14: call take_lock context 1 br .L15 .L15: call major cbr %arg1, .L16, .L17 .L16: call drop_lock context -1 br .L17 .L17: ret There is no select, no seteq/setne, no complex condition, nothing. It's an idealization but I think that a lot of code involving conditional locks boils down to this and as I said is quite common. What do we want to do about this? I don't think there is nothing that can be done except warning or duplicating .L15, making it as if the code would have been written as: static void ko(int cond) { if (cond) { take_lock(); major(); drop_lock(); } else { major(); } } It's doable of course but ... well, I don't think we want to do this. -- Luc
On Sat, Nov 7, 2020 at 4:42 PM Luc Van Oostenryck <luc.vanoostenryck@gmail.com> wrote: > > But isn't a lot of code written to explicitly support this, with an > argument or some other condition selecting between the locked mode > or the unlocked mode? A lot? No. I've actively discouraged it. It does exist - don't get me wrong - but it shouldn't be the normal pattern. > This one is fine for Sparse > static inline int cond_lock1(void) > { > if (complex_cond()) { > take_lock(); > minor(); > return 1; > } > return 0; > } > static void ok1(void) > { > int cond = cond_lock1(); > if (cond) { > major(); > drop_lock(); > } > } That _is_ a somewhat common case, and it's basically the "first helper tells the caller that it took the lock". The _really_ basic version of that is the various "trylock()" things, which has that "__cond_lock()" macro in the kernel for it, so that sparse can catch that very special case. That said, it's still not the _common_ case, which is just that the code is straightforward take_lock(); do something under the lock drop_lock(); > The next one corresponds to the situation Song Liu reported > static inline int cond_lock2(void) > { > if (!complex_cond()) > return -1; > > take_lock(); > minor(); > return 0; > } > static int okish(void) > { > int cond = cond_lock2(); > if (cond) > return 0; > major(); > drop_lock(); > return 1; > } Yeah, this is a more complex version of the same. Clearly sparse doesn't grok it today, but the fact that your patches make sparse able to track it is a good improvement. > The one that really annoys me is this one. It's very simple and, as > far as I know, quite common in the kernel but kinda hopeless. > static void ko(int cond) > { > if (cond) > take_lock(); > major(); > if (cond) > drop_lock(); > } This really is invalid code. "major()" is done in two different lock contexts. Sparse _should_ complain. Linus
diff --git i/kernel/bpf/hashtab.c w/kernel/bpf/hashtab.c index 23f73d4649c9c..faad2061f1167 100644 --- i/kernel/bpf/hashtab.c +++ w/kernel/bpf/hashtab.c @@ -1279,6 +1279,7 @@ static int htab_map_delete_elem(struct bpf_map *map, void *key) ret = htab_lock_bucket(htab, b, hash, &flags); if (ret) return ret; + ret = 0; l = lookup_elem_raw(head, hash, key, key_size); @@ -1314,6 +1315,7 @@ static int htab_lru_map_delete_elem(struct bpf_map *map, void *key) ret = htab_lock_bucket(htab, b, hash, &flags); if (ret) return ret; + ret = 0; l = lookup_elem_raw(head, hash, key, key_size); @@ -1476,6 +1478,7 @@ __htab_map_lookup_and_delete_batch(struct bpf_map *map, ret = htab_lock_bucket(htab, b, batch, &flags); if (ret) goto next_batch; + ret = 0; } bucket_cnt = 0;