Message ID | 20201118204007.269943012@linutronix.de (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/highmem: Preemptible variant of kmap_atomic & friends | expand |
On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote: > From: Thomas Gleixner <tglx@linutronix.de> > > Now that the scheduler can deal with migrate disable properly, there is no > real compelling reason to make it only available for RT. > > There are quite some code pathes which needlessly disable preemption in > order to prevent migration and some constructs like kmap_atomic() enforce > it implicitly. > > Making it available independent of RT allows to provide a preemptible > variant of kmap_atomic() and makes the code more consistent in general. > > FIXME: Rework the comment in preempt.h - Peter? > I didn't keep up to date and there is clearly a dependency on patches in tip for migrate_enable/migrate_disable . It's not 100% clear to me what reworking you're asking for but then again, I'm not Peter! From tip; /** * migrate_disable - Prevent migration of the current task * * Maps to preempt_disable() which also disables preemption. Use * migrate_disable() to annotate that the intent is to prevent migration, * but not necessarily preemption. * * Can be invoked nested like preempt_disable() and needs the corresponding * number of migrate_enable() invocations. */ I assume that the rework is to document the distinction between migrate_disable and preempt_disable() because it may not be clear to some people why one should be used over another and the risk of cut&paste cargo cult programming. So I assume the rework is for the middle paragraph * Maps to preempt_disable() which also disables preemption. Use * migrate_disable() to annotate that the intent is to prevent migration, * but not necessarily preemption. The distinction is that preemption * disabling will protect a per-cpu structure from concurrent * modifications due to preemption. migrate_disable partially protects * the tasks address space and potentially preserves the TLB entries * even if preempted such as an needed for a local IO mapping or a * kmap_atomic() referenced by on-stack pointers to avoid interference * between user threads or kernel threads sharing the same address space. I know it can have other examples that are rt-specific and some tricks on percpu page alloc draining that relies on a combination of migrate_disable and interrupt disabling to protect the structures but the above example might be understandable to a non-RT audience.
On Thu, Nov 19, 2020 at 09:38:34AM +0000, Mel Gorman wrote: > On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote: > > From: Thomas Gleixner <tglx@linutronix.de> > > > > Now that the scheduler can deal with migrate disable properly, there is no > > real compelling reason to make it only available for RT. > > > > There are quite some code pathes which needlessly disable preemption in > > order to prevent migration and some constructs like kmap_atomic() enforce > > it implicitly. > > > > Making it available independent of RT allows to provide a preemptible > > variant of kmap_atomic() and makes the code more consistent in general. > > > > FIXME: Rework the comment in preempt.h - Peter? > > > > I didn't keep up to date and there is clearly a dependency on patches in > tip for migrate_enable/migrate_disable . It's not 100% clear to me what > reworking you're asking for but then again, I'm not Peter! He's talking about the big one: "Migrate-Disable and why it is undesired.". I still hate all of this, and I really fear that with migrate_disable() available, people will be lazy and usage will increase :/ Case at hand is this series, the only reason we need it here is because per-cpu page-tables are expensive... I really do think we want to limit the usage and get rid of the implicit migrate_disable() in spinlock_t/rwlock_t for example. AFAICT the scenario described there is entirely possible; and it has to show up for workloads that rely on multi-cpu bandwidth for correctness. Switching from preempt_disable() to migrate_disable() hides the immediate / easily visible high priority latency, but you move the interference term into a place where it is much harder to detect, you don't lose the term, it stays in the system. So no, I don't want to make the comment less scary. Usage is discouraged.
On Thu, Nov 19, 2020 at 12:14:11PM +0100, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 09:38:34AM +0000, Mel Gorman wrote: > > On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote: > > > From: Thomas Gleixner <tglx@linutronix.de> > > > > > > Now that the scheduler can deal with migrate disable properly, there is no > > > real compelling reason to make it only available for RT. > > > > > > There are quite some code pathes which needlessly disable preemption in > > > order to prevent migration and some constructs like kmap_atomic() enforce > > > it implicitly. > > > > > > Making it available independent of RT allows to provide a preemptible > > > variant of kmap_atomic() and makes the code more consistent in general. > > > > > > FIXME: Rework the comment in preempt.h - Peter? > > > > > > > I didn't keep up to date and there is clearly a dependency on patches in > > tip for migrate_enable/migrate_disable . It's not 100% clear to me what > > reworking you're asking for but then again, I'm not Peter! > > He's talking about the big one: "Migrate-Disable and why it is > undesired.". > Ah yes, that makes more sense. I was thinking in terms of what is protected but the PREEMPT_RT hazard is severe. > I still hate all of this, and I really fear that with migrate_disable() > available, people will be lazy and usage will increase :/ > > Case at hand is this series, the only reason we need it here is because > per-cpu page-tables are expensive... > I guessed, it was the only thing that made sense. > I really do think we want to limit the usage and get rid of the implicit > migrate_disable() in spinlock_t/rwlock_t for example. > > AFAICT the scenario described there is entirely possible; and it has to > show up for workloads that rely on multi-cpu bandwidth for correctness. > > Switching from preempt_disable() to migrate_disable() hides the > immediate / easily visible high priority latency, but you move the > interference term into a place where it is much harder to detect, you > don't lose the term, it stays in the system. > > So no, I don't want to make the comment less scary. Usage is > discouraged. More scary then by adding this to the kerneldoc section for migrate_disable? * Usage of migrate_disable is heavily discouraged as it is extremely * hazardous on PREEMPT_RT kernels and any usage needs to be heavily * justified. Before even thinking about using this, read * "Migrate-Disable and why it is undesired" in * include/linux/preempt.h and include both a comment and document * in the changelog why the use case is an exception. It's not necessary for the current series because the interface hides it and anyone poking at the internals of kmap_atomic probably should be aware of the address space and TLB hazards associated with it. There are few in-tree users and presumably any future preempt-rt related merges already know why migrate_disable is required. However, with the kerneldoc, there is no excuse for missing it for new users that are not PREEMPT_RT-aware. It makes it easier to NAK/revert a patch without proper justification similar to how undocumented usages of memory barriers tend to get a poor reception.
On Thu, 19 Nov 2020 12:14:13 +0000 Mel Gorman <mgorman@suse.de> wrote: > * Usage of migrate_disable is heavily discouraged as it is extremely > * hazardous on PREEMPT_RT kernels and any usage needs to be heavily I don't believe it's just PREEMPT_RT. It's RT tasks that are concerned, especially when you are dealing with SCHED_DEADLINE. PREEMPT_RT just allows better determinism for RT tasks, but the issue with migrate_disable is not limited to just PREEMPT_RT. -- Steve > * justified. Before even thinking about using this, read > * "Migrate-Disable and why it is undesired" in > * include/linux/preempt.h and include both a comment and document > * in the changelog why the use case is an exception.
On Thu, Nov 19, 2020 at 3:14 AM Peter Zijlstra <peterz@infradead.org> wrote: > > I still hate all of this, and I really fear that with migrate_disable() > available, people will be lazy and usage will increase :/ > > Case at hand is this series, the only reason we need it here is because > per-cpu page-tables are expensive... No, I think you as a scheduler person just need to accept it. Because this is certainly not the only time migration limiting has come up, and no, it has absolutely nothing to do with per-cpu page tables being completely unacceptable. The scheduler people need to get used to this. Really. Because ASMP is just going to be a fact. There are few things more futile than railing against reality, Peter. Honestly, the only argument I've ever heard against limiting migration is the whole "our scheduling theory doesn't cover it". So either throw the broken theory away, or live with it. Theory that doesn't match reality isn't theory, it's religion. Linus
On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote: > On Thu, Nov 19, 2020 at 3:14 AM Peter Zijlstra <peterz@infradead.org> wrote: > > > > I still hate all of this, and I really fear that with migrate_disable() > > available, people will be lazy and usage will increase :/ > > > > Case at hand is this series, the only reason we need it here is because > > per-cpu page-tables are expensive... > > No, I think you as a scheduler person just need to accept it. Well, I did do write the patches. > Because this is certainly not the only time migration limiting has > come up, and no, it has absolutely nothing to do with per-cpu page > tables being completely unacceptable. It is for this instance; but sure, it's come up before in other contexts. > The scheduler people need to get used to this. Really. Because ASMP is > just going to be a fact. ASMP is different in that it is a hardware constraint, you're just not going to be able to run more of X than there's X capable hardware units on (be if FPUs, Vector units, 32bit or whatever) > There are few things more futile than railing against reality, Peter. But, but, my windmills! :-) > Honestly, the only argument I've ever heard against limiting migration > is the whole "our scheduling theory doesn't cover it". > > So either throw the broken theory away, or live with it. Theory that > doesn't match reality isn't theory, it's religion. The first stage of throwing it out is understanding the problem, which is just about where we're at. Next is creating a new formalism (if possible) that covers this new issue. That might take a while. Thing is though; without a formalism to reason about timeliness guarantees, there is no Real-Time. So sure, I've written the patches, doesn't mean I have to like the place we're in due to it.
On Thu, Nov 19 2020 at 19:28, Peter Zijlstra wrote: > On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote: >> Because this is certainly not the only time migration limiting has >> come up, and no, it has absolutely nothing to do with per-cpu page >> tables being completely unacceptable. > > It is for this instance; but sure, it's come up before in other > contexts. Indeed. And one of the really bad outcomes of this is that people are forced to use preempt_disable() to prevent migration which entails a slew of consequences: - Using spinlocks where it wouldn't be needed otherwise - Spinwaiting instead of sleeping - The whole crazyness of doing copy_to/from_user_in_atomic() along with the necessary out of line error handling. - .... The introduction of per-cpu storage happened almost 20 years ago (2002) and still the only answer we have is preempt_disable(). I know the scheduling theory folks still try to wrap their heads around the introduction of SMP which dates back to 1962 IIRC... >> The scheduler people need to get used to this. Really. Because ASMP is >> just going to be a fact. > > ASMP is different in that it is a hardware constraint, you're just not > going to be able to run more of X than there's X capable hardware units > on (be if FPUs, Vector units, 32bit or whatever) ASMP is as old as SMP. The first SMP systems were in fact ASMP. The reasons for ASMP 60 years ago were not that different from the reasons for ASMP today. Just the scale and the effectivness are different. >> There are few things more futile than railing against reality, Peter. > > But, but, my windmills! :-) At least you have windmills where you live so you can pull off the real Don Quixote while other people have to find substitutes :) Thanks, tglx
On Fri, Nov 20, 2020 at 02:33:58AM +0100, Thomas Gleixner wrote: > On Thu, Nov 19 2020 at 19:28, Peter Zijlstra wrote: > > On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote: > >> Because this is certainly not the only time migration limiting has > >> come up, and no, it has absolutely nothing to do with per-cpu page > >> tables being completely unacceptable. > > > > It is for this instance; but sure, it's come up before in other > > contexts. > > Indeed. And one of the really bad outcomes of this is that people are > forced to use preempt_disable() to prevent migration which entails a > slew of consequences: > > - Using spinlocks where it wouldn't be needed otherwise > - Spinwaiting instead of sleeping > - The whole crazyness of doing copy_to/from_user_in_atomic() along > with the necessary out of line error handling. > - .... > > The introduction of per-cpu storage happened almost 20 years ago (2002) > and still the only answer we have is preempt_disable(). IIRC the first time this migrate_disable() stuff came up was when Chris Lameter did SLUB. Eventually he settled for that cmpxchg_double() approach (which is somewhat similar to userspace rseq) which is vastly superiour and wouldn't have happened had we provided migrate_disable(). As already stated, per-cpu page-tables would allow for a much saner kmap approach, but alas, x86 really can't sanely do that (the archs that have separate kernel and user page-tables could do this, and how we cursed x86 didn't have that when meltdown happened). [ and using fixmaps in the per-cpu memory space _could_ work, but is a giant pain because then all accesses need GS prefix and blah... ] And I'm sure there's creative ways for other problems too, but yes, it's hard. Anyway, clearly I'm the only one that cares, so I'll just crawl back under my rock...
On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote: > > As already stated, per-cpu page-tables would allow for a much saner kmap > approach, but alas, x86 really can't sanely do that (the archs that have > separate kernel and user page-tables could do this, and how we cursed > x86 didn't have that when meltdown happened). > > [ and using fixmaps in the per-cpu memory space _could_ work, but is a > giant pain because then all accesses need GS prefix and blah... ] > > And I'm sure there's creative ways for other problems too, but yes, it's > hard. > > Anyway, clearly I'm the only one that cares, so I'll just crawl back > under my rock... I'll poke my head out of the rock for a moment, though... Several years ago, we discussed (in person at some conference IIRC) having percpu pagetables to get sane kmaps, percpu memory, etc. The conclusion was that Linus thought the performance would suck and we shouldn't do it. Since then, though, we added really fancy infrastructure for keeping track of a per-CPU list of recently used mms and efficiently tracking when they need to be invalidated. We called these "ASIDs". It would be fairly straightforward to have an entire pgd for each (cpu, asid) pair. Newly added second-level (p4d/pud/whatever -- have I ever mentioned how much I dislike the Linux pagetable naming conventions and folding tricks?) tables could be lazily faulted in, and copies of the full 2kB mess would only be neeced when a new (cpu,asid) is allocated because either a flush happened while the mm was inactive on the CPU in question or because the mm fell off the percpu cache. The total overhead would be a bit more cache usage, 4kB * num cpus * num ASIDs per CPU (or 8k for PTI), and a few extra page faults (max num cpus * 256 per mm over the entire lifetime of that mm). The common case of a CPU switching back and forth between a small number of mms would have no significant overhead. On an unrelated note, what happens if you migrate_disable(), sleep for a looooong time, and someone tries to offline your CPU?
On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote: > On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote: >> Anyway, clearly I'm the only one that cares, so I'll just crawl back >> under my rock... > > I'll poke my head out of the rock for a moment, though... > > Several years ago, we discussed (in person at some conference IIRC) > having percpu pagetables to get sane kmaps, percpu memory, etc. Yes, I remember. That was our initial reaction in Prague to the looming PTI challenge 3 years ago. > The conclusion was that Linus thought the performance would suck and > we shouldn't do it. Linus had opinions, but we all agreed that depending on the workload and the CPU features (think !PCID) the copy/pagefault overhead could be significant. > Since then, though, we added really fancy infrastructure for keeping > track of a per-CPU list of recently used mms and efficiently tracking > when they need to be invalidated. We called these "ASIDs". It would > be fairly straightforward to have an entire pgd for each (cpu, asid) > pair. Newly added second-level (p4d/pud/whatever -- have I ever > mentioned how much I dislike the Linux pagetable naming conventions > and folding tricks?) tables could be lazily faulted in, and copies of > the full 2kB mess would only be neeced when a new (cpu,asid) is > allocated because either a flush happened while the mm was inactive on > the CPU in question or because the mm fell off the percpu cache. > > The total overhead would be a bit more cache usage, 4kB * num cpus * > num ASIDs per CPU (or 8k for PTI), and a few extra page faults (max > num cpus * 256 per mm over the entire lifetime of that mm). > The common case of a CPU switching back and forth between a small > number of mms would have no significant overhead. For CPUs which do not support PCID this sucks, which is everything pre Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to bite the bullet and use the very same mechanism. Not that I care much TBH. Even for those CPUs which support it we'd need to increase the number of ASIDs significantly. Right now we use only 6 ASIDs, which is not a lot. There are process heavy workloads out there which do quite some context switching so avoiding the copy matters. I'm not worried about fork as the copy will probably be just noise. That said, I'm not saying it shouldn't be done, but there are quite a few things which need to be looked at. TBH, I really would love to see that just to make GS kernel usage and the related mess in the ASM code go away completely. For the task at hand, i.e. replacing kmap_atomic() by kmap_local(), this is not really helpful because we'd need to make all highmem using architectures do the same thing. But if we can pull it off on x86 the required changes for the kmap_local() code are not really significant. > On an unrelated note, what happens if you migrate_disable(), sleep for > a looooong time, and someone tries to offline your CPU? The hotplug code will prevent the CPU from going offline in that case, i.e. it waits until the last task left it's migrate disabled section. But you are not supposed to invoke sleep($ETERNAL) in such a context. Emphasis on 'not supposed' :) Thanks, tglx
On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote: > On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote: >> On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote: >> The common case of a CPU switching back and forth between a small >> number of mms would have no significant overhead. > > For CPUs which do not support PCID this sucks, which is everything pre > Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to > bite the bullet and use the very same mechanism. Not that I care much > TBH. Bah, I completely forgot that AMD does not support PCID before Zen3 which is a major showstopper. Thanks, tglx
> On Nov 23, 2020, at 1:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote: > > On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote: >>> On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote: >>>> On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote: >>> The common case of a CPU switching back and forth between a small >>> number of mms would have no significant overhead. >> >> For CPUs which do not support PCID this sucks, which is everything pre >> Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to >> bite the bullet and use the very same mechanism. Not that I care much >> TBH. > > Bah, I completely forgot that AMD does not support PCID before Zen3 > which is a major showstopper. Why? Couldn’t we rig up the code so we still track all the ASIDs even if there is no CPU support? We would take the TLB flush hit on every context switch, but we pay that cost anyway. We would avoid the extra copy in the same cases in which we would avoid it if we had PCID. > > Thanks, > > tglx
On Mon, Nov 23 2020 at 14:07, Andy Lutomirski wrote: >> On Nov 23, 2020, at 1:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote: >> On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote: >>>> On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote: >>>> >>>> The common case of a CPU switching back and forth between a small >>>> number of mms would have no significant overhead. >>> >>> For CPUs which do not support PCID this sucks, which is everything pre >>> Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to >>> bite the bullet and use the very same mechanism. Not that I care much >>> TBH. >> >> Bah, I completely forgot that AMD does not support PCID before Zen3 >> which is a major showstopper. > > Why? Couldn’t we rig up the code so we still track all the ASIDs even > if there is no CPU support? We would take the TLB flush hit on every > context switch, but we pay that cost anyway. We would avoid the extra > copy in the same cases in which we would avoid it if we had PCID. Did not think about that indeed. Yes, that should do the trick and should not be worse than what we have now. Sometimes one just can't see the forest for the trees :) Thanks, tglx
--- a/include/linux/kernel.h +++ b/include/linux/kernel.h @@ -204,6 +204,7 @@ extern int _cond_resched(void); extern void ___might_sleep(const char *file, int line, int preempt_offset); extern void __might_sleep(const char *file, int line, int preempt_offset); extern void __cant_sleep(const char *file, int line, int preempt_offset); +extern void __cant_migrate(const char *file, int line); /** * might_sleep - annotation for functions that can sleep @@ -227,6 +228,18 @@ extern void __cant_sleep(const char *fil # define cant_sleep() \ do { __cant_sleep(__FILE__, __LINE__, 0); } while (0) # define sched_annotate_sleep() (current->task_state_change = 0) + +/** + * cant_migrate - annotation for functions that cannot migrate + * + * Will print a stack trace if executed in code which is migratable + */ +# define cant_migrate() \ + do { \ + if (IS_ENABLED(CONFIG_SMP)) \ + __cant_migrate(__FILE__, __LINE__); \ + } while (0) + /** * non_block_start - annotate the start of section where sleeping is prohibited * @@ -251,6 +264,7 @@ extern void __cant_sleep(const char *fil int preempt_offset) { } # define might_sleep() do { might_resched(); } while (0) # define cant_sleep() do { } while (0) +# define cant_migrate() do { } while (0) # define sched_annotate_sleep() do { } while (0) # define non_block_start() do { } while (0) # define non_block_end() do { } while (0) @@ -258,13 +272,6 @@ extern void __cant_sleep(const char *fil #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0) -#ifndef CONFIG_PREEMPT_RT -# define cant_migrate() cant_sleep() -#else - /* Placeholder for now */ -# define cant_migrate() do { } while (0) -#endif - /** * abs - return absolute value of an argument * @x: the value. If it is unsigned type, it is converted to signed type first. --- a/include/linux/preempt.h +++ b/include/linux/preempt.h @@ -322,7 +322,7 @@ static inline void preempt_notifier_init #endif -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP /* * Migrate-Disable and why it is undesired. @@ -382,43 +382,11 @@ static inline void preempt_notifier_init extern void migrate_disable(void); extern void migrate_enable(void); -#elif defined(CONFIG_PREEMPT_RT) +#else static inline void migrate_disable(void) { } static inline void migrate_enable(void) { } -#else /* !CONFIG_PREEMPT_RT */ - -/** - * migrate_disable - Prevent migration of the current task - * - * Maps to preempt_disable() which also disables preemption. Use - * migrate_disable() to annotate that the intent is to prevent migration, - * but not necessarily preemption. - * - * Can be invoked nested like preempt_disable() and needs the corresponding - * number of migrate_enable() invocations. - */ -static __always_inline void migrate_disable(void) -{ - preempt_disable(); -} - -/** - * migrate_enable - Allow migration of the current task - * - * Counterpart to migrate_disable(). - * - * As migrate_disable() can be invoked nested, only the outermost invocation - * reenables migration. - * - * Currently mapped to preempt_enable(). - */ -static __always_inline void migrate_enable(void) -{ - preempt_enable(); -} - -#endif /* CONFIG_SMP && CONFIG_PREEMPT_RT */ +#endif /* CONFIG_SMP */ #endif /* __LINUX_PREEMPT_H */ --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -715,7 +715,7 @@ struct task_struct { const cpumask_t *cpus_ptr; cpumask_t cpus_mask; void *migration_pending; -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP unsigned short migration_disabled; #endif unsigned short migration_flags; --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -1696,8 +1696,6 @@ void check_preempt_curr(struct rq *rq, s #ifdef CONFIG_SMP -#ifdef CONFIG_PREEMPT_RT - static void __do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask, u32 flags); @@ -1768,8 +1766,6 @@ static inline bool rq_has_pinned_tasks(s return rq->nr_pinned; } -#endif - /* * Per-CPU kthreads are allowed to run on !active && online CPUs, see * __set_cpus_allowed_ptr() and select_fallback_rq(). @@ -2839,7 +2835,7 @@ void sched_set_stop_task(int cpu, struct } } -#else +#else /* CONFIG_SMP */ static inline int __set_cpus_allowed_ptr(struct task_struct *p, const struct cpumask *new_mask, @@ -2848,10 +2844,6 @@ static inline int __set_cpus_allowed_ptr return set_cpus_allowed_ptr(p, new_mask); } -#endif /* CONFIG_SMP */ - -#if !defined(CONFIG_SMP) || !defined(CONFIG_PREEMPT_RT) - static inline void migrate_disable_switch(struct rq *rq, struct task_struct *p) { } static inline bool rq_has_pinned_tasks(struct rq *rq) @@ -2859,7 +2851,7 @@ static inline bool rq_has_pinned_tasks(s return false; } -#endif +#endif /* !CONFIG_SMP */ static void ttwu_stat(struct task_struct *p, int cpu, int wake_flags) @@ -7881,6 +7873,39 @@ void __cant_sleep(const char *file, int add_taint(TAINT_WARN, LOCKDEP_STILL_OK); } EXPORT_SYMBOL_GPL(__cant_sleep); + +#ifdef CONFIG_SMP +void __cant_migrate(const char *file, int line) +{ + static unsigned long prev_jiffy; + + if (irqs_disabled()) + return; + + if (is_migration_disabled(current)) + return; + + if (!IS_ENABLED(CONFIG_PREEMPT_COUNT)) + return; + + if (preempt_count() > 0) + return; + + if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy) + return; + prev_jiffy = jiffies; + + pr_err("BUG: assuming non migratable context at %s:%d\n", file, line); + pr_err("in_atomic(): %d, irqs_disabled(): %d, migration_disabled() %u pid: %d, name: %s\n", + in_atomic(), irqs_disabled(), is_migration_disabled(current), + current->pid, current->comm); + + debug_show_held_locks(current); + dump_stack(); + add_taint(TAINT_WARN, LOCKDEP_STILL_OK); +} +EXPORT_SYMBOL_GPL(__cant_migrate); +#endif #endif #ifdef CONFIG_MAGIC_SYSRQ --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1056,7 +1056,7 @@ struct rq { struct cpuidle_state *idle_state; #endif -#if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP) +#ifdef CONFIG_SMP unsigned int nr_pinned; #endif unsigned int push_busy; @@ -1092,7 +1092,7 @@ static inline int cpu_of(struct rq *rq) static inline bool is_migration_disabled(struct task_struct *p) { -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP return p->migration_disabled; #else return false; --- a/lib/smp_processor_id.c +++ b/lib/smp_processor_id.c @@ -26,7 +26,7 @@ unsigned int check_preemption_disabled(c if (current->nr_cpus_allowed == 1) goto out; -#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT) +#ifdef CONFIG_SMP if (current->migration_disabled) goto out; #endif