diff mbox series

[V4,4/8] sched: Make migrate_disable/enable() independent of RT

Message ID 20201118204007.269943012@linutronix.de (mailing list archive)
State New, archived
Headers show
Series mm/highmem: Preemptible variant of kmap_atomic & friends | expand

Commit Message

Thomas Gleixner Nov. 18, 2020, 7:48 p.m. UTC
From: Thomas Gleixner <tglx@linutronix.de>

Now that the scheduler can deal with migrate disable properly, there is no
real compelling reason to make it only available for RT.

There are quite some code pathes which needlessly disable preemption in
order to prevent migration and some constructs like kmap_atomic() enforce
it implicitly.

Making it available independent of RT allows to provide a preemptible
variant of kmap_atomic() and makes the code more consistent in general.

FIXME: Rework the comment in preempt.h - Peter?

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
---
 include/linux/kernel.h  |   21 ++++++++++++++-------
 include/linux/preempt.h |   38 +++-----------------------------------
 include/linux/sched.h   |    2 +-
 kernel/sched/core.c     |   45 +++++++++++++++++++++++++++++++++++----------
 kernel/sched/sched.h    |    4 ++--
 lib/smp_processor_id.c  |    2 +-
 6 files changed, 56 insertions(+), 56 deletions(-)

Comments

Mel Gorman Nov. 19, 2020, 9:38 a.m. UTC | #1
On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote:
> From: Thomas Gleixner <tglx@linutronix.de>
> 
> Now that the scheduler can deal with migrate disable properly, there is no
> real compelling reason to make it only available for RT.
> 
> There are quite some code pathes which needlessly disable preemption in
> order to prevent migration and some constructs like kmap_atomic() enforce
> it implicitly.
> 
> Making it available independent of RT allows to provide a preemptible
> variant of kmap_atomic() and makes the code more consistent in general.
> 
> FIXME: Rework the comment in preempt.h - Peter?
> 

I didn't keep up to date and there is clearly a dependency on patches in
tip for migrate_enable/migrate_disable . It's not 100% clear to me what
reworking you're asking for but then again, I'm not Peter!

From tip;

/**
 * migrate_disable - Prevent migration of the current task
 *
 * Maps to preempt_disable() which also disables preemption. Use
 * migrate_disable() to annotate that the intent is to prevent migration,
 * but not necessarily preemption.
 *
 * Can be invoked nested like preempt_disable() and needs the corresponding
 * number of migrate_enable() invocations.
 */

I assume that the rework is to document the distinction between
migrate_disable and preempt_disable() because it may not be clear to some
people why one should be used over another and the risk of cut&paste
cargo cult programming.

So I assume the rework is for the middle paragraph

 * Maps to preempt_disable() which also disables preemption. Use
 * migrate_disable() to annotate that the intent is to prevent migration,
 * but not necessarily preemption. The distinction is that preemption
 * disabling will protect a per-cpu structure from concurrent
 * modifications due to preemption. migrate_disable partially protects
 * the tasks address space and potentially preserves the TLB entries
 * even if preempted such as an needed for a local IO mapping or a
 * kmap_atomic() referenced by on-stack pointers to avoid interference
 * between user threads or kernel threads sharing the same address space.

I know it can have other examples that are rt-specific and some tricks on
percpu page alloc draining that relies on a combination of migrate_disable
and interrupt disabling to protect the structures but the above example
might be understandable to a non-RT audience.
Peter Zijlstra Nov. 19, 2020, 11:14 a.m. UTC | #2
On Thu, Nov 19, 2020 at 09:38:34AM +0000, Mel Gorman wrote:
> On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote:
> > From: Thomas Gleixner <tglx@linutronix.de>
> > 
> > Now that the scheduler can deal with migrate disable properly, there is no
> > real compelling reason to make it only available for RT.
> > 
> > There are quite some code pathes which needlessly disable preemption in
> > order to prevent migration and some constructs like kmap_atomic() enforce
> > it implicitly.
> > 
> > Making it available independent of RT allows to provide a preemptible
> > variant of kmap_atomic() and makes the code more consistent in general.
> > 
> > FIXME: Rework the comment in preempt.h - Peter?
> > 
> 
> I didn't keep up to date and there is clearly a dependency on patches in
> tip for migrate_enable/migrate_disable . It's not 100% clear to me what
> reworking you're asking for but then again, I'm not Peter!

He's talking about the big one: "Migrate-Disable and why it is
undesired.".

I still hate all of this, and I really fear that with migrate_disable()
available, people will be lazy and usage will increase :/

Case at hand is this series, the only reason we need it here is because
per-cpu page-tables are expensive...

I really do think we want to limit the usage and get rid of the implicit
migrate_disable() in spinlock_t/rwlock_t for example.


AFAICT the scenario described there is entirely possible; and it has to
show up for workloads that rely on multi-cpu bandwidth for correctness.

Switching from preempt_disable() to migrate_disable() hides the
immediate / easily visible high priority latency, but you move the
interference term into a place where it is much harder to detect, you
don't lose the term, it stays in the system.


So no, I don't want to make the comment less scary. Usage is
discouraged.
Mel Gorman Nov. 19, 2020, 12:14 p.m. UTC | #3
On Thu, Nov 19, 2020 at 12:14:11PM +0100, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 09:38:34AM +0000, Mel Gorman wrote:
> > On Wed, Nov 18, 2020 at 08:48:42PM +0100, Thomas Gleixner wrote:
> > > From: Thomas Gleixner <tglx@linutronix.de>
> > > 
> > > Now that the scheduler can deal with migrate disable properly, there is no
> > > real compelling reason to make it only available for RT.
> > > 
> > > There are quite some code pathes which needlessly disable preemption in
> > > order to prevent migration and some constructs like kmap_atomic() enforce
> > > it implicitly.
> > > 
> > > Making it available independent of RT allows to provide a preemptible
> > > variant of kmap_atomic() and makes the code more consistent in general.
> > > 
> > > FIXME: Rework the comment in preempt.h - Peter?
> > > 
> > 
> > I didn't keep up to date and there is clearly a dependency on patches in
> > tip for migrate_enable/migrate_disable . It's not 100% clear to me what
> > reworking you're asking for but then again, I'm not Peter!
> 
> He's talking about the big one: "Migrate-Disable and why it is
> undesired.".
> 

Ah yes, that makes more sense. I was thinking in terms of what is protected
but the PREEMPT_RT hazard is severe.

> I still hate all of this, and I really fear that with migrate_disable()
> available, people will be lazy and usage will increase :/
> 
> Case at hand is this series, the only reason we need it here is because
> per-cpu page-tables are expensive...
> 

I guessed, it was the only thing that made sense.

> I really do think we want to limit the usage and get rid of the implicit
> migrate_disable() in spinlock_t/rwlock_t for example.
> 
> AFAICT the scenario described there is entirely possible; and it has to
> show up for workloads that rely on multi-cpu bandwidth for correctness.
> 
> Switching from preempt_disable() to migrate_disable() hides the
> immediate / easily visible high priority latency, but you move the
> interference term into a place where it is much harder to detect, you
> don't lose the term, it stays in the system.
> 
> So no, I don't want to make the comment less scary. Usage is
> discouraged.

More scary then by adding this to the kerneldoc section for
migrate_disable?

* Usage of migrate_disable is heavily discouraged as it is extremely
* hazardous on PREEMPT_RT kernels and any usage needs to be heavily
* justified. Before even thinking about using this, read
* "Migrate-Disable and why it is undesired" in
* include/linux/preempt.h and include both a comment and document
* in the changelog why the use case is an exception.

It's not necessary for the current series because the interface hides
it and anyone poking at the internals of kmap_atomic probably should be
aware of the address space and TLB hazards associated with it. There are
few in-tree users and presumably any future preempt-rt related merges
already know why migrate_disable is required.

However, with the kerneldoc, there is no excuse for missing it for new
users that are not PREEMPT_RT-aware. It makes it easier to NAK/revert a
patch without proper justification similar to how undocumented usages of
memory barriers tend to get a poor reception.
Steven Rostedt Nov. 19, 2020, 2:17 p.m. UTC | #4
On Thu, 19 Nov 2020 12:14:13 +0000
Mel Gorman <mgorman@suse.de> wrote:

> * Usage of migrate_disable is heavily discouraged as it is extremely
> * hazardous on PREEMPT_RT kernels and any usage needs to be heavily

I don't believe it's just PREEMPT_RT. It's RT tasks that are concerned,
especially when you are dealing with SCHED_DEADLINE.

PREEMPT_RT just allows better determinism for RT tasks, but the issue with
migrate_disable is not limited to just PREEMPT_RT.

-- Steve


> * justified. Before even thinking about using this, read
> * "Migrate-Disable and why it is undesired" in
> * include/linux/preempt.h and include both a comment and document
> * in the changelog why the use case is an exception.
Linus Torvalds Nov. 19, 2020, 5:23 p.m. UTC | #5
On Thu, Nov 19, 2020 at 3:14 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> I still hate all of this, and I really fear that with migrate_disable()
> available, people will be lazy and usage will increase :/
>
> Case at hand is this series, the only reason we need it here is because
> per-cpu page-tables are expensive...

No, I think you as a scheduler person just need to accept it.

Because this is certainly not the only time migration limiting has
come up, and no, it has absolutely nothing to do with per-cpu page
tables being completely unacceptable.

The scheduler people need to get used to this. Really. Because ASMP is
just going to be a fact.

There are few things more futile than railing against reality, Peter.

Honestly, the only argument I've ever heard against limiting migration
is the whole "our scheduling theory doesn't cover it".

So either throw the broken theory away, or live with it. Theory that
doesn't match reality isn't theory, it's religion.

          Linus
Peter Zijlstra Nov. 19, 2020, 6:28 p.m. UTC | #6
On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote:
> On Thu, Nov 19, 2020 at 3:14 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > I still hate all of this, and I really fear that with migrate_disable()
> > available, people will be lazy and usage will increase :/
> >
> > Case at hand is this series, the only reason we need it here is because
> > per-cpu page-tables are expensive...
> 
> No, I think you as a scheduler person just need to accept it.

Well, I did do write the patches.

> Because this is certainly not the only time migration limiting has
> come up, and no, it has absolutely nothing to do with per-cpu page
> tables being completely unacceptable.

It is for this instance; but sure, it's come up before in other
contexts.

> The scheduler people need to get used to this. Really. Because ASMP is
> just going to be a fact.

ASMP is different in that it is a hardware constraint, you're just not
going to be able to run more of X than there's X capable hardware units
on (be if FPUs, Vector units, 32bit or whatever)

> There are few things more futile than railing against reality, Peter.

But, but, my windmills! :-)

> Honestly, the only argument I've ever heard against limiting migration
> is the whole "our scheduling theory doesn't cover it".
> 
> So either throw the broken theory away, or live with it. Theory that
> doesn't match reality isn't theory, it's religion.

The first stage of throwing it out is understanding the problem, which
is just about where we're at. Next is creating a new formalism (if
possible) that covers this new issue. That might take a while.

Thing is though; without a formalism to reason about timeliness
guarantees, there is no Real-Time.

So sure, I've written the patches, doesn't mean I have to like the place
we're in due to it.
Thomas Gleixner Nov. 20, 2020, 1:33 a.m. UTC | #7
On Thu, Nov 19 2020 at 19:28, Peter Zijlstra wrote:
> On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote:
>> Because this is certainly not the only time migration limiting has
>> come up, and no, it has absolutely nothing to do with per-cpu page
>> tables being completely unacceptable.
>
> It is for this instance; but sure, it's come up before in other
> contexts.

Indeed. And one of the really bad outcomes of this is that people are
forced to use preempt_disable() to prevent migration which entails a
slew of consequences:

     - Using spinlocks where it wouldn't be needed otherwise
     - Spinwaiting instead of sleeping
     - The whole crazyness of doing copy_to/from_user_in_atomic() along
       with the necessary out of line error handling.
     - ....

The introduction of per-cpu storage happened almost 20 years ago (2002)
and still the only answer we have is preempt_disable().

I know the scheduling theory folks still try to wrap their heads around
the introduction of SMP which dates back to 1962 IIRC...

>> The scheduler people need to get used to this. Really. Because ASMP is
>> just going to be a fact.
>
> ASMP is different in that it is a hardware constraint, you're just not
> going to be able to run more of X than there's X capable hardware units
> on (be if FPUs, Vector units, 32bit or whatever)

ASMP is as old as SMP. The first SMP systems were in fact ASMP.  The
reasons for ASMP 60 years ago were not that different from the reasons
for ASMP today. Just the scale and the effectivness are different.

>> There are few things more futile than railing against reality, Peter.
>
> But, but, my windmills! :-)

At least you have windmills where you live so you can pull off the real
Don Quixote while other people have to find substitutes :)

Thanks,

        tglx
Peter Zijlstra Nov. 20, 2020, 9:29 a.m. UTC | #8
On Fri, Nov 20, 2020 at 02:33:58AM +0100, Thomas Gleixner wrote:
> On Thu, Nov 19 2020 at 19:28, Peter Zijlstra wrote:
> > On Thu, Nov 19, 2020 at 09:23:47AM -0800, Linus Torvalds wrote:
> >> Because this is certainly not the only time migration limiting has
> >> come up, and no, it has absolutely nothing to do with per-cpu page
> >> tables being completely unacceptable.
> >
> > It is for this instance; but sure, it's come up before in other
> > contexts.
> 
> Indeed. And one of the really bad outcomes of this is that people are
> forced to use preempt_disable() to prevent migration which entails a
> slew of consequences:
> 
>      - Using spinlocks where it wouldn't be needed otherwise
>      - Spinwaiting instead of sleeping
>      - The whole crazyness of doing copy_to/from_user_in_atomic() along
>        with the necessary out of line error handling.
>      - ....
> 
> The introduction of per-cpu storage happened almost 20 years ago (2002)
> and still the only answer we have is preempt_disable().

IIRC the first time this migrate_disable() stuff came up was when Chris
Lameter did SLUB. Eventually he settled for that cmpxchg_double()
approach (which is somewhat similar to userspace rseq) which is vastly
superiour and wouldn't have happened had we provided migrate_disable().

As already stated, per-cpu page-tables would allow for a much saner kmap
approach, but alas, x86 really can't sanely do that (the archs that have
separate kernel and user page-tables could do this, and how we cursed
x86 didn't have that when meltdown happened).

[ and using fixmaps in the per-cpu memory space _could_ work, but is a
  giant pain because then all accesses need GS prefix and blah... ]

And I'm sure there's creative ways for other problems too, but yes, it's
hard.

Anyway, clearly I'm the only one that cares, so I'll just crawl back
under my rock...
Andy Lutomirski Nov. 22, 2020, 11:16 p.m. UTC | #9
On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
>

> As already stated, per-cpu page-tables would allow for a much saner kmap
> approach, but alas, x86 really can't sanely do that (the archs that have
> separate kernel and user page-tables could do this, and how we cursed
> x86 didn't have that when meltdown happened).
>
> [ and using fixmaps in the per-cpu memory space _could_ work, but is a
>   giant pain because then all accesses need GS prefix and blah... ]
>
> And I'm sure there's creative ways for other problems too, but yes, it's
> hard.
>
> Anyway, clearly I'm the only one that cares, so I'll just crawl back
> under my rock...

I'll poke my head out of the rock for a moment, though...

Several years ago, we discussed (in person at some conference IIRC)
having percpu pagetables to get sane kmaps, percpu memory, etc.  The
conclusion was that Linus thought the performance would suck and we
shouldn't do it.  Since then, though, we added really fancy
infrastructure for keeping track of a per-CPU list of recently used
mms and efficiently tracking when they need to be invalidated.  We
called these "ASIDs".  It would be fairly straightforward to have an
entire pgd for each (cpu, asid) pair.  Newly added second-level
(p4d/pud/whatever -- have I ever mentioned how much I dislike the
Linux pagetable naming conventions and folding tricks?) tables could
be lazily faulted in, and copies of the full 2kB mess would only be
neeced when a new (cpu,asid) is allocated because either a flush
happened while the mm was inactive on the CPU in question or because
the mm fell off the percpu cache.

The total overhead would be a bit more cache usage, 4kB * num cpus *
num ASIDs per CPU (or 8k for PTI), and a few extra page faults (max
num cpus * 256 per mm over the entire lifetime of that mm).  The
common case of a CPU switching back and forth between a small number
of mms would have no significant overhead.

On an unrelated note, what happens if you migrate_disable(), sleep for
a looooong time, and someone tries to offline your CPU?
Thomas Gleixner Nov. 23, 2020, 9:15 p.m. UTC | #10
On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote:
> On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
>> Anyway, clearly I'm the only one that cares, so I'll just crawl back
>> under my rock...
>
> I'll poke my head out of the rock for a moment, though...
>
> Several years ago, we discussed (in person at some conference IIRC)
> having percpu pagetables to get sane kmaps, percpu memory, etc.

Yes, I remember. That was our initial reaction in Prague to the looming
PTI challenge 3 years ago.

> The conclusion was that Linus thought the performance would suck and
> we shouldn't do it.

Linus had opinions, but we all agreed that depending on the workload and
the CPU features (think !PCID) the copy/pagefault overhead could be
significant.

> Since then, though, we added really fancy infrastructure for keeping
> track of a per-CPU list of recently used mms and efficiently tracking
> when they need to be invalidated.  We called these "ASIDs".  It would
> be fairly straightforward to have an entire pgd for each (cpu, asid)
> pair.  Newly added second-level (p4d/pud/whatever -- have I ever
> mentioned how much I dislike the Linux pagetable naming conventions
> and folding tricks?) tables could be lazily faulted in, and copies of
> the full 2kB mess would only be neeced when a new (cpu,asid) is
> allocated because either a flush happened while the mm was inactive on
> the CPU in question or because the mm fell off the percpu cache.
>
> The total overhead would be a bit more cache usage, 4kB * num cpus *
> num ASIDs per CPU (or 8k for PTI), and a few extra page faults (max
> num cpus * 256 per mm over the entire lifetime of that mm).

> The common case of a CPU switching back and forth between a small
> number of mms would have no significant overhead.

For CPUs which do not support PCID this sucks, which is everything pre
Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to
bite the bullet and use the very same mechanism. Not that I care much
TBH.

Even for those CPUs which support it we'd need to increase the number of
ASIDs significantly.  Right now we use only 6 ASIDs, which is not a
lot. There are process heavy workloads out there which do quite some
context switching so avoiding the copy matters. I'm not worried about
fork as the copy will probably be just noise.

That said, I'm not saying it shouldn't be done, but there are quite a
few things which need to be looked at.

TBH, I really would love to see that just to make GS kernel usage and
the related mess in the ASM code go away completely.

For the task at hand, i.e. replacing kmap_atomic() by kmap_local(), this
is not really helpful because we'd need to make all highmem using
architectures do the same thing. But if we can pull it off on x86 the
required changes for the kmap_local() code are not really significant.

> On an unrelated note, what happens if you migrate_disable(), sleep for
> a looooong time, and someone tries to offline your CPU?

The hotplug code will prevent the CPU from going offline in that case,
i.e. it waits until the last task left it's migrate disabled section.

But you are not supposed to invoke sleep($ETERNAL) in such a
context. Emphasis on 'not supposed' :)

Thanks,

        tglx
Thomas Gleixner Nov. 23, 2020, 9:25 p.m. UTC | #11
On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote:
> On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote:
>> On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
>> The common case of a CPU switching back and forth between a small
>> number of mms would have no significant overhead.
>
> For CPUs which do not support PCID this sucks, which is everything pre
> Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to
> bite the bullet and use the very same mechanism. Not that I care much
> TBH.

Bah, I completely forgot that AMD does not support PCID before Zen3
which is a major showstopper.

Thanks,

        tglx
Andy Lutomirski Nov. 23, 2020, 10:07 p.m. UTC | #12
> On Nov 23, 2020, at 1:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> 
> On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote:
>>> On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote:
>>>> On Fri, Nov 20, 2020 at 1:29 AM Peter Zijlstra <peterz@infradead.org> wrote:
>>> The common case of a CPU switching back and forth between a small
>>> number of mms would have no significant overhead.
>> 
>> For CPUs which do not support PCID this sucks, which is everything pre
>> Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to
>> bite the bullet and use the very same mechanism. Not that I care much
>> TBH.
> 
> Bah, I completely forgot that AMD does not support PCID before Zen3
> which is a major showstopper.

Why?  Couldn’t we rig up the code so we still track all the ASIDs even if there is no CPU support?  We would take the TLB flush hit on every context switch, but we pay that cost anyway. We would avoid the extra copy in the same cases in which we would avoid it if we had PCID.

> 
> Thanks,
> 
>        tglx
Thomas Gleixner Nov. 23, 2020, 11:10 p.m. UTC | #13
On Mon, Nov 23 2020 at 14:07, Andy Lutomirski wrote:
>> On Nov 23, 2020, at 1:25 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
>> On Mon, Nov 23 2020 at 22:15, Thomas Gleixner wrote:
>>>> On Sun, Nov 22 2020 at 15:16, Andy Lutomirski wrote:
>>>>
>>>> The common case of a CPU switching back and forth between a small
>>>> number of mms would have no significant overhead.
>>> 
>>> For CPUs which do not support PCID this sucks, which is everything pre
>>> Westmere and all of 32bit. Yes, 32bit. If we go there then 32bit has to
>>> bite the bullet and use the very same mechanism. Not that I care much
>>> TBH.
>> 
>> Bah, I completely forgot that AMD does not support PCID before Zen3
>> which is a major showstopper.
>
> Why?  Couldn’t we rig up the code so we still track all the ASIDs even
> if there is no CPU support?  We would take the TLB flush hit on every
> context switch, but we pay that cost anyway. We would avoid the extra
> copy in the same cases in which we would avoid it if we had PCID.

Did not think about that indeed. Yes, that should do the trick and
should not be worse than what we have now.

Sometimes one just can't see the forest for the trees :)

Thanks,

        tglx
diff mbox series

Patch

--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -204,6 +204,7 @@  extern int _cond_resched(void);
 extern void ___might_sleep(const char *file, int line, int preempt_offset);
 extern void __might_sleep(const char *file, int line, int preempt_offset);
 extern void __cant_sleep(const char *file, int line, int preempt_offset);
+extern void __cant_migrate(const char *file, int line);
 
 /**
  * might_sleep - annotation for functions that can sleep
@@ -227,6 +228,18 @@  extern void __cant_sleep(const char *fil
 # define cant_sleep() \
 	do { __cant_sleep(__FILE__, __LINE__, 0); } while (0)
 # define sched_annotate_sleep()	(current->task_state_change = 0)
+
+/**
+ * cant_migrate - annotation for functions that cannot migrate
+ *
+ * Will print a stack trace if executed in code which is migratable
+ */
+# define cant_migrate()							\
+	do {								\
+		if (IS_ENABLED(CONFIG_SMP))				\
+			__cant_migrate(__FILE__, __LINE__);		\
+	} while (0)
+
 /**
  * non_block_start - annotate the start of section where sleeping is prohibited
  *
@@ -251,6 +264,7 @@  extern void __cant_sleep(const char *fil
 				   int preempt_offset) { }
 # define might_sleep() do { might_resched(); } while (0)
 # define cant_sleep() do { } while (0)
+# define cant_migrate()		do { } while (0)
 # define sched_annotate_sleep() do { } while (0)
 # define non_block_start() do { } while (0)
 # define non_block_end() do { } while (0)
@@ -258,13 +272,6 @@  extern void __cant_sleep(const char *fil
 
 #define might_sleep_if(cond) do { if (cond) might_sleep(); } while (0)
 
-#ifndef CONFIG_PREEMPT_RT
-# define cant_migrate()		cant_sleep()
-#else
-  /* Placeholder for now */
-# define cant_migrate()		do { } while (0)
-#endif
-
 /**
  * abs - return absolute value of an argument
  * @x: the value.  If it is unsigned type, it is converted to signed type first.
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -322,7 +322,7 @@  static inline void preempt_notifier_init
 
 #endif
 
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+#ifdef CONFIG_SMP
 
 /*
  * Migrate-Disable and why it is undesired.
@@ -382,43 +382,11 @@  static inline void preempt_notifier_init
 extern void migrate_disable(void);
 extern void migrate_enable(void);
 
-#elif defined(CONFIG_PREEMPT_RT)
+#else
 
 static inline void migrate_disable(void) { }
 static inline void migrate_enable(void) { }
 
-#else /* !CONFIG_PREEMPT_RT */
-
-/**
- * migrate_disable - Prevent migration of the current task
- *
- * Maps to preempt_disable() which also disables preemption. Use
- * migrate_disable() to annotate that the intent is to prevent migration,
- * but not necessarily preemption.
- *
- * Can be invoked nested like preempt_disable() and needs the corresponding
- * number of migrate_enable() invocations.
- */
-static __always_inline void migrate_disable(void)
-{
-	preempt_disable();
-}
-
-/**
- * migrate_enable - Allow migration of the current task
- *
- * Counterpart to migrate_disable().
- *
- * As migrate_disable() can be invoked nested, only the outermost invocation
- * reenables migration.
- *
- * Currently mapped to preempt_enable().
- */
-static __always_inline void migrate_enable(void)
-{
-	preempt_enable();
-}
-
-#endif /* CONFIG_SMP && CONFIG_PREEMPT_RT */
+#endif /* CONFIG_SMP */
 
 #endif /* __LINUX_PREEMPT_H */
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -715,7 +715,7 @@  struct task_struct {
 	const cpumask_t			*cpus_ptr;
 	cpumask_t			cpus_mask;
 	void				*migration_pending;
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+#ifdef CONFIG_SMP
 	unsigned short			migration_disabled;
 #endif
 	unsigned short			migration_flags;
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1696,8 +1696,6 @@  void check_preempt_curr(struct rq *rq, s
 
 #ifdef CONFIG_SMP
 
-#ifdef CONFIG_PREEMPT_RT
-
 static void
 __do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask, u32 flags);
 
@@ -1768,8 +1766,6 @@  static inline bool rq_has_pinned_tasks(s
 	return rq->nr_pinned;
 }
 
-#endif
-
 /*
  * Per-CPU kthreads are allowed to run on !active && online CPUs, see
  * __set_cpus_allowed_ptr() and select_fallback_rq().
@@ -2839,7 +2835,7 @@  void sched_set_stop_task(int cpu, struct
 	}
 }
 
-#else
+#else /* CONFIG_SMP */
 
 static inline int __set_cpus_allowed_ptr(struct task_struct *p,
 					 const struct cpumask *new_mask,
@@ -2848,10 +2844,6 @@  static inline int __set_cpus_allowed_ptr
 	return set_cpus_allowed_ptr(p, new_mask);
 }
 
-#endif /* CONFIG_SMP */
-
-#if !defined(CONFIG_SMP) || !defined(CONFIG_PREEMPT_RT)
-
 static inline void migrate_disable_switch(struct rq *rq, struct task_struct *p) { }
 
 static inline bool rq_has_pinned_tasks(struct rq *rq)
@@ -2859,7 +2851,7 @@  static inline bool rq_has_pinned_tasks(s
 	return false;
 }
 
-#endif
+#endif /* !CONFIG_SMP */
 
 static void
 ttwu_stat(struct task_struct *p, int cpu, int wake_flags)
@@ -7881,6 +7873,39 @@  void __cant_sleep(const char *file, int
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 }
 EXPORT_SYMBOL_GPL(__cant_sleep);
+
+#ifdef CONFIG_SMP
+void __cant_migrate(const char *file, int line)
+{
+	static unsigned long prev_jiffy;
+
+	if (irqs_disabled())
+		return;
+
+	if (is_migration_disabled(current))
+		return;
+
+	if (!IS_ENABLED(CONFIG_PREEMPT_COUNT))
+		return;
+
+	if (preempt_count() > 0)
+		return;
+
+	if (time_before(jiffies, prev_jiffy + HZ) && prev_jiffy)
+		return;
+	prev_jiffy = jiffies;
+
+	pr_err("BUG: assuming non migratable context at %s:%d\n", file, line);
+	pr_err("in_atomic(): %d, irqs_disabled(): %d, migration_disabled() %u pid: %d, name: %s\n",
+	       in_atomic(), irqs_disabled(), is_migration_disabled(current),
+	       current->pid, current->comm);
+
+	debug_show_held_locks(current);
+	dump_stack();
+	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
+}
+EXPORT_SYMBOL_GPL(__cant_migrate);
+#endif
 #endif
 
 #ifdef CONFIG_MAGIC_SYSRQ
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1056,7 +1056,7 @@  struct rq {
 	struct cpuidle_state	*idle_state;
 #endif
 
-#if defined(CONFIG_PREEMPT_RT) && defined(CONFIG_SMP)
+#ifdef CONFIG_SMP
 	unsigned int		nr_pinned;
 #endif
 	unsigned int		push_busy;
@@ -1092,7 +1092,7 @@  static inline int cpu_of(struct rq *rq)
 
 static inline bool is_migration_disabled(struct task_struct *p)
 {
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+#ifdef CONFIG_SMP
 	return p->migration_disabled;
 #else
 	return false;
--- a/lib/smp_processor_id.c
+++ b/lib/smp_processor_id.c
@@ -26,7 +26,7 @@  unsigned int check_preemption_disabled(c
 	if (current->nr_cpus_allowed == 1)
 		goto out;
 
-#if defined(CONFIG_SMP) && defined(CONFIG_PREEMPT_RT)
+#ifdef CONFIG_SMP
 	if (current->migration_disabled)
 		goto out;
 #endif