diff mbox series

[1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL

Message ID 20200624080248.3701-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series [1/2] mm/mmu_notifier: Mark up direct reclaim paths with MAYFAIL | expand

Commit Message

Chris Wilson June 24, 2020, 8:02 a.m. UTC
When direct reclaim enters the shrinker and tries to reclaim pages, it
has to opportunitically unmap them [try_to_unmap_one]. For direct
reclaim, the calling context is unknown and may include attempts to
unmap one page of a dma object while attempting to allocate more pages
for that object. Pass the information along that we are inside an
opportunistic unmap that can allow that page to remain referenced and
mapped, and let the callback opt in to avoiding a recursive wait.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Andrew Morton <akpm@linux-foundation.org>
CC: Jason Gunthorpe <jgg@ziepe.ca>
---
 include/linux/mmu_notifier.h | 15 ++++++++++++++-
 mm/mmu_notifier.c            |  3 +++
 mm/rmap.c                    |  5 +++--
 3 files changed, 20 insertions(+), 3 deletions(-)

Comments

Jason Gunthorpe June 24, 2020, 12:10 p.m. UTC | #1
On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> When direct reclaim enters the shrinker and tries to reclaim pages, it
> has to opportunitically unmap them [try_to_unmap_one]. For direct
> reclaim, the calling context is unknown and may include attempts to
> unmap one page of a dma object while attempting to allocate more pages
> for that object. Pass the information along that we are inside an
> opportunistic unmap that can allow that page to remain referenced and
> mapped, and let the callback opt in to avoiding a recursive wait.

i915 should already not be holding locks shared with the notifiers
across allocations that can trigger reclaim. This is already required
to use notifiers correctly anyhow - why do we need something in the
notifiers?

I really don't like this patch, the purpose of notifiers is only to
*track changes* not to influence policy of the callers.

Jason
Chris Wilson June 24, 2020, 12:21 p.m. UTC | #2
Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > reclaim, the calling context is unknown and may include attempts to
> > unmap one page of a dma object while attempting to allocate more pages
> > for that object. Pass the information along that we are inside an
> > opportunistic unmap that can allow that page to remain referenced and
> > mapped, and let the callback opt in to avoiding a recursive wait.
> 
> i915 should already not be holding locks shared with the notifiers
> across allocations that can trigger reclaim. This is already required
> to use notifiers correctly anyhow - why do we need something in the
> notifiers?

for (n = 0; n < num_pages; n++)
	pin_user_page()

may call try_to_unmap_page from the lru shrinker for [0, n-1].

We're in the middle of allocating the object, how are we best to untangle
that?

Or during allocation of something that is using the pages pinned by that
object, how are we best to not to shrink that object as there is a
mutual dependency?
-Chris
Jason Gunthorpe June 24, 2020, 12:39 p.m. UTC | #3
On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > reclaim, the calling context is unknown and may include attempts to
> > > unmap one page of a dma object while attempting to allocate more pages
> > > for that object. Pass the information along that we are inside an
> > > opportunistic unmap that can allow that page to remain referenced and
> > > mapped, and let the callback opt in to avoiding a recursive wait.
> > 
> > i915 should already not be holding locks shared with the notifiers
> > across allocations that can trigger reclaim. This is already required
> > to use notifiers correctly anyhow - why do we need something in the
> > notifiers?
> 
> for (n = 0; n < num_pages; n++)
> 	pin_user_page()
> 
> may call try_to_unmap_page from the lru shrinker for [0, n-1].

Yes, of course you can't hold any locks that intersect with notifiers
across pin_user_page()/get_user_page()

It has always been that way.

I consolidated all this tricky locking into interval notifiers, maybe
updating i915 to use them will give it a solution. I looked at it
once, it was straightforward enough until it got to all the #ifdefery

> We're in the middle of allocating the object, how are we best to untangle
> that?

I don't know anything about i915, but this is clearly i915 not using
notifiers properly, it needs proper fixing, not hacking up notifiers.

Jason
Chris Wilson June 24, 2020, 2:12 p.m. UTC | #4
Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > reclaim, the calling context is unknown and may include attempts to
> > > > unmap one page of a dma object while attempting to allocate more pages
> > > > for that object. Pass the information along that we are inside an
> > > > opportunistic unmap that can allow that page to remain referenced and
> > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > 
> > > i915 should already not be holding locks shared with the notifiers
> > > across allocations that can trigger reclaim. This is already required
> > > to use notifiers correctly anyhow - why do we need something in the
> > > notifiers?
> > 
> > for (n = 0; n < num_pages; n++)
> >       pin_user_page()
> > 
> > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> 
> Yes, of course you can't hold any locks that intersect with notifiers
> across pin_user_page()/get_user_page()

What lock though? It's just the page refcount, shrinker asks us to drop
it [via mmu], we reply we would like to keep using that page as freeing
it for the current allocation is "robbing Peter to pay Paul".
-Chris
Jason Gunthorpe June 24, 2020, 2:16 p.m. UTC | #5
On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > for that object. Pass the information along that we are inside an
> > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > 
> > > > i915 should already not be holding locks shared with the notifiers
> > > > across allocations that can trigger reclaim. This is already required
> > > > to use notifiers correctly anyhow - why do we need something in the
> > > > notifiers?
> > > 
> > > for (n = 0; n < num_pages; n++)
> > >       pin_user_page()
> > > 
> > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > 
> > Yes, of course you can't hold any locks that intersect with notifiers
> > across pin_user_page()/get_user_page()
> 
> What lock though? It's just the page refcount, shrinker asks us to drop
> it [via mmu], we reply we would like to keep using that page as freeing
> it for the current allocation is "robbing Peter to pay Paul".

Maybe I'm unclear what this series is actually trying to fix? 

You said "avoiding a recursive wait" which sounds like some locking
deadlock to me.

Still, again, notifiers are for tracking, not for influencing MM policy.

Jason
Chris Wilson June 24, 2020, 2:21 p.m. UTC | #6
Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > for that object. Pass the information along that we are inside an
> > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > 
> > > > > i915 should already not be holding locks shared with the notifiers
> > > > > across allocations that can trigger reclaim. This is already required
> > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > notifiers?
> > > > 
> > > > for (n = 0; n < num_pages; n++)
> > > >       pin_user_page()
> > > > 
> > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > 
> > > Yes, of course you can't hold any locks that intersect with notifiers
> > > across pin_user_page()/get_user_page()
> > 
> > What lock though? It's just the page refcount, shrinker asks us to drop
> > it [via mmu], we reply we would like to keep using that page as freeing
> > it for the current allocation is "robbing Peter to pay Paul".
> 
> Maybe I'm unclear what this series is actually trying to fix? 
> 
> You said "avoiding a recursive wait" which sounds like some locking
> deadlock to me.

It's the shrinker being called while we are allocating for/on behalf of
the object. As we are actively using the object, we don't want to free
it -- the partial object allocation being the clearest, if the object
consists of 2 pages, trying to free page 0 in order to allocate page 1
has to fail (and the shrinker should find another candidate to reclaim,
or fail the allocation).
-Chris
Jason Gunthorpe June 24, 2020, 2:25 p.m. UTC | #7
On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > 
> > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > notifiers?
> > > > > 
> > > > > for (n = 0; n < num_pages; n++)
> > > > >       pin_user_page()
> > > > > 
> > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > 
> > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > across pin_user_page()/get_user_page()
> > > 
> > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > it [via mmu], we reply we would like to keep using that page as freeing
> > > it for the current allocation is "robbing Peter to pay Paul".
> > 
> > Maybe I'm unclear what this series is actually trying to fix? 
> > 
> > You said "avoiding a recursive wait" which sounds like some locking
> > deadlock to me.
> 
> It's the shrinker being called while we are allocating for/on behalf of
> the object. As we are actively using the object, we don't want to free
> it -- the partial object allocation being the clearest, if the object
> consists of 2 pages, trying to free page 0 in order to allocate page 1
> has to fail (and the shrinker should find another candidate to reclaim,
> or fail the allocation).

mmu notifiers are not for influencing policy of the mm.

Jason
Chris Wilson June 24, 2020, 2:37 p.m. UTC | #8
Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > 
> > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > notifiers?
> > > > > > 
> > > > > > for (n = 0; n < num_pages; n++)
> > > > > >       pin_user_page()
> > > > > > 
> > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > 
> > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > across pin_user_page()/get_user_page()
> > > > 
> > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > it for the current allocation is "robbing Peter to pay Paul".
> > > 
> > > Maybe I'm unclear what this series is actually trying to fix? 
> > > 
> > > You said "avoiding a recursive wait" which sounds like some locking
> > > deadlock to me.
> > 
> > It's the shrinker being called while we are allocating for/on behalf of
> > the object. As we are actively using the object, we don't want to free
> > it -- the partial object allocation being the clearest, if the object
> > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > has to fail (and the shrinker should find another candidate to reclaim,
> > or fail the allocation).
> 
> mmu notifiers are not for influencing policy of the mm.

It's policy is "this may fail" regardless of the mmu notifier at this
point. That is not changed.

Your suggestion is that we move the pages to the unevictable mapping so
that the shrinker LRU is never invoked on pages we have grabbed with
pin_user_page. Does that work with the rest of the mmu notifiers?
-Chris
Jason Gunthorpe June 24, 2020, 4:50 p.m. UTC | #9
On Wed, Jun 24, 2020 at 03:37:32PM +0100, Chris Wilson wrote:
> Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> > On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > > 
> > > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > > notifiers?
> > > > > > > 
> > > > > > > for (n = 0; n < num_pages; n++)
> > > > > > >       pin_user_page()
> > > > > > > 
> > > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > > 
> > > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > > across pin_user_page()/get_user_page()
> > > > > 
> > > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > > it for the current allocation is "robbing Peter to pay Paul".
> > > > 
> > > > Maybe I'm unclear what this series is actually trying to fix? 
> > > > 
> > > > You said "avoiding a recursive wait" which sounds like some locking
> > > > deadlock to me.
> > > 
> > > It's the shrinker being called while we are allocating for/on behalf of
> > > the object. As we are actively using the object, we don't want to free
> > > it -- the partial object allocation being the clearest, if the object
> > > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > > has to fail (and the shrinker should find another candidate to reclaim,
> > > or fail the allocation).
> > 
> > mmu notifiers are not for influencing policy of the mm.
> 
> It's policy is "this may fail" regardless of the mmu notifier at this
> point. That is not changed.

MMU notifiers are for tracking updates, they are not allowed to fail.
The one slightly weird case of non-blocking is the only exception.

> Your suggestion is that we move the pages to the unevictable mapping so
> that the shrinker LRU is never invoked on pages we have grabbed with
> pin_user_page. Does that work with the rest of the mmu notifiers?

That is beyond what I'm familiar with - but generally - if you want to
influence decisions the MM is making then it needs to be at the
front of the process and not inside notifiers. 

So what you describe seems broadly appropriate to me.

I'm still a little unclear on what you are trying to fix - pinned
pages are definitely not freed, do you have some case where pages
which are pinned are being cleaned out from the MM despite being
pinned? Sounds a bit strange, maybe that is worth adressing directly?

Jason
Chris Wilson June 24, 2020, 5:58 p.m. UTC | #10
Quoting Jason Gunthorpe (2020-06-24 17:50:57)
> On Wed, Jun 24, 2020 at 03:37:32PM +0100, Chris Wilson wrote:
> > Quoting Jason Gunthorpe (2020-06-24 15:25:44)
> > > On Wed, Jun 24, 2020 at 03:21:49PM +0100, Chris Wilson wrote:
> > > > Quoting Jason Gunthorpe (2020-06-24 15:16:04)
> > > > > On Wed, Jun 24, 2020 at 03:12:42PM +0100, Chris Wilson wrote:
> > > > > > Quoting Jason Gunthorpe (2020-06-24 13:39:10)
> > > > > > > On Wed, Jun 24, 2020 at 01:21:03PM +0100, Chris Wilson wrote:
> > > > > > > > Quoting Jason Gunthorpe (2020-06-24 13:10:53)
> > > > > > > > > On Wed, Jun 24, 2020 at 09:02:47AM +0100, Chris Wilson wrote:
> > > > > > > > > > When direct reclaim enters the shrinker and tries to reclaim pages, it
> > > > > > > > > > has to opportunitically unmap them [try_to_unmap_one]. For direct
> > > > > > > > > > reclaim, the calling context is unknown and may include attempts to
> > > > > > > > > > unmap one page of a dma object while attempting to allocate more pages
> > > > > > > > > > for that object. Pass the information along that we are inside an
> > > > > > > > > > opportunistic unmap that can allow that page to remain referenced and
> > > > > > > > > > mapped, and let the callback opt in to avoiding a recursive wait.
> > > > > > > > > 
> > > > > > > > > i915 should already not be holding locks shared with the notifiers
> > > > > > > > > across allocations that can trigger reclaim. This is already required
> > > > > > > > > to use notifiers correctly anyhow - why do we need something in the
> > > > > > > > > notifiers?
> > > > > > > > 
> > > > > > > > for (n = 0; n < num_pages; n++)
> > > > > > > >       pin_user_page()
> > > > > > > > 
> > > > > > > > may call try_to_unmap_page from the lru shrinker for [0, n-1].
> > > > > > > 
> > > > > > > Yes, of course you can't hold any locks that intersect with notifiers
> > > > > > > across pin_user_page()/get_user_page()
> > > > > > 
> > > > > > What lock though? It's just the page refcount, shrinker asks us to drop
> > > > > > it [via mmu], we reply we would like to keep using that page as freeing
> > > > > > it for the current allocation is "robbing Peter to pay Paul".
> > > > > 
> > > > > Maybe I'm unclear what this series is actually trying to fix? 
> > > > > 
> > > > > You said "avoiding a recursive wait" which sounds like some locking
> > > > > deadlock to me.
> > > > 
> > > > It's the shrinker being called while we are allocating for/on behalf of
> > > > the object. As we are actively using the object, we don't want to free
> > > > it -- the partial object allocation being the clearest, if the object
> > > > consists of 2 pages, trying to free page 0 in order to allocate page 1
> > > > has to fail (and the shrinker should find another candidate to reclaim,
> > > > or fail the allocation).
> > > 
> > > mmu notifiers are not for influencing policy of the mm.
> > 
> > It's policy is "this may fail" regardless of the mmu notifier at this
> > point. That is not changed.
> 
> MMU notifiers are for tracking updates, they are not allowed to fail.
> The one slightly weird case of non-blocking is the only exception.
> 
> > Your suggestion is that we move the pages to the unevictable mapping so
> > that the shrinker LRU is never invoked on pages we have grabbed with
> > pin_user_page. Does that work with the rest of the mmu notifiers?
> 
> That is beyond what I'm familiar with - but generally - if you want to
> influence decisions the MM is making then it needs to be at the
> front of the process and not inside notifiers. 
> 
> So what you describe seems broadly appropriate to me.

Sadly, it's a mlock_vma_page problem all over again.
 
> I'm still a little unclear on what you are trying to fix - pinned
> pages are definitely not freed, do you have some case where pages
> which are pinned are being cleaned out from the MM despite being
> pinned? Sounds a bit strange, maybe that is worth adressing directly?

It suffices to say that pin_user_pages does not prevent try_to_unmap_one
from trying to revoke the page. But we could perhaps slip a
page_maybe_dma_pinned() in around there and see what happens.
-Chris
Jason Gunthorpe June 24, 2020, 6:48 p.m. UTC | #11
On Wed, Jun 24, 2020 at 06:58:49PM +0100, Chris Wilson wrote:
> > I'm still a little unclear on what you are trying to fix - pinned
> > pages are definitely not freed, do you have some case where pages
> > which are pinned are being cleaned out from the MM despite being
> > pinned? Sounds a bit strange, maybe that is worth adressing directly?
> 
> It suffices to say that pin_user_pages does not prevent try_to_unmap_one
> from trying to revoke the page.

This doesn't sound right.. Maybe there are some odd ball cases, but in
common things like page out we absolutely cannot page out or discard
pages under DMA. This would be a significant bug.

What is the actual problem here?

Jason
diff mbox series

Patch

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index fc68f3570e19..ee1ad008951c 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -48,7 +48,8 @@  enum mmu_notifier_event {
 	MMU_NOTIFY_RELEASE,
 };
 
-#define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+#define MMU_NOTIFIER_RANGE_BLOCKABLE	BIT(0)
+#define MMU_NOTIFIER_RANGE_MAYFAIL	BIT(1)
 
 struct mmu_notifier_ops {
 	/*
@@ -169,6 +170,12 @@  struct mmu_notifier_ops {
 	 * a non-blocking behavior then the same applies to
 	 * invalidate_range_end.
 	 *
+	 * If mayfail is set then the callback may return -EAGAIN while still
+	 * holding its page references. This flag is set inside direct
+	 * reclaim paths that are opportunistically trying to unmap pages
+	 * from unknown contexts. The callback must be prepared to handle
+	 * the matching invalidate_range_end even after failing the
+	 * invalidate_range_start.
 	 */
 	int (*invalidate_range_start)(struct mmu_notifier *subscription,
 				      const struct mmu_notifier_range *range);
@@ -397,6 +404,12 @@  mmu_notifier_range_blockable(const struct mmu_notifier_range *range)
 	return (range->flags & MMU_NOTIFIER_RANGE_BLOCKABLE);
 }
 
+static inline bool
+mmu_notifier_range_mayfail(const struct mmu_notifier_range *range)
+{
+	return (range->flags & MMU_NOTIFIER_RANGE_MAYFAIL);
+}
+
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
 	if (mm_has_notifiers(mm))
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 352bb9f3ecc0..95b89cee7af4 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -493,6 +493,9 @@  static int mn_hlist_invalidate_range_start(
 			_ret = ops->invalidate_range_start(subscription, range);
 			if (!mmu_notifier_range_blockable(range))
 				non_block_end();
+			if (_ret == -EAGAIN &&
+			    mmu_notifier_range_mayfail(range))
+				_ret = 0;
 			if (_ret) {
 				pr_info("%pS callback failed with %d in %sblockable context.\n",
 					ops->invalidate_range_start, _ret,
diff --git a/mm/rmap.c b/mm/rmap.c
index 5fe2dedce1fc..912b737a3353 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1406,8 +1406,9 @@  static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	 * Note that the page can not be free in this function as call of
 	 * try_to_unmap() must hold a reference on the page.
 	 */
-	mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
-				address,
+	mmu_notifier_range_init(&range,
+				MMU_NOTIFY_CLEAR, MMU_NOTIFIER_RANGE_MAYFAIL,
+				vma, vma->vm_mm, address,
 				min(vma->vm_end, address + page_size(page)));
 	if (PageHuge(page)) {
 		/*