diff mbox series

mm, oom: Always call tlb_finish_mmu().

Message ID 1535023848-5554-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp (mailing list archive)
State New, archived
Headers show
Series mm, oom: Always call tlb_finish_mmu(). | expand

Commit Message

Tetsuo Handa Aug. 23, 2018, 11:30 a.m. UTC
Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
know whether tlb_flush_pending imbalance causes problems other than
extra cost, but at least it looks strange.

More worrisome part in that patch is that I don't know whether using
trylock if blockable == false at entry is really sufficient. For example,
since __gnttab_unmap_refs_async() from gnttab_unmap_refs_async() from
gnttab_unmap_refs_sync() from __unmap_grant_pages() from
unmap_grant_pages() from unmap_if_in_range() from mn_invl_range_start()
involves schedule_delayed_work() which could be blocked on memory
allocation under OOM situation, wait_for_completion() from
gnttab_unmap_refs_sync() might deadlock? I don't know...

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 mm/oom_kill.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Tetsuo Handa Aug. 23, 2018, 1:48 p.m. UTC | #1
On 2018/08/23 20:59, Michal Hocko wrote:
> On Thu 23-08-18 20:30:48, Tetsuo Handa wrote:
>> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
>> notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
>> know whether tlb_flush_pending imbalance causes problems other than
>> extra cost, but at least it looks strange.
> 
> tlb_flush_pending has mm scope and it would confuse
> mm_tlb_flush_pending. At least ptep_clear_flush could get confused and
> flush unnecessarily for prot_none entries AFAICS. Other paths shouldn't
> trigger for oom victims. Even ptep_clear_flush is unlikely to happen.
> So nothing really earth shattering but I do agree that it looks weird
> and should be fixed.

OK. But what is the reason we call tlb_gather_mmu() before
mmu_notifier_invalidate_range_start_nonblock() ?
I want that the fix explains why we can't do

-			tlb_gather_mmu(&tlb, mm, start, end);
 			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
 				ret = false;
 				continue;
 			}
+			tlb_gather_mmu(&tlb, mm, start, end);

instead.
Michal Hocko Aug. 23, 2018, 2:02 p.m. UTC | #2
On Thu 23-08-18 22:48:22, Tetsuo Handa wrote:
> On 2018/08/23 20:59, Michal Hocko wrote:
> > On Thu 23-08-18 20:30:48, Tetsuo Handa wrote:
> >> Commit 93065ac753e44438 ("mm, oom: distinguish blockable mode for mmu
> >> notifiers") added "continue;" without calling tlb_finish_mmu(). I don't
> >> know whether tlb_flush_pending imbalance causes problems other than
> >> extra cost, but at least it looks strange.
> > 
> > tlb_flush_pending has mm scope and it would confuse
> > mm_tlb_flush_pending. At least ptep_clear_flush could get confused and
> > flush unnecessarily for prot_none entries AFAICS. Other paths shouldn't
> > trigger for oom victims. Even ptep_clear_flush is unlikely to happen.
> > So nothing really earth shattering but I do agree that it looks weird
> > and should be fixed.
> 
> OK. But what is the reason we call tlb_gather_mmu() before
> mmu_notifier_invalidate_range_start_nonblock() ?
> I want that the fix explains why we can't do
> 
> -			tlb_gather_mmu(&tlb, mm, start, end);
>  			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
>  				ret = false;
>  				continue;
>  			}
> +			tlb_gather_mmu(&tlb, mm, start, end);

This should be indeed doable because mmu notifiers have no way to know
about tlb_gather. I have no idea why we used to have tlb_gather_mmu like
that before. Most probably a C&P from munmap path where it didn't make
any difference either. A quick check shows that tlb_flush_pending is the
only mm scope thing and none of the notifiers really depend on it.

I would be calmer if both paths were in sync in that regards. So I think
it would be better to go with your previous version first. Maybe it
makes sense to switch the order but I do not really see a huge win for
doing so.
diff mbox series

Patch

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index b5b25e4..4f431c1 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -522,6 +522,7 @@  bool __oom_reap_task_mm(struct mm_struct *mm)
 
 			tlb_gather_mmu(&tlb, mm, start, end);
 			if (mmu_notifier_invalidate_range_start_nonblock(mm, start, end)) {
+				tlb_finish_mmu(&tlb, start, end);
 				ret = false;
 				continue;
 			}