From patchwork Mon Jul 6 07:23:29 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Nicholas Piggin X-Patchwork-Id: 11645109 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0367D13B4 for ; Mon, 6 Jul 2020 07:23:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B769D2075B for ; Mon, 6 Jul 2020 07:23:38 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="hH90ovSY" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B769D2075B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B905E6B0006; Mon, 6 Jul 2020 03:23:37 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B40C66B0007; Mon, 6 Jul 2020 03:23:37 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2F2D6B0008; Mon, 6 Jul 2020 03:23:37 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 88F826B0006 for ; Mon, 6 Jul 2020 03:23:37 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 077C7181AC9CB for ; Mon, 6 Jul 2020 07:23:37 +0000 (UTC) X-FDA: 77006810874.24.corn12_411594726eaa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id CA6031A4A0 for ; Mon, 6 Jul 2020 07:23:36 +0000 (UTC) X-Spam-Summary: 1,0,0,6fa898410a665b39,d41d8cd98f00b204,npiggin@gmail.com,,RULES_HIT:2:41:152:355:379:966:973:988:989:1260:1277:1311:1313:1314:1345:1437:1513:1515:1516:1518:1521:1535:1593:1594:1605:1730:1747:1777:1792:1801:2194:2196:2198:2199:2200:2201:2393:2559:2562:2693:3138:3139:3140:3141:3142:3165:3622:3865:3866:3867:3868:3870:3871:3874:4049:4120:4250:4321:4362:4385:4605:5007:6119:6120:6261:6653:7901:7903:9413:10004:11026:11473:11657:11658:11914:12043:12296:12297:12438:12517:12519:12555:12986:14096:14097:14659:14687:21080:21433:21444:21451:21611:21627:21666:21740:21795:21990:30051:30054:30056:30070:30089,0,RBL:209.85.215.196:@gmail.com:.lbl8.mailshell.net-62.50.0.100 66.100.201.100;04yfk7tgnau8r1bbxdkbsqe9o9ft9ocekfwj1at7n7fyj78joxzu4wmog7rog1d.baxaqk4jx96sgmwhsxn1ckoqoa1gs8sjzcae4ka6c8byy61yjhucz8uu3seatoj.a-lbl8.mailshell.net-223.238.255.100,CacheIP:none,Bayesian:0.5,0.5,0.5,Netcheck:none,DomainCache:0,MSF:not bulk,SPF:fp,MSBL:0,DNSBL:none,Custom_rules:0:0:0,LFtime:23,L UA_SUMMA X-HE-Tag: corn12_411594726eaa X-Filterd-Recvd-Size: 9223 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Mon, 6 Jul 2020 07:23:36 +0000 (UTC) Received: by mail-pg1-f196.google.com with SMTP id z5so17971264pgb.6 for ; Mon, 06 Jul 2020 00:23:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=date:from:subject:to:cc:mime-version:message-id :content-transfer-encoding; bh=/n7qmna+81iqwNn7K7nxIBKAMee6aaPGWP4cNzBkYLk=; b=hH90ovSYTNc2CcHvuW1P2E7SRzuhUGci7vhQD+L7r9PjTiFOMTmRFOtm+sfl6vOCEE oyV/eY/hvGINE1SyqjY68tFRe2W4xPataCSesjUKHYNIuaIfwleuAaEU8/ISxxLo+QpI 1tf+aTe+REo2ejlC/YIEx05OMUzyRHkFYa2FHdKtPY0j1dHab/RzKmM5QELr6jaSAtL3 fuHs5h0YL1CXNUOoloL+RK7hVHZ/NfeBfzRFnXRq+/dVCpxty/jgCsEStKta6Nj8PbiE qH202WWbgqdjRtQr8I+QfuDIhf1Pff2k944SNJUaHcWop3CIf750zLo/Ax4mfxi7Upv8 HQ6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:subject:to:cc:mime-version:message-id :content-transfer-encoding; bh=/n7qmna+81iqwNn7K7nxIBKAMee6aaPGWP4cNzBkYLk=; b=NVZOgQin8EyQISLBaEd+9ur044f9YKgQSq5tQEVHMcC5GqOZ+uTureFBLJU2se1CHr mOba2mxqPSBUQQFczIiQtGBMrV7fdtLT11Vfs883Zr2ElU7kof1nDaAmZWPWdTZZixnD RRrcEjHkQBAe7hhb9eUGOLJN/XlkHFxEe259qy68CknhrBxDL0fLyDSWuvzc1TsJAgtC pl9PS/9U7wBNT2r/PsRQkVqbpG+dSmi7gW8B6XZCHVwbgkcpwjO3MOux4/ip0Hk5slaP 8i2MXyDf7f7k4nnLAVbKysL+y3KfG2E9VfLiJqPcL7PbfSurKDL7H0TOLHUjbsp0nW8a HVcA== X-Gm-Message-State: AOAM533MyqPg9T3Le+ovLdgqGyOB22KciUiki5MGWRAP8snv+ZJVOwGd xEWi7J34hwXup4cVB2/UdiFuVQzP X-Google-Smtp-Source: ABdhPJwl+PnyQNGtGUDDASWiyBV3TkmhGfL/yWOjH9QczG29EW+OMQfWP33zlFtb3PwPLEvJLpk2Nw== X-Received: by 2002:a65:5c88:: with SMTP id a8mr4464661pgt.215.1594020215189; Mon, 06 Jul 2020 00:23:35 -0700 (PDT) Received: from localhost (61-68-186-125.tpgi.com.au. [61.68.186.125]) by smtp.gmail.com with ESMTPSA id e128sm18527158pfe.196.2020.07.06.00.23.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Jul 2020 00:23:34 -0700 (PDT) Date: Mon, 06 Jul 2020 17:23:29 +1000 From: Nicholas Piggin Subject: [RFC][PATCH] avoid refcounting the lazy tlb mm struct To: linux-mm@kvack.org Cc: linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, Anton Blanchard MIME-Version: 1.0 Message-Id: <1594019787.286knc5cet.astroid@bobo.none> X-Rspamd-Queue-Id: CA6031A4A0 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On big systems, the mm refcount can become highly contented when doing a lot of context switching with threaded applications (particularly switching between the idle thread and an application thread). Not doing lazy tlb at all slows switching down quite a bit, so I wonder if we can avoid the refcount for the lazy tlb, but have __mmdrop() IPI all CPUs that might be using this mm lazily. This patch has only had light testing so far, but seems to work okay. Thanks, Nick Tested-by: Anton Blanchard diff --git a/arch/Kconfig b/arch/Kconfig index 8cc35dc556c7..69ea7172db3d 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -411,6 +411,16 @@ config MMU_GATHER_NO_GATHER bool depends on MMU_GATHER_TABLE_FREE +config MMU_LAZY_TLB_SHOOTDOWN + bool + help + Instead of refcounting the "lazy tlb" mm struct, which can cause + contention with multi-threaded apps on large multiprocessor systems, + this option causes __mmdrop to IPI all CPUs in the mm_cpumask and + switch to init_mm if they were using the to-be-freed mm as the lazy + tlb. Architectures which do not track all possible lazy tlb CPUs in + mm_cpumask can not use this (without modification). + config ARCH_HAVE_NMI_SAFE_CMPXCHG bool diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 920c4e3ca4ef..24ac85c868db 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -225,6 +225,7 @@ config PPC select HAVE_PERF_USER_STACK_DUMP select MMU_GATHER_RCU_TABLE_FREE select MMU_GATHER_PAGE_SIZE + select MMU_LAZY_TLB_SHOOTDOWN select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC_BOOK3S_64 && CPU_LITTLE_ENDIAN select HAVE_SYSCALL_TRACEPOINTS diff --git a/arch/powerpc/mm/book3s64/radix_tlb.c b/arch/powerpc/mm/book3s64/radix_tlb.c index b5cc9b23cf02..52730629b3eb 100644 --- a/arch/powerpc/mm/book3s64/radix_tlb.c +++ b/arch/powerpc/mm/book3s64/radix_tlb.c @@ -652,10 +652,10 @@ static void do_exit_flush_lazy_tlb(void *arg) * Must be a kernel thread because sender is single-threaded. */ BUG_ON(current->mm); - mmgrab(&init_mm); + mmgrab_lazy_tlb(&init_mm); switch_mm(mm, &init_mm, current); current->active_mm = &init_mm; - mmdrop(mm); + mmdrop_lazy_tlb(mm); } _tlbiel_pid(pid, RIC_FLUSH_ALL); } diff --git a/fs/exec.c b/fs/exec.c index e6e8a9a70327..6c96c8feba1f 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1119,7 +1119,7 @@ static int exec_mmap(struct mm_struct *mm) mmput(old_mm); return 0; } - mmdrop(active_mm); + mmdrop_lazy_tlb(active_mm); return 0; } diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index 480a4d1b7dd8..ef28059086a1 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -51,6 +51,25 @@ static inline void mmdrop(struct mm_struct *mm) void mmdrop(struct mm_struct *mm); +static inline void mmgrab_lazy_tlb(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + mmgrab(mm); +} + +static inline void mmdrop_lazy_tlb(struct mm_struct *mm) +{ + if (!IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + mmdrop(mm); +} + +static inline void mmdrop_lazy_tlb_smp_mb(struct mm_struct *mm) +{ + mmdrop_lazy_tlb(mm); + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) + smp_mb(); +} + /* * This has to be called after a get_task_mm()/mmget_not_zero() * followed by taking the mmap_lock for writing before modifying the diff --git a/kernel/fork.c b/kernel/fork.c index 142b23645d82..e3f1039cee9f 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -685,6 +685,34 @@ static void check_mm(struct mm_struct *mm) #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) +static void do_shoot_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + if (current->active_mm == mm) { + BUG_ON(current->mm); + switch_mm(mm, &init_mm, current); + current->active_mm = &init_mm; + } +} + +static void do_check_lazy_tlb(void *arg) +{ + struct mm_struct *mm = arg; + + BUG_ON(current->active_mm == mm); +} + +void shoot_lazy_tlbs(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_MMU_LAZY_TLB_SHOOTDOWN)) { + smp_call_function_many(mm_cpumask(mm), do_shoot_lazy_tlb, (void *)mm, 1); + do_shoot_lazy_tlb(mm); + } + smp_call_function(do_check_lazy_tlb, (void *)mm, 1); + do_check_lazy_tlb(mm); +} + /* * Called when the last reference to the mm * is dropped: either by a lazy thread or by @@ -692,6 +720,7 @@ static void check_mm(struct mm_struct *mm) */ void __mmdrop(struct mm_struct *mm) { + shoot_lazy_tlbs(mm); BUG_ON(mm == &init_mm); WARN_ON_ONCE(mm == current->mm); WARN_ON_ONCE(mm == current->active_mm); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index ca5db40392d4..4d615e0be9e0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3308,7 +3308,7 @@ static struct rq *finish_task_switch(struct task_struct *prev) */ if (mm) { membarrier_mm_sync_core_before_usermode(mm); - mmdrop(mm); + mmdrop_lazy_tlb_smp_mb(mm); } if (unlikely(prev_state == TASK_DEAD)) { if (prev->sched_class->task_dead) @@ -3413,9 +3413,9 @@ context_switch(struct rq *rq, struct task_struct *prev, /* * kernel -> kernel lazy + transfer active - * user -> kernel lazy + mmgrab() active + * user -> kernel lazy + mmgrab_lazy_tlb() active * - * kernel -> user switch + mmdrop() active + * kernel -> user switch + mmdrop_lazy_tlb() active * user -> user switch */ if (!next->mm) { // to kernel @@ -3423,7 +3423,7 @@ context_switch(struct rq *rq, struct task_struct *prev, next->active_mm = prev->active_mm; if (prev->mm) // from user - mmgrab(prev->active_mm); + mmgrab_lazy_tlb(prev->active_mm); else prev->active_mm = NULL; } else { // to user @@ -3439,7 +3439,7 @@ context_switch(struct rq *rq, struct task_struct *prev, switch_mm_irqs_off(prev->active_mm, next->mm, next); if (!prev->mm) { // from kernel - /* will mmdrop() in finish_task_switch(). */ + /* will mmdrop_lazy_tlb() in finish_task_switch(). */ rq->prev_mm = prev->active_mm; prev->active_mm = NULL; }