From patchwork Thu Oct 3 01:33:15 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Leonardo Bras X-Patchwork-Id: 11171957 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0AA7F17E1 for ; Thu, 3 Oct 2019 01:34:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BB159222BE for ; Thu, 3 Oct 2019 01:34:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB159222BE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 686086B0008; Wed, 2 Oct 2019 21:34:50 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 642B86B000C; Wed, 2 Oct 2019 21:34:50 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B0376B000A; Wed, 2 Oct 2019 21:34:50 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0249.hostedemail.com [216.40.44.249]) by kanga.kvack.org (Postfix) with ESMTP id 29B236B0007 for ; Wed, 2 Oct 2019 21:34:50 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id BEE33181AC9AE for ; Thu, 3 Oct 2019 01:34:49 +0000 (UTC) X-FDA: 76000754298.20.idea54_4732a0d63ae4c X-Spam-Summary: 2,0,0,65ec412ee81eb896,d41d8cd98f00b204,leonardo@linux.ibm.com,:linuxppc-dev@lists.ozlabs.org:linux-kernel@vger.kernel.org:kvm-ppc@vger.kernel.org:linux-arch@vger.kernel.org::leonardo@linux.ibm.com:benh@kernel.crashing.org:paulus@samba.org:mpe@ellerman.id.au:arnd@arndb.de:aneesh.kumar@linux.ibm.com:christophe.leroy@c-s.fr:npiggin@gmail.com:akpm@linux-foundation.org:mahesh@linux.vnet.ibm.com:arbab@linux.ibm.com:santosh@fossix.org:bsingharora@gmail.com:tglx@linutronix.de:gregkh@linuxfoundation.org:rppt@linux.ibm.com:allison@lohutok.net:jgg@ziepe.ca:dan.j.williams@intel.com:vbabka@suse.cz:cl@linux.com:logang@deltatee.com:aryabinin@virtuozzo.com:adobriyan@gmail.com:jrdr.linux@gmail.com:mathieu.desnoyers@efficios.com:rcampbell@nvidia.com:brouer@redhat.com:jannh@google.com:dave@stgolabs.net:peterz@infradead.org:mingo@kernel.org:christian.brauner@ubuntu.com:mhocko@suse.com:elena.reshetova@intel.com:guro@fb.com:aarcange@redhat.com:viro@zeniv.linux.org.uk:ldv@altlinux. org:jgli X-HE-Tag: idea54_4732a0d63ae4c X-Filterd-Recvd-Size: 10743 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Oct 2019 01:34:49 +0000 (UTC) Received: from pps.filterd (m0098416.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.0.27/8.16.0.27) with SMTP id x931WaSQ155162; Wed, 2 Oct 2019 21:34:08 -0400 Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 2vd2vm6gk6-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Oct 2019 21:34:07 -0400 Received: from m0098416.ppops.net (m0098416.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.27/8.16.0.27) with SMTP id x931WcI2155383; Wed, 2 Oct 2019 21:34:07 -0400 Received: from ppma02wdc.us.ibm.com (aa.5b.37a9.ip4.static.sl-reverse.com [169.55.91.170]) by mx0b-001b2d01.pphosted.com with ESMTP id 2vd2vm6ghr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 02 Oct 2019 21:34:07 -0400 Received: from pps.filterd (ppma02wdc.us.ibm.com [127.0.0.1]) by ppma02wdc.us.ibm.com (8.16.0.27/8.16.0.27) with SMTP id x931RCvj027815; Thu, 3 Oct 2019 01:34:05 GMT Received: from b03cxnp07028.gho.boulder.ibm.com (b03cxnp07028.gho.boulder.ibm.com [9.17.130.15]) by ppma02wdc.us.ibm.com with ESMTP id 2v9y58ggbb-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 03 Oct 2019 01:34:05 +0000 Received: from b03ledav004.gho.boulder.ibm.com (b03ledav004.gho.boulder.ibm.com [9.17.130.235]) by b03cxnp07028.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id x931Y3CZ39059904 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 3 Oct 2019 01:34:04 GMT Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id D380E7805C; Thu, 3 Oct 2019 01:34:03 +0000 (GMT) Received: from b03ledav004.gho.boulder.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 7A41C7805E; Thu, 3 Oct 2019 01:33:51 +0000 (GMT) Received: from LeoBras.ibmuc.com (unknown [9.85.174.224]) by b03ledav004.gho.boulder.ibm.com (Postfix) with ESMTP; Thu, 3 Oct 2019 01:33:51 +0000 (GMT) From: Leonardo Bras To: linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org, kvm-ppc@vger.kernel.org, linux-arch@vger.kernel.org, linux-mm@kvack.org Cc: Leonardo Bras , Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , Arnd Bergmann , "Aneesh Kumar K.V" , Christophe Leroy , Nicholas Piggin , Andrew Morton , Mahesh Salgaonkar , Reza Arbab , Santosh Sivaraj , Balbir Singh , Thomas Gleixner , Greg Kroah-Hartman , Mike Rapoport , Allison Randal , Jason Gunthorpe , Dan Williams , Vlastimil Babka , Christoph Lameter , Logan Gunthorpe , Andrey Ryabinin , Alexey Dobriyan , Souptick Joarder , Mathieu Desnoyers , Ralph Campbell , Jesper Dangaard Brouer , Jann Horn , Davidlohr Bueso , "Peter Zijlstra (Intel)" , Ingo Molnar , Christian Brauner , Michal Hocko , Elena Reshetova , Roman Gushchin , Andrea Arcangeli , Al Viro , "Dmitry V. Levin" , =?utf-8?b?SsOpcsO0bWUgR2xpc3Nl?= , Song Liu , Bartlomiej Zolnierkiewicz , Ira Weiny , "Kirill A. Shutemov" , John Hubbard , Keith Busch Subject: [PATCH v5 01/11] asm-generic/pgtable: Adds generic functions to monitor lockless pgtable walks Date: Wed, 2 Oct 2019 22:33:15 -0300 Message-Id: <20191003013325.2614-2-leonardo@linux.ibm.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20191003013325.2614-1-leonardo@linux.ibm.com> References: <20191003013325.2614-1-leonardo@linux.ibm.com> MIME-Version: 1.0 X-TM-AS-GCONF: 00 X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10434:,, definitions=2019-10-03_01:,, signatures=0 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 priorityscore=1501 malwarescore=0 suspectscore=0 phishscore=0 bulkscore=0 spamscore=0 clxscore=1015 lowpriorityscore=0 mlxscore=0 impostorscore=0 mlxlogscore=667 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1908290000 definitions=main-1910030012 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It's necessary to monitor lockless pagetable walks, in order to avoid doing THP splitting/collapsing during them. Some methods rely on irq enable/disable, but that can be slow on cases with a lot of cpus are used for the process, given all these cpus have to run a IPI. In order to speedup some cases, I propose a refcount-based approach, that counts the number of lockless pagetable walks happening on the process. If this count is zero, it skips the irq-oriented method. Given that there are lockless pagetable walks on generic code, it's necessary to create documented generic functions that may be enough for most archs and but let open to arch-specific implemenations. This method does not exclude the current irq-oriented method. It works as a complement to skip unnecessary waiting. begin_lockless_pgtbl_walk(mm) Insert before starting any lockless pgtable walk end_lockless_pgtbl_walk(mm) Insert after the end of any lockless pgtable walk (Mostly after the ptep is last used) running_lockless_pgtbl_walk(mm) Returns the number of lockless pgtable walks running While there is no config option, the method is disabled and these functions are only doing what was already needed to lockless pagetable walks (disabling interrupt). A memory barrier was also added just to make sure there is no speculative read outside the interrupt disabled area. Signed-off-by: Leonardo Bras --- include/asm-generic/pgtable.h | 58 +++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 11 +++++++ kernel/fork.c | 3 ++ 3 files changed, 72 insertions(+) diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h index 818691846c90..3043ea9812d5 100644 --- a/include/asm-generic/pgtable.h +++ b/include/asm-generic/pgtable.h @@ -1171,6 +1171,64 @@ static inline bool arch_has_pfn_modify_check(void) #endif #endif +#ifndef __HAVE_ARCH_LOCKLESS_PGTBL_WALK_CONTROL +static inline unsigned long begin_lockless_pgtbl_walk(struct mm_struct *mm) +{ + unsigned long irq_mask; + + if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) + atomic_inc(&mm->lockless_pgtbl_walkers); + + /* + * Interrupts must be disabled during the lockless page table walk. + * That's because the deleting or splitting involves flushing TLBs, + * which in turn issues interrupts, that will block when disabled. + */ + local_irq_save(irq_mask); + + /* + * This memory barrier pairs with any code that is either trying to + * delete page tables, or split huge pages. Without this barrier, + * the page tables could be read speculatively outside of interrupt + * disabling. + */ + smp_mb(); + + return irq_mask; +} + +static inline void end_lockless_pgtbl_walk(struct mm_struct *mm, + unsigned long irq_mask) +{ + /* + * This memory barrier pairs with any code that is either trying to + * delete page tables, or split huge pages. Without this barrier, + * the page tables could be read speculatively outside of interrupt + * disabling. + */ + smp_mb(); + + /* + * Interrupts must be disabled during the lockless page table walk. + * That's because the deleting or splitting involves flushing TLBs, + * which in turn issues interrupts, that will block when disabled. + */ + local_irq_restore(irq_mask); + + if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) + atomic_dec(&mm->lockless_pgtbl_walkers); +} + +static inline int running_lockless_pgtbl_walk(struct mm_struct *mm) +{ + if (IS_ENABLED(CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING)) + return atomic_read(&mm->lockless_pgtbl_walkers); + + /* If disabled, must return > 0, so it falls back to sync method */ + return 1; +} +#endif + /* * On some architectures it depends on the mm if the p4d/pud or pmd * layer of the page table hierarchy is folded or not. diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 2222fa795284..277462f0b4fd 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -521,6 +521,17 @@ struct mm_struct { struct work_struct async_put_work; } __randomize_layout; +#ifdef CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING + /* + * Number of callers who are doing a lockless walk of the + * page tables. Typically arches might enable this in order to + * help optimize performance, by possibly avoiding expensive + * IPIs at the wrong times. + */ + atomic_t lockless_pgtbl_walkers; + +#endif + /* * The mm_cpumask needs to be at the end of mm_struct, because it * is dynamically sized based on nr_cpu_ids. diff --git a/kernel/fork.c b/kernel/fork.c index f9572f416126..2cbca867f5a5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1029,6 +1029,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, #endif mm_init_uprobes_state(mm); +#ifdef CONFIG_LOCKLESS_PAGE_TABLE_WALK_TRACKING + atomic_set(&mm->lockless_pgtbl_walkers, 0); +#endif if (current->mm) { mm->flags = current->mm->flags & MMF_INIT_MASK; mm->def_flags = current->mm->def_flags & VM_INIT_DEF_MASK;