From patchwork Thu Feb 15 10:32:03 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13557866 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66484C4829E for ; Thu, 15 Feb 2024 10:33:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 774598D0022; Thu, 15 Feb 2024 05:33:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FCC28D000E; Thu, 15 Feb 2024 05:33:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4FFBC8D0022; Thu, 15 Feb 2024 05:33:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 356B58D000E for ; Thu, 15 Feb 2024 05:33:17 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 15A6280FC4 for ; Thu, 15 Feb 2024 10:33:17 +0000 (UTC) X-FDA: 81793676034.24.FAE8DED Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf27.hostedemail.com (Postfix) with ESMTP id 673A840020 for ; Thu, 15 Feb 2024 10:33:15 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707993195; a=rsa-sha256; cv=none; b=0aFBquuYjObQ1+B+QB6ASiIiBs1x24v6O2AV7M7h+n5GcBubYe+WXJnoPoe5/kbiOGxlgM n8wiUkMHa46quJizdF5BUGRTOsLnFoz7213GZ+3+oTrKqw1e7jzJIGuEwhQR8YLYAr7HvF nX4BIfMhlnmhWPW4y/GpqXX+LCa26Wk= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf27.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707993195; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RBDSb1ynAOyzexiyHHWXWWE26cJncrKBIu2RWSUrpHs=; b=Miiw20A7Tuc4T4FPzU5IEYvM0yzjRsMJBgdtUOAKtKhXVYKrmcJG9o3pXQ8iN8pa6jIXBF suZSuHLPtYCOTEcnDrDepyFFfYqcfKnLXXwb0Hh/6Ga0UwPH/ggYUmu4LjEXltXZP5Y6xK 1mX4cwenJwWZtIKRLBz3UWAnjMkGV6Q= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 871BF15BF; Thu, 15 Feb 2024 02:33:55 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 7B9B93F7B4; Thu, 15 Feb 2024 02:33:11 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v6 16/18] arm64/mm: Implement pte_batch_hint() Date: Thu, 15 Feb 2024 10:32:03 +0000 Message-Id: <20240215103205.2607016-17-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240215103205.2607016-1-ryan.roberts@arm.com> References: <20240215103205.2607016-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 673A840020 X-Stat-Signature: momodpndugdcdcak9w9xs88j7bnjwia5 X-Rspam-User: X-HE-Tag: 1707993195-786397 X-HE-Meta: U2FsdGVkX1/keZAjVT6f93ztl7FT+THUsRsEOFWyPEYDgLrxZPznmr7zHIdRDv2EFV5Fd1WEJNq0GQQVM7gseYoCMaYFi1gYKzso4bnBF3c9cpyiMhPfq40dIcjxHAU1JMWYXEHpYbnWxDHNfUozezoVe3cfYNcDsoGdi1hSnL0lagR6mjX74WTCdCnDC4XNwLo9TH+WT8mFAtbo4ExD+8XQpjrc+bYD4EApj2TfTR1CsX8GdPhi3YM9zemSDL/T8FlhgtapfuwltyXxvur9M3ODC9F8cSs8a57cEo/47D6+/12codsqYPzDe6HwWlMGQ1FSqh9VMFlA9KzIEkpcmrbdkaQ/f2I0w66n8xSZLdGUKEHbF9F01yr3flHn+Ti8tACiacPSlkkzrt4uwlw98OLMwTPJdT7s20Rsny+ye3NbzdLilUgp8R1EanI6d2zi3c5+z1jM6gchlJU3dR4oB6t8e23CIQn1/y4J5Aa5IhbVSaG4HQ8t2q8FTMKWQvb7k9QaI8UUIqpRQMrKIp9xDIgcCKtzQje+jSIIIkyNZeINNWWFq0olaLo8l2S6FETUbwLg08LnOo+9yKbVgwPpvDPr9Vt5Nv3Z3EFPSdlXOvm/pace0wvdYmB8r+pc3C0X1TLRQjrX153da/CSTl01DQ2fanYQUpwwd7oXFav/uYMCpfvn6Rpy1k7bceeTuR34NXwxtnROZkxyeWyI+nV+Izw+sAgD6iLnIkG92AC6cA9PDH2FEHjktiGy4MmU4dAfsBI9BZ2Hb0Crd18nEBUbTetBuWoipa9bdPzFQE0ce9mqwRH0FIG44946/BHMZF89zuHKpRfP7tY2eegLtZ6gNkhzIbIGFa9opfw6ZxhF6bzsQwQmOTUScrwio7u7LIfYi3awY+eWCxXc/lzc9mqC/y2+T0z3bazklDbzQ3smWwAXwNErpl9VR/crt8Cc6F2SUGbILk9A8Lv/NtYtzQl B23dSlRo UuGXKOYPIc36wkF4sgu1PBAhWzJr6vy51GSowzTzBKINvtaHkKgFes/ABon1kC5+T2fmVid+Y2nOk1hLQA2Ic26JNbFScASbBCl8x48bCeXka2GVX58rxcofcfi4zF2zWQ8yd1GvqMxuH1zinfE44TIdhPSSphHYNbAlELq95gZDUG59y4+ccCsxdCw57nAFbLikjeBjk2QrQKB/mOjL3YI7Y6wnaTHkVqajmR+F6MRtbpGuC45BQTEFis5qsgVspzbKRf8AJKXX92vIZy0srM192bgZxm9yIiMF4P1Hzv64+ZxuKA0F4wt55P1nxQbKsK/OI X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When core code iterates over a range of ptes and calls ptep_get() for each of them, if the range happens to cover contpte mappings, the number of pte reads becomes amplified by a factor of the number of PTEs in a contpte block. This is because for each call to ptep_get(), the implementation must read all of the ptes in the contpte block to which it belongs to gather the access and dirty bits. This causes a hotspot for fork(), as well as operations that unmap memory such as munmap(), exit and madvise(MADV_DONTNEED). Fortunately we can fix this by implementing pte_batch_hint() which allows their iterators to skip getting the contpte tail ptes when gathering the batch of ptes to operate on. This results in the number of PTE reads returning to 1 per pte. Acked-by: Mark Rutland Reviewed-by: David Hildenbrand Tested-by: John Hubbard Signed-off-by: Ryan Roberts Acked-by: Catalin Marinas --- arch/arm64/include/asm/pgtable.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index a8f1a35e3086..d759a20d2929 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1213,6 +1213,15 @@ static inline void contpte_try_unfold(struct mm_struct *mm, unsigned long addr, __contpte_try_unfold(mm, addr, ptep, pte); } +#define pte_batch_hint pte_batch_hint +static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte) +{ + if (!pte_valid_cont(pte)) + return 1; + + return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1)); +} + /* * The below functions constitute the public API that arm64 presents to the * core-mm to manipulate PTE entries within their page tables (or at least this