From patchwork Fri Feb 2 08:07:54 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13542358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A8EDC47258 for ; Fri, 2 Feb 2024 08:09:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C03A06B00AF; Fri, 2 Feb 2024 03:09:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BB4436B00B1; Fri, 2 Feb 2024 03:09:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A7C516B00B2; Fri, 2 Feb 2024 03:09:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 984F26B00AF for ; Fri, 2 Feb 2024 03:09:46 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 747E2140F38 for ; Fri, 2 Feb 2024 08:09:46 +0000 (UTC) X-FDA: 81746139972.08.CE75675 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf14.hostedemail.com (Postfix) with ESMTP id CB765100010 for ; Fri, 2 Feb 2024 08:09:44 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706861384; a=rsa-sha256; cv=none; b=ta2OrRgj8hmBn3g+tmxu5rXFV5ACbjxmBHvBuXTGbu2Z42GvgdtVVlPw1RBNy0nqoT0CxY 88J0vlDluV0fqrjz1zfgCt1S2XD/oFUIPsHl9mgNt/j0zskTV7KyE225aZbdxURuiBWhTF VfBbYghiZzDhWg25EQlaBZbsctLSqcU= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706861384; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=j94fI+UszkXDyyB5sUGdHbBs/R63iGXwtMWb4oxj3U4=; b=3hqAkLpj6CTjL5tCxCQafiupzCyL4xaQCcllZnbW/NKbSteFcYLoY+Msb/GX/YbB4W6TrC vnlZgGYJKNLhGzUNtBbtpgO2EMP9pHG2xIwEJoMtWOnRSQWpHZmY2YRiK4uMaHSzYeyEO6 zftFgkeni8tGXvVA1m4EQ9m9oPfURVo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 87EF01BF3; Fri, 2 Feb 2024 00:10:26 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 9C0C43F5A1; Fri, 2 Feb 2024 00:09:40 -0800 (PST) From: Ryan Roberts To: Catalin Marinas , Will Deacon , Ard Biesheuvel , Marc Zyngier , James Morse , Andrey Ryabinin , Andrew Morton , Matthew Wilcox , Mark Rutland , David Hildenbrand , Kefeng Wang , John Hubbard , Zi Yan , Barry Song <21cnbao@gmail.com>, Alistair Popple , Yang Shi , Nicholas Piggin , Christophe Leroy , "Aneesh Kumar K.V" , "Naveen N. Rao" , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, x86@kernel.org, linuxppc-dev@lists.ozlabs.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v5 23/25] arm64/mm: Implement pte_batch_hint() Date: Fri, 2 Feb 2024 08:07:54 +0000 Message-Id: <20240202080756.1453939-24-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 In-Reply-To: <20240202080756.1453939-1-ryan.roberts@arm.com> References: <20240202080756.1453939-1-ryan.roberts@arm.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: CB765100010 X-Stat-Signature: pq3zyq35pcexw9b6i6rnoqfcy3gki5sj X-HE-Tag: 1706861384-878312 X-HE-Meta: U2FsdGVkX18SobMpthYfuXKtlEQQZrxIbH8jsGf06Nlgyj7wF9VpTa2nxkQblj2ZXNpTU9m9tOnfzWyjAubyn69iXPX2bkalIaLyr02rSfLUKDcUSRuNdlPBoyuPkjIB3NTNIQPIF1uVe/WFlDZiJh0VV+rehrv2yMZAdcvo3LclM6L4+oCZBoLnNQPKLlofbBNrZ6YG2kWvo0QPWixwTDd/cvYlZyWwficK4uQ3uZ9EPy5IfpM0DNLeYzAd3WsAag9NFxGx27+ZCk46YxljwnAdAIbmo2/ALyzR+I5zbW2ju+OQQ7rw+XMPdeCR0k7VzDOifW5n/aAlf7mLodKVoAtd7wT2jBmvAfwouwh15uzUPT9XZil2l186ebTft49FIrmZMeIyQNJQ+NHyEVl0qoWgS0NlNgqPGEa/ypI26kmoLWl+mKpSMJF99zaEWVtkDFAXZQe6KuGvUai/bg3of3YYMsXrLOIYvjgsj/c1uEjYAzGi8HThz/NA7CeeUL13umJ4FA+MTg3bDhWKpFm5zl9ZG+H6anGxZ5Dq05zgsQZCDomGd2Nd4V41x3RVJzdCYFDn4bMwF7uWbTpBr5PSlY0hsSm+rz4xOZQGPgrL/bi/L2yOXtSzLZvFcniqgA4r8eL9uSa/sN7FsSzqB8jG4kJGeNnuNEHEZEajxCd3uVttBcQHW5rC4L2gcQZ4vUXe4jt9vhHiHZjFci6N927kRbViAuYK2GqH0dxk3n8QIkh0Sk+63stmX2fBa1ji5bNzpPZRzcZgkqarBovu9oPUeOYBqKkqgcGhP9B/07fYS6W7yQgifqFbyvJVTAkrxgOo4oOLvVHn8ZnlP2ED/ifYb7fbxiUWtJ7E9AY3MMh7y2Z1DphOhJJUnyKwkN+ur5on3e0jF6IhvzAiMtUot4KdmKcPH/BDVsqr1WDsL+75MF7dHT/aCBtIdjiRLR6zuakjcVa2tovTyguYjX+l2/S 7NTZpI/M /inbC94SZdipWixmP5w11yYGBlIwib80Ay0LxIj6htj0Nb0hBJZPXmwr0Ir0LUhXJyJulhuZd3pOzu5Yj2bOGmVSHNOZvapSZiu1qrgRAh0YnKe7r+EEilu/4s//RRrpHv2GjZaoEb9AOKOL2vEU1Jld9bkExnqf1/nS2BRuC/TA/4T8D21OSNeq1J4SwJtcJDL0rzt+ocDvk4ngwi5HvVjcZp9zwYA7i3X62i3GWqq6uYQ5+AsQMKMym2mhU3Lc4pldixjp4n8Odg/ntpITZDgNpeE8mkHe9N2sIscDAZ7C+1Ec= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: When core code iterates over a range of ptes and calls ptep_get() for each of them, if the range happens to cover contpte mappings, the number of pte reads becomes amplified by a factor of the number of PTEs in a contpte block. This is because for each call to ptep_get(), the implementation must read all of the ptes in the contpte block to which it belongs to gather the access and dirty bits. This causes a hotspot for fork(), as well as operations that unmap memory such as munmap(), exit and madvise(MADV_DONTNEED). Fortunately we can fix this by implementing pte_batch_hint() which allows their iterators to skip getting the contpte tail ptes when gathering the batch of ptes to operate on. This results in the number of PTE reads returning to 1 per pte. Tested-by: John Hubbard Signed-off-by: Ryan Roberts Reviewed-by: David Hildenbrand Acked-by: Mark Rutland --- arch/arm64/include/asm/pgtable.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index ad04adb7b87f..353ea67b5d75 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1220,6 +1220,15 @@ static inline void contpte_try_unfold(struct mm_struct *mm, unsigned long addr, __contpte_try_unfold(mm, addr, ptep, pte); } +#define pte_batch_hint pte_batch_hint +static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte) +{ + if (!pte_valid_cont(pte)) + return 1; + + return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1)); +} + /* * The below functions constitute the public API that arm64 presents to the * core-mm to manipulate PTE entries within their page tables (or at least this