From patchwork Thu Feb 15 12:17:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Ryan Roberts X-Patchwork-Id: 13558300 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A5A9C4829E for ; Thu, 15 Feb 2024 12:18:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 01BA18D0006; Thu, 15 Feb 2024 07:18:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E97688D0001; Thu, 15 Feb 2024 07:18:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEB058D0006; Thu, 15 Feb 2024 07:18:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id B7DC18D0001 for ; Thu, 15 Feb 2024 07:18:08 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8779FC103A for ; Thu, 15 Feb 2024 12:18:08 +0000 (UTC) X-FDA: 81793940256.02.32EF70E Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf04.hostedemail.com (Postfix) with ESMTP id C68F74001B for ; Thu, 15 Feb 2024 12:18:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf04.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1707999487; a=rsa-sha256; cv=none; b=aj0FIrdCjnvIIRmN34kKF5Pcc0gmw/7na7tBU9x9/qt6DQNNY2vZN4BS6fCNUFEOybBDy/ XfuCD126Z/VI324ijde4KMQ+PRm9qBAHqU+YFmiW/oyW5zdUVrRDpu8EbippI4JqyLFh5Z hAtYw04PvE5wtKQgcH6ZCkNAAN9EI4I= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf04.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1707999487; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references; bh=EgATNENApWy8csVmwonky1cJjqyqybzBOEU21JHol90=; b=Z5LnbC+o9QsoYieKizQcZvNg3vwSctpfceJn6dP6yr+SxA6REPFbqe4MMzskWp51gAoDf0 bG9Cdy2NLONFb55xvS4IbAmAc7nYmWW4hVNkYCvLRoxkibygYFC5HUkiBOTF3+GV1NnBaM C+vvN0kpg+t5hziv2BdrP9KGkeodQ1k= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id C309F1FB; Thu, 15 Feb 2024 04:18:46 -0800 (PST) Received: from e125769.cambridge.arm.com (e125769.cambridge.arm.com [10.1.196.26]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 256853F766; Thu, 15 Feb 2024 04:18:04 -0800 (PST) From: Ryan Roberts To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: Ryan Roberts , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [RFC PATCH v1 0/4] Reduce cost of ptep_get_lockless on arm64 Date: Thu, 15 Feb 2024 12:17:52 +0000 Message-Id: <20240215121756.2734131-1-ryan.roberts@arm.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: C68F74001B X-Stat-Signature: 6rispf3rny7abkxniyapuntbh6db79mu X-Rspam-User: X-HE-Tag: 1707999486-769970 X-HE-Meta: U2FsdGVkX18SbpVSdT2TNTw0uqneL6A2Z7hRxPhcEMB37+p8cbXE8D//BPqByBcjzA2lHetsZIZx0qithSgUXbUUjGy8jUE9XDZqzdvzKvqnqN4OcP/rQGDCEEgmsPwlNUd1aMeRlCqqf/2BmZ+A0tydHrWAok/BOcEAYc/2lDtY5Mz0p5BGwEtySwN/TpNaOt411htISW51LMztcff4dQURflxXyq9OUcc99Xfk9Ik/un/C9Hyj5PSNdSree4J82TNc9VPpNNp26CkLcm+8FYlbWf+G6Pjwuw++UMRaKGv/WfUJ/mV3Zpo74G7Brf04QzdItGTsQXyHgjcutZPge3A0mCcxVQOaR1/Bt/vmr1hUgSN5HDGq8hx80p/AIFD9inB9s48T6lLq2pc9ywa0/0C95HWNNXwVAbZldfqLJnw+TQ1HTUiu4b9mYgVlIx7tYupllc/uVidT1yRwCh9xzNUCdFE/QQWdLiH0rDBujPGQcptUdkg/DerCJ565k11M//+NEhw/AloBOALxOuiWcQUiKNiUeq8teegjyF7Q9CIBoST3k6fgNYdSwgGCwJApAZT8fMLzp+6hCxa7b7kCbItA9wzAt4fhd0ujTFysjWL4M4ApUAd8TbQ1RSUQz4b0nG/ZIfSmpszLb2tj20uktrJtCkWHZvHHCeZrpDMxN7QBWXsIddaMV+bg/OaBK3sX+dUNSrRYXQU5hR/wJT2/e8VBD5nv8HtePZPJJZzCtFl2SA5JD1/EAp0RGjp/9rymwRTS0PqdsmaxnL3KZ7dx/CBh4KvcCpILAGD+UTvJQsHewGPoicPYRy22Yo86rfdf8loJnFqpzyZygZtWuqC6crIagLeZdv10OtQOV1U/9a/qhUok4IbwPqBOPSK70Qp9gRhHHZXe2wQYPg1KVuCY4uGuQzlpjfzgvlnwNL+LYSUWEJkKevp9PIYLyh5cXYruct2Gd/drgJwbv9q++UV BEKvBos6 vcmXaStbxIe65lgf7tfA6S5iyNfreLAWmNlRPV+CZY008UQ4nVFOlUzYk42He9j5SPXnlNEWF7Bb6oPVoamQnWoinPHeeL79L+KHWo1iN38xKek0eN2Vpwwqt4tuvC9425foEuKodB6i8G5BKqSdSrYmfqqzCOnehgBf7ZJcu5+3Kb52Tp78SFnFhlG4SR13YkyvW+bzhTC1g0GeQXJu/9WhYUVZWCrlFSpwHWkczE2VYPirgyUS3YWGxda5PjK6+RQwzb8JDlgKEfpdh4zWlOVm6FqdN8keHfS81 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is an RFC for a series that aims to reduce the cost and complexity of ptep_get_lockless() for arm64 when supporting transparent contpte mappings [1]. The approach came from discussion with Mark and David [2]. It introduces a new helper, ptep_get_lockless_norecency(), which allows the access and dirty bits in the returned pte to be incorrect. This relaxation permits arm64's implementation to just read the single target pte, and avoids having to iterate over the full contpte block to gather the access and dirty bits, for the contpte case. It turns out that none of the call sites using ptep_get_lockless() require accurate access and dirty bit information, so we can also convert those sites. Although a couple of places need care (see patches 2 and 3). Arguably patch 3 is a bit fragile, given the wide accessibility of vmf->orig_pte. So it might make sense to drop this patch and stick to using ptep_get_lockless() in the page fault path. I'm keen to hear opinions. I've chosen the name "recency" because it's shortish and somewhat descriptive, and is alredy used in a couple of places to mean similar things (see mglru and damon). I'm open to other names if anyone has better ideas. If concensus is that this approach is generally acceptable, I intend to create a series in future to do a similar thing with ptep_get() -> ptep_get_norecency(). --- This series applies on top of [1]. [1] https://lore.kernel.org/linux-mm/20240215103205.2607016-1-ryan.roberts@arm.com/ [2] https://lore.kernel.org/linux-mm/a91cfe1c-289e-4828-8cfc-be34eb69a71b@redhat.com/ Thanks, Ryan Ryan Roberts (4): mm: Introduce ptep_get_lockless_norecency() mm/gup: Use ptep_get_lockless_norecency() mm/memory: Use ptep_get_lockless_norecency() for orig_pte arm64/mm: Override ptep_get_lockless_norecency() arch/arm64/include/asm/pgtable.h | 6 ++++ include/linux/pgtable.h | 55 ++++++++++++++++++++++++++++-- kernel/events/core.c | 2 +- mm/gup.c | 7 ++-- mm/hugetlb.c | 2 +- mm/khugepaged.c | 2 +- mm/memory.c | 57 ++++++++++++++++++++------------ mm/swap_state.c | 2 +- mm/swapfile.c | 2 +- 9 files changed, 102 insertions(+), 33 deletions(-) -- 2.25.1