From patchwork Tue Jan 4 20:22:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Yu Zhao X-Patchwork-Id: 12703821 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id A317DC433EF for ; Tue, 4 Jan 2022 20:25:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=lists.infradead.org; s=bombadil.20210309; h=Sender: Content-Transfer-Encoding:Content-Type:List-Subscribe:List-Help:List-Post: List-Archive:List-Unsubscribe:List-Id:Cc:To:From:Subject:References: Mime-Version:Message-Id:In-Reply-To:Date:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Owner; bh=A/itJqn60/+aEvyjzhYk1v+/1SZWCXP02OHylisEq4Q=; b=qL4IJlKbkNCUmiiGnbuCFtVSQo nXzeYX6Fpf+v9ZOdrhxePwYRkgFjgKS5rINV8KLrdJnMK1PyZHXE8CLVibEmQk1SWRV3A0r9DKPwY GzhsUvGrmHXex40/z3rvj0selXogWemNxckJHK+1w9yqcc4/JfN0x+o0h8cwhW4oT8CZiEkAlPqzD ziQuDU91CCTHTXRoPwXi70VmzVv3eOUpUW9juNCLd1veqGIc30b4OOqejW/cTolED4O/1fkHOaAGK VkYBTwUrUAjfnaj3ChJSJPZzwtfH86K+PvzeT83LqBP1A/v1HBu8ixSu498a+xOBLVrOcRQACh6Mm C472Wq5Q==; Received: from localhost ([::1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.94.2 #2 (Red Hat Linux)) id 1n4qLQ-00CmWE-6a; Tue, 04 Jan 2022 20:23:44 +0000 Received: from mail-yb1-xb49.google.com ([2607:f8b0:4864:20::b49]) by bombadil.infradead.org with esmtps (Exim 4.94.2 #2 (Red Hat Linux)) id 1n4qL7-00CmHv-1i for linux-arm-kernel@lists.infradead.org; Tue, 04 Jan 2022 20:23:27 +0000 Received: by mail-yb1-xb49.google.com with SMTP id n2-20020a255902000000b0060f9d75eafeso13160664ybb.1 for ; Tue, 04 Jan 2022 12:23:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=2fzYgKHjTtrxGw0GB60HtJq3pkJVmwXmT5LDUgkS65M=; b=YI7VLpCNyyiUbnArPXH4Vog2iUj3Py1GveGy5aNLD018G6UTJ5D220cf3S8iAqFZM+ ChAgVvl7odJAso8iiC8Gk0jW6zt3nXSLXLFwhahNv16mu72Qr/tPjaoNxJsh0sf99Tog zzLbdc/omE1wmDBCYoYXcsX9KI5r+qZclmfYOKgcEEVYtRo4fTCUb/AqeQpCTi24srgE WXXwVoof997pmI299Q4VRR809/TbKAPHfWcDdJ+pKT66zHCqIfwmnvDPJtv8NppW1v23 kezUSb2GfdvOVZ/hNDqh3QznuHfUtxOBxEaueZjlEIM0iuTtXQeA0DXfSB7YiBOdpJQ5 BHIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2fzYgKHjTtrxGw0GB60HtJq3pkJVmwXmT5LDUgkS65M=; b=zwvvxUJs+b4KMlcpwwr40C0WAO/aG4ntlzAcyCS8iAmJIM6pjDIzPmNiCW33TMtEyA DYEk1oF1UkeBJAVtOguJd98ejCh1ztQkOWTQXm3bk4aqY9mTv46U3k+W7y6elsxfHkOw 8gZugXbFnQSmAbj/yAf5pMAaHHtcPtpXgFgiLPbpHfXTmFgpR2pP+OcZ4PK5SG1+UZGy DzFEK8GJ/N6bAA7x1IhJS2knJwwLigDhdcGJFzyQ96Okc5sp2BL8r6z0et0CzEnvwiHF +tLGR/uBtVBlGRuANBnxHGXI2ZQr7U96+bwxF6mtfjzgBuNy9lnVu0ozEIdOz/zcdAgD vWwg== X-Gm-Message-State: AOAM530udzPxd+rgm7Ay8siz1Y6BAwk7U8mJa1Zj1hOZ01tUzZVen/6+ i8evHMfV1+zLzl3XOfiaFaNu3R0L0QU= X-Google-Smtp-Source: ABdhPJzJjG/RD50GN61XoI5Bcnp6HfOxLQQyP8uuoJlHDydVLPIBL6v9CM3RuZykJkxMwwXnrdc0GXLYZr4= X-Received: from yuzhao.bld.corp.google.com ([2620:15c:183:200:6c8c:5506:7ca2:9dfd]) (user=yuzhao job=sendgmr) by 2002:a25:84c2:: with SMTP id x2mr17042809ybm.51.1641327802572; Tue, 04 Jan 2022 12:23:22 -0800 (PST) Date: Tue, 4 Jan 2022 13:22:20 -0700 In-Reply-To: <20220104202227.2903605-1-yuzhao@google.com> Message-Id: <20220104202227.2903605-2-yuzhao@google.com> Mime-Version: 1.0 References: <20220104202227.2903605-1-yuzhao@google.com> X-Mailer: git-send-email 2.34.1.448.ga2b2bfdf31-goog Subject: [PATCH v6 1/9] mm: x86, arm64: add arch_has_hw_pte_young() From: Yu Zhao To: Andrew Morton , Linus Torvalds Cc: Andi Kleen , Catalin Marinas , Dave Hansen , Hillf Danton , Jens Axboe , Jesse Barnes , Johannes Weiner , Jonathan Corbet , Matthew Wilcox , Mel Gorman , Michael Larabel , Michal Hocko , Rik van Riel , Vlastimil Babka , Will Deacon , Ying Huang , linux-arm-kernel@lists.infradead.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, page-reclaim@google.com, x86@kernel.org, Yu Zhao , Konstantin Kharlamov X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20220104_122325_121198_AADA11BE X-CRM114-Status: GOOD ( 22.46 ) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.34 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+linux-arm-kernel=archiver.kernel.org@lists.infradead.org Some architectures automatically set the accessed bit in PTEs, e.g., x86 and arm64 v8.2. On architectures that don't have this capability, clearing the accessed bit in a PTE usually triggers a page fault following the TLB miss of this PTE. Being aware of this capability can help make better decisions, e.g., whether to spread the work out over a period of time to avoid bursty page faults when trying to clear the accessed bit in a large number of PTEs. Signed-off-by: Yu Zhao Tested-by: Konstantin Kharlamov --- arch/arm64/include/asm/cpufeature.h | 5 +++++ arch/arm64/include/asm/pgtable.h | 13 ++++++++----- arch/arm64/kernel/cpufeature.c | 19 +++++++++++++++++++ arch/arm64/tools/cpucaps | 1 + arch/x86/include/asm/pgtable.h | 6 +++--- include/linux/pgtable.h | 13 +++++++++++++ mm/memory.c | 14 +------------- 7 files changed, 50 insertions(+), 21 deletions(-) diff --git a/arch/arm64/include/asm/cpufeature.h b/arch/arm64/include/asm/cpufeature.h index ef6be92b1921..99518b4b2a9e 100644 --- a/arch/arm64/include/asm/cpufeature.h +++ b/arch/arm64/include/asm/cpufeature.h @@ -779,6 +779,11 @@ static inline bool system_supports_tlb_range(void) cpus_have_const_cap(ARM64_HAS_TLB_RANGE); } +static inline bool system_has_hw_af(void) +{ + return IS_ENABLED(CONFIG_ARM64_HW_AFDBM) && cpus_have_const_cap(ARM64_HW_AF); +} + extern int do_emulate_mrs(struct pt_regs *regs, u32 sys_reg, u32 rt); static inline u32 id_aa64mmfr0_parange_to_phys_shift(int parange) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index c4ba047a82d2..e736f47436c7 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -999,13 +999,16 @@ static inline void update_mmu_cache(struct vm_area_struct *vma, * page after fork() + CoW for pfn mappings. We don't always have a * hardware-managed access flag on arm64. */ -static inline bool arch_faults_on_old_pte(void) +static inline bool arch_has_hw_pte_young(bool local) { - WARN_ON(preemptible()); + if (local) { + WARN_ON(preemptible()); + return cpu_has_hw_af(); + } - return !cpu_has_hw_af(); + return system_has_hw_af(); } -#define arch_faults_on_old_pte arch_faults_on_old_pte +#define arch_has_hw_pte_young arch_has_hw_pte_young /* * Experimentally, it's cheap to set the access flag in hardware and we @@ -1013,7 +1016,7 @@ static inline bool arch_faults_on_old_pte(void) */ static inline bool arch_wants_old_prefaulted_pte(void) { - return !arch_faults_on_old_pte(); + return arch_has_hw_pte_young(true); } #define arch_wants_old_prefaulted_pte arch_wants_old_prefaulted_pte diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c index 6f3e677d88f1..5bb553ee2c0e 100644 --- a/arch/arm64/kernel/cpufeature.c +++ b/arch/arm64/kernel/cpufeature.c @@ -2171,6 +2171,25 @@ static const struct arm64_cpu_capabilities arm64_features[] = { .matches = has_hw_dbm, .cpu_enable = cpu_enable_hw_dbm, }, + { + /* + * __cpu_setup always enables this capability. But if the boot + * CPU has it and a late CPU doesn't, the absent + * ARM64_CPUCAP_OPTIONAL_FOR_LATE_CPU will prevent this late CPU + * from going online. There is neither known hardware does that + * nor obvious reasons to design hardware works that way, hence + * no point leaving the door open here. If the need arises, a + * new weak system feature flag should do the trick. + */ + .desc = "Hardware update of the Access flag", + .type = ARM64_CPUCAP_SYSTEM_FEATURE, + .capability = ARM64_HW_AF, + .sys_reg = SYS_ID_AA64MMFR1_EL1, + .sign = FTR_UNSIGNED, + .field_pos = ID_AA64MMFR1_HADBS_SHIFT, + .min_field_value = 1, + .matches = has_cpuid_feature, + }, #endif { .desc = "CRC32 instructions", diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index 870c39537dd0..56e4ef5d95fa 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -36,6 +36,7 @@ HAS_STAGE2_FWB HAS_SYSREG_GIC_CPUIF HAS_TLB_RANGE HAS_VIRT_HOST_EXTN +HW_AF HW_DBM KVM_PROTECTED_MODE MISMATCHED_CACHE_TYPE diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index 448cd01eb3ec..c60b16f8b741 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1397,10 +1397,10 @@ static inline bool arch_has_pfn_modify_check(void) return boot_cpu_has_bug(X86_BUG_L1TF); } -#define arch_faults_on_old_pte arch_faults_on_old_pte -static inline bool arch_faults_on_old_pte(void) +#define arch_has_hw_pte_young arch_has_hw_pte_young +static inline bool arch_has_hw_pte_young(bool local) { - return false; + return true; } #endif /* __ASSEMBLY__ */ diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..53bd6a26918f 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -258,6 +258,19 @@ static inline int pmdp_clear_flush_young(struct vm_area_struct *vma, #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ #endif +#ifndef arch_has_hw_pte_young +/* + * Return whether the accessed bit is supported by the local CPU or system-wide. + * + * This stub assumes accessing thru an old PTE triggers a page fault. + * Architectures that automatically set the access bit should overwrite it. + */ +static inline bool arch_has_hw_pte_young(bool local) +{ + return false; +} +#endif + #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long address, diff --git a/mm/memory.c b/mm/memory.c index 8f1de811a1dc..ead6c7d4b9a1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -121,18 +121,6 @@ int randomize_va_space __read_mostly = 2; #endif -#ifndef arch_faults_on_old_pte -static inline bool arch_faults_on_old_pte(void) -{ - /* - * Those arches which don't have hw access flag feature need to - * implement their own helper. By default, "true" means pagefault - * will be hit on old pte. - */ - return true; -} -#endif - #ifndef arch_wants_old_prefaulted_pte static inline bool arch_wants_old_prefaulted_pte(void) { @@ -2755,7 +2743,7 @@ static inline bool cow_user_page(struct page *dst, struct page *src, * On architectures with software "accessed" bits, we would * take a double page fault, so mark it accessed here. */ - if (arch_faults_on_old_pte() && !pte_young(vmf->orig_pte)) { + if (!arch_has_hw_pte_young(true) && !pte_young(vmf->orig_pte)) { pte_t entry; vmf->pte = pte_offset_map_lock(mm, vmf->pmd, addr, &vmf->ptl);