From patchwork Wed Feb 23 05:21:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756358 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9CC58C433EF for ; Wed, 23 Feb 2022 05:23:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2A25F8D0003; Wed, 23 Feb 2022 00:23:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 250ED8D0001; Wed, 23 Feb 2022 00:23:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0CB438D0003; Wed, 23 Feb 2022 00:23:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id F1B928D0001 for ; Wed, 23 Feb 2022 00:23:49 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5B04720765 for ; Wed, 23 Feb 2022 05:23:49 +0000 (UTC) X-FDA: 79172902578.10.9A898CE Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf28.hostedemail.com (Postfix) with ESMTP id DD73BC0003 for ; Wed, 23 Feb 2022 05:23:48 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d6d36ec646so129182317b3.23 for ; Tue, 22 Feb 2022 21:23:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=GiCelUYVwIstxpHcG9rjUpGG5AWgwrhv320gd9YieGc=; b=m37cSIhex1P0AZECSya53Nwlg/yln3Q85Hc/7yVpLCT4G9GGxthOi0alDY9kzflfJf jxr7dI9kLc34Zh2tgIIu7ncsA8mItJVunH8yTxcyUS0nT8UjkaLLgEbCfjeQThYH++9U X5/yPZI6EEWT1qhiFMvs4k6thovqNzSAhAgs+b0wZLFQZwXp2Q22vOzrG2juvEle09q1 lF2qYsQdKi3eJhFxiXSLgstSyfZd9tAOiBs2kmdCVwg1XZsE0rPkNGxamGQwsj1qmYfa 3N/pDdrSsLFF0KDU8Yzh8eFsdN7hYEX4OK/Dt3o9A1u0TvL2rmxVKhEMmpc6wiFqQO4Y WIOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=GiCelUYVwIstxpHcG9rjUpGG5AWgwrhv320gd9YieGc=; b=uz0h4xwIeK0uDHWVQsG1SsEigacd6a+/C9jobTOMlYlJE3EGt1WqJPmkjDZ7ksebzW Kbyq1dvS3ylRPluaRZ7qlO2fC65naXcyM7TmPcRvA7lEhYLOS36cest+aaaydw0hs13z y5jXEMvcgy2hPCjDkPMSTifbGepO79JbmiIbqnLxqSqffe8Bf4CDNN4QuJcYX507BcGX 8mtPL6Kk0w+vBO6Ax6XWLEr3SWJoBrFBnajixWGqroiPnnv2irBP8yMKLmURlWqvteoL rRvz51AUFfyJHdFh1ahV0IHCzylSkMTbKTrjG6epHSXwOGEOh5wKX93XcMahwf/6ZdA7 Mszg== X-Gm-Message-State: AOAM531wHLaAwdM8yqD+zvGv38R78V4ea7MG/JATMhE29Zw1ySo+2lhl Obqi4gIHvdjHQooU/RmMj906INZeLQj1 X-Google-Smtp-Source: ABdhPJymjY3mQJBHUfMhfZ8vJdOZZMoFyZQ84NUDUQbSn/nngiTOsENeiNKk8wsCIb50BwzPk44T8MOnbWIc X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:2b0a:0:b0:624:a898:3e2f with SMTP id r10-20020a252b0a000000b00624a8983e2fmr9721548ybr.643.1645593828179; Tue, 22 Feb 2022 21:23:48 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:37 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-2-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 01/47] mm: asi: Introduce ASI core API From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DD73BC0003 X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=m37cSIhe; spf=pass (imf28.hostedemail.com: domain of 35MQVYgcKCNsGRK7FAPDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=35MQVYgcKCNsGRK7FAPDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: rdk8p4ss4wd366a6eztu8g9dcre96oxn X-HE-Tag: 1645593828-941869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce core API for Address Space Isolation (ASI). Kernel address space isolation provides the ability to run some kernel code with a reduced kernel address space. There can be multiple classes of such restricted kernel address spaces (e.g. KPTI, KVM-PTI etc.). Each ASI class is identified by an index. The ASI class can register some hooks to be called when entering/exiting the restricted address space. Currently, there is a fixed maximum number of ASI classes supported. In addition, each process can have at most one restricted address space from each ASI class. Neither of these are inherent limitations and are merely simplifying assumptions for the time being. (The Kconfig and the high-level ASI API are derived from the original ASI RFC by Alexandre Chartre). Originally-by: Alexandre Chartre Signed-off-by: Junaid Shahid --- arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm64/include/asm/Kbuild | 1 + arch/csky/include/asm/Kbuild | 1 + arch/h8300/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/ia64/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nds32/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 1 + arch/riscv/include/asm/Kbuild | 1 + arch/s390/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/x86/include/asm/asi.h | 81 +++++++++++++++ arch/x86/include/asm/tlbflush.h | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 152 +++++++++++++++++++++++++++++ arch/x86/mm/init.c | 5 +- arch/x86/mm/tlb.c | 2 +- arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/asi.h | 51 ++++++++++ include/linux/mm_types.h | 3 + kernel/fork.c | 3 + security/Kconfig | 10 ++ 32 files changed, 329 insertions(+), 3 deletions(-) create mode 100644 arch/x86/include/asm/asi.h create mode 100644 arch/x86/mm/asi.c create mode 100644 include/asm-generic/asi.h diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 42911c8340c7..e3cd063d9cca 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y += syscall_table.h generic-y += export.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 3c1afa524b9c..60bdeffa7c31 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h generic-y += user.h +generic-y += asi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 03657ff8fbe3..1e2c3d8dbbd9 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += parport.h generated-y += mach-types.h generated-y += unistd-nr.h +generic-y += asi.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 64202010b700..086e94f00f94 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -4,5 +4,6 @@ generic-y += mcs_spinlock.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h generated-y += cpucaps.h diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild index 904a18a818be..b4af49fa48c3 100644 --- a/arch/csky/include/asm/Kbuild +++ b/arch/csky/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += kvm_para.h generic-y += qrwlock.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h diff --git a/arch/h8300/include/asm/Kbuild b/arch/h8300/include/asm/Kbuild index e23139c8fc0d..f1e937df4c8e 100644 --- a/arch/h8300/include/asm/Kbuild +++ b/arch/h8300/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h generic-y += spinlock.h +generic-y += asi.h diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index 3ece3c93fe08..744ffbeeb7ae 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += extable.h generic-y += iomap.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild index f994c1daf9d4..897a388f3e85 100644 --- a/arch/ia64/include/asm/Kbuild +++ b/arch/ia64/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y += syscall_table.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += vtime.h +generic-y += asi.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 0dbf9c5c6fae..faf0f135df4a 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += extable.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h +generic-y += asi.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index a055f5dbe00a..012e4bf83c13 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += parport.h generic-y += syscalls.h generic-y += tlb.h generic-y += user.h +generic-y += asi.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index dee172716581..b2c7b62536b4 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -14,3 +14,4 @@ generic-y += parport.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/nds32/include/asm/Kbuild b/arch/nds32/include/asm/Kbuild index 82a4453c9c2d..e8c4cf63db79 100644 --- a/arch/nds32/include/asm/Kbuild +++ b/arch/nds32/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += gpio.h generic-y += kvm_para.h generic-y += parport.h generic-y += user.h +generic-y += asi.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 7fe7437555fb..bfdc4026c5b1 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index ca5987e11053..3d365bec74d0 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y += qspinlock.h generic-y += qrwlock_types.h generic-y += qrwlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index e6e7f74c8ac9..b14e4f727331 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y += syscall_table_64.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index bcf95ce0964f..2aff0fa469c4 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += mcs_spinlock.h generic-y += qrwlock.h generic-y += vtime.h generic-y += early_ioremap.h +generic-y += asi.h diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 445ccc97305a..3e2022a5a6c5 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += flat.h generic-y += kvm_para.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 1a18d7b82f86..ef80906ed195 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += asm-offsets.h generic-y += export.h generic-y += kvm_types.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index fc44d9c88b41..ea19e4515828 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y += syscall_table.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h +generic-y += asi.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 0b9d98ced34a..08730a26aaed 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y += syscall_table_64.h generic-y += export.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index e5a7b552bb38..b62245b2445a 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -27,3 +27,4 @@ generic-y += word-at-a-time.h generic-y += kprobes.h generic-y += mm_hooks.h generic-y += vga.h +generic-y += asi.h diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h new file mode 100644 index 000000000000..f9fc928a555d --- /dev/null +++ b/arch/x86/include/asm/asi.h @@ -0,0 +1,81 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_ASI_H +#define _ASM_X86_ASI_H + +#include + +#include +#include + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +#define ASI_MAX_NUM_ORDER 2 +#define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) + +struct asi_state { + struct asi *curr_asi; + struct asi *target_asi; +}; + +struct asi_hooks { + /* Both of these functions MUST be idempotent and re-entrant. */ + + void (*post_asi_enter)(void); + void (*pre_asi_exit)(void); +}; + +struct asi_class { + struct asi_hooks ops; + uint flags; + const char *name; +}; + +struct asi { + pgd_t *pgd; + struct asi_class *class; + struct mm_struct *mm; +}; + +DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); + +void asi_init_mm_state(struct mm_struct *mm); + +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops); +void asi_unregister_class(int index); + +int asi_init(struct mm_struct *mm, int asi_index); +void asi_destroy(struct asi *asi); + +void asi_enter(struct asi *asi); +void asi_exit(void); + +static inline void asi_set_target_unrestricted(void) +{ + barrier(); + this_cpu_write(asi_cpu_state.target_asi, NULL); +} + +static inline struct asi *asi_get_current(void) +{ + return this_cpu_read(asi_cpu_state.curr_asi); +} + +static inline struct asi *asi_get_target(void) +{ + return this_cpu_read(asi_cpu_state.target_asi); +} + +static inline bool is_asi_active(void) +{ + return (bool)asi_get_current(); +} + +static inline bool asi_is_target_unrestricted(void) +{ + return !asi_get_target(); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index b587a9ee9cb2..3c43ad46c14a 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -259,6 +259,8 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); +unsigned long build_cr3(pgd_t *pgd, u16 asid); + #endif /* !MODULE */ #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 5864219221ca..09d5e65e47c8 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -51,6 +51,7 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o obj-$(CONFIG_PAGE_TABLE_ISOLATION) += pti.o +obj-$(CONFIG_ADDRESS_SPACE_ISOLATION) += asi.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_identity.o diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c new file mode 100644 index 000000000000..9928325f3787 --- /dev/null +++ b/arch/x86/mm/asi.c @@ -0,0 +1,152 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include +#include +#include + +#undef pr_fmt +#define pr_fmt(fmt) "ASI: " fmt + +static struct asi_class asi_class[ASI_MAX_NUM]; +static DEFINE_SPINLOCK(asi_class_lock); + +DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); + +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops) +{ + int i; + + VM_BUG_ON(name == NULL); + + spin_lock(&asi_class_lock); + + for (i = 1; i < ASI_MAX_NUM; i++) { + if (asi_class[i].name == NULL) { + asi_class[i].name = name; + asi_class[i].flags = flags; + if (ops != NULL) + asi_class[i].ops = *ops; + break; + } + } + + spin_unlock(&asi_class_lock); + + if (i == ASI_MAX_NUM) + i = -ENOSPC; + + return i; +} +EXPORT_SYMBOL_GPL(asi_register_class); + +void asi_unregister_class(int index) +{ + spin_lock(&asi_class_lock); + + WARN_ON(asi_class[index].name == NULL); + memset(&asi_class[index], 0, sizeof(struct asi_class)); + + spin_unlock(&asi_class_lock); +} +EXPORT_SYMBOL_GPL(asi_unregister_class); + +int asi_init(struct mm_struct *mm, int asi_index) +{ + struct asi *asi = &mm->asi[asi_index]; + + /* Index 0 is reserved for special purposes. */ + WARN_ON(asi_index == 0 || asi_index >= ASI_MAX_NUM); + WARN_ON(asi->pgd != NULL); + + /* + * For now, we allocate 2 pages to avoid any potential problems with + * KPTI code. This won't be needed once KPTI is folded into the ASI + * framework. + */ + asi->pgd = (pgd_t *)__get_free_pages(GFP_PGTABLE_USER, + PGD_ALLOCATION_ORDER); + if (!asi->pgd) + return -ENOMEM; + + asi->class = &asi_class[asi_index]; + asi->mm = mm; + + return 0; +} +EXPORT_SYMBOL_GPL(asi_init); + +void asi_destroy(struct asi *asi) +{ + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + memset(asi, 0, sizeof(struct asi)); +} +EXPORT_SYMBOL_GPL(asi_destroy); + +static void __asi_enter(void) +{ + u64 asi_cr3; + struct asi *target = this_cpu_read(asi_cpu_state.target_asi); + + VM_BUG_ON(preemptible()); + + if (!target || target == this_cpu_read(asi_cpu_state.curr_asi)) + return; + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == + LOADED_MM_SWITCHING); + + this_cpu_write(asi_cpu_state.curr_asi, target); + + asi_cr3 = build_cr3(target->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + write_cr3(asi_cr3); + + if (target->class->ops.post_asi_enter) + target->class->ops.post_asi_enter(); +} + +void asi_enter(struct asi *asi) +{ + VM_WARN_ON_ONCE(!asi); + + this_cpu_write(asi_cpu_state.target_asi, asi); + barrier(); + + __asi_enter(); +} +EXPORT_SYMBOL_GPL(asi_enter); + +void asi_exit(void) +{ + u64 unrestricted_cr3; + struct asi *asi; + + preempt_disable(); + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == + LOADED_MM_SWITCHING); + + asi = this_cpu_read(asi_cpu_state.curr_asi); + + if (asi) { + if (asi->class->ops.pre_asi_exit) + asi->class->ops.pre_asi_exit(); + + unrestricted_cr3 = + build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + + write_cr3(unrestricted_cr3); + this_cpu_write(asi_cpu_state.curr_asi, NULL); + } + + preempt_enable(); +} +EXPORT_SYMBOL_GPL(asi_exit); + +void asi_init_mm_state(struct mm_struct *mm) +{ + memset(mm->asi, 0, sizeof(mm->asi)); +} diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 1895986842b9..000cbe5315f5 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -238,8 +238,9 @@ static void __init probe_page_size_mask(void) /* By the default is everything supported: */ __default_kernel_pte_mask = __supported_pte_mask; - /* Except when with PTI where the kernel is mostly non-Global: */ - if (cpu_feature_enabled(X86_FEATURE_PTI)) + /* Except when with PTI or ASI where the kernel is mostly non-Global: */ + if (cpu_feature_enabled(X86_FEATURE_PTI) || + IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 59ba2968af1b..88d9298720dc 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -153,7 +153,7 @@ static inline u16 user_pcid(u16 asid) return ret; } -static inline unsigned long build_cr3(pgd_t *pgd, u16 asid) +inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { return __sme_pa(pgd) | kern_pcid(asid); diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index 854c5e07e867..49fcdf9d83f5 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y += param.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h new file mode 100644 index 000000000000..e5ba51d30b90 --- /dev/null +++ b/include/asm-generic/asi.h @@ -0,0 +1,51 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_ASI_H +#define __ASM_GENERIC_ASI_H + +/* ASI class flags */ +#define ASI_MAP_STANDARD_NONSENSITIVE 1 + +#ifndef CONFIG_ADDRESS_SPACE_ISOLATION + +#define ASI_MAX_NUM_ORDER 0 +#define ASI_MAX_NUM 0 + +#ifndef _ASSEMBLY_ + +struct asi_hooks {}; +struct asi {}; + +static inline +int asi_register_class(const char *name, uint flags, + const struct asi_hooks *ops) +{ + return 0; +} + +static inline void asi_unregister_class(int asi_index) { } + +static inline void asi_init_mm_state(struct mm_struct *mm) { } + +static inline int asi_init(struct mm_struct *mm, int asi_index) { return 0; } + +static inline void asi_destroy(struct asi *asi) { } + +static inline void asi_enter(struct asi *asi) { } + +static inline void asi_set_target_unrestricted(void) { } + +static inline bool asi_is_target_unrestricted(void) { return true; } + +static inline void asi_exit(void) { } + +static inline bool is_asi_active(void) { return false; } + +static inline struct asi *asi_get_target(void) { return NULL; } + +static inline struct asi *asi_get_current(void) { return NULL; } + +#endif /* !_ASSEMBLY_ */ + +#endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c3a6e6209600..3de1afa57289 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -18,6 +18,7 @@ #include #include +#include #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 @@ -495,6 +496,8 @@ struct mm_struct { atomic_t membarrier_state; #endif + struct asi asi[ASI_MAX_NUM]; + /** * @mm_users: The number of users including userspace. * diff --git a/kernel/fork.c b/kernel/fork.c index 3244cc56b697..3695a32ee9bd 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -102,6 +102,7 @@ #include #include #include +#include #include @@ -1071,6 +1072,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->def_flags = 0; } + asi_init_mm_state(mm); + if (mm_alloc_pgd(mm)) goto fail_nopgd; diff --git a/security/Kconfig b/security/Kconfig index 0b847f435beb..21b15ecaf2c1 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -65,6 +65,16 @@ config PAGE_TABLE_ISOLATION See Documentation/x86/pti.rst for more details. +config ADDRESS_SPACE_ISOLATION + bool "Allow code to run with a reduced kernel address space" + default n + depends on X86_64 && !UML + depends on !PARAVIRT + help + This feature provides the ability to run some kernel code + with a reduced kernel address space. This can be used to + mitigate some speculative execution attacks. + config SECURITY_INFINIBAND bool "Infiniband Security Hooks" depends on SECURITY && INFINIBAND From patchwork Wed Feb 23 05:21:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756359 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 948FBC433F5 for ; Wed, 23 Feb 2022 05:23:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 141758D0005; Wed, 23 Feb 2022 00:23:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0F2E98D0001; Wed, 23 Feb 2022 00:23:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ED56D8D0005; Wed, 23 Feb 2022 00:23:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id DBEBF8D0001 for ; Wed, 23 Feb 2022 00:23:51 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id A2623805B2 for ; Wed, 23 Feb 2022 05:23:51 +0000 (UTC) X-FDA: 79172902662.05.792F65F Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 2B475140005 for ; Wed, 23 Feb 2022 05:23:51 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d07ae11460so162177217b3.7 for ; Tue, 22 Feb 2022 21:23:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=w8tfMjRpTHBn0meL0Ch0aZ7WXP9rbXxYNEO8B63hsno=; b=davxLdvboa7AnosMDU4YZaonHOXc2mWvmXUYJD5HKzin8o36/qY50HjsjugGaI/D5l Ru80WqGe/FnLjt+8HnlGtPjt4Zi6YBsphfykkNdK7xrLMzwOVzFL31vs2n/ZZ0O/HnpI LOgi7W6aQ+4Zmn4Z1cOc+jbXsnCew4XJAl85YB5PRtJ03TFKlpmUqcnmAJ1Qg9xnXjcb 4yfUNHBaaQoJYVar5RgqzCzuUxniA1Jjald3axQoDsLvTOuO9FNS+ARVJ1i5nxUMdGko 1OO6tdOVImcBTLT1zhrx+dYQEI4jcgxd4WQZmhlrRvv7dBLxB4zea6BzNV1mQ4MyCUUD fnQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=w8tfMjRpTHBn0meL0Ch0aZ7WXP9rbXxYNEO8B63hsno=; b=6042lXKZJYuJG2S1S6iO4iB/CPUxY2uKVMe4wbl1OVkpWyK5fHTeiFXobyewAL5HS1 FjmKQudvCK50ezZihXJ56xkwLSrMrWv95rGokdzgSB+cEUr7TdwNAM0e5baPxd2bYik6 haoJD0SQrtbZTXjwxWZFf5Z9bRynZOHyPbD4tN2jgI4qQ9B/Xg8H9bKa87z1uYn+wKL1 n80cPD4KGRXwkxD/3bZUwxR4U1Hxn86Xhs3dL2CYvPBAFGLOAjuuTe1VW7YVLH0SbX/9 Hm77Ck4cAfQNWCYWt+g6Owdwj6+nWUKgMqxmi4xauBOuWGkZSESW5mQzBNL47oQIlG6W dXfg== X-Gm-Message-State: AOAM5335nfM1bvN/4TJlCknYx9w+0bs4skF5gjpcAvMooEJ91ClUL32O HI1OUzHOhQqjXzgaHyHva3I7MpMxc9o+ X-Google-Smtp-Source: ABdhPJxbYNkojrkaLjILQ8n9XN/i4xd/c0tZZOEcEinUXddidcFfUTlIca2QMtLlpZBc7xT9Yey+F1gwDIpI X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:2551:0:b0:613:2017:b879 with SMTP id l78-20020a252551000000b006132017b879mr26593133ybl.557.1645593830476; Tue, 22 Feb 2022 21:23:50 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:38 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-3-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 02/47] mm: asi: Add command-line parameter to enable/disable ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=davxLdvb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 35sQVYgcKCN0ITM9HCRFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=35sQVYgcKCN0ITM9HCRFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2B475140005 X-Stat-Signature: mg7yey7bj7q7wc6ewwqxubk3x49sepek X-HE-Tag: 1645593831-674789 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A parameter named "asi" is added, disabled by default. A feature flag X86_FEATURE_ASI is set if ASI is enabled. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 17 ++++++++++---- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 ++++++- arch/x86/mm/asi.c | 29 ++++++++++++++++++++++++ arch/x86/mm/init.c | 2 +- include/asm-generic/asi.h | 2 ++ 6 files changed, 53 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f9fc928a555d..0a4af23ed0eb 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -6,6 +6,7 @@ #include #include +#include #ifdef CONFIG_ADDRESS_SPACE_ISOLATION @@ -52,18 +53,24 @@ void asi_exit(void); static inline void asi_set_target_unrestricted(void) { - barrier(); - this_cpu_write(asi_cpu_state.target_asi, NULL); + if (static_cpu_has(X86_FEATURE_ASI)) { + barrier(); + this_cpu_write(asi_cpu_state.target_asi, NULL); + } } static inline struct asi *asi_get_current(void) { - return this_cpu_read(asi_cpu_state.curr_asi); + return static_cpu_has(X86_FEATURE_ASI) + ? this_cpu_read(asi_cpu_state.curr_asi) + : NULL; } static inline struct asi *asi_get_target(void) { - return this_cpu_read(asi_cpu_state.target_asi); + return static_cpu_has(X86_FEATURE_ASI) + ? this_cpu_read(asi_cpu_state.target_asi) + : NULL; } static inline bool is_asi_active(void) @@ -76,6 +83,8 @@ static inline bool asi_is_target_unrestricted(void) return !asi_get_target(); } +#define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ #endif diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index d5b5f2ab87a0..0b0ead3cdd48 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -295,6 +295,7 @@ #define X86_FEATURE_PER_THREAD_MBA (11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */ #define X86_FEATURE_SGX1 (11*32+ 8) /* "" Basic SGX */ #define X86_FEATURE_SGX2 (11*32+ 9) /* "" SGX Enclave Dynamic Memory Management (EDMM) */ +#define X86_FEATURE_ASI (11*32+10) /* Kernel Address Space Isolation */ /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */ #define X86_FEATURE_AVX_VNNI (12*32+ 4) /* AVX VNNI instructions */ diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index 8f28fafa98b3..9659cd9f867d 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -56,6 +56,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +# define DISABLE_ASI 0 +#else +# define DISABLE_ASI (1 << (X86_FEATURE_ASI & 31)) +#endif + /* Force disable because it's broken beyond repair */ #define DISABLE_ENQCMD (1 << (X86_FEATURE_ENQCMD & 31)) @@ -79,7 +85,7 @@ #define DISABLED_MASK8 0 #define DISABLED_MASK9 (DISABLE_SMAP|DISABLE_SGX) #define DISABLED_MASK10 0 -#define DISABLED_MASK11 0 +#define DISABLED_MASK11 (DISABLE_ASI) #define DISABLED_MASK12 0 #define DISABLED_MASK13 0 #define DISABLED_MASK14 0 diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 9928325f3787..d274c86f89b7 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1,5 +1,7 @@ // SPDX-License-Identifier: GPL-2.0 +#include + #include #include #include @@ -18,6 +20,9 @@ int asi_register_class(const char *name, uint flags, { int i; + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + VM_BUG_ON(name == NULL); spin_lock(&asi_class_lock); @@ -43,6 +48,9 @@ EXPORT_SYMBOL_GPL(asi_register_class); void asi_unregister_class(int index) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + spin_lock(&asi_class_lock); WARN_ON(asi_class[index].name == NULL); @@ -52,10 +60,22 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +static int __init set_asi_param(char *str) +{ + if (strcmp(str, "on") == 0) + setup_force_cpu_cap(X86_FEATURE_ASI); + + return 0; +} +early_param("asi", set_asi_param); + int asi_init(struct mm_struct *mm, int asi_index) { struct asi *asi = &mm->asi[asi_index]; + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + /* Index 0 is reserved for special purposes. */ WARN_ON(asi_index == 0 || asi_index >= ASI_MAX_NUM); WARN_ON(asi->pgd != NULL); @@ -79,6 +99,9 @@ EXPORT_SYMBOL_GPL(asi_init); void asi_destroy(struct asi *asi) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); memset(asi, 0, sizeof(struct asi)); } @@ -109,6 +132,9 @@ static void __asi_enter(void) void asi_enter(struct asi *asi) { + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + VM_WARN_ON_ONCE(!asi); this_cpu_write(asi_cpu_state.target_asi, asi); @@ -123,6 +149,9 @@ void asi_exit(void) u64 unrestricted_cr3; struct asi *asi; + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + preempt_disable(); VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 000cbe5315f5..dfff17363365 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -240,7 +240,7 @@ static void __init probe_page_size_mask(void) __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI or ASI where the kernel is mostly non-Global: */ if (cpu_feature_enabled(X86_FEATURE_PTI) || - IS_ENABLED(CONFIG_ADDRESS_SPACE_ISOLATION)) + cpu_feature_enabled(X86_FEATURE_ASI)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index e5ba51d30b90..dae1403ee1d0 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -44,6 +44,8 @@ static inline struct asi *asi_get_target(void) { return NULL; } static inline struct asi *asi_get_current(void) { return NULL; } +#define static_asi_enabled() false + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ From patchwork Wed Feb 23 05:21:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756363 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7868EC4332F for ; Wed, 23 Feb 2022 05:23:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F5A38D0006; Wed, 23 Feb 2022 00:23:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0572C8D0001; Wed, 23 Feb 2022 00:23:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E3B148D0006; Wed, 23 Feb 2022 00:23:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id D6C158D0001 for ; Wed, 23 Feb 2022 00:23:53 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id AED4722E5B for ; Wed, 23 Feb 2022 05:23:53 +0000 (UTC) X-FDA: 79172902746.01.001E268 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf02.hostedemail.com (Postfix) with ESMTP id 3E1E380005 for ; Wed, 23 Feb 2022 05:23:53 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id e129-20020a25d387000000b006245d830ca6so12322628ybf.13 for ; Tue, 22 Feb 2022 21:23:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=o+K8zCWcV59JiWXD4aWine0/FDtELjiPI/RbdPNAHQc=; b=gL62Qc22CZ6FeqYe4RTaen0CJKVsVLhWP6XuIpy8wtIhquKE7Hq2aAgjM0cnFrMx// B9jyIFrBZPnR4ZblulS8U7l5+BX78tipmOGn+uNc7HwBgNSyCtAgwmuxuLupc9yf33fw sG7BRSGiBqlxsiI4E0nPILyGbfSDlu+DOPd9cNUxoIqlK1SF5LShFZNMYNQ5wWc1UUUq JRgPAlM6n7K8F/7k32Qk1plHhKLU8qRnhntMmXKxbAFtecCJ3vVWG8s4ryjAHjKMNW6w HoGY4yajLvjwNYPSxXLVJpFkRwFJsLHr1WAWSZWkzlHrdQQ5OWrVIF2pqwZ0e1qUQPaz mArQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=o+K8zCWcV59JiWXD4aWine0/FDtELjiPI/RbdPNAHQc=; b=rtEuutDtn+e8rmA1FBLAwQyOv/Ht+nsgSFPVSrwSmFpMIJHmUSeQnda7dCeU2ajAQA ZgCnj/ub+mvu7/q+X0CR4cUA5lU5QwCnBGXfn1GOR4TOt6SvRdfdff4gbGdRmeyIgq7i If1QKd3pz4IgS2SzO31ymlIPvTfdD7JNGKy0co3+rJXzQCYhnI3F2brnlYY0aGzKNCPA 0DrAzH38I5+xHhX7NljwPR5FQo75W43aaWZ2RRRgYPWDPFe15wdn4ZrQ6r8j5gZi/QLg MHTqvNSRmD99gzxgcnmlUZGscnD3dIumnv8lk6CMnpPjySU2Yz7Q6ktIvDMjZrnYbyx2 f/6g== X-Gm-Message-State: AOAM530T/kXpPk2Mtl7a8hw9tSVe3FO/qYo9wPJm6Qe7PLGVtEx7SsNQ TiMSIa2lc5yM9A502Q/83+bvQ+UgDjHa X-Google-Smtp-Source: ABdhPJyhZCUKb1E/Ab+htpmdItbzaBqk6IfSNwdQ2cMPET/cQ9m/cxN2+Idkkun2NTapLkyPe3v/k6Dz4k/E X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:a486:0:b0:61d:a523:acd0 with SMTP id g6-20020a25a486000000b0061da523acd0mr25432547ybi.203.1645593832574; Tue, 22 Feb 2022 21:23:52 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:39 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-4-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 03/47] mm: asi: Switch to unrestricted address space when entering scheduler From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=gL62Qc22; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 36MQVYgcKCN8KVOBJETHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=36MQVYgcKCN8KVOBJETHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 3E1E380005 X-Stat-Signature: ht6pa6oqi87d8tifyst3pcr1dyp39xuj X-HE-Tag: 1645593833-935443 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To keep things simpler, we run the scheduler only in the full unrestricted address space for the time being. Signed-off-by: Junaid Shahid --- kernel/sched/core.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 77563109c0ea..44ea197c16ea 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -19,6 +19,7 @@ #include #include +#include #include "../workqueue_internal.h" #include "../../fs/io-wq.h" @@ -6141,6 +6142,10 @@ static void __sched notrace __schedule(unsigned int sched_mode) rq = cpu_rq(cpu); prev = rq->curr; + /* This could possibly be delayed to just before the context switch. */ + VM_WARN_ON(!asi_is_target_unrestricted()); + asi_exit(); + schedule_debug(prev, !!sched_mode); if (sched_feat(HRTICK) || sched_feat(HRTICK_DL)) From patchwork Wed Feb 23 05:21:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756364 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AAE4DC433EF for ; Wed, 23 Feb 2022 05:23:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 428E68D0007; Wed, 23 Feb 2022 00:23:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3DA978D0001; Wed, 23 Feb 2022 00:23:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22B7B8D0007; Wed, 23 Feb 2022 00:23:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0026.hostedemail.com [216.40.44.26]) by kanga.kvack.org (Postfix) with ESMTP id 13F998D0001 for ; Wed, 23 Feb 2022 00:23:56 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B73758249980 for ; Wed, 23 Feb 2022 05:23:55 +0000 (UTC) X-FDA: 79172902830.21.9DFA6FD Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf14.hostedemail.com (Postfix) with ESMTP id 30605100003 for ; Wed, 23 Feb 2022 05:23:55 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id r14-20020a5b018e000000b00624f6f97bf4so306143ybl.12 for ; Tue, 22 Feb 2022 21:23:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kBbjpsiAT8t80DB/Hae/sSQvzOy4duVi7ttnmtZT7iU=; b=cvWzSg8yZgLUgWZKUIjW5TSN+BW3VTrtVOKpGEnHmCq9LFApxaz2ug4Y6Y4TweNu1W f9E1LiZV7paz48nIFQ58vfAnH1K6o0bvbWTXyZRNy2eoNt9EF9GseVfZcFrpOMpaAZsv YfeUwK9fKdGDUIC+bPsLXVrJ/dI3zlbx7hzpBmIG2MmUKT7no3dFhQFnuWlscR6nzDpw D94zRcBAw6obzNOLADy5DNW08Hz9FTyO1IWbrJxXtBCcNGZOBsCLR5Y7BX69Ut6ugHh9 /ICUNuePn6v0tNGb7si9lPoIGDB/04KZ4aNfI9KEiCrUUlX4abSfJJXZ09+2AJGxuoWO 1yZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kBbjpsiAT8t80DB/Hae/sSQvzOy4duVi7ttnmtZT7iU=; b=abdeu5p7QndtQLTm359zmGvc1lENkgdkCgarpnpsg4CgwIiWg5tA5DtkWd5R5lRxQx UuA6WffGH7Oi21x5dhA77XPW7WspgLzUE05/oAWqGhMjB1Sc6dAPHo1sNI1jc6CYG+V3 b6O6PVf+f/XXDpeF3oOkM2XB93lnE+AL0SfopIoEgCfrCx5CAy0IjC8YIG6hprTOcKRC NwXOjCLKY/VuWt7xam+BRQpTmhUhag67nc0jGDLJ9PYWy7aQS4J0IdFWexprNG4KcRns ZU+piX16Fq88XCERtawDjEDCjvlPKUSavD029fUbj+HSruwEFagwkfNy/CxeKGEfPeOU E8LA== X-Gm-Message-State: AOAM533aYLkz181Zo1aRdqPd4VX1WpWxIsk3I6UshiqsjhvzmSTzoNzE SZ079CljjQ5JCyipnXZL2v5fCy9WhqFl X-Google-Smtp-Source: ABdhPJyBLtJwVHx/eQ2H1TXtGc5CWguV1DYKie3D2boh+nhOh7RhoskWr4B/mahF4dAK4TfkpcUJJeAOMAIF X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:1ca:b0:624:e2a1:2856 with SMTP id u10-20020a05690201ca00b00624e2a12856mr4238491ybh.389.1645593834551; Tue, 22 Feb 2022 21:23:54 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:40 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-5-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 04/47] mm: asi: ASI support in interrupts/exceptions From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 30605100003 X-Stat-Signature: 3yzzieo6cgkpfdxy4ejn5tcwmnarrtam Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cvWzSg8y; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 36sQVYgcKCOEMXQDLGVJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=36sQVYgcKCOEMXQDLGVJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1645593835-644038 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add support for potentially switching address spaces from within interrupts/exceptions/NMIs etc. An interrupt does not automatically switch to the unrestricted address space. It can switch if needed to access some memory not available in the restricted address space, using the normal asi_exit call. On return from the outermost interrupt, if the target address space was the restricted address space (e.g. we were in the critical code path between ASI Enter and VM Enter), the restricted address space will be automatically restored. Otherwise, execution will continue in the unrestricted address space until the next explicit ASI Enter. In order to keep track of when to restore the restricted address space, an interrupt/exception nesting depth counter is maintained per-task. An alternative implementation without needing this counter is also possible, but the counter unlocks an additional nice-to-have benefit by allowing detection of whether or not we are currently executing inside an exception context, which would be useful in a later patch. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 35 ++++++++++++++++++++++++++++++++ arch/x86/include/asm/idtentry.h | 25 +++++++++++++++++++++-- arch/x86/include/asm/processor.h | 5 +++++ arch/x86/kernel/process.c | 2 ++ arch/x86/kernel/traps.c | 2 ++ arch/x86/mm/asi.c | 3 ++- kernel/entry/common.c | 6 ++++++ 7 files changed, 75 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 0a4af23ed0eb..7702332c62e8 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -4,6 +4,8 @@ #include +#include + #include #include #include @@ -51,6 +53,11 @@ void asi_destroy(struct asi *asi); void asi_enter(struct asi *asi); void asi_exit(void); +static inline void asi_init_thread_state(struct thread_struct *thread) +{ + thread->intr_nest_depth = 0; +} + static inline void asi_set_target_unrestricted(void) { if (static_cpu_has(X86_FEATURE_ASI)) { @@ -85,6 +92,34 @@ static inline bool asi_is_target_unrestricted(void) #define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) +static inline void asi_intr_enter(void) +{ + if (static_cpu_has(X86_FEATURE_ASI)) { + current->thread.intr_nest_depth++; + barrier(); + } +} + +static inline void asi_intr_exit(void) +{ + void __asi_enter(void); + + if (static_cpu_has(X86_FEATURE_ASI)) { + barrier(); + + if (--current->thread.intr_nest_depth == 0) + __asi_enter(); + } +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +static inline void asi_intr_enter(void) { } + +static inline void asi_intr_exit(void) { } + +static inline void asi_init_thread_state(struct thread_struct *thread) { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ #endif diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 1345088e9902..ea5cdc90403d 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -10,6 +10,7 @@ #include #include +#include /** * DECLARE_IDTENTRY - Declare functions for simple IDT entry points @@ -133,7 +134,16 @@ static __always_inline void __##func(struct pt_regs *regs, \ * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW(func) \ -__visible noinstr void func(struct pt_regs *regs) +static __always_inline void __##func(struct pt_regs *regs); \ + \ +__visible noinstr void func(struct pt_regs *regs) \ +{ \ + asi_intr_enter(); \ + __##func (regs); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs) /** * DECLARE_IDTENTRY_RAW_ERRORCODE - Declare functions for raw IDT entry points @@ -161,7 +171,18 @@ __visible noinstr void func(struct pt_regs *regs) * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW_ERRORCODE(func) \ -__visible noinstr void func(struct pt_regs *regs, unsigned long error_code) +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code); \ + \ +__visible noinstr void func(struct pt_regs *regs, unsigned long error_code)\ +{ \ + asi_intr_enter(); \ + __##func (regs, error_code); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code) /** * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 355d38c0cf60..20116efd2756 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -519,6 +519,11 @@ struct thread_struct { unsigned int iopl_warn:1; unsigned int sig_on_uaccess_err:1; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* The nesting depth of exceptions/interrupts */ + int intr_nest_depth; +#endif + /* * Protection Keys Register for Userspace. Loaded immediately on * context switch. Store it in thread_struct to avoid a lookup in diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index 04143a653a8a..c8d4a00a4de7 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -90,6 +90,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 = NULL; #endif + asi_init_thread_state(&dst->thread); + /* Drop the copied pointer to current's fpstate */ dst->thread.fpu.fpstate = NULL; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index c9d566dcf89a..acf675ddda96 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -61,6 +61,7 @@ #include #include #include +#include #ifdef CONFIG_X86_64 #include @@ -413,6 +414,7 @@ DEFINE_IDTENTRY_DF(exc_double_fault) } #endif + asi_exit(); irqentry_nmi_enter(regs); instrumentation_begin(); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index d274c86f89b7..2453124f221d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -107,12 +107,13 @@ void asi_destroy(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_destroy); -static void __asi_enter(void) +void __asi_enter(void) { u64 asi_cr3; struct asi *target = this_cpu_read(asi_cpu_state.target_asi); VM_BUG_ON(preemptible()); + VM_BUG_ON(current->thread.intr_nest_depth != 0); if (!target || target == this_cpu_read(asi_cpu_state.curr_asi)) return; diff --git a/kernel/entry/common.c b/kernel/entry/common.c index d5a61d565ad5..9064253085c7 100644 --- a/kernel/entry/common.c +++ b/kernel/entry/common.c @@ -9,6 +9,8 @@ #include "common.h" +#include + #define CREATE_TRACE_POINTS #include @@ -321,6 +323,8 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs *regs) .exit_rcu = false, }; + asi_intr_enter(); + if (user_mode(regs)) { irqentry_enter_from_user_mode(regs); return ret; @@ -416,6 +420,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) instrumentation_end(); rcu_irq_exit(); lockdep_hardirqs_on(CALLER_ADDR0); + asi_intr_exit(); return; } @@ -438,6 +443,7 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqentry_state_t state) if (state.exit_rcu) rcu_irq_exit(); } + asi_intr_exit(); } irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs) From patchwork Wed Feb 23 05:21:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756365 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7E4DC433EF for ; Wed, 23 Feb 2022 05:23:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FF718D0008; Wed, 23 Feb 2022 00:23:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 587088D0001; Wed, 23 Feb 2022 00:23:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44E328D0008; Wed, 23 Feb 2022 00:23:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 341F68D0001 for ; Wed, 23 Feb 2022 00:23:58 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EB6559DAFA for ; Wed, 23 Feb 2022 05:23:57 +0000 (UTC) X-FDA: 79172902914.30.26DA092 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf13.hostedemail.com (Postfix) with ESMTP id 8168F20005 for ; Wed, 23 Feb 2022 05:23:57 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id a12-20020a056902056c00b0061dc0f2a94aso26531129ybt.6 for ; Tue, 22 Feb 2022 21:23:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=LI1Qxw8YovJHKPYFPYoCf4datmDsbayp3z46DL8Kp8Q=; b=U7xDPHt7nJyB2GTAgVc+5mVNa5D9AiEIkeTSJpYyuShkSe5G+Xo37Cmu4z2S0ODbd3 Wm5Eeq8cvy6TNvAv3A7bWuNv++i7NTg2KMLHM/m4eHjfRoQqwVJ2y/Yi5MDF7xrDbsQI j5wRAvkCT4cYk3Bc/Ve5H0pU1FD7r6VuCExamGfvXcE67Mlg9W1l1RFlwTf48DP0g9j8 zrQbvmaWMqoIe6J4oo0xoCcquiVzfC7OAHWIqYk7V7cZwcAdfqwpYM8UaM/ZPk1iYlJM o8gS8/KkwFxTjJeXBM9VLMqxRp+ccKcT56QtTvv+HXupkPEETHBW4Z+e7zRt00MQFKUT ixgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=LI1Qxw8YovJHKPYFPYoCf4datmDsbayp3z46DL8Kp8Q=; b=lreRIxzYqpcrF5DEakYs/76yaCIBGctQFlJaP7ITCbOCJTgVuIosytNvNyHeSMXNWa 8s0Meo40PunE16bfp8n7PaDiDQ0VDA/k3adHAf73BybMIBMxfZMnAerM0ZeklzF0Dd1J kvrMmNW0En9jhvg+i21Jinapq7vfWEuTW3KenWywm3YfgTStCMUUNHHsSD9xqwmkH04S Mi/9KWthV/0AggLvMEpOKFvkJYOE1Kvj0VDmpohwXVfH4hLhru4BghjTnAsd/R5BzHYl ztEZAwnWZzvrj3QRP0M78MEttmpo0GYxyuNsL0RKXzkowdW2MQg1UfwliCUpMWcOOsUO g49A== X-Gm-Message-State: AOAM530fxlukstIK/OFNEe+/YSeKLBfYGa5xck36gqZciZgsllJAGE/z CbKOmA8Kr/lWRVdHyFpenN4yBYSYlVz/ X-Google-Smtp-Source: ABdhPJxfznelYhWgkiOo62fJh4kZWSbldwDVUL5AZ4s809e7D3FsyXFvg8XSNCQZyLZ5Vxg+VGGrfUoGnpy5 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:5cc3:0:b0:2d0:a2d0:9c0e with SMTP id q186-20020a815cc3000000b002d0a2d09c0emr27666033ywb.270.1645593836836; Tue, 22 Feb 2022 21:23:56 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:41 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-6-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 05/47] mm: asi: Make __get_current_cr3_fast() ASI-aware From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 8168F20005 X-Stat-Signature: z3xcwf3mwkntoecgo8rn7sp93mnwymo9 X-Rspam-User: Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=U7xDPHt7; spf=pass (imf13.hostedemail.com: domain of 37MQVYgcKCOMOZSFNIXLTTLQJ.HTRQNSZc-RRPaFHP.TWL@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=37MQVYgcKCOMOZSFNIXLTTLQJ.HTRQNSZc-RRPaFHP.TWL@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593837-535245 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When ASI is active, __get_current_cr3_fast() adjusts the returned CR3 value accordingly to reflect the actual ASI CR3. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 7 +++++++ arch/x86/mm/tlb.c | 20 ++++++++++++++++++-- 2 files changed, 25 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 7702332c62e8..95557211dabd 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -112,6 +112,11 @@ static inline void asi_intr_exit(void) } } +static inline pgd_t *asi_pgd(struct asi *asi) +{ + return asi->pgd; +} + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ static inline void asi_intr_enter(void) { } @@ -120,6 +125,8 @@ static inline void asi_intr_exit(void) { } static inline void asi_init_thread_state(struct thread_struct *thread) { } +static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ #endif diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 88d9298720dc..25bee959d1d3 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "mm_internal.h" @@ -1073,12 +1074,27 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) */ unsigned long __get_current_cr3_fast(void) { - unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + unsigned long cr3; + pgd_t *pgd; + u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + struct asi *asi = asi_get_current(); + + if (asi) + pgd = asi_pgd(asi); + else + pgd = this_cpu_read(cpu_tlbstate.loaded_mm)->pgd; + + cr3 = build_cr3(pgd, asid); /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); + /* + * CR3 is unstable if the target ASI is unrestricted + * and a restricted ASI is currently loaded. + */ + VM_WARN_ON_ONCE(asi && asi_is_target_unrestricted()); + VM_BUG_ON(cr3 != __read_cr3()); return cr3; } From patchwork Wed Feb 23 05:21:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756366 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 12AF0C433EF for ; Wed, 23 Feb 2022 05:24:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 821D48D0009; Wed, 23 Feb 2022 00:24:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D1488D0001; Wed, 23 Feb 2022 00:24:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 697ED8D0009; Wed, 23 Feb 2022 00:24:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0104.hostedemail.com [216.40.44.104]) by kanga.kvack.org (Postfix) with ESMTP id 59CFE8D0001 for ; Wed, 23 Feb 2022 00:24:00 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1CB1E9F847 for ; Wed, 23 Feb 2022 05:24:00 +0000 (UTC) X-FDA: 79172903040.11.B8BA771 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf31.hostedemail.com (Postfix) with ESMTP id BDD1620002 for ; Wed, 23 Feb 2022 05:23:59 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id m10-20020a25800a000000b0061daa5b7151so26449037ybk.10 for ; Tue, 22 Feb 2022 21:23:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PIzleJhvx1e4in5SU2cz59acEkt7I4gZXxP/VePrhcs=; b=UOhpXDHCnae+DRnauRxZtdjiTln+VoGZhoV1LC2HwZ0wE6hCMEDx6ZsM5m4zhmzU8C zHAJJxzPwGGULan8LLoI7C3OSmGzlkgZIkeoNkliXHKxR2zxRtRuDwE2nmiSS1c4BXRL MiuUyWSM6+TUYE5RqqQekigHUWwoMR7es7zfFE2KzZ5ooe/XoTE9TQgA48klkJO1uSey E656AcQf7rLEbAA6uzuZiKaqjksL8iwXpjTOMvdQBp3iyrK4xwvnjKTRQ5lhnl0vyEeu pioGnqi8o2SeqNRitqaZjawpEPOB4cWAR7yNk2BDoyC4/ldrIIsdHyj3n/nSEdlK5M0a gypQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PIzleJhvx1e4in5SU2cz59acEkt7I4gZXxP/VePrhcs=; b=2/NkY9Ty1rtJPURjyvdft9qw0zyQSR5yZdCHK1G0H9AUTsnmD76A1uNnSHwI6zzBXy yk5NXZl0ogJ8fIqGmFp0ImeZf2nPDkK79x9eNvQwubgzBPVevE3Ok4wt8eRrLr886YFD 3Nu5zYkzunf8iIAHKj2cOMjylVHWk5NNIn21/z6kflZIkgM19DgjJc9OzeRFg79Na/Hw AgI988qq7A01R8ZrFqXT8yWqdilztWTK6NTx3gEZH9BuBlyjaQE+L+f5uWBg7zJW3dxd olnKYKQGmJ+zdxeJ2aAmqbQUFKCw5xstHu2qm8gg8pVaXYVLDxqt8dNBUM6S+TficOSI vvBw== X-Gm-Message-State: AOAM533KN8Z+aISocyBrGU6t7X0KEOnxhVL2TPYQEKwyUuGNR6hKAGE3 jnap6Se/Xx4cal6NzsRfHkB/3LAXK62P X-Google-Smtp-Source: ABdhPJyZtnc6SgOpidUjkYpln4RLDN4W6lG8iW37Mn66M+p2QX3sw1OB0/Qqkeh8xQlAuLXN36L1X25C92Rt X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:c607:0:b0:2ca:287c:6b6c with SMTP id i7-20020a0dc607000000b002ca287c6b6cmr28060793ywd.17.1645593839000; Tue, 22 Feb 2022 21:23:59 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:42 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-7-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 06/47] mm: asi: ASI page table allocation and free functions From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: BDD1620002 X-Stat-Signature: qtptgootaf5qyo1fer4hbckg8ogrwj35 Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UOhpXDHC; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 378QVYgcKCOYRcVIQLaOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=378QVYgcKCOYRcVIQLaOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1645593839-108071 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds custom allocation and free functions for ASI page tables. The alloc functions support allocating memory using different GFP reclaim flags, in order to be able to support non-sensitive allocations from both standard and atomic contexts. They also install the page tables locklessly, which makes it slightly simpler to handle non-sensitive allocations from interrupts/exceptions. The free functions recursively free the page tables when the ASI instance is being torn down. Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 109 +++++++++++++++++++++++++++++++++++++++- include/linux/pgtable.h | 3 ++ 2 files changed, 111 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 2453124f221d..40d772b2e2a8 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -60,6 +60,113 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +#ifndef mm_inc_nr_p4ds +#define mm_inc_nr_p4ds(mm) do {} while (false) +#endif + +#ifndef mm_dec_nr_p4ds +#define mm_dec_nr_p4ds(mm) do {} while (false) +#endif + +#define pte_offset pte_offset_kernel + +#define DEFINE_ASI_PGTBL_ALLOC(base, level) \ +static level##_t * asi_##level##_alloc(struct asi *asi, \ + base##_t *base, ulong addr, \ + gfp_t flags) \ +{ \ + if (unlikely(base##_none(*base))) { \ + ulong pgtbl = get_zeroed_page(flags); \ + phys_addr_t pgtbl_pa; \ + \ + if (pgtbl == 0) \ + return NULL; \ + \ + pgtbl_pa = __pa(pgtbl); \ + paravirt_alloc_##level(asi->mm, PHYS_PFN(pgtbl_pa)); \ + \ + if (cmpxchg((ulong *)base, 0, \ + pgtbl_pa | _PAGE_TABLE) == 0) { \ + mm_inc_nr_##level##s(asi->mm); \ + } else { \ + paravirt_release_##level(PHYS_PFN(pgtbl_pa)); \ + free_page(pgtbl); \ + } \ + \ + /* NOP on native. PV call on Xen. */ \ + set_##base(base, *base); \ + } \ + VM_BUG_ON(base##_large(*base)); \ + return level##_offset(base, addr); \ +} + +DEFINE_ASI_PGTBL_ALLOC(pgd, p4d) +DEFINE_ASI_PGTBL_ALLOC(p4d, pud) +DEFINE_ASI_PGTBL_ALLOC(pud, pmd) +DEFINE_ASI_PGTBL_ALLOC(pmd, pte) + +#define asi_free_dummy(asi, addr) +#define __pmd_free(mm, pmd) free_page((ulong)(pmd)) +#define pud_page_vaddr(pud) ((ulong)pud_pgtable(pud)) +#define p4d_page_vaddr(p4d) ((ulong)p4d_pgtable(p4d)) + +static inline unsigned long pte_page_vaddr(pte_t pte) +{ + return (unsigned long)__va(pte_val(pte) & PTE_PFN_MASK); +} + +#define DEFINE_ASI_PGTBL_FREE(level, LEVEL, next, free) \ +static void asi_free_##level(struct asi *asi, ulong pgtbl_addr) \ +{ \ + uint i; \ + level##_t *level = (level##_t *)pgtbl_addr; \ + \ + for (i = 0; i < PTRS_PER_##LEVEL; i++) { \ + ulong vaddr; \ + \ + if (level##_none(level[i])) \ + continue; \ + \ + vaddr = level##_page_vaddr(level[i]); \ + \ + if (!level##_leaf(level[i])) \ + asi_free_##next(asi, vaddr); \ + else \ + VM_WARN(true, "Lingering mapping in ASI %p at %lx",\ + asi, vaddr); \ + } \ + paravirt_release_##level(PHYS_PFN(__pa(pgtbl_addr))); \ + free(asi->mm, level); \ + mm_dec_nr_##level##s(asi->mm); \ +} + +DEFINE_ASI_PGTBL_FREE(pte, PTE, dummy, pte_free_kernel) +DEFINE_ASI_PGTBL_FREE(pmd, PMD, pte, __pmd_free) +DEFINE_ASI_PGTBL_FREE(pud, PUD, pmd, pud_free) +DEFINE_ASI_PGTBL_FREE(p4d, P4D, pud, p4d_free) + +static void asi_free_pgd_range(struct asi *asi, uint start, uint end) +{ + uint i; + + for (i = start; i < end; i++) + if (pgd_present(asi->pgd[i])) + asi_free_p4d(asi, (ulong)p4d_offset(asi->pgd + i, 0)); +} + +/* + * Free the page tables allocated for the given ASI instance. + * The caller must ensure that all the mappings have already been cleared + * and appropriate TLB flushes have been issued before calling this function. + */ +static void asi_free_pgd(struct asi *asi) +{ + VM_BUG_ON(asi->mm == &init_mm); + + asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); +} + static int __init set_asi_param(char *str) { if (strcmp(str, "on") == 0) @@ -102,7 +209,7 @@ void asi_destroy(struct asi *asi) if (!boot_cpu_has(X86_FEATURE_ASI)) return; - free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + asi_free_pgd(asi); memset(asi, 0, sizeof(struct asi)); } EXPORT_SYMBOL_GPL(asi_destroy); diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index e24d2c992b11..2fff17a939f0 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1593,6 +1593,9 @@ typedef unsigned int pgtbl_mod_mask; #ifndef pmd_leaf #define pmd_leaf(x) 0 #endif +#ifndef pte_leaf +#define pte_leaf(x) 1 +#endif #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) From patchwork Wed Feb 23 05:21:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756367 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4A139C4332F for ; Wed, 23 Feb 2022 05:24:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDC4F8D0002; Wed, 23 Feb 2022 00:24:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C8BCB8D0001; Wed, 23 Feb 2022 00:24:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8EDF8D0002; Wed, 23 Feb 2022 00:24:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0109.hostedemail.com [216.40.44.109]) by kanga.kvack.org (Postfix) with ESMTP id 982868D0001 for ; Wed, 23 Feb 2022 00:24:02 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 3C82C180FC129 for ; Wed, 23 Feb 2022 05:24:02 +0000 (UTC) X-FDA: 79172903124.25.D75FE5F Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf20.hostedemail.com (Postfix) with ESMTP id B9ACE1C0006 for ; Wed, 23 Feb 2022 05:24:01 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id k7-20020a255607000000b00621afc793b8so26766768ybb.1 for ; Tue, 22 Feb 2022 21:24:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4OZ+JPmRPuew/8tnwetStHaUs2Nh17IXa7mxxiAMwj8=; b=kb4u7VjrnfodFdsXodDcDOsogb6ps0VFuoU2Ch/2lXixZ/HfftZw3kdktGLZNqJaaS sS3He2L5FPw70IJb8AKJo1NDhJWYKpPBXq/OhYV1okDxsl0mvsUmyEav/HUJeog/Gj9y Q5Dc8pVb07blL49n409GgYRDrcKVQVQRVFNARfm5uMsXWp3Yd02cr7kkTrGmuAR1Xu1E sxdm0LFpj5VhSie+6tVTejQjNAf6nTCGVLsZ8GtzBSgzjYfKnMkFSPkXEgjwcV5k4Zwo pwZnNuNgZRjJww5tQ8PUPFXrFLOBqqQa1ttWIBXg8Sv/mjjUaM/rS2VlzHPfgRo69Xqa n3Jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4OZ+JPmRPuew/8tnwetStHaUs2Nh17IXa7mxxiAMwj8=; b=c3EJ9aMQiM8OUwUUWvpCYo6UVaMNx3KjgixZy7EmycH5VJFBOAA+/7CeNMIR/ET1lj fMjvIfkPZqXLwUOwbqj5KV+sCeIYpki3cZnOb+ajC6N7HOiym95DS7BHDbqikvgqKFCX IvRzS7CDJZiDbH+qP9jPAuNQFmL0CAciAEah2QefAocSc4hGhD1ftZQgLWSCqNR3mk/8 YoZobHmOvB3NWEYyo43MYjQWCD2gnHaVSxZFeiG/GSUzjbgK5lHxqHETH2PZUZwLN4rg c/+FTKHC7o8I/Wu/PVAPh0WlHwXCco7hQZbSOhozRTV/tz/9xrgf0qm1OIL1FDEM6OwA bqiA== X-Gm-Message-State: AOAM531MbnCtL3SdVXE8VHcEUDjsYxQbVD6gQ56a6Ly1ukJZ7NlAv06M 8/dfDz5CrU7X34N5ryz6FL16bXzM7ZA3 X-Google-Smtp-Source: ABdhPJznarXNTwMlwRo1R3Ew9ob69h2spGBZggFbqvzN1k2HjtUx3qHkoNOb8UlGcjxUj887oMmqSttakWi9 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:eb09:0:b0:2d1:e0df:5104 with SMTP id u9-20020a0deb09000000b002d1e0df5104mr27667696ywe.250.1645593841036; Tue, 22 Feb 2022 21:24:01 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:43 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-8-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 07/47] mm: asi: Functions to map/unmap a memory range into ASI page tables From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kb4u7Vjr; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 38cQVYgcKCOgTeXKSNcQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=38cQVYgcKCOgTeXKSNcQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: B9ACE1C0006 X-Stat-Signature: bm5hramwj6icgnucgxbaibt49wdqfqzc X-HE-Tag: 1645593841-854175 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Two functions, asi_map() and asi_map_gfp(), are added to allow mapping memory into ASI page tables. The mapping will be identical to the one for the same virtual address in the unrestricted page tables. This is necessary to allow switching between the page tables at any arbitrary point in the kernel. Another function, asi_unmap() is added to allow unmapping memory mapped via asi_map* Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 5 + arch/x86/mm/asi.c | 196 +++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 19 ++++ mm/internal.h | 3 + mm/vmalloc.c | 60 +++++++----- 5 files changed, 261 insertions(+), 22 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 95557211dabd..521b40d1864b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -53,6 +53,11 @@ void asi_destroy(struct asi *asi); void asi_enter(struct asi *asi); void asi_exit(void); +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags); +int asi_map(struct asi *asi, void *addr, size_t len); +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb); +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->intr_nest_depth = 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 40d772b2e2a8..84d220cbdcfc 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -6,6 +6,8 @@ #include #include +#include "../../../mm/internal.h" + #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt @@ -287,3 +289,197 @@ void asi_init_mm_state(struct mm_struct *mm) { memset(mm->asi, 0, sizeof(mm->asi)); } + +static bool is_page_within_range(size_t addr, size_t page_size, + size_t range_start, size_t range_end) +{ + size_t page_start, page_end, page_mask; + + page_mask = ~(page_size - 1); + page_start = addr & page_mask; + page_end = page_start + page_size; + + return page_start >= range_start && page_end <= range_end; +} + +static bool follow_physaddr(struct mm_struct *mm, size_t virt, + phys_addr_t *phys, size_t *page_size, ulong *flags) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + +#define follow_addr_at_level(base, level, LEVEL) \ + do { \ + *page_size = LEVEL##_SIZE; \ + level = level##_offset(base, virt); \ + if (!level##_present(*level)) \ + return false; \ + \ + if (level##_large(*level)) { \ + *phys = PFN_PHYS(level##_pfn(*level)) | \ + (virt & ~LEVEL##_MASK); \ + *flags = level##_flags(*level); \ + return true; \ + } \ + } while (false) + + follow_addr_at_level(mm, pgd, PGDIR); + follow_addr_at_level(pgd, p4d, P4D); + follow_addr_at_level(p4d, pud, PUD); + follow_addr_at_level(pud, pmd, PMD); + + *page_size = PAGE_SIZE; + pte = pte_offset_map(pmd, virt); + if (!pte) + return false; + + if (!pte_present(*pte)) { + pte_unmap(pte); + return false; + } + + *phys = PFN_PHYS(pte_pfn(*pte)) | (virt & ~PAGE_MASK); + *flags = pte_flags(*pte); + + pte_unmap(pte); + return true; + +#undef follow_addr_at_level +} + +/* + * Map the given range into the ASI page tables. The source of the mapping + * is the regular unrestricted page tables. + * Can be used to map any kernel memory. + * + * The caller MUST ensure that the source mapping will not change during this + * function. For dynamic kernel memory, this is generally ensured by mapping + * the memory within the allocator. + * + * If the source mapping is a large page and the range being mapped spans the + * entire large page, then it will be mapped as a large page in the ASI page + * tables too. If the range does not span the entire huge page, then it will + * be mapped as smaller pages. In that case, the implementation is slightly + * inefficient, as it will walk the source page tables again for each small + * destination page, but that should be ok for now, as usually in such cases, + * the range would consist of a small-ish number of pages. + */ +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + size_t virt; + size_t start = (size_t)addr; + size_t end = start + len; + size_t page_size; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return 0; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(start < TASK_SIZE_MAX); + + gfp_flags &= GFP_RECLAIM_MASK; + + if (asi->mm != &init_mm) + gfp_flags |= __GFP_ACCOUNT; + + for (virt = start; virt < end; virt = ALIGN(virt + 1, page_size)) { + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + phys_addr_t phys; + ulong flags; + + if (!follow_physaddr(asi->mm, virt, &phys, &page_size, &flags)) + continue; + +#define MAP_AT_LEVEL(base, BASE, level, LEVEL) { \ + if (base##_large(*base)) { \ + VM_BUG_ON(PHYS_PFN(phys & BASE##_MASK) != \ + base##_pfn(*base)); \ + continue; \ + } \ + \ + level = asi_##level##_alloc(asi, base, virt, gfp_flags);\ + if (!level) \ + return -ENOMEM; \ + \ + if (page_size >= LEVEL##_SIZE && \ + (level##_none(*level) || level##_leaf(*level)) && \ + is_page_within_range(virt, LEVEL##_SIZE, \ + start, end)) { \ + page_size = LEVEL##_SIZE; \ + phys &= LEVEL##_MASK; \ + \ + if (level##_none(*level)) \ + set_##level(level, \ + __##level(phys | flags)); \ + else \ + VM_BUG_ON(level##_pfn(*level) != \ + PHYS_PFN(phys)); \ + continue; \ + } \ + } + + pgd = pgd_offset_pgd(asi->pgd, virt); + + MAP_AT_LEVEL(pgd, PGDIR, p4d, P4D); + MAP_AT_LEVEL(p4d, P4D, pud, PUD); + MAP_AT_LEVEL(pud, PUD, pmd, PMD); + MAP_AT_LEVEL(pmd, PMD, pte, PAGE); + + VM_BUG_ON(true); /* Should never reach here. */ +#undef MAP_AT_LEVEL + } + + return 0; +} + +int asi_map(struct asi *asi, void *addr, size_t len) +{ + return asi_map_gfp(asi, addr, len, GFP_KERNEL); +} + +/* + * Unmap a kernel address range previously mapped into the ASI page tables. + * The caller must ensure appropriate TLB flushing. + * + * The area being unmapped must be a whole previously mapped region (or regions) + * Unmapping a partial subset of a previously mapped region is not supported. + * That will work, but may end up unmapping more than what was asked for, if + * the mapping contained huge pages. + * + * Note that higher order direct map allocations are allowed to be partially + * freed. If it turns out that that actually happens for any of the + * non-sensitive allocations, then the above limitation may be a problem. For + * now, vunmap_pgd_range() will emit a warning if this situation is detected. + */ +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) +{ + size_t start = (size_t)addr; + size_t end = start + len; + pgtbl_mod_mask mask = 0; + + if (!static_cpu_has(X86_FEATURE_ASI) || !len) + return; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(start < TASK_SIZE_MAX); + + vunmap_pgd_range(asi->pgd, start, end, &mask, false); + + if (flush_tlb) + asi_flush_tlb_range(asi, addr, len); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + /* Later patches will do a more optimized flush. */ + flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index dae1403ee1d0..7da91cbe075d 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -2,6 +2,8 @@ #ifndef __ASM_GENERIC_ASI_H #define __ASM_GENERIC_ASI_H +#include + /* ASI class flags */ #define ASI_MAP_STANDARD_NONSENSITIVE 1 @@ -44,6 +46,23 @@ static inline struct asi *asi_get_target(void) { return NULL; } static inline struct asi *asi_get_current(void) { return NULL; } +static inline +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + return 0; +} + +static inline int asi_map(struct asi *asi, void *addr, size_t len) +{ + return 0; +} + +static inline +void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } + +static inline +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } + #define static_asi_enabled() false #endif /* !_ASSEMBLY_ */ diff --git a/mm/internal.h b/mm/internal.h index 3b79a5c9427a..ae8799d86dd3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -79,6 +79,9 @@ void unmap_page_range(struct mmu_gather *tlb, unsigned long addr, unsigned long end, struct zap_details *details); +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask, bool sleepable); + void do_page_cache_ra(struct readahead_control *, unsigned long nr_to_read, unsigned long lookahead_size); void force_page_cache_ra(struct readahead_control *, unsigned long nr); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index d2a00ad4e1dd..f2ef719f1cba 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -336,7 +336,7 @@ static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, } static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { pmd_t *pmd; unsigned long next; @@ -350,18 +350,22 @@ static void vunmap_pmd_range(pud_t *pud, unsigned long addr, unsigned long end, if (cleared || pmd_bad(*pmd)) *mask |= PGTBL_PMD_MODIFIED; - if (cleared) + if (cleared) { + WARN_ON(addr & ~PMD_MASK); + WARN_ON(next & ~PMD_MASK); continue; + } if (pmd_none_or_clear_bad(pmd)) continue; vunmap_pte_range(pmd, addr, next, mask); - cond_resched(); + if (sleepable) + cond_resched(); } while (pmd++, addr = next, addr != end); } static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { pud_t *pud; unsigned long next; @@ -375,16 +379,19 @@ static void vunmap_pud_range(p4d_t *p4d, unsigned long addr, unsigned long end, if (cleared || pud_bad(*pud)) *mask |= PGTBL_PUD_MODIFIED; - if (cleared) + if (cleared) { + WARN_ON(addr & ~PUD_MASK); + WARN_ON(next & ~PUD_MASK); continue; + } if (pud_none_or_clear_bad(pud)) continue; - vunmap_pmd_range(pud, addr, next, mask); + vunmap_pmd_range(pud, addr, next, mask, sleepable); } while (pud++, addr = next, addr != end); } static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, - pgtbl_mod_mask *mask) + pgtbl_mod_mask *mask, bool sleepable) { p4d_t *p4d; unsigned long next; @@ -398,14 +405,35 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, if (cleared || p4d_bad(*p4d)) *mask |= PGTBL_P4D_MODIFIED; - if (cleared) + if (cleared) { + WARN_ON(addr & ~P4D_MASK); + WARN_ON(next & ~P4D_MASK); continue; + } if (p4d_none_or_clear_bad(p4d)) continue; - vunmap_pud_range(p4d, addr, next, mask); + vunmap_pud_range(p4d, addr, next, mask, sleepable); } while (p4d++, addr = next, addr != end); } +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask, bool sleepable) +{ + unsigned long next; + pgd_t *pgd = pgd_offset_pgd(pgd_table, addr); + + BUG_ON(addr >= end); + + do { + next = pgd_addr_end(addr, end); + if (pgd_bad(*pgd)) + *mask |= PGTBL_PGD_MODIFIED; + if (pgd_none_or_clear_bad(pgd)) + continue; + vunmap_p4d_range(pgd, addr, next, mask, sleepable); + } while (pgd++, addr = next, addr != end); +} + /* * vunmap_range_noflush is similar to vunmap_range, but does not * flush caches or TLBs. @@ -420,21 +448,9 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, */ void vunmap_range_noflush(unsigned long start, unsigned long end) { - unsigned long next; - pgd_t *pgd; - unsigned long addr = start; pgtbl_mod_mask mask = 0; - BUG_ON(addr >= end); - pgd = pgd_offset_k(addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) - mask |= PGTBL_PGD_MODIFIED; - if (pgd_none_or_clear_bad(pgd)) - continue; - vunmap_p4d_range(pgd, addr, next, &mask); - } while (pgd++, addr = next, addr != end); + vunmap_pgd_range(init_mm.pgd, start, end, &mask, true); if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); From patchwork Wed Feb 23 05:21:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756368 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C328C43219 for ; Wed, 23 Feb 2022 05:24:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B8ED8D000A; Wed, 23 Feb 2022 00:24:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 169948D0001; Wed, 23 Feb 2022 00:24:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 007EE8D000A; Wed, 23 Feb 2022 00:24:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id E3C1E8D0001 for ; Wed, 23 Feb 2022 00:24:04 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 85B0F181AEF3F for ; Wed, 23 Feb 2022 05:24:04 +0000 (UTC) X-FDA: 79172903208.21.A0B1448 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 10C90120002 for ; Wed, 23 Feb 2022 05:24:03 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d07ae11462so163390657b3.8 for ; Tue, 22 Feb 2022 21:24:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=rf+PEwSCAjx96OEampym4lZ49QLmyoHK2vvPo63tKDU=; b=ClDNkRcR05xwUIt+/3suGkjgzNs7jx8crmYYSBy7YIfyMwXpzxpsuO7L5+YX8vxRfh 1E4PSbB3uEBG+XkPHeMZghVckls3s0XgENlVkTaPWZjkwqbjhFcBMXRc0pNft0u4lgb5 OHkuB86fnETxs+Q/fn9lSwDJhVOhXOzfA1+wbVB6gzJyDg9tXEAFaT4CQ0u+HBjQvZbK xaLThn55HGR54AO8ID3Zi3c8s3jxcmEzd75ic5aZDYq6ErsSaIKBY/8EVr+KDDNXwZv5 bVHGd/T3R6cR1QrhXBnqEawoY6M01XywGeiIoQ1z1l9A03h+cXgMumLSGriK6mCj3H15 pxlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=rf+PEwSCAjx96OEampym4lZ49QLmyoHK2vvPo63tKDU=; b=ocgXIUNhLfLR6S232R30bnlJWQcOQQ618qMegsscJmgIcMjvDtMtUoyEUWGrLhW7xB HAZAAGti5W5TsBysF6x83zmu1BhG9oKcjBV+HE85quU0ZWAEcmNWR6oj4s/ZEtC6LY7j AN8jTz2apitKH1lffZcjaG323R/tK0LfEDKhmZ4EYBU2VFj1nLVwai4CPIOt4dyxnxU9 bMoNhWNzoPiD0zE4GzyycOCiF9+2QO4Lljvp5psteOh76/qNwUyffuF/2Pl/14ytZFf6 80ZqZKVNv9sX0eKunW8so7JLCXcUMfpbNRBdvmKVI7sngjV2zi/V4jlAy63qSV9ZXf6d 1Y3A== X-Gm-Message-State: AOAM532a4IbGgsO4ln/s12N3mebo4x7AW2IJSSvWgzz64gHUsifDp0UL pvieueFVrJyYxyr0iUbez4In3sdMwYEV X-Google-Smtp-Source: ABdhPJxAE8giXqGIDJh7Mviar4JsaaTJZtQQuDBTZF65UTbW8rTNqxt0+PJdM/jBytNMKHkW6V66FJSXIQ6n X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:108:b0:621:165e:5c1e with SMTP id o8-20020a056902010800b00621165e5c1emr25436069ybh.204.1645593843385; Tue, 22 Feb 2022 21:24:03 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:44 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-9-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 08/47] mm: asi: Add basic infrastructure for global non-sensitive mappings From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ClDNkRcR; spf=pass (imf29.hostedemail.com: domain of 388QVYgcKCOoVgZMUPeSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=388QVYgcKCOoVgZMUPeSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 10C90120002 X-Stat-Signature: id4hz4r8843gna8zxi9ar5eunzqtwz3m X-HE-Tag: 1645593843-763322 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A pseudo-PGD is added to store global non-sensitive ASI mappings. Actual ASI PGDs copy entries from this pseudo-PGD during asi_init(). Memory can be mapped as globally non-sensitive by calling asi_map() with ASI_GLOBAL_NONSENSITIVE. Page tables allocated for global non-sensitive mappings are never freed. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 12 ++++++++++++ arch/x86/mm/asi.c | 36 +++++++++++++++++++++++++++++++++++- arch/x86/mm/init_64.c | 26 +++++++++++++++++--------- arch/x86/mm/mm_internal.h | 3 +++ include/asm-generic/asi.h | 5 +++++ mm/init-mm.c | 2 ++ 6 files changed, 74 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 521b40d1864b..64c2b4d1dba2 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -15,6 +15,8 @@ #define ASI_MAX_NUM_ORDER 2 #define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) +#define ASI_GLOBAL_NONSENSITIVE (&init_mm.asi[0]) + struct asi_state { struct asi *curr_asi; struct asi *target_asi; @@ -41,6 +43,8 @@ struct asi { DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +extern pgd_t asi_global_nonsensitive_pgd[]; + void asi_init_mm_state(struct mm_struct *mm); int asi_register_class(const char *name, uint flags, @@ -117,6 +121,14 @@ static inline void asi_intr_exit(void) } } +#define INIT_MM_ASI(init_mm) \ + .asi = { \ + [0] = { \ + .pgd = asi_global_nonsensitive_pgd, \ + .mm = &init_mm \ + } \ + }, + static inline pgd_t *asi_pgd(struct asi *asi) { return asi->pgd; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 84d220cbdcfc..d381ae573af9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1,11 +1,13 @@ // SPDX-License-Identifier: GPL-2.0 #include +#include #include #include #include +#include "mm_internal.h" #include "../../../mm/internal.h" #undef pr_fmt @@ -17,6 +19,8 @@ static DEFINE_SPINLOCK(asi_class_lock); DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); +__aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -160,12 +164,17 @@ static void asi_free_pgd_range(struct asi *asi, uint start, uint end) * Free the page tables allocated for the given ASI instance. * The caller must ensure that all the mappings have already been cleared * and appropriate TLB flushes have been issued before calling this function. + * + * For standard non-sensitive ASI classes, the page tables shared with the + * master pseudo-PGD are not freed. */ static void asi_free_pgd(struct asi *asi) { VM_BUG_ON(asi->mm == &init_mm); - asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + if (!(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) + asi_free_pgd_range(asi, KERNEL_PGD_BOUNDARY, PTRS_PER_PGD); + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); } @@ -178,6 +187,24 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); +static int __init asi_global_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + + preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, + PAGE_OFFSET, + PAGE_OFFSET + PFN_PHYS(max_possible_pfn) - 1, + "ASI Global Non-sensitive direct map"); + + preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, + VMALLOC_START, VMALLOC_END, + "ASI Global Non-sensitive vmalloc"); + + return 0; +} +subsys_initcall(asi_global_init) + int asi_init(struct mm_struct *mm, int asi_index) { struct asi *asi = &mm->asi[asi_index]; @@ -202,6 +229,13 @@ int asi_init(struct mm_struct *mm, int asi_index) asi->class = &asi_class[asi_index]; asi->mm = mm; + if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { + uint i; + + for (i = KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) + set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + } + return 0; } EXPORT_SYMBOL_GPL(asi_init); diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 36098226a957..ebd512c64ed0 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1277,18 +1277,15 @@ static void __init register_page_bootmem_info(void) #endif } -/* - * Pre-allocates page-table pages for the vmalloc area in the kernel page-table. - * Only the level which needs to be synchronized between all page-tables is - * allocated because the synchronization can be expensive. - */ -static void __init preallocate_vmalloc_pages(void) +void __init preallocate_toplevel_pgtbls(pgd_t *pgd_table, + ulong start, ulong end, + const char *name) { unsigned long addr; const char *lvl; - for (addr = VMALLOC_START; addr <= VMALLOC_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) { - pgd_t *pgd = pgd_offset_k(addr); + for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) { + pgd_t *pgd = pgd_offset_pgd(pgd_table, addr); p4d_t *p4d; pud_t *pud; @@ -1324,7 +1321,18 @@ static void __init preallocate_vmalloc_pages(void) * The pages have to be there now or they will be missing in * process page-tables later. */ - panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl); + panic("Failed to pre-allocate %s pages for %s area\n", lvl, name); +} + +/* + * Pre-allocates page-table pages for the vmalloc area in the kernel page-table. + * Only the level which needs to be synchronized between all page-tables is + * allocated because the synchronization can be expensive. + */ +static void __init preallocate_vmalloc_pages(void) +{ + preallocate_toplevel_pgtbls(init_mm.pgd, VMALLOC_START, VMALLOC_END, + "vmalloc"); } void __init mem_init(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index 3f37b5c80bb3..a1e8c523ab08 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -19,6 +19,9 @@ unsigned long kernel_physical_mapping_change(unsigned long start, unsigned long page_size_mask); void zone_sizes_init(void); +void preallocate_toplevel_pgtbls(pgd_t *pgd_table, ulong start, ulong end, + const char *name); + extern int after_bootmem; void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7da91cbe075d..012691e29895 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -12,6 +12,8 @@ #define ASI_MAX_NUM_ORDER 0 #define ASI_MAX_NUM 0 +#define ASI_GLOBAL_NONSENSITIVE NULL + #ifndef _ASSEMBLY_ struct asi_hooks {}; @@ -63,8 +65,11 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } +#define INIT_MM_ASI(init_mm) + #define static_asi_enabled() false + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/mm/init-mm.c b/mm/init-mm.c index b4a6f38fb51d..47a6a66610fb 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -11,6 +11,7 @@ #include #include #include +#include #ifndef INIT_MM_CONTEXT #define INIT_MM_CONTEXT(name) @@ -38,6 +39,7 @@ struct mm_struct init_mm = { .mmlist = LIST_HEAD_INIT(init_mm.mmlist), .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, + INIT_MM_ASI(init_mm) INIT_MM_CONTEXT(init_mm) }; From patchwork Wed Feb 23 05:21:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756369 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CAA85C433F5 for ; Wed, 23 Feb 2022 05:24:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6762D8D000B; Wed, 23 Feb 2022 00:24:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D4218D0001; Wed, 23 Feb 2022 00:24:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44D948D000B; Wed, 23 Feb 2022 00:24:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 339D98D0001 for ; Wed, 23 Feb 2022 00:24:07 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0595620765 for ; Wed, 23 Feb 2022 05:24:07 +0000 (UTC) X-FDA: 79172903334.07.94556EA Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 857BA40004 for ; Wed, 23 Feb 2022 05:24:06 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d6b6cf0cafso150366007b3.21 for ; Tue, 22 Feb 2022 21:24:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=A2kZT9A4P8bW0lI9YOtf1wQA9jJkQ1koWvhOh//ocUo=; b=fbSsueZGuBScgzqcY02VIl0ILFLXpPlKdSo5ctJYl8kSPI7w0kDZ1uGY8ujtC+fl+e PEzsS5j+6h2lyRr/UrTBatrXvU4f3NErvUAR7v4X1dzKu33iDdHE2QCAop5cuA3atf5U EJ4V1Fx706MmP14iSjNuJdP3U5u2B16C5Fj3e/UoRc8fZC4srXBJUArKZJ67Qxmu0hNA BsPOvLUKtyHj9nuWAemSw1G1+wDLXUYyYOFHHc5fb0JSBY7PBPNYf4tS05DMNxCxy6ed 1VKdWkC7HesYHDOsvNnF5BnhxEmTplkx92VqdXNvb7r03sGqA/SDTfz8vVfIQeADD5L1 ir/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=A2kZT9A4P8bW0lI9YOtf1wQA9jJkQ1koWvhOh//ocUo=; b=1D80Akdi91IKOIJKSJSzvXFiJlf5nStQPZzWIYLyZVc/ENd7GXAY3Cz+PksI2V8kuU yVNY6uOY5T98GDZm6v5ljfKEr5CnllgPXNYjpExNWVz9TSkDg6S5mOY82g7xCksJNmum i5EOcGtW0xzTOzTiNSO67k+tP5TQ3mScJk9wkKQC+8pw/W6BH/JgD8uYxqWidKus7NbF MSylqYDgrDZ3205Zo1GzXoiqHqC2boX6df669OA4QXXNHKeiy2icW6hWj1yObJ3lqF9Z XOkcmZWZwLd5r5sEzoM79+oMJSa/2IJS2A/yAbZbTvQ4R1kn6mczHkiKJWVwXi3mL2i1 m8LQ== X-Gm-Message-State: AOAM530IZOWhZVBhrBKAHVL6DB0JdVd9Xg6WbJgXcG83BXO3/E5lhC+q UuRb0duzBoDhL8fhHxT3Zhl7Y2rMy6Xi X-Google-Smtp-Source: ABdhPJx1fLXJ2Nd+XyAyj3SwtD/6yx9YWclO/RpwDKJJBLCMeXbY/FNMqCpTASogEOd+LsJ96uLWXmi1HY+9 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1d8c:0:b0:2cb:da76:5da8 with SMTP id d134-20020a811d8c000000b002cbda765da8mr27707177ywd.165.1645593845809; Tue, 22 Feb 2022 21:24:05 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:45 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-10-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 09/47] mm: Add __PAGEFLAG_FALSE From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 857BA40004 X-Stat-Signature: ypwo4n9oaswtq3xd6tmemhao6kmy333f X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fbSsueZG; spf=pass (imf11.hostedemail.com: domain of 39cQVYgcKCOwXibOWRgUccUZS.QcaZWbil-aaYjOQY.cfU@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=39cQVYgcKCOwXibOWRgUccUZS.QcaZWbil-aaYjOQY.cfU@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593846-660926 X-Bogosity: Ham, tests=bogofilter, spamicity=0.008290, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: __PAGEFLAG_FALSE is a non-atomic equivalent of PAGEFLAG_FALSE. Signed-off-by: Junaid Shahid --- include/linux/page-flags.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b5f14d581113..b90a17e9796d 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -390,6 +390,10 @@ static inline int Page##uname(const struct page *page) { return 0; } static inline void folio_set_##lname(struct folio *folio) { } \ static inline void SetPage##uname(struct page *page) { } +#define __SETPAGEFLAG_NOOP(uname, lname) \ +static inline void __folio_set_##lname(struct folio *folio) { } \ +static inline void __SetPage##uname(struct page *page) { } + #define CLEARPAGEFLAG_NOOP(uname, lname) \ static inline void folio_clear_##lname(struct folio *folio) { } \ static inline void ClearPage##uname(struct page *page) { } @@ -411,6 +415,9 @@ static inline int TestClearPage##uname(struct page *page) { return 0; } #define PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ SETPAGEFLAG_NOOP(uname, lname) CLEARPAGEFLAG_NOOP(uname, lname) +#define __PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ + __SETPAGEFLAG_NOOP(uname, lname) __CLEARPAGEFLAG_NOOP(uname, lname) + #define TESTSCFLAG_FALSE(uname, lname) \ TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) From patchwork Wed Feb 23 05:21:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756370 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EA44C433FE for ; Wed, 23 Feb 2022 05:24:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 90BEC8D000C; Wed, 23 Feb 2022 00:24:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7F7B28D0001; Wed, 23 Feb 2022 00:24:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 697588D000C; Wed, 23 Feb 2022 00:24:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 58D6F8D0001 for ; Wed, 23 Feb 2022 00:24:09 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 1D0709F84C for ; Wed, 23 Feb 2022 05:24:09 +0000 (UTC) X-FDA: 79172903418.11.3CA12AA Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf18.hostedemail.com (Postfix) with ESMTP id A185C1C0002 for ; Wed, 23 Feb 2022 05:24:08 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id e129-20020a25d387000000b006245d830ca6so12323101ybf.13 for ; Tue, 22 Feb 2022 21:24:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=gMgbv5PxdERvilfG13W1/59w3Y4MPufgzH9ipT5mZcE=; b=b6urhxkWGfnvarQIxVi9atsKIrKLwBXqKa9HpQGtESTNRqGIm9vkGf3iqdrTrF3WZR 5m1o2lHCH4V6NfBZsxlk8GhGN3fJPzlR3rYLD19suBuPDuixp1NPgPg6U6hpUzZvTY5L IgBE58eLo2ZZzbqp5xEwBHiVFVtDJ87nrKLza6quhghsvVIu54Q0IHEcJ3cNhC/sxTwh jZftmgJSZ3zhdFt219bc3VfY34rKbPUl3RGNBdUPaeK8YOXIcRYVBQ85kA7yXhGQqR5+ DMF/TwW/QA+HqXhMMcxPWsADaTGbuLYcULrIcRh5s/jS84XHXJ0eFx/xBfm7XL6hFrwa +MKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=gMgbv5PxdERvilfG13W1/59w3Y4MPufgzH9ipT5mZcE=; b=JhImuL7YSUylUjkvYz41H0tvmbqRCsVKDBL/HQWoLB5zAW2YSNX0pypnJyYzMPahJz G4sqkJbQSixpeipaPhe7yJt4nJR8SwhPG55kRT7+lszbMyoMhf1INqCXkWvtg72qWqIm +7icmO8YzxX2lGs/yPt/zOPLjz+So+ZyonlwZXfqI6YJsSVV4o50Jd2IE7OOchboVAUC NFVNhZF8TnLfuMwvu0Bq1nqIGFzHSZxtBPhpIB/tkItjygCH2t58otKPDnNFOdYa8wF6 lx8NWUEpU78a8OUEcrZ7ZamZ0hF+ztY9oyn+5qlrwQ97jbS9GrYDFLulJ200HXEyBCse RfkA== X-Gm-Message-State: AOAM531zFU8/BzK5Tg3dxdJCxcN0tMOvuUVNWt7FLxIsdkNB1ZV5+2sN vw5HFcM8Tkm94AbEOA7HETR2t5JB5Ed5 X-Google-Smtp-Source: ABdhPJxT/vWX0OjiErFxevki8JUky5j473dKCodPTix92VCbsXfh3r9zdc2C7QnCBcd9ntaYhKL50joC8J3g X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:5d0:0:b0:2d0:d056:c703 with SMTP id 199-20020a8105d0000000b002d0d056c703mr28080196ywf.288.1645593847896; Tue, 22 Feb 2022 21:24:07 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:46 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-11-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 10/47] mm: asi: Support for global non-sensitive direct map allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: A185C1C0002 X-Stat-Signature: zhjc6u3t5zmdxe5rbkqayx7rkfk39im4 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=b6urhxkW; spf=pass (imf18.hostedemail.com: domain of 398QVYgcKCO4ZkdQYTiWeeWbU.SecbYdkn-ccalQSa.ehW@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=398QVYgcKCO4ZkdQYTiWeeWbU.SecbYdkn-ccalQSa.ehW@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593848-82230 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new GFP flag is added to specify that an allocation should be considered globally non-sensitive. The pages will be mapped into the ASI global non-sensitive pseudo-PGD, which is shared between all standard ASI instances. A new page flag is also added so that when these pages are freed, they can also be unmapped from the ASI page tables. Signed-off-by: Junaid Shahid --- include/linux/gfp.h | 10 ++- include/linux/mm_types.h | 5 ++ include/linux/page-flags.h | 9 ++ include/trace/events/mmflags.h | 12 ++- mm/page_alloc.c | 145 ++++++++++++++++++++++++++++++++- tools/perf/builtin-kmem.c | 1 + 6 files changed, 178 insertions(+), 4 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 8fcc38467af6..07a99a463a34 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -60,6 +60,11 @@ struct vm_area_struct; #else #define ___GFP_NOLOCKDEP 0 #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ___GFP_GLOBAL_NONSENSITIVE 0x4000000u +#else +#define ___GFP_GLOBAL_NONSENSITIVE 0 +#endif /* If the above are modified, __GFP_BITS_SHIFT may need updating */ /* @@ -248,8 +253,11 @@ struct vm_area_struct; /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) +/* Allocate non-sensitive memory */ +#define __GFP_GLOBAL_NONSENSITIVE ((__force gfp_t)___GFP_GLOBAL_NONSENSITIVE) + /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT (25 + IS_ENABLED(CONFIG_LOCKDEP)) +#define __GFP_BITS_SHIFT 27 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /** diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3de1afa57289..5b8028fcfe67 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -191,6 +191,11 @@ struct page { /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Links the pages_to_free_async list */ + struct llist_node async_free_node; +#endif }; union { /* This union is 4 bytes in size. */ diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b90a17e9796d..a07434cc679c 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -140,6 +140,9 @@ enum pageflags { #endif #ifdef CONFIG_KASAN_HW_TAGS PG_skip_kasan_poison, +#endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + PG_global_nonsensitive, #endif __NR_PAGEFLAGS, @@ -542,6 +545,12 @@ TESTCLEARFLAG(Young, young, PF_ANY) PAGEFLAG(Idle, idle, PF_ANY) #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +#else +__PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +#endif + #ifdef CONFIG_KASAN_HW_TAGS PAGEFLAG(SkipKASanPoison, skip_kasan_poison, PF_HEAD) #else diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 116ed4d5d0f8..73a49197ef54 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -50,7 +50,8 @@ {(unsigned long)__GFP_DIRECT_RECLAIM, "__GFP_DIRECT_RECLAIM"},\ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\ {(unsigned long)__GFP_ZEROTAGS, "__GFP_ZEROTAGS"}, \ - {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"}\ + {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"},\ + {(unsigned long)__GFP_GLOBAL_NONSENSITIVE, "__GFP_GLOBAL_NONSENSITIVE"}\ #define show_gfp_flags(flags) \ (flags) ? __print_flags(flags, "|", \ @@ -93,6 +94,12 @@ #define IF_HAVE_PG_SKIP_KASAN_POISON(flag,string) #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define IF_HAVE_ASI(flag, string) ,{1UL << flag, string} +#else +#define IF_HAVE_ASI(flag, string) +#endif + #define __def_pageflag_names \ {1UL << PG_locked, "locked" }, \ {1UL << PG_waiters, "waiters" }, \ @@ -121,7 +128,8 @@ IF_HAVE_PG_HWPOISON(PG_hwpoison, "hwpoison" ) \ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) \ -IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") +IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") \ +IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c5952749ad40..a4048fa1868a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -697,7 +697,7 @@ static inline bool pcp_allowed_order(unsigned int order) return false; } -static inline void free_the_page(struct page *page, unsigned int order) +static inline void __free_the_page(struct page *page, unsigned int order) { if (pcp_allowed_order(order)) /* Via pcp? */ free_unref_page(page, order); @@ -705,6 +705,14 @@ static inline void free_the_page(struct page *page, unsigned int order) __free_pages_ok(page, order, FPI_NONE); } +static bool asi_unmap_freed_pages(struct page *page, unsigned int order); + +static inline void free_the_page(struct page *page, unsigned int order) +{ + if (asi_unmap_freed_pages(page, order)) + __free_the_page(page, order); +} + /* * Higher-order pages are called "compound pages". They are structured thusly: * @@ -5162,6 +5170,129 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, return true; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static DEFINE_PER_CPU(struct work_struct, async_free_work); +static DEFINE_PER_CPU(struct llist_head, pages_to_free_async); +static bool async_free_work_initialized; + +static void __free_the_page(struct page *page, unsigned int order); + +static void async_free_work_fn(struct work_struct *work) +{ + struct page *page, *tmp; + struct llist_node *pages_to_free; + void *va; + size_t len; + uint order; + + pages_to_free = llist_del_all(this_cpu_ptr(&pages_to_free_async)); + + /* A later patch will do a more optimized TLB flush. */ + + llist_for_each_entry_safe(page, tmp, pages_to_free, async_free_node) { + va = page_to_virt(page); + order = page->private; + len = PAGE_SIZE * (1 << order); + + asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, va, len); + __free_the_page(page, order); + } +} + +static int __init asi_page_alloc_init(void) +{ + int cpu; + + if (!static_asi_enabled()) + return 0; + + for_each_possible_cpu(cpu) + INIT_WORK(per_cpu_ptr(&async_free_work, cpu), + async_free_work_fn); + + /* + * This function is called before SMP is initialized, so we can assume + * that this is the only running CPU at this point. + */ + + barrier(); + async_free_work_initialized = true; + barrier(); + + if (!llist_empty(this_cpu_ptr(&pages_to_free_async))) + queue_work_on(smp_processor_id(), mm_percpu_wq, + this_cpu_ptr(&async_free_work)); + + return 0; +} +early_initcall(asi_page_alloc_init); + +static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) +{ + uint i; + + if (!static_asi_enabled()) + return 0; + + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + for (i = 0; i < (1 << order); i++) + __SetPageGlobalNonSensitive(page + i); + + return asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); + } + + return 0; +} + +static bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + void *va; + size_t len; + bool async_flush_needed; + + if (!static_asi_enabled()) + return true; + + if (!PageGlobalNonSensitive(page)) + return true; + + va = page_to_virt(page); + len = PAGE_SIZE * (1 << order); + async_flush_needed = irqs_disabled() || in_interrupt(); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, va, len, !async_flush_needed); + + if (!async_flush_needed) + return true; + + page->private = order; + llist_add(&page->async_free_node, this_cpu_ptr(&pages_to_free_async)); + + if (async_free_work_initialized) + queue_work_on(smp_processor_id(), mm_percpu_wq, + this_cpu_ptr(&async_free_work)); + + return false; +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +static inline +int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask) +{ + return 0; +} + +static inline +bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + return true; +} + +#endif + /* * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array * @gfp: GFP flags for the allocation @@ -5345,6 +5476,9 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, return NULL; } + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + gfp |= __GFP_ZERO; + gfp &= gfp_allowed_mask; /* * Apply scoped allocation constraints. This is mainly about GFP_NOFS @@ -5388,6 +5522,15 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, page = NULL; } + if (page) { + int err = asi_map_alloced_pages(page, order, gfp); + + if (unlikely(err)) { + __free_pages(page, order); + page = NULL; + } + } + trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); return page; diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index da03a341c63c..5857953cd5c1 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -660,6 +660,7 @@ static const struct { { "__GFP_RECLAIM", "R" }, { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, + { "__GFP_GLOBAL_NONSENSITIVE", "GNS" }, }; static size_t max_gfp_len; From patchwork Wed Feb 23 05:21:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756371 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45DE4C433EF for ; Wed, 23 Feb 2022 05:24:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C18E18D000D; Wed, 23 Feb 2022 00:24:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BEF158D0001; Wed, 23 Feb 2022 00:24:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A8F798D000D; Wed, 23 Feb 2022 00:24:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 986058D0001 for ; Wed, 23 Feb 2022 00:24:11 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6212022E5B for ; Wed, 23 Feb 2022 05:24:11 +0000 (UTC) X-FDA: 79172903502.04.40B8CC3 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf09.hostedemail.com (Postfix) with ESMTP id E0539140004 for ; Wed, 23 Feb 2022 05:24:10 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id s133-20020a252c8b000000b0062112290d0bso26612617ybs.23 for ; Tue, 22 Feb 2022 21:24:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ZeYN5P9RWE85rWG989hD+TytaZQUQYUlju5cKer3rg4=; b=ojUJsUEL9fv13WjcZCmN21LxwR3Z6zqiLpmAQSzxCgc5C9RlKzd4dJEPqIdea+S3R3 SjjKai+zXn4VwTi+vC8vw+NDJWHGEkF4oYybv8WUgGBs6X4Bacon5BeJ8KDM6+Kjcqg7 fYXSLYWwuUfF3ZzaHWiW5r5fUjU5hUbi5i8udL9OUGPQxFLT6BZJN8cKiUG2KSidmSdI 7UxdullkH2J71pHhwQMQjuVH7Fyee/6awuNUgdr31148ohQL1xZz1MmOnDGr6y/5AC05 atbUvniVVa19eg7E36/l3gFWXXYduMYcF6MD2CAuUTHeqJzSqvtlhwblL5Fzis+kQ6SQ claQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZeYN5P9RWE85rWG989hD+TytaZQUQYUlju5cKer3rg4=; b=cbxC7umTKfxT/tsphXBQM1QP/wZmnUXeSk6qXNic8zhZTgGoN2iBmfsDIAiFexeJWv dF8M1JY0DXmCE+iCoamf8LKm5aqow0di2Ic0npemw521n8sIC2Zlovn4TN3sgcWZFYlB of3Dj/9HxjQZm9Kq4Y1y5bhX74QMulczBjc1QkPTN6vu2Z06ja1pJTuv2OpHxPdnK0VK cf9gr1U0F6RpM5kf+4dlQfmhZfqtKgoaUKPXEy9ODbAnTl/CRYxCopdeOAkenVpipRmo YiEH1i1TKQs8LZtCyPxowYdfo7JyZ4W8nniCow35040GPiNV+D+Zt2fw7hSnZfdqnqHD t64w== X-Gm-Message-State: AOAM533HnLRGGA0VZniA4C8z9E5LckoZ5/GpDQ1Cw4EjAXetEn145X2p mwZnFr4JkQMa8eMKeh9iNFA4716JZCsO X-Google-Smtp-Source: ABdhPJx7Y4lP1+DVrEJyeaCPqdjSHoYWrsMiEOQNVQ/p8prXsXPbox8dmisMikykjaQnA+B25uCbwTFY5Vh8 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a5b:589:0:b0:61d:de51:9720 with SMTP id l9-20020a5b0589000000b0061dde519720mr26317731ybp.167.1645593850281; Tue, 22 Feb 2022 21:24:10 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:47 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-12-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 11/47] mm: asi: Global non-sensitive vmalloc/vmap support From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Stat-Signature: jii13x65hbwsu87jxbpfgggeaqwqtt48 X-Rspam-User: Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ojUJsUEL; spf=pass (imf09.hostedemail.com: domain of 3-sQVYgcKCPEcngTbWlZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3-sQVYgcKCPEcngTbWlZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: E0539140004 X-HE-Tag: 1645593850-63305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new flag, VM_GLOBAL_NONSENSITIVE is added to designate globally non-sensitive vmalloc/vmap areas. When using the __vmalloc / __vmalloc_node APIs, if the corresponding GFP flag is specified, the VM flag is automatically added. When using the __vmalloc_node_range API, either flag can be specified independently. The VM flag will only map the vmalloc area as non-sensitive, while the GFP flag will only map the underlying direct map area as non-sensitive. When using the __vmalloc_node_range API, instead of VMALLOC_START/END, VMALLOC_GLOBAL_NONSENSITIVE_START/END should be used. This is to keep these mappings separate from locally non-sensitive vmalloc areas, which will be added later. Areas outside of the standard vmalloc range can specify the range as before. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/pgtable_64_types.h | 5 +++ arch/x86/mm/asi.c | 3 +- include/asm-generic/asi.h | 3 ++ include/linux/vmalloc.h | 6 +++ mm/vmalloc.c | 53 ++++++++++++++++++++++--- 5 files changed, 64 insertions(+), 6 deletions(-) diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 91ac10654570..0fc380ba25b8 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -141,6 +141,11 @@ extern unsigned int ptrs_per_p4d; #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END +#endif + #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) /* The module sections ends with the start of the fixmap */ #ifndef CONFIG_DEBUG_KMAP_LOCAL_FORCE_MAP diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index d381ae573af9..71348399baf1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -198,7 +198,8 @@ static int __init asi_global_init(void) "ASI Global Non-sensitive direct map"); preallocate_toplevel_pgtbls(asi_global_nonsensitive_pgd, - VMALLOC_START, VMALLOC_END, + VMALLOC_GLOBAL_NONSENSITIVE_START, + VMALLOC_GLOBAL_NONSENSITIVE_END, "ASI Global Non-sensitive vmalloc"); return 0; diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 012691e29895..f918cd052722 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -14,6 +14,9 @@ #define ASI_GLOBAL_NONSENSITIVE NULL +#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END + #ifndef _ASSEMBLY_ struct asi_hooks {}; diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 6e022cc712e6..c7c66decda3e 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -39,6 +39,12 @@ struct notifier_block; /* in notifier.h */ * determine which allocations need the module shadow freed. */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define VM_GLOBAL_NONSENSITIVE 0x00000800 /* Similar to __GFP_GLOBAL_NONSENSITIVE */ +#else +#define VM_GLOBAL_NONSENSITIVE 0 +#endif + /* bits [20..32] reserved for arch specific ioremap internals */ /* diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f2ef719f1cba..ba588a37ee75 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2393,6 +2393,33 @@ void __init vmalloc_init(void) vmap_initialized = true; } +static int asi_map_vm_area(struct vm_struct *area) +{ + if (!static_asi_enabled()) + return 0; + + if (area->flags & VM_GLOBAL_NONSENSITIVE) + return asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area)); + + return 0; +} + +static void asi_unmap_vm_area(struct vm_struct *area) +{ + if (!static_asi_enabled()) + return; + + /* + * TODO: The TLB flush here could potentially be avoided in + * the case when the existing flush from try_purge_vmap_area_lazy() + * and/or vm_unmap_aliases() happens non-lazily. + */ + if (area->flags & VM_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area), true); +} + static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { @@ -2570,6 +2597,7 @@ static void vm_remove_mappings(struct vm_struct *area, int deallocate_pages) int flush_dmap = 0; int i; + asi_unmap_vm_area(area); remove_vm_area(area->addr); /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */ @@ -2787,16 +2815,20 @@ void *vmap(struct page **pages, unsigned int count, addr = (unsigned long)area->addr; if (vmap_pages_range(addr, addr + size, pgprot_nx(prot), - pages, PAGE_SHIFT) < 0) { - vunmap(area->addr); - return NULL; - } + pages, PAGE_SHIFT) < 0) + goto err; + + if (asi_map_vm_area(area)) + goto err; if (flags & VM_MAP_PUT_PAGES) { area->pages = pages; area->nr_pages = count; } return area->addr; +err: + vunmap(area->addr); + return NULL; } EXPORT_SYMBOL(vmap); @@ -2991,6 +3023,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; } + if (asi_map_vm_area(area)) + goto fail; + return area->addr; fail: @@ -3038,6 +3073,9 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, if (WARN_ON_ONCE(!size)) return NULL; + if (static_asi_enabled() && (vm_flags & VM_GLOBAL_NONSENSITIVE)) + gfp_mask |= __GFP_ZERO; + if ((size >> PAGE_SHIFT) > totalram_pages()) { warn_alloc(gfp_mask, NULL, "vmalloc error: size %lu, exceeds total pages", @@ -3127,8 +3165,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) { + ulong vm_flags = 0; + + if (static_asi_enabled() && (gfp_mask & __GFP_GLOBAL_NONSENSITIVE)) + vm_flags |= VM_GLOBAL_NONSENSITIVE; + return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, - gfp_mask, PAGE_KERNEL, 0, node, caller); + gfp_mask, PAGE_KERNEL, vm_flags, node, caller); } /* * This is only for performance analysis of vmalloc and stress purpose. From patchwork Wed Feb 23 05:21:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756372 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D05EAC433EF for ; Wed, 23 Feb 2022 05:24:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 69A8F8D000E; Wed, 23 Feb 2022 00:24:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 64A2D8D0001; Wed, 23 Feb 2022 00:24:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 510E48D000E; Wed, 23 Feb 2022 00:24:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0092.hostedemail.com [216.40.44.92]) by kanga.kvack.org (Postfix) with ESMTP id 3C9E18D0001 for ; Wed, 23 Feb 2022 00:24:14 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id EDDC69F5DD for ; Wed, 23 Feb 2022 05:24:13 +0000 (UTC) X-FDA: 79172903586.13.8E9DF74 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf30.hostedemail.com (Postfix) with ESMTP id 6759980003 for ; Wed, 23 Feb 2022 05:24:13 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id a12-20020a056902056c00b0061dc0f2a94aso26531573ybt.6 for ; Tue, 22 Feb 2022 21:24:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=h48HgrV9Ms87ueeFjUPVjN/Yw+KwlPL1CcnJb5APGwE=; b=Hmt0w/kiWsp/YP+equQ01AdLjZMh/o/2Z8oEGQ6Lq4Nt157PlHz1XpRl+A7GPQXL2i FRSX/mXn548mQM/TC1p/MGu32EFPYWhSLwU4OefqIbwBe+iA80Pue4vU6e/ZB7nsR4Y4 523kigSioD12IuJ5T0licYK9j98yC1QXVRYs3anUzXJCRNXAH9qqii0hyNma152NX9Ps ggEjc8Pz3upnBint0xIR6P2QSeIhMzI8K84X5fX/AWfAvKn+5L2W9liNKxXaSWKaCFma +45lbJ2fC4qur3JJ5wHMSfnoIjceTnzqkA7atH2Cu3t0CWoRI/zCfcvLHfSjM3PIpuRT HILQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=h48HgrV9Ms87ueeFjUPVjN/Yw+KwlPL1CcnJb5APGwE=; b=kAISBLQRZONQ5+hOrkQokSBef1pjyaTRWY5RgMPTFeNtiSbxcAPjzXiM1sPc4PLUw9 Tgn0+LYOU/ugFhmsoCXly5chvEC99WkHth9XQLiJ8fkH/3ag/iSJti0c1BfM6Oa6N4nd 2D2MoLoPL3i9U98JBMBn+i++U4xUkc6s9wHG+2Nq3nuAwBJE+KYFkhnAlfI3/kWRfjVY 5xNmlE3pkBHS3AXHY2EQE+qFfE6YxSa4XXxpPYElCIsQpbSgV7JMHuUA9/Tga4nUTNG/ fRfjevx3s1pdsKbg7/aNtVxpihGrTC88zM77zCOvExheS9L1ksVYrm+aCApKyUkNW/Id KRZA== X-Gm-Message-State: AOAM530Uioma/u4aKm0frMaqZL+uFXpu8k9HXx+bEVT8/4gUrPfS5PKa a5VrlAJQtj6ybX1RlLyl3A3qOOEWyur1 X-Google-Smtp-Source: ABdhPJyfC5FSpMaAtl/McL2SxzvvGsqtTFjrUAQSrC01vyuUjm1vV+WNWVvhncOg+7K7oU2DAPMbItihfugH X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1c47:0:b0:2d7:5822:1739 with SMTP id c68-20020a811c47000000b002d758221739mr11411035ywc.502.1645593852744; Tue, 22 Feb 2022 21:24:12 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:48 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-13-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 12/47] mm: asi: Support for global non-sensitive slab caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6759980003 X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Hmt0w/ki"; spf=pass (imf30.hostedemail.com: domain of 3_MQVYgcKCPMepiVdYnbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3_MQVYgcKCPMepiVdYnbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: xe5qjm88waguybd5h5piy4user619sgd X-HE-Tag: 1645593853-91763 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new flag SLAB_GLOBAL_NONSENSITIVE is added, which would designate all objects within that slab cache to be globally non-sensitive. Another flag SLAB_NONSENSITIVE is also added, which is currently just an alias for SLAB_GLOBAL_NONSENSITIVE, but will eventually be used to designate slab caches which can allocate either global or local non-sensitive objects. In addition, new kmalloc caches have been added that can be used to allocate non-sensitive objects. Signed-off-by: Junaid Shahid --- include/linux/slab.h | 32 +++++++++++++++---- mm/slab.c | 5 +++ mm/slab.h | 14 ++++++++- mm/slab_common.c | 73 +++++++++++++++++++++++++++++++++----------- security/Kconfig | 2 +- 5 files changed, 101 insertions(+), 25 deletions(-) diff --git a/include/linux/slab.h b/include/linux/slab.h index 181045148b06..7b8a3853d827 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -120,6 +120,12 @@ /* Slab deactivation flag */ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x10000000U) +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define SLAB_GLOBAL_NONSENSITIVE ((slab_flags_t __force)0x20000000U) +#else +#define SLAB_GLOBAL_NONSENSITIVE 0 +#endif + /* * ZERO_SIZE_PTR will be returned for zero sized kmalloc requests. * @@ -329,6 +335,11 @@ enum kmalloc_cache_type { extern struct kmem_cache * kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern struct kmem_cache * +nonsensitive_kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1]; +#endif + /* * Define gfp bits that should not be set for KMALLOC_NORMAL. */ @@ -361,6 +372,17 @@ static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) return KMALLOC_CGROUP; } +static __always_inline struct kmem_cache *get_kmalloc_cache(gfp_t flags, + uint index) +{ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (static_asi_enabled() && (flags & __GFP_GLOBAL_NONSENSITIVE)) + return nonsensitive_kmalloc_caches[kmalloc_type(flags)][index]; +#endif + return kmalloc_caches[kmalloc_type(flags)][index]; +} + /* * Figure out which kmalloc slab an allocation of a certain size * belongs to. @@ -587,9 +609,8 @@ static __always_inline __alloc_size(1) void *kmalloc(size_t size, gfp_t flags) if (!index) return ZERO_SIZE_PTR; - return kmem_cache_alloc_trace( - kmalloc_caches[kmalloc_type(flags)][index], - flags, size); + return kmem_cache_alloc_trace(get_kmalloc_cache(flags, index), + flags, size); #endif } return __kmalloc(size, flags); @@ -605,9 +626,8 @@ static __always_inline __alloc_size(1) void *kmalloc_node(size_t size, gfp_t fla if (!i) return ZERO_SIZE_PTR; - return kmem_cache_alloc_node_trace( - kmalloc_caches[kmalloc_type(flags)][i], - flags, node, size); + return kmem_cache_alloc_node_trace(get_kmalloc_cache(flags, i), + flags, node, size); } #endif return __kmalloc_node(size, flags, node); diff --git a/mm/slab.c b/mm/slab.c index ca4822f6b2b6..5a928d95d67b 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1956,6 +1956,9 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags) size = ALIGN(size, REDZONE_ALIGN); } + if (!static_asi_enabled()) + flags &= ~SLAB_NONSENSITIVE; + /* 3) caller mandated alignment */ if (ralign < cachep->align) { ralign = cachep->align; @@ -2058,6 +2061,8 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags) cachep->allocflags |= GFP_DMA32; if (flags & SLAB_RECLAIM_ACCOUNT) cachep->allocflags |= __GFP_RECLAIMABLE; + if (flags & SLAB_GLOBAL_NONSENSITIVE) + cachep->allocflags |= __GFP_GLOBAL_NONSENSITIVE; cachep->size = size; cachep->reciprocal_buffer_size = reciprocal_value(size); diff --git a/mm/slab.h b/mm/slab.h index 56ad7eea3ddf..f190f4fc0286 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -77,6 +77,10 @@ extern struct kmem_cache *kmem_cache; /* A table of kmalloc cache names and sizes */ extern const struct kmalloc_info_struct { const char *name[NR_KMALLOC_TYPES]; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + const char *nonsensitive_name[NR_KMALLOC_TYPES]; +#endif + slab_flags_t flags[NR_KMALLOC_TYPES]; unsigned int size; } kmalloc_info[]; @@ -124,11 +128,14 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, } #endif +/* This will also include SLAB_LOCAL_NONSENSITIVE in a later patch. */ +#define SLAB_NONSENSITIVE SLAB_GLOBAL_NONSENSITIVE /* Legal flag mask for kmem_cache_create(), for various configurations */ #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ SLAB_CACHE_DMA32 | SLAB_PANIC | \ - SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS ) + SLAB_TYPESAFE_BY_RCU | SLAB_DEBUG_OBJECTS | \ + SLAB_NONSENSITIVE) #if defined(CONFIG_DEBUG_SLAB) #define SLAB_DEBUG_FLAGS (SLAB_RED_ZONE | SLAB_POISON | SLAB_STORE_USER) @@ -491,6 +498,11 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, might_alloc(flags); + if (static_asi_enabled()) { + VM_BUG_ON(!(s->flags & SLAB_GLOBAL_NONSENSITIVE) && + (flags & __GFP_GLOBAL_NONSENSITIVE)); + } + if (should_failslab(s, flags)) return NULL; diff --git a/mm/slab_common.c b/mm/slab_common.c index e5d080a93009..72dee2494bf8 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -50,7 +50,7 @@ static DECLARE_WORK(slab_caches_to_rcu_destroy_work, SLAB_FAILSLAB | kasan_never_merge()) #define SLAB_MERGE_SAME (SLAB_RECLAIM_ACCOUNT | SLAB_CACHE_DMA | \ - SLAB_CACHE_DMA32 | SLAB_ACCOUNT) + SLAB_CACHE_DMA32 | SLAB_ACCOUNT | SLAB_NONSENSITIVE) /* * Merge control. If this is set then no merging of slab caches will occur. @@ -681,6 +681,15 @@ kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init = { /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ }; EXPORT_SYMBOL(kmalloc_caches); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_cache * +nonsensitive_kmalloc_caches[NR_KMALLOC_TYPES][KMALLOC_SHIFT_HIGH + 1] __ro_after_init = +{ /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ }; +EXPORT_SYMBOL(nonsensitive_kmalloc_caches); + +#endif + /* * Conversion table for small slabs sizes / 8 to the index in the * kmalloc array. This is necessary for slabs < 192 since we have non power @@ -738,25 +747,34 @@ struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) index = fls(size - 1); } - return kmalloc_caches[kmalloc_type(flags)][index]; + return get_kmalloc_cache(flags, index); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define __KMALLOC_NAME(type, base_name, sz) \ + .name[type] = base_name "-" #sz, \ + .nonsensitive_name[type] = "ns-" base_name "-" #sz, +#else +#define __KMALLOC_NAME(type, base_name, sz) \ + .name[type] = base_name "-" #sz, +#endif + #ifdef CONFIG_ZONE_DMA -#define KMALLOC_DMA_NAME(sz) .name[KMALLOC_DMA] = "dma-kmalloc-" #sz, +#define KMALLOC_DMA_NAME(sz) __KMALLOC_NAME(KMALLOC_DMA, "dma-kmalloc", sz) #else #define KMALLOC_DMA_NAME(sz) #endif #ifdef CONFIG_MEMCG_KMEM -#define KMALLOC_CGROUP_NAME(sz) .name[KMALLOC_CGROUP] = "kmalloc-cg-" #sz, +#define KMALLOC_CGROUP_NAME(sz) __KMALLOC_NAME(KMALLOC_CGROUP, "kmalloc-cg", sz) #else #define KMALLOC_CGROUP_NAME(sz) #endif #define INIT_KMALLOC_INFO(__size, __short_size) \ { \ - .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ - .name[KMALLOC_RECLAIM] = "kmalloc-rcl-" #__short_size, \ + __KMALLOC_NAME(KMALLOC_NORMAL, "kmalloc", __short_size) \ + __KMALLOC_NAME(KMALLOC_RECLAIM, "kmalloc-rcl", __short_size) \ KMALLOC_CGROUP_NAME(__short_size) \ KMALLOC_DMA_NAME(__short_size) \ .size = __size, \ @@ -846,18 +864,30 @@ void __init setup_kmalloc_cache_index_table(void) static void __init new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) { + struct kmem_cache *(*caches)[KMALLOC_SHIFT_HIGH + 1] = kmalloc_caches; + const char *name = kmalloc_info[idx].name[type]; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (flags & SLAB_NONSENSITIVE) { + caches = nonsensitive_kmalloc_caches; + name = kmalloc_info[idx].nonsensitive_name[type]; + } +#endif + if (type == KMALLOC_RECLAIM) { flags |= SLAB_RECLAIM_ACCOUNT; } else if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_CGROUP)) { if (cgroup_memory_nokmem) { - kmalloc_caches[type][idx] = kmalloc_caches[KMALLOC_NORMAL][idx]; + caches[type][idx] = caches[KMALLOC_NORMAL][idx]; return; } flags |= SLAB_ACCOUNT; + } else if (IS_ENABLED(CONFIG_ZONE_DMA) && (type == KMALLOC_DMA)) { + flags |= SLAB_CACHE_DMA; } - kmalloc_caches[type][idx] = create_kmalloc_cache( - kmalloc_info[idx].name[type], + caches[type][idx] = create_kmalloc_cache(name, kmalloc_info[idx].size, flags, 0, kmalloc_info[idx].size); @@ -866,7 +896,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) * KMALLOC_NORMAL caches. */ if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_NORMAL)) - kmalloc_caches[type][idx]->refcount = -1; + caches[type][idx]->refcount = -1; } /* @@ -908,15 +938,24 @@ void __init create_kmalloc_caches(slab_flags_t flags) for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; - if (s) { - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( - kmalloc_info[i].name[KMALLOC_DMA], - kmalloc_info[i].size, - SLAB_CACHE_DMA | flags, 0, - kmalloc_info[i].size); - } + if (s) + new_kmalloc_cache(i, KMALLOC_DMA, flags); } #endif + /* + * TODO: We may want to make slab allocations without exiting ASI. + * In that case, the cache metadata itself would need to be + * treated as non-sensitive and mapped as such, and we would need to + * do the bootstrap much more carefully. We can do that if we find + * that slab allocations while inside a restricted address space are + * frequent enough to warrant the additional complexity. + */ + if (static_asi_enabled()) + for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) + for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) + if (kmalloc_caches[type][i]) + new_kmalloc_cache(i, type, + flags | SLAB_NONSENSITIVE); } #endif /* !CONFIG_SLOB */ diff --git a/security/Kconfig b/security/Kconfig index 21b15ecaf2c1..0a3e49d6a331 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,7 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML + depends on X86_64 && !UML && SLAB depends on !PARAVIRT help This feature provides the ability to run some kernel code From patchwork Wed Feb 23 05:21:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756373 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6450C433FE for ; Wed, 23 Feb 2022 05:24:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75CE08D000F; Wed, 23 Feb 2022 00:24:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70C468D0001; Wed, 23 Feb 2022 00:24:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 537EB8D000F; Wed, 23 Feb 2022 00:24:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 4571D8D0001 for ; Wed, 23 Feb 2022 00:24:16 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id F1EE28249980 for ; Wed, 23 Feb 2022 05:24:15 +0000 (UTC) X-FDA: 79172903712.19.BA18CE7 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf06.hostedemail.com (Postfix) with ESMTP id 8D58E180004 for ; Wed, 23 Feb 2022 05:24:15 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id b64-20020a256743000000b0061e169a5f19so26554695ybc.11 for ; Tue, 22 Feb 2022 21:24:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=J1PJHeAg+5GsFDWdvzoAJvbqIGqngaUErLJYlufPa+k=; b=E8ylcvEY7HObuUi7YReOD7Xi+5ii//8STuZIJO/P8zS7AqqFvr59ONAjhLdHaJFCVd ANMtkHwUkTCW3M7pycyZwbgMNj908KdgtDwrheZQ6geVat7tNVM68QbPQQZm1CRl1l6l Y4ISDUE0Oi8e8zBW/PcuKqI33TayofpiZRwi9zPR+wcWJ0CNMO0p+82XCjaXWKVoLCcG P0m9VyO7WwmFPDExN5wo9Zkywpckb/NizMoYl/Y1S6i3OD/kq/dMFyBf0yRW04D/GG97 5W9at4d8tgmUGtV7LAQny8Hl7A704zcz/QnIbquS3/UrNABcFf6svGwZpVuu40Fb0cN9 xygg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=J1PJHeAg+5GsFDWdvzoAJvbqIGqngaUErLJYlufPa+k=; b=k3BBbubHVzLKoB7v5Q/JuL1GsGfU2orMFvHv19vurLU6R5suMA9sLrqrLkaE0tLjZm qKaV+DPmzTM5ctuEE/wQ5ORL8Nm1hGe7yjRP7NhnEo0Y/8wwTjvwvPzaEr+DKUP8g8la ZyZX65YkiPPBp6t5XTqrmYo/ey+MBJ/PPBpmRo9wlXUW2KGxZUlsdhxM/LO+nVnT0YHj FFx8whSh5t9+NEbxWjKcHNgjY57exCQrpgob1MvN9ORfuZID5HrGKI2NMQ40AdEuyEiG WK7apZ7/8kRXf4SbvmPWzmaq8aMSI8aYTSvMfUzJp+IS24ax20qgKwzTlzLLMS+rQ31S bvMg== X-Gm-Message-State: AOAM532sJmotdUfdDNksUz5vi7QMRk5jE4MFFxMlB46xDdRLDdE61fvn 9aX7+ZFVYR6xLZ3uBOv4oHZdRMGKrNBE X-Google-Smtp-Source: ABdhPJwO2zSkIt0Rgd4eSBL4j29VK3OAOmCOM+xoXLfQvXzLhs6JlCYQqS7jIhDe/nPgasCVydPPru7Y6T7c X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:106:0:b0:2d0:e682:8a7a with SMTP id 6-20020a810106000000b002d0e6828a7amr27939534ywb.257.1645593854910; Tue, 22 Feb 2022 21:24:14 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:49 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-14-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 13/47] asi: Added ASI memory cgroup flag From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8D58E180004 X-Stat-Signature: dm6es3nssemqntxex6eop54q34p86fs3 Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=E8ylcvEY; spf=pass (imf06.hostedemail.com: domain of 3_sQVYgcKCPUgrkXfapdlldib.Zljifkru-jjhsXZh.lod@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3_sQVYgcKCPUgrkXfapdlldib.Zljifkru-jjhsXZh.lod@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593855-543604 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Adds a cgroup flag to control if ASI is enabled for processes in that cgroup. Can be set or cleared by writing to the memory.use_asi file in the memory cgroup. The flag only affects new processes created after the flag was set. In addition to the cgroup flag, we may also want to add a per-process flag, though it will have to be something that can be set at process creation time. Signed-off-by: Ofir Weisse Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 14 ++++++++++++++ include/linux/memcontrol.h | 3 +++ include/linux/mm_types.h | 17 +++++++++++++++++ mm/memcontrol.c | 30 ++++++++++++++++++++++++++++++ 4 files changed, 64 insertions(+) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 71348399baf1..ca50a32ecd7e 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -2,6 +2,7 @@ #include #include +#include #include #include @@ -322,7 +323,20 @@ EXPORT_SYMBOL_GPL(asi_exit); void asi_init_mm_state(struct mm_struct *mm) { + struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + memset(mm->asi, 0, sizeof(mm->asi)); + mm->asi_enabled = false; + + /* + * TODO: In addition to a cgroup flag, we may also want a per-process + * flag. + */ + if (memcg) { + mm->asi_enabled = boot_cpu_has(X86_FEATURE_ASI) && + memcg->use_asi; + css_put(&memcg->css); + } } static bool is_page_within_range(size_t addr, size_t page_size, diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0c5c403f4be6..a883cb458b06 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -259,6 +259,9 @@ struct mem_cgroup { */ bool oom_group; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + bool use_asi; +#endif /* protected by memcg_oom_lock */ bool oom_lock; int under_oom; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5b8028fcfe67..8624d2783661 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -607,6 +607,14 @@ struct mm_struct { * new_owner->alloc_lock is held */ struct task_struct __rcu *owner; + +#endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Is ASI enabled for this mm? ASI requires allocating extra + * resources, such as ASI page tables. To prevent allocationg + * these resources for every mm in the system, we expect that + * only VM mm's will have this flag set. */ + bool asi_enabled; #endif struct user_namespace *user_ns; @@ -665,6 +673,15 @@ struct mm_struct { extern struct mm_struct init_mm; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static inline bool mm_asi_enabled(struct mm_struct *mm) +{ + return mm->asi_enabled; +} +#else +static inline bool mm_asi_enabled(struct mm_struct *mm) { return false; } +#endif + /* Pointer magic because the dynamic array size confuses some compilers. */ static inline void mm_init_cpumask(struct mm_struct *mm) { diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2ed5f2a0879d..a66d6b222ecf 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3539,6 +3539,29 @@ static int mem_cgroup_hierarchy_write(struct cgroup_subsys_state *css, return -EINVAL; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static u64 mem_cgroup_asi_read(struct cgroup_subsys_state *css, + struct cftype *cft) +{ + return mem_cgroup_from_css(css)->use_asi; +} + +static int mem_cgroup_asi_write(struct cgroup_subsys_state *css, + struct cftype *cft, u64 val) +{ + struct mem_cgroup *memcg = mem_cgroup_from_css(css); + + if (val == 1 || val == 0) + memcg->use_asi = val; + else + return -EINVAL; + + return 0; +} + +#endif + static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap) { unsigned long val; @@ -4888,6 +4911,13 @@ static struct cftype mem_cgroup_legacy_files[] = { .write_u64 = mem_cgroup_hierarchy_write, .read_u64 = mem_cgroup_hierarchy_read, }, +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + { + .name = "use_asi", + .write_u64 = mem_cgroup_asi_write, + .read_u64 = mem_cgroup_asi_read, + }, +#endif { .name = "cgroup.event_control", /* XXX: for compat */ .write = memcg_write_event_control, From patchwork Wed Feb 23 05:21:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756374 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 273FEC433F5 for ; Wed, 23 Feb 2022 05:24:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A1EB38D0010; Wed, 23 Feb 2022 00:24:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9A5AA8D0001; Wed, 23 Feb 2022 00:24:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 75DBA8D0010; Wed, 23 Feb 2022 00:24:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0087.hostedemail.com [216.40.44.87]) by kanga.kvack.org (Postfix) with ESMTP id 66D778D0001 for ; Wed, 23 Feb 2022 00:24:18 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2D4C79F845 for ; Wed, 23 Feb 2022 05:24:18 +0000 (UTC) X-FDA: 79172903796.24.89F1009 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf16.hostedemail.com (Postfix) with ESMTP id CE1C5180002 for ; Wed, 23 Feb 2022 05:24:17 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d726bd83a2so91080807b3.20 for ; Tue, 22 Feb 2022 21:24:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ObwiCZnkzVE8MG/ZKfaaMbJx1EDS+iQex1h+i9r1lWc=; b=gZ9xDnCTT9J2il7FQZjHDQXU7P24qq79E3XEa26PGLeqDS7nHRklN55Z0YrbQllfvk a54Tnc7QS05X1TZviBvKI6s2J4e0JaxU2Mc4n2JddMKt6uhs1bA5ThP/H5jQeUIa+Q5L YzXtSBRLRL23nYFayWpilSuGrgyaEtzeVJtrSMs+fJOYlGhJTGeNFUiGfho3OKAJCQvL xqhDtH8EyJDw3FY1cTPmzwnJaSBjdeIgm+usm4Xbnvmu4zwwAyO/ya/YpXR5JHhjT9H0 Wpbgj5s5UiyqBd60tJZ8fxwBXyX5DCw1p9aGG11pd0ckG2SC7DJ9q4R1uxUVnHb2V4YR +hNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ObwiCZnkzVE8MG/ZKfaaMbJx1EDS+iQex1h+i9r1lWc=; b=jdH1pbUlGxXULP3cQ9Xzk53XOqiWGZsTIyFkU6sbnOddlK4zcBab0h86fG6zDAYQC1 YanR63aKSKqsbZwSeSTRZy4KayfR6rkhQG0se4UzMXJWTB6pbOLTc9dTmcE6geHGEKu9 y/A7Sg3uxhXIxfA5YnzqcCmevcT1f/m0nPX9J15x1U/cGbi7ea51Fu1YOUFcB5O9q85u 4BtpM63HUcQrFVFN8WmVWV9womZWtIKs4TSMv77kHJ2FLpgfsnDqKq+5iJMUqA3mLYhZ DUnr9kw4v5Fhve/310zSkp23tBFVVlF93Gj1WOr/nkii2MkltMzhoiOPTuhwtwaXKZNH ncig== X-Gm-Message-State: AOAM531F4ZW03+KJY8bfGbZLZK8Xe9o11fatfpGyoPBuINxVzKNam3Cf bprc7s/XhG7mfKW3hkGgBD+nIHt12FsN X-Google-Smtp-Source: ABdhPJwnMSRMCv1r9WCSdtorggC4aqKc4d1lpyRHc4R/HT3Ens8gLxTDNQT3s1kJiX6qfOat6YB18uabGISu X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:db0d:0:b0:2d0:e912:3e47 with SMTP id d13-20020a0ddb0d000000b002d0e9123e47mr27008531ywe.23.1645593857064; Tue, 22 Feb 2022 21:24:17 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:50 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-15-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 14/47] mm: asi: Disable ASI API when ASI is not enabled for a process From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: CE1C5180002 X-Stat-Signature: 8adpbzikntccbbjbrmco4y4ri6gtpciz Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=gZ9xDnCT; spf=pass (imf16.hostedemail.com: domain of 3AcUVYgcKCPgozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3AcUVYgcKCPgozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1645593857-710029 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: If ASI is not enabled for a process, then asi_init() will return a NULL ASI pointer as output, though it will return a 0 error code. All other ASI API functions will return without an error when they get a NULL ASI pointer. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 +- arch/x86/mm/asi.c | 18 ++++++++++-------- include/asm-generic/asi.h | 7 ++++++- 3 files changed, 17 insertions(+), 10 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 64c2b4d1dba2..f69e1f2f09a4 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -51,7 +51,7 @@ int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops); void asi_unregister_class(int index); -int asi_init(struct mm_struct *mm, int asi_index); +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi); void asi_destroy(struct asi *asi); void asi_enter(struct asi *asi); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ca50a32ecd7e..58d1c532274a 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -207,11 +207,13 @@ static int __init asi_global_init(void) } subsys_initcall(asi_global_init) -int asi_init(struct mm_struct *mm, int asi_index) +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { struct asi *asi = &mm->asi[asi_index]; - if (!boot_cpu_has(X86_FEATURE_ASI)) + *out_asi = NULL; + + if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) return 0; /* Index 0 is reserved for special purposes. */ @@ -238,13 +240,15 @@ int asi_init(struct mm_struct *mm, int asi_index) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } + *out_asi = asi; + return 0; } EXPORT_SYMBOL_GPL(asi_init); void asi_destroy(struct asi *asi) { - if (!boot_cpu_has(X86_FEATURE_ASI)) + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; asi_free_pgd(asi); @@ -278,11 +282,9 @@ void __asi_enter(void) void asi_enter(struct asi *asi) { - if (!static_cpu_has(X86_FEATURE_ASI)) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) return; - VM_WARN_ON_ONCE(!asi); - this_cpu_write(asi_cpu_state.target_asi, asi); barrier(); @@ -423,7 +425,7 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) size_t end = start + len; size_t page_size; - if (!static_cpu_has(X86_FEATURE_ASI)) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) return 0; VM_BUG_ON(start & ~PAGE_MASK); @@ -514,7 +516,7 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) size_t end = start + len; pgtbl_mod_mask mask = 0; - if (!static_cpu_has(X86_FEATURE_ASI) || !len) + if (!static_cpu_has(X86_FEATURE_ASI) || !asi || !len) return; VM_BUG_ON(start & ~PAGE_MASK); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index f918cd052722..51c9c4a488e8 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -33,7 +33,12 @@ static inline void asi_unregister_class(int asi_index) { } static inline void asi_init_mm_state(struct mm_struct *mm) { } -static inline int asi_init(struct mm_struct *mm, int asi_index) { return 0; } +static inline +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) +{ + *out_asi = NULL; + return 0; +} static inline void asi_destroy(struct asi *asi) { } From patchwork Wed Feb 23 05:21:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756375 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F99BC433F5 for ; Wed, 23 Feb 2022 05:24:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D77E18D0011; Wed, 23 Feb 2022 00:24:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D267B8D0001; Wed, 23 Feb 2022 00:24:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC7DB8D0011; Wed, 23 Feb 2022 00:24:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id A9D018D0001 for ; Wed, 23 Feb 2022 00:24:20 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 639988249980 for ; Wed, 23 Feb 2022 05:24:20 +0000 (UTC) X-FDA: 79172903880.23.EB5BA57 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf11.hostedemail.com (Postfix) with ESMTP id DEAE640007 for ; Wed, 23 Feb 2022 05:24:19 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d7b96d74f8so26389627b3.16 for ; Tue, 22 Feb 2022 21:24:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=CYYrqCq/zRkZaGiiFDeL2WZqXbaiwLweAdvv3KPw30Q=; b=gttTspcTbeOxem9vIHTEp8HxBKLNL6kOKKunFMomQY7gQjAmmA7zLLKeW3pD5BHCG4 n+cavMcNG3z81q2MsF1OUm8LD7b7lNc/CiICe2jeX2JuESrjhWO3zBSCqlZo0v8ri/13 wSpPkKBkIUU4Lon1cqB6YDkf8UKIBOe6n1yOTYHkccD+ZHQbtAOffPifXTCZr0f1Etmj 7oxfXxlKJ56Y/Q570Q0HNRmXemUr5RxMgLdTEadmhWn3pzJBdmkD+yWBtTx1hkL9PZjC s4h9v+aGt/9IsWDNJnCTYuoYPrr3fqgrmkUg8lETXm3ikW63y2HAoRDcoo7ArQV+H7ZF BqDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=CYYrqCq/zRkZaGiiFDeL2WZqXbaiwLweAdvv3KPw30Q=; b=d9TMGCTj9joAy4jL4lbOcsKPwdxkzYWWC5RHhODwTe0DHjM4ApxtfwJTOx2IUenANH xfAP9E+0H/oNuifa0cjT4wFL1d0LK/TfEiYmfez6KYEzkutw/Q2dw0WUBj19iQ4fB7Ia yVJEC0D55i6CnoqHVFMdfRcydSGCHh7M5GL63FiFxP1jGt1P0drCN5Ic1KPvLG58DBW/ lbISUp6t0WY3m6hAa5rIiY8fZF6wu96Yaa5s8Yw+WD2LCZae6qbqShXj/ZkVhGmm2VCO 2RqGlcurW8P8HlJ84MzJIV+YH1ZJZZcojdzP81gZY8F4qCGJ9frd2gPpP6KrSLq5YKud GNUg== X-Gm-Message-State: AOAM532C+oVdrc882N3a2HB9C0vbWUpdNFacS2NLFk7x/OMd7NtbJ6F3 bnb+UNX6DYN1PUN7JmZOJC9/n+3jSY5U X-Google-Smtp-Source: ABdhPJyHSwIU1ypDJg3UAIk4OCagkwsYrp4hfbAB5svLD0KoToAB7kgIvdsaPCw4R5P2Bj++u4Uv/BXKWrDE X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:9842:0:b0:2cb:86f2:560d with SMTP id p63-20020a819842000000b002cb86f2560dmr27884434ywg.375.1645593859219; Tue, 22 Feb 2022 21:24:19 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:51 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-16-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 15/47] kvm: asi: Restricted address space for VM execution From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DEAE640007 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=gttTspcT; spf=pass (imf11.hostedemail.com: domain of 3A8UVYgcKCPolwpckfuiqqing.eqonkpwz-oomxcem.qti@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3A8UVYgcKCPolwpckfuiqqing.eqonkpwz-oomxcem.qti@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: tcefsfjtz9jfjdxn8tgdwmnj6bqctgut X-HE-Tag: 1645593859-126773 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: An ASI restricted address space is added for KVM. It is currently only enabled for Intel CPUs. The ASI hooks have been setup to do an L1D cache flush and MDS clear when entering the restricted address space. The hooks are also meant to stun and unstun the sibling hyperthread when exiting and entering the restricted address space. Internally, we do have a full stunning implementation available, but it hasn't yet been determined whether it is fully compatible with the upstream core scheduling implementation, so it is not included in this patch series and instead this patch just includes corresponding stub functions to demonstrate where the stun/unstun would happen. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/vmx/vmx.c | 41 ++++++++++++----- arch/x86/kvm/x86.c | 81 ++++++++++++++++++++++++++++++++- include/linux/kvm_host.h | 2 + 4 files changed, 113 insertions(+), 13 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 555f4de47ef2..98cbd6447e3e 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1494,6 +1494,8 @@ struct kvm_x86_ops { int (*complete_emulated_msr)(struct kvm_vcpu *vcpu, int err); void (*vcpu_deliver_sipi_vector)(struct kvm_vcpu *vcpu, u8 vector); + + void (*flush_sensitive_cpu_state)(struct kvm_vcpu *vcpu); }; struct kvm_x86_nested_ops { diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 0dbf94eb954f..e0178b57be75 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -47,6 +47,7 @@ #include #include #include +#include #include "capabilities.h" #include "cpuid.h" @@ -300,7 +301,7 @@ static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf) else static_branch_disable(&vmx_l1d_should_flush); - if (l1tf == VMENTER_L1D_FLUSH_COND) + if (l1tf == VMENTER_L1D_FLUSH_COND && !boot_cpu_has(X86_FEATURE_ASI)) static_branch_enable(&vmx_l1d_flush_cond); else static_branch_disable(&vmx_l1d_flush_cond); @@ -6079,6 +6080,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) if (static_branch_likely(&vmx_l1d_flush_cond)) { bool flush_l1d; + VM_BUG_ON(vcpu->kvm->asi); + /* * Clear the per-vcpu flush bit, it gets set again * either from vcpu_run() or from one of the unsafe @@ -6590,16 +6593,31 @@ static fastpath_t vmx_exit_handlers_fastpath(struct kvm_vcpu *vcpu) } } -static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, - struct vcpu_vmx *vmx) +static void vmx_flush_sensitive_cpu_state(struct kvm_vcpu *vcpu) { - kvm_guest_enter_irqoff(); - /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); else if (static_branch_unlikely(&mds_user_clear)) mds_clear_cpu_buffers(); +} + +static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, + struct vcpu_vmx *vmx) +{ + unsigned long cr3; + + kvm_guest_enter_irqoff(); + + vmx_flush_sensitive_cpu_state(vcpu); + + asi_enter(vcpu->kvm->asi); + + cr3 = __get_current_cr3_fast(); + if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->loaded_vmcs->host_state.cr3 = cr3; + } if (vcpu->arch.cr2 != native_read_cr2()) native_write_cr2(vcpu->arch.cr2); @@ -6609,13 +6627,16 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vcpu->arch.cr2 = native_read_cr2(); + VM_WARN_ON_ONCE(vcpu->kvm->asi && !is_asi_active()); + asi_set_target_unrestricted(); + kvm_guest_exit_irqoff(); } static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) { struct vcpu_vmx *vmx = to_vmx(vcpu); - unsigned long cr3, cr4; + unsigned long cr4; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -6657,12 +6678,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP)) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); - cr3 = __get_current_cr3_fast(); - if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { - vmcs_writel(HOST_CR3, cr3); - vmx->loaded_vmcs->host_state.cr3 = cr3; - } - cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); @@ -7691,6 +7706,8 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = { .complete_emulated_msr = kvm_complete_insn_gp, .vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector, + + .flush_sensitive_cpu_state = vmx_flush_sensitive_cpu_state, }; static __init void vmx_setup_user_return_msrs(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index e50e97ac4408..dd07f677d084 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -81,6 +81,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "trace.h" @@ -297,6 +298,8 @@ EXPORT_SYMBOL_GPL(supported_xcr0); static struct kmem_cache *x86_emulator_cache; +static int __read_mostly kvm_asi_index; + /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access. @@ -8620,6 +8623,50 @@ static struct notifier_block pvclock_gtod_notifier = { }; #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +/* + * We have an HT-stunning implementation available internally, + * but it is yet to be determined if it is fully compatible with the + * upstream core scheduling implementation. So leaving it out for now + * and just leaving these stubs here. + */ +static void stun_sibling(void) { } +static void unstun_sibling(void) { } + +/* + * This function must be fully re-entrant and idempotent. + * Though the idempotency requirement could potentially be relaxed for stuff + * like stats where complete accuracy is not needed. + */ +static void kvm_pre_asi_exit(void) +{ + stun_sibling(); +} + +/* + * This function must be fully re-entrant and idempotent. + * Though the idempotency requirement could potentially be relaxed for stuff + * like stats where complete accuracy is not needed. + */ +static void kvm_post_asi_enter(void) +{ + struct kvm_vcpu *vcpu = raw_cpu_read(*kvm_get_running_vcpus()); + + kvm_x86_ops.flush_sensitive_cpu_state(vcpu); + + unstun_sibling(); +} + +#endif + +static const struct asi_hooks kvm_asi_hooks = { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + .pre_asi_exit = kvm_pre_asi_exit, + .post_asi_enter = kvm_post_asi_enter +#endif +}; + int kvm_arch_init(void *opaque) { struct kvm_x86_init_ops *ops = opaque; @@ -8674,6 +8721,15 @@ int kvm_arch_init(void *opaque) if (r) goto out_free_percpu; + if (ops->runtime_ops->flush_sensitive_cpu_state) { + r = asi_register_class("KVM", ASI_MAP_STANDARD_NONSENSITIVE, + &kvm_asi_hooks); + if (r < 0) + goto out_mmu_exit; + + kvm_asi_index = r; + } + kvm_timer_init(); perf_register_guest_info_callbacks(&kvm_guest_cbs); @@ -8694,6 +8750,8 @@ int kvm_arch_init(void *opaque) return 0; +out_mmu_exit: + kvm_mmu_module_exit(); out_free_percpu: free_percpu(user_return_msrs); out_free_x86_emulator_cache: @@ -8720,6 +8778,11 @@ void kvm_arch_exit(void) irq_work_sync(&pvclock_irq_work); cancel_work_sync(&pvclock_gtod_work); #endif + if (kvm_asi_index > 0) { + asi_unregister_class(kvm_asi_index); + kvm_asi_index = 0; + } + kvm_x86_ops.hardware_enable = NULL; kvm_mmu_module_exit(); free_percpu(user_return_msrs); @@ -11391,11 +11454,26 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); kvm_apicv_init(kvm); + + if (kvm_asi_index > 0) { + ret = asi_init(kvm->mm, kvm_asi_index, &kvm->asi); + if (ret) + goto error; + } + kvm_hv_init_vm(kvm); kvm_mmu_init_vm(kvm); kvm_xen_init_vm(kvm); - return static_call(kvm_x86_vm_init)(kvm); + ret = static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto error; + + return 0; +error: + kvm_page_track_cleanup(kvm); + asi_destroy(kvm->asi); + return ret; } int kvm_arch_post_init_vm(struct kvm *kvm) @@ -11549,6 +11627,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); kvm_hv_destroy_vm(kvm); + asi_destroy(kvm->asi); } static void memslot_rmap_free(struct kvm_memory_slot *slot) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index c310648cc8f1..9dd63ed21f75 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -38,6 +38,7 @@ #include #include +#include #ifndef KVM_MAX_VCPU_IDS #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS @@ -551,6 +552,7 @@ struct kvm { */ struct mutex slots_arch_lock; struct mm_struct *mm; /* userspace tied to this vm */ + struct asi *asi; struct kvm_memslots __rcu *memslots[KVM_ADDRESS_SPACE_NUM]; struct kvm_vcpu *vcpus[KVM_MAX_VCPUS]; From patchwork Wed Feb 23 05:21:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756376 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE122C433EF for ; Wed, 23 Feb 2022 05:24:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 49EE28D0012; Wed, 23 Feb 2022 00:24:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44DA08D0001; Wed, 23 Feb 2022 00:24:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EE7F8D0012; Wed, 23 Feb 2022 00:24:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id 195A58D0001 for ; Wed, 23 Feb 2022 00:24:23 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C11A1181AEF3F for ; Wed, 23 Feb 2022 05:24:22 +0000 (UTC) X-FDA: 79172903964.21.A1188C2 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf25.hostedemail.com (Postfix) with ESMTP id 0D302A0003 for ; Wed, 23 Feb 2022 05:24:21 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d07ae11464so162256367b3.14 for ; Tue, 22 Feb 2022 21:24:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=aBDkmRlW2wO+VZF+yuNfjgMBKgOQnmUW3eUrwjZrCS8=; b=YOGSR3d5gsNbX6iUzwto0/6/twFuZLdwqQ+OyGBmDQ+Y0Z9tIxowj80ZJNXhvJxX+Y 39mdIpZnjCbsNOAMZ3WcDDj7mi8QZH0c/RsyMzdcAwnc6PykioINuQH19G0EC5hHlDik 23zXjrmfS9wbxjdu3O5ye1/Ud1PUIpPDZK7t9Nl7PN6MOX3LEYFZ3t9yeI4RY0EL+Zno WAZ3yPIe48Tjd2EqQRhcT9O3QomAL8NNi0J1Bk6EaY/8ic2RRlHQl+dYmmLeJ/OqUWOZ mOxvcqRrmNjCUbbMYTjvZgG5UsBId8RgCT1I73bDH8qdngvHAJMKuao3IUJPZHo/KDG+ YImA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=aBDkmRlW2wO+VZF+yuNfjgMBKgOQnmUW3eUrwjZrCS8=; b=73PgWN2YdOX9snTcttHN2h5Nr08NmH+8nVwJk3xGWRraEHmOmlJqGgzDCcpEuLFYZF G6P7Gsa76PlhuF7im4i5IR5yEgKBM9cs6I8iNcNYfZPgMT99V1K0ieF1cRfNhAjr3hwQ ANLwWW9IwUjusLhAkTBZLBuBvThgd9zLl7iQY7OADmKzR72dJRBQLhP34Umq28jezJpS 7wUXWMljyx8DNF2aZJwqLIjRYEpMbb38dZjI9kTj5XtwXr2mMGnolXh0nYi3qtO4MCrJ ALzig6xsVGO2meZjyqDkedblYQQ+6skP1ZVeokR8vCCH2VwDouCy+L6njZo0NbXkpA+G o5hw== X-Gm-Message-State: AOAM533kJMG/TPBlzOon3nA0TnG27DKxV0cUHty6j13/W8NwyiEcNudq Jw33b0Y1v/d9X6YkUM04GRFrhrGD0sGQ X-Google-Smtp-Source: ABdhPJxviHHfMBTme5NCgrk3enqRX6Pa+N3RjWDQxUKeUW0rsQhJHGhPnnpDhjrkSitIn82U+v9xIBog7t0o X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7489:0:b0:2d1:518:8c57 with SMTP id p131-20020a817489000000b002d105188c57mr27268814ywc.69.1645593861309; Tue, 22 Feb 2022 21:24:21 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:52 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-17-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 16/47] mm: asi: Support for mapping non-sensitive pcpu chunks From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 0D302A0003 X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=YOGSR3d5; spf=pass (imf25.hostedemail.com: domain of 3BcUVYgcKCPwnyremhwksskpi.gsqpmry1-qqozego.svk@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3BcUVYgcKCPwnyremhwksskpi.gsqpmry1-qqozego.svk@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: io7ak5fj8ntb77cs11z63hw86ceoauur X-HE-Tag: 1645593861-248311 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds support for mapping and unmapping dynamic percpu chunks as globally non-sensitive. A later patch will modify the percpu allocator to use this for dynamically allocating non-sensitive percpu memory. Signed-off-by: Junaid Shahid --- include/linux/vmalloc.h | 4 ++-- mm/percpu-vm.c | 51 +++++++++++++++++++++++++++++++++-------- mm/vmalloc.c | 17 ++++++++++---- security/Kconfig | 2 +- 4 files changed, 58 insertions(+), 16 deletions(-) diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index c7c66decda3e..5f85690f27b6 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -260,14 +260,14 @@ extern __init void vm_area_register_early(struct vm_struct *vm, size_t align); # ifdef CONFIG_MMU struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align); + size_t align, ulong flags); void pcpu_free_vm_areas(struct vm_struct **vms, int nr_vms); # else static inline struct vm_struct ** pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align) + size_t align, ulong flags) { return NULL; } diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 2054c9213c43..5579a96ad782 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -153,8 +153,12 @@ static void __pcpu_unmap_pages(unsigned long addr, int nr_pages) static void pcpu_unmap_pages(struct pcpu_chunk *chunk, struct page **pages, int page_start, int page_end) { + struct vm_struct **vms = (struct vm_struct **)chunk->data; unsigned int cpu; int i; + ulong addr, nr_pages; + + nr_pages = page_end - page_start; for_each_possible_cpu(cpu) { for (i = page_start; i < page_end; i++) { @@ -164,8 +168,14 @@ static void pcpu_unmap_pages(struct pcpu_chunk *chunk, WARN_ON(!page); pages[pcpu_page_idx(cpu, i)] = page; } - __pcpu_unmap_pages(pcpu_chunk_addr(chunk, cpu, page_start), - page_end - page_start); + addr = pcpu_chunk_addr(chunk, cpu, page_start); + + /* TODO: We should batch the TLB flushes */ + if (vms[0]->flags & VM_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE, true); + + __pcpu_unmap_pages(addr, nr_pages); } } @@ -212,18 +222,30 @@ static int __pcpu_map_pages(unsigned long addr, struct page **pages, * reverse lookup (addr -> chunk). */ static int pcpu_map_pages(struct pcpu_chunk *chunk, - struct page **pages, int page_start, int page_end) + struct page **pages, int page_start, int page_end, + gfp_t gfp) { unsigned int cpu, tcpu; int i, err; + ulong addr, nr_pages; + + nr_pages = page_end - page_start; for_each_possible_cpu(cpu) { - err = __pcpu_map_pages(pcpu_chunk_addr(chunk, cpu, page_start), + addr = pcpu_chunk_addr(chunk, cpu, page_start); + err = __pcpu_map_pages(addr, &pages[pcpu_page_idx(cpu, page_start)], - page_end - page_start); + nr_pages); if (err < 0) goto err; + if (gfp & __GFP_GLOBAL_NONSENSITIVE) { + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE); + if (err) + goto err; + } + for (i = page_start; i < page_end; i++) pcpu_set_page_chunk(pages[pcpu_page_idx(cpu, i)], chunk); @@ -231,10 +253,15 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, return 0; err: for_each_possible_cpu(tcpu) { + addr = pcpu_chunk_addr(chunk, tcpu, page_start); + + if (gfp & __GFP_GLOBAL_NONSENSITIVE) + asi_unmap(ASI_GLOBAL_NONSENSITIVE, (void *)addr, + nr_pages * PAGE_SIZE, false); + + __pcpu_unmap_pages(addr, nr_pages); if (tcpu == cpu) break; - __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), - page_end - page_start); } pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); return err; @@ -285,7 +312,7 @@ static int pcpu_populate_chunk(struct pcpu_chunk *chunk, if (pcpu_alloc_pages(chunk, pages, page_start, page_end, gfp)) return -ENOMEM; - if (pcpu_map_pages(chunk, pages, page_start, page_end)) { + if (pcpu_map_pages(chunk, pages, page_start, page_end, gfp)) { pcpu_free_pages(chunk, pages, page_start, page_end); return -ENOMEM; } @@ -334,13 +361,19 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) { struct pcpu_chunk *chunk; struct vm_struct **vms; + ulong vm_flags = 0; + + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + vm_flags = VM_GLOBAL_NONSENSITIVE; + + gfp &= ~__GFP_GLOBAL_NONSENSITIVE; chunk = pcpu_alloc_chunk(gfp); if (!chunk) return NULL; vms = pcpu_get_vm_areas(pcpu_group_offsets, pcpu_group_sizes, - pcpu_nr_groups, pcpu_atom_size); + pcpu_nr_groups, pcpu_atom_size, vm_flags); if (!vms) { pcpu_free_chunk(chunk); return NULL; diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ba588a37ee75..f13bfe7e896b 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3664,10 +3664,10 @@ pvm_determine_end_from_reverse(struct vmap_area **va, unsigned long align) */ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, const size_t *sizes, int nr_vms, - size_t align) + size_t align, ulong flags) { - const unsigned long vmalloc_start = ALIGN(VMALLOC_START, align); - const unsigned long vmalloc_end = VMALLOC_END & ~(align - 1); + unsigned long vmalloc_start = VMALLOC_START; + unsigned long vmalloc_end = VMALLOC_END; struct vmap_area **vas, *va; struct vm_struct **vms; int area, area2, last_area, term_area; @@ -3677,6 +3677,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, /* verify parameters and allocate data structures */ BUG_ON(offset_in_page(align) || !is_power_of_2(align)); + + if (static_asi_enabled() && (flags & VM_GLOBAL_NONSENSITIVE)) { + vmalloc_start = VMALLOC_GLOBAL_NONSENSITIVE_START; + vmalloc_end = VMALLOC_GLOBAL_NONSENSITIVE_END; + } + + vmalloc_start = ALIGN(vmalloc_start, align); + vmalloc_end = vmalloc_end & ~(align - 1); + for (last_area = 0, area = 0; area < nr_vms; area++) { start = offsets[area]; end = start + sizes[area]; @@ -3815,7 +3824,7 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, for (area = 0; area < nr_vms; area++) { insert_vmap_area(vas[area], &vmap_area_root, &vmap_area_list); - setup_vmalloc_vm_locked(vms[area], vas[area], VM_ALLOC, + setup_vmalloc_vm_locked(vms[area], vas[area], flags | VM_ALLOC, pcpu_get_vm_areas); } spin_unlock(&vmap_area_lock); diff --git a/security/Kconfig b/security/Kconfig index 0a3e49d6a331..e89c2658e6cf 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,7 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML && SLAB + depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM depends on !PARAVIRT help This feature provides the ability to run some kernel code From patchwork Wed Feb 23 05:21:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756377 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E36A9C433FE for ; Wed, 23 Feb 2022 05:24:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 674028D0013; Wed, 23 Feb 2022 00:24:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6220A8D0001; Wed, 23 Feb 2022 00:24:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EA9C8D0013; Wed, 23 Feb 2022 00:24:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0147.hostedemail.com [216.40.44.147]) by kanga.kvack.org (Postfix) with ESMTP id 388E08D0001 for ; Wed, 23 Feb 2022 00:24:25 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id E87EE8249980 for ; Wed, 23 Feb 2022 05:24:24 +0000 (UTC) X-FDA: 79172904048.30.74FC2BF Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf22.hostedemail.com (Postfix) with ESMTP id 6ED25C0006 for ; Wed, 23 Feb 2022 05:24:24 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id l3-20020a25ad43000000b0062462e2af34so11457045ybe.17 for ; Tue, 22 Feb 2022 21:24:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=orMBAJTu5bZfo+ZjtAIW/XRZfJpT6yrC1YB1ci4mdhQ=; b=i488SXIT1e9g9PD4iGqqQ5k0m2F8EhUxsfxeEhnLAG3wqUbodJtbO+eBa1paYhzivd f6qEUMrlJuQyIim27Unr9cJtD2qOlwrJ3Vb92Q7HTAPrjZK/4FZKPxvkHiWfCGgfeGks wPJV2pOqdvyTeI1kLDQCnafd3hNR4RZrBUmWr4GBRfp9dxcFXFHOnNlY5ByTNaX9Ovn7 medb8pTA3zAjMiK0tjiHROjMfziSZ3ZN7ikLLMgh9Do8+8uTWIEbihQ6o2v0V8CMkelT vayq0rCzDDSabPvpcm7/0+wlNG5uCJLTrXlA3oAlDMciITRnUAJzKr5xLZ2ttczqv0Td DGUA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=orMBAJTu5bZfo+ZjtAIW/XRZfJpT6yrC1YB1ci4mdhQ=; b=a+ofeDPOcslwkLYtqIqTn9G1ZR6y0+fXAGd4aI5PmsPh3Yijr/sFkNPmC/YQsCJ6ZC geB6wBXA0CiOYT1UDaktT2VL3JwoGhm3bHc6qvg1hoHw8R35jru88DMP8sqtBQ04bTJo vHLlofamGpCkQQ2JPOJ/T/nFf2sb+jPi2dMZmzt8maPFfjzpQCk5sOP9HJhnl14IMkHZ v5yAfv82QkK0l0G6zJ09pXG1uKNsNaM8HT/0rJiOBOMSZYeVfYLKBxgtJ6En1QruV2fe NPQfVqCM3VODhGubkqDSgho9ukGQPW6sK/gT+4BT0jXYfLAlH9dDaKErnn7St3sgl3cG 2PAA== X-Gm-Message-State: AOAM531wGrcsHBrANIRvPgT+SbBSIij8THi3F6ZwP8JDZeFgJQE6T+p1 DcuBlvkQyU98862tQF4odxog6T8Ag0i3 X-Google-Smtp-Source: ABdhPJz6tUJjJr3TTCqxc+szuwfGgWLkVUJz81MU2cNZP+P/jcoAt0UrWZKKXpvg67XLb/VTmlwpxyNA5Syq X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:a4e8:0:b0:61e:1eb6:19bd with SMTP id g95-20020a25a4e8000000b0061e1eb619bdmr27268416ybi.168.1645593863676; Tue, 22 Feb 2022 21:24:23 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:53 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-18-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 17/47] mm: asi: Aliased direct map for local non-sensitive allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 6ED25C0006 X-Stat-Signature: xr84pitoukgym643ftju596mzofpc8en X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=i488SXIT; spf=pass (imf22.hostedemail.com: domain of 3B8UVYgcKCP4p0tgojymuumrk.iusrot03-ssq1giq.uxm@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3B8UVYgcKCP4p0tgojymuumrk.iusrot03-ssq1giq.uxm@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593864-499911 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This creates a second copy of the direct map, which mirrors the normal direct map in the regular unrestricted kernel page tables. But in the ASI restricted address spaces, the page tables for this aliased direct map would be local to each process. So this aliased map can be used for locally non-sensitive page allocations. Because of the lack of available kernel virtual address space, we have to reduce the max possible direct map size by half. That should be fine with 5 level page tables but could be an issue with 4 level page tables (as max 32 TB RAM could be supported instead of 64 TB). An alternative vmap-style implementation of an aliased local region is possible without this limitation, but that has some other compromises and would be usable only if we trim down the types of structures marked as local non-sensitive by limiting the designation to only those that really are locally non-sensitive but globally sensitive. That is certainly ideal and likely feasible, and would also allow removal of some other relatively complex infrastructure introduced in later patches. But we are including this implementation here just for demonstration of a fully general mechanism. An altogether different alternative to a separate aliased region is also possible by just partitioning the regular direct map (either statically or dynamically via additional page-block types), which is certainly feasible but would require more effort to implement properly. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/page.h | 19 +++++++- arch/x86/include/asm/page_64.h | 25 +++++++++- arch/x86/include/asm/page_64_types.h | 20 ++++++++ arch/x86/kernel/e820.c | 7 ++- arch/x86/mm/asi.c | 69 +++++++++++++++++++++++++++- arch/x86/mm/kaslr.c | 34 +++++++++++++- arch/x86/mm/mm_internal.h | 2 + arch/x86/mm/physaddr.c | 8 ++++ include/linux/page-flags.h | 3 ++ include/trace/events/mmflags.h | 3 +- security/Kconfig | 1 + 11 files changed, 183 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/page.h b/arch/x86/include/asm/page.h index 4d5810c8fab7..7688ba9d3542 100644 --- a/arch/x86/include/asm/page.h +++ b/arch/x86/include/asm/page.h @@ -18,6 +18,7 @@ struct page; +#include #include extern struct range pfn_mapped[]; extern int nr_pfn_mapped; @@ -56,8 +57,24 @@ static inline void copy_user_page(void *to, void *from, unsigned long vaddr, __phys_addr_symbol(__phys_reloc_hide((unsigned long)(x))) #ifndef __va -#define __va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) + +#define ___va(x) ((void *)((unsigned long)(x)+PAGE_OFFSET)) + +#ifndef CONFIG_ADDRESS_SPACE_ISOLATION +#define __va(x) ___va(x) +#else + +DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); +void *asi_va(unsigned long pa); + +/* + * This might significantly increase the size of the jump table. + * If that turns out to be a problem, we should use a non-static branch. + */ +#define __va(x) (static_branch_likely(&asi_local_map_initialized) \ + ? asi_va((unsigned long)(x)) : ___va(x)) #endif +#endif /* __va */ #define __boot_va(x) __va(x) #define __boot_pa(x) __pa(x) diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 4bde0dc66100..2845eca02552 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -5,6 +5,7 @@ #include #ifndef __ASSEMBLY__ +#include #include /* duplicated to the one in bootmem.h */ @@ -15,12 +16,34 @@ extern unsigned long page_offset_base; extern unsigned long vmalloc_base; extern unsigned long vmemmap_base; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +extern unsigned long asi_local_map_base; +DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); + +#else + +/* Should never be used if ASI is not enabled */ +#define asi_local_map_base (*(ulong *)NULL) + +#endif + static inline unsigned long __phys_addr_nodebug(unsigned long x) { unsigned long y = x - __START_KERNEL_map; + unsigned long map_start = PAGE_OFFSET; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * This might significantly increase the size of the jump table. + * If that turns out to be a problem, we should use a non-static branch. + */ + if (static_branch_likely(&asi_local_map_initialized) && + x > ASI_LOCAL_MAP) + map_start = ASI_LOCAL_MAP; +#endif /* use the carry flag to determine if x was < __START_KERNEL_map */ - x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET)); + x = y + ((x > y) ? phys_base : (__START_KERNEL_map - map_start)); return x; } diff --git a/arch/x86/include/asm/page_64_types.h b/arch/x86/include/asm/page_64_types.h index e9e2c3ba5923..bd27ebe51a8c 100644 --- a/arch/x86/include/asm/page_64_types.h +++ b/arch/x86/include/asm/page_64_types.h @@ -2,6 +2,8 @@ #ifndef _ASM_X86_PAGE_64_DEFS_H #define _ASM_X86_PAGE_64_DEFS_H +#include + #ifndef __ASSEMBLY__ #include #endif @@ -47,6 +49,24 @@ #define __PAGE_OFFSET __PAGE_OFFSET_BASE_L4 #endif /* CONFIG_DYNAMIC_MEMORY_LAYOUT */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +#define __ASI_LOCAL_MAP_BASE (__PAGE_OFFSET + \ + ALIGN(_BITUL(MAX_PHYSMEM_BITS - 1), PGDIR_SIZE)) + +#ifdef CONFIG_DYNAMIC_MEMORY_LAYOUT +#define ASI_LOCAL_MAP asi_local_map_base +#else +#define ASI_LOCAL_MAP __ASI_LOCAL_MAP_BASE +#endif + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +/* Should never be used if ASI is not enabled */ +#define ASI_LOCAL_MAP (*(ulong *)NULL) + +#endif + #define __START_KERNEL_map _AC(0xffffffff80000000, UL) /* See Documentation/x86/x86_64/mm.rst for a description of the memory map. */ diff --git a/arch/x86/kernel/e820.c b/arch/x86/kernel/e820.c index bc0657f0deed..e2ea4d6bfbdf 100644 --- a/arch/x86/kernel/e820.c +++ b/arch/x86/kernel/e820.c @@ -880,6 +880,11 @@ static void __init early_panic(char *msg) static int userdef __initdata; +u64 __init set_phys_mem_limit(u64 size) +{ + return e820__range_remove(size, ULLONG_MAX - size, E820_TYPE_RAM, 1); +} + /* The "mem=nopentium" boot option disables 4MB page tables on 32-bit kernels: */ static int __init parse_memopt(char *p) { @@ -905,7 +910,7 @@ static int __init parse_memopt(char *p) if (mem_size == 0) return -EINVAL; - e820__range_remove(mem_size, ULLONG_MAX - mem_size, E820_TYPE_RAM, 1); + set_phys_mem_limit(mem_size); #ifdef CONFIG_MEMORY_HOTPLUG max_mem_size = mem_size; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 58d1c532274a..38eaa650bac1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -22,6 +22,12 @@ EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); __aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; +DEFINE_STATIC_KEY_FALSE(asi_local_map_initialized); +EXPORT_SYMBOL(asi_local_map_initialized); + +unsigned long asi_local_map_base __ro_after_init; +EXPORT_SYMBOL(asi_local_map_base); + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -181,8 +187,44 @@ static void asi_free_pgd(struct asi *asi) static int __init set_asi_param(char *str) { - if (strcmp(str, "on") == 0) + if (strcmp(str, "on") == 0) { + /* TODO: We should eventually add support for KASAN. */ + if (IS_ENABLED(CONFIG_KASAN)) { + pr_warn("ASI is currently not supported with KASAN"); + return 0; + } + + /* + * We create a second copy of the direct map for the aliased + * ASI Local Map, so we can support only half of the max + * amount of RAM. That should be fine with 5 level page tables + * but could be an issue with 4 level page tables. + * + * An alternative vmap-style implementation of an aliased local + * region is possible without this limitation, but that has + * some other compromises and would be usable only if + * we trim down the types of structures marked as local + * non-sensitive by limiting the designation to only those that + * really are locally non-sensitive but globally sensitive. + * That is certainly ideal and likely feasible, and would also + * allow removal of some other relatively complex infrastructure + * introduced in later patches. But we are including this + * implementation here just for demonstration of a fully general + * mechanism. + * + * An altogether different alternative to a separate aliased + * region is also possible by just partitioning the regular + * direct map (either statically or dynamically via additional + * page-block types), which is certainly feasible but would + * require more effort to implement properly. + */ + if (set_phys_mem_limit(MAXMEM / 2)) + pr_warn("Limiting Memory Size to %llu", MAXMEM / 2); + + asi_local_map_base = __ASI_LOCAL_MAP_BASE; + setup_force_cpu_cap(X86_FEATURE_ASI); + } return 0; } @@ -190,6 +232,8 @@ early_param("asi", set_asi_param); static int __init asi_global_init(void) { + uint i, n; + if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -203,6 +247,14 @@ static int __init asi_global_init(void) VMALLOC_GLOBAL_NONSENSITIVE_END, "ASI Global Non-sensitive vmalloc"); + /* TODO: We should also handle memory hotplug. */ + n = DIV_ROUND_UP(PFN_PHYS(max_pfn), PGDIR_SIZE); + for (i = 0; i < n; i++) + swapper_pg_dir[pgd_index(ASI_LOCAL_MAP) + i] = + swapper_pg_dir[pgd_index(PAGE_OFFSET) + i]; + + static_branch_enable(&asi_local_map_initialized); + return 0; } subsys_initcall(asi_global_init) @@ -236,7 +288,11 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; - for (i = KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) + for (i = KERNEL_PGD_BOUNDARY; i < pgd_index(ASI_LOCAL_MAP); i++) + set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + + for (i = pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); + i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } @@ -534,3 +590,12 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) /* Later patches will do a more optimized flush. */ flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); } + +void *asi_va(unsigned long pa) +{ + struct page *page = pfn_to_page(PHYS_PFN(pa)); + + return (void *)(pa + (PageLocalNonSensitive(page) + ? ASI_LOCAL_MAP : PAGE_OFFSET)); +} +EXPORT_SYMBOL(asi_va); diff --git a/arch/x86/mm/kaslr.c b/arch/x86/mm/kaslr.c index 557f0fe25dff..2e68ce84767c 100644 --- a/arch/x86/mm/kaslr.c +++ b/arch/x86/mm/kaslr.c @@ -48,6 +48,7 @@ static const unsigned long vaddr_end = CPU_ENTRY_AREA_BASE; static __initdata struct kaslr_memory_region { unsigned long *base; unsigned long size_tb; + unsigned long extra_bytes; } kaslr_regions[] = { { &page_offset_base, 0 }, { &vmalloc_base, 0 }, @@ -57,7 +58,7 @@ static __initdata struct kaslr_memory_region { /* Get size in bytes used by the memory region */ static inline unsigned long get_padding(struct kaslr_memory_region *region) { - return (region->size_tb << TB_SHIFT); + return (region->size_tb << TB_SHIFT) + region->extra_bytes; } /* Initialize base and padding for each memory region randomized with KASLR */ @@ -69,6 +70,8 @@ void __init kernel_randomize_memory(void) struct rnd_state rand_state; unsigned long remain_entropy; unsigned long vmemmap_size; + unsigned int max_physmem_bits = MAX_PHYSMEM_BITS - + !!boot_cpu_has(X86_FEATURE_ASI); vaddr_start = pgtable_l5_enabled() ? __PAGE_OFFSET_BASE_L5 : __PAGE_OFFSET_BASE_L4; vaddr = vaddr_start; @@ -85,7 +88,7 @@ void __init kernel_randomize_memory(void) if (!kaslr_memory_enabled()) return; - kaslr_regions[0].size_tb = 1 << (MAX_PHYSMEM_BITS - TB_SHIFT); + kaslr_regions[0].size_tb = 1 << (max_physmem_bits - TB_SHIFT); kaslr_regions[1].size_tb = VMALLOC_SIZE_TB; /* @@ -100,6 +103,18 @@ void __init kernel_randomize_memory(void) if (memory_tb < kaslr_regions[0].size_tb) kaslr_regions[0].size_tb = memory_tb; + if (boot_cpu_has(X86_FEATURE_ASI)) { + ulong direct_map_size = kaslr_regions[0].size_tb << TB_SHIFT; + + /* Reserve additional space for the ASI Local Map */ + direct_map_size = round_up(direct_map_size, PGDIR_SIZE); + direct_map_size *= 2; + VM_BUG_ON(direct_map_size % (1UL << TB_SHIFT)); + + kaslr_regions[0].size_tb = direct_map_size >> TB_SHIFT; + kaslr_regions[0].extra_bytes = PGDIR_SIZE; + } + /* * Calculate the vmemmap region size in TBs, aligned to a TB * boundary. @@ -136,6 +151,21 @@ void __init kernel_randomize_memory(void) vaddr = round_up(vaddr + 1, PUD_SIZE); remain_entropy -= entropy; } + + /* + * This ensures that the ASI Local Map does not share a PGD entry with + * the regular direct map, and also that the alignment of the two + * regions is the same. + * + * We are relying on the fact that the region following the ASI Local + * Map will be the local non-sensitive portion of the VMALLOC region. + * If that were not the case and the next region was a global one, + * then we would need extra padding after the ASI Local Map to ensure + * that it doesn't share a PGD entry with that global region. + */ + if (cpu_feature_enabled(X86_FEATURE_ASI)) + asi_local_map_base = page_offset_base + PGDIR_SIZE + + ((kaslr_regions[0].size_tb / 2) << TB_SHIFT); } void __meminit init_trampoline_kaslr(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index a1e8c523ab08..ace1d0b6d2d9 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -28,4 +28,6 @@ void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache); extern unsigned long tlb_single_page_flush_ceiling; +u64 set_phys_mem_limit(u64 size); + #endif /* __X86_MM_INTERNAL_H */ diff --git a/arch/x86/mm/physaddr.c b/arch/x86/mm/physaddr.c index fc3f3d3e2ef2..2cd6cee942da 100644 --- a/arch/x86/mm/physaddr.c +++ b/arch/x86/mm/physaddr.c @@ -21,6 +21,9 @@ unsigned long __phys_addr(unsigned long x) x = y + phys_base; VIRTUAL_BUG_ON(y >= KERNEL_IMAGE_SIZE); + } else if (cpu_feature_enabled(X86_FEATURE_ASI) && x > ASI_LOCAL_MAP) { + x -= ASI_LOCAL_MAP; + VIRTUAL_BUG_ON(!phys_addr_valid(x)); } else { x = y + (__START_KERNEL_map - PAGE_OFFSET); @@ -28,6 +31,7 @@ unsigned long __phys_addr(unsigned long x) VIRTUAL_BUG_ON((x > y) || !phys_addr_valid(x)); } + VIRTUAL_BUG_ON(!pfn_valid(x >> PAGE_SHIFT)); return x; } EXPORT_SYMBOL(__phys_addr); @@ -54,6 +58,10 @@ bool __virt_addr_valid(unsigned long x) if (y >= KERNEL_IMAGE_SIZE) return false; + } else if (cpu_feature_enabled(X86_FEATURE_ASI) && x > ASI_LOCAL_MAP) { + x -= ASI_LOCAL_MAP; + if (!phys_addr_valid(x)) + return false; } else { x = y + (__START_KERNEL_map - PAGE_OFFSET); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index a07434cc679c..e5223a05c41a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -143,6 +143,7 @@ enum pageflags { #endif #ifdef CONFIG_ADDRESS_SPACE_ISOLATION PG_global_nonsensitive, + PG_local_nonsensitive, #endif __NR_PAGEFLAGS, @@ -547,8 +548,10 @@ PAGEFLAG(Idle, idle, PF_ANY) #ifdef CONFIG_ADDRESS_SPACE_ISOLATION __PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +__PAGEFLAG(LocalNonSensitive, local_nonsensitive, PF_ANY); #else __PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +__PAGEFLAG_FALSE(LocalNonSensitive, local_nonsensitive); #endif #ifdef CONFIG_KASAN_HW_TAGS diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 73a49197ef54..96e61d838bec 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -129,7 +129,8 @@ IF_HAVE_PG_IDLE(PG_young, "young" ) \ IF_HAVE_PG_IDLE(PG_idle, "idle" ) \ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) \ IF_HAVE_PG_SKIP_KASAN_POISON(PG_skip_kasan_poison, "skip_kasan_poison") \ -IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") +IF_HAVE_ASI(PG_global_nonsensitive, "global_nonsensitive") \ +IF_HAVE_ASI(PG_local_nonsensitive, "local_nonsensitive") #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/security/Kconfig b/security/Kconfig index e89c2658e6cf..070a948b5266 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -70,6 +70,7 @@ config ADDRESS_SPACE_ISOLATION default n depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM depends on !PARAVIRT + depends on !MEMORY_HOTPLUG help This feature provides the ability to run some kernel code with a reduced kernel address space. This can be used to From patchwork Wed Feb 23 05:21:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756378 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52BF4C433EF for ; Wed, 23 Feb 2022 05:24:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7364B8D0014; Wed, 23 Feb 2022 00:24:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E7478D0001; Wed, 23 Feb 2022 00:24:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 560E78D0014; Wed, 23 Feb 2022 00:24:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0098.hostedemail.com [216.40.44.98]) by kanga.kvack.org (Postfix) with ESMTP id 429698D0001 for ; Wed, 23 Feb 2022 00:24:27 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id F406618194D5B for ; Wed, 23 Feb 2022 05:24:26 +0000 (UTC) X-FDA: 79172904174.27.D270CF4 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf11.hostedemail.com (Postfix) with ESMTP id 7F2D040004 for ; Wed, 23 Feb 2022 05:24:26 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id j17-20020a25ec11000000b0061dabf74012so26621973ybh.15 for ; Tue, 22 Feb 2022 21:24:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1WKa+KZIcaa+dJcFTsqZOSts6oD//SyJ1kbRYfoMX2c=; b=sZHRLO9kxK5TzWD0X0seDuzrMm3IaK4jRNNmTtrhccA0nwFV9CPHeE86cjQT27AmkI v7sdoVjafaJDxmnspYK1IJDte5ixXIP4NlIpzJcvsi6oStzTdw2xXhTTK6hSh2PE8S4p mF1cpkFTM3aKbNKaOvkN2Z8/pjjAmAg7yV5uvedIm/mcA50u4mXY9mOc7gt5zMPS72Jb jLpByILet1L0UsVaxDROFUGR6gYkZbc/Z5Ipq4VnAgrCI+T/jtIuqYuHmNGwLB/tOVe/ jw6HvYSt2X5Np3Q4sH2T+yBRKnSfg6rhn+2SA7NrFtkru9npVZHpssAQN1BJYkVWWTVb OPZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1WKa+KZIcaa+dJcFTsqZOSts6oD//SyJ1kbRYfoMX2c=; b=5EBqWSMW31bXoIgUHXo1pYv8HyBnPZrf4voQ0f+7qL6oyU9X3MQ/kdCC5RvTARqYAf hueqel7D2UjNDLioYiQhV6genFwKucZq9YU/ospuBlEov4wKiftM4SjHK6a2eJ0ygYDi tvPVI8r+Wxo/rZvhpwHc9bjQOwtdeezEgypOR9w5erRoFgHninGg07i7OHOZ9EE8thb2 Xt+iRRq+42i72YKjZQ3zwmNjlL7IKIuA7uj/+PtzwIs9FGdbmZl2GwpxCvFVeatX1PVp OEqPE0klf/gIqHfaAJ4hYhqmht6bo2Nd9nYSTkErAeaC6J0HQ5owqtT/zZ8iWnnZHjjw +D4w== X-Gm-Message-State: AOAM533LsyooxnvDlnLH4+ORcIzLO5A8d0czucnhmMC8jwRGYTWZxeAW WXEdr1OSQ2jH+wDwaibqKvHPNFIy6ItQ X-Google-Smtp-Source: ABdhPJzkV7QmjkoELH2kQlDpniWdBZ4HcBwkQjVMUj/MiMOJhgvGDpbnFOCshsSuGHfkjD35goEqd0h2ttTQ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7848:0:b0:2ca:287c:6ce3 with SMTP id t69-20020a817848000000b002ca287c6ce3mr26938064ywc.392.1645593865848; Tue, 22 Feb 2022 21:24:25 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:54 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-19-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 18/47] mm: asi: Support for pre-ASI-init local non-sensitive allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 7F2D040004 X-Stat-Signature: go7uusb9brwji4j8eoppoxasxsm9a9q1 Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=sZHRLO9k; spf=pass (imf11.hostedemail.com: domain of 3CcUVYgcKCAIlwpckfuiqqing.eqonkpwz-oomxcem.qti@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3CcUVYgcKCAIlwpckfuiqqing.eqonkpwz-oomxcem.qti@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593866-540314 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Local non-sensitive allocations can be made before an actual ASI instance is initialized. To support this, a process-wide pseudo-PGD is created, which contains mappings for all locally non-sensitive allocations. Memory can be mapped into this pseudo-PGD by using ASI_LOCAL_NONSENSITIVE when calling asi_map(). The mappings will be copied to an actual ASI PGD when an ASI instance is initialized in that process, by copying all the PGD entries in the local non-sensitive range from the pseudo-PGD to the ASI PGD. In addition, the page fault handler will copy any new PGD entries that get added after the initialization of the ASI instance. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 6 +++- arch/x86/mm/asi.c | 74 +++++++++++++++++++++++++++++++++++++- arch/x86/mm/fault.c | 7 ++++ include/asm-generic/asi.h | 12 ++++++- kernel/fork.c | 8 +++-- 5 files changed, 102 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f69e1f2f09a4..f11010c0334b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -16,6 +16,7 @@ #define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) #define ASI_GLOBAL_NONSENSITIVE (&init_mm.asi[0]) +#define ASI_LOCAL_NONSENSITIVE (¤t->mm->asi[0]) struct asi_state { struct asi *curr_asi; @@ -45,7 +46,8 @@ DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); extern pgd_t asi_global_nonsensitive_pgd[]; -void asi_init_mm_state(struct mm_struct *mm); +int asi_init_mm_state(struct mm_struct *mm); +void asi_free_mm_state(struct mm_struct *mm); int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops); @@ -61,6 +63,8 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags); int asi_map(struct asi *asi, void *addr, size_t len); void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb); void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); +void asi_sync_mapping(struct asi *asi, void *addr, size_t len); +void asi_do_lazy_map(struct asi *asi, size_t addr); static inline void asi_init_thread_state(struct thread_struct *thread) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 38eaa650bac1..3ba0971a318d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -73,6 +73,17 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src = pgd_offset_pgd(src_table, addr); + pgd_t *dst = pgd_offset_pgd(dst_table, addr); + + if (!pgd_val(*dst)) + set_pgd(dst, *src); + else + VM_BUG_ON(pgd_val(*dst) != pgd_val(*src)); +} + #ifndef mm_inc_nr_p4ds #define mm_inc_nr_p4ds(mm) do {} while (false) #endif @@ -291,6 +302,11 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) for (i = KERNEL_PGD_BOUNDARY; i < pgd_index(ASI_LOCAL_MAP); i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + for (i = pgd_index(ASI_LOCAL_MAP); + i <= pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)); + i++) + set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); + for (i = pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); @@ -379,7 +395,7 @@ void asi_exit(void) } EXPORT_SYMBOL_GPL(asi_exit); -void asi_init_mm_state(struct mm_struct *mm) +int asi_init_mm_state(struct mm_struct *mm) { struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); @@ -395,6 +411,28 @@ void asi_init_mm_state(struct mm_struct *mm) memcg->use_asi; css_put(&memcg->css); } + + if (!mm->asi_enabled) + return 0; + + mm->asi[0].mm = mm; + mm->asi[0].pgd = (pgd_t *)__get_free_page(GFP_PGTABLE_USER); + if (!mm->asi[0].pgd) + return -ENOMEM; + + return 0; +} + +void asi_free_mm_state(struct mm_struct *mm) +{ + if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) + return; + + asi_free_pgd_range(&mm->asi[0], pgd_index(ASI_LOCAL_MAP), + pgd_index(ASI_LOCAL_MAP + + PFN_PHYS(max_possible_pfn)) + 1); + + free_page((ulong)mm->asi[0].pgd); } static bool is_page_within_range(size_t addr, size_t page_size, @@ -599,3 +637,37 @@ void *asi_va(unsigned long pa) ? ASI_LOCAL_MAP : PAGE_OFFSET)); } EXPORT_SYMBOL(asi_va); + +static bool is_addr_in_local_nonsensitive_range(size_t addr) +{ + return addr >= ASI_LOCAL_MAP && + addr < VMALLOC_GLOBAL_NONSENSITIVE_START; +} + +void asi_do_lazy_map(struct asi *asi, size_t addr) +{ + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && + is_addr_in_local_nonsensitive_range(addr)) + asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); +} + +/* + * Should be called after asi_map(ASI_LOCAL_NONSENSITIVE,...) for any mapping + * that is required to exist prior to asi_enter() (e.g. thread stacks) + */ +void asi_sync_mapping(struct asi *asi, void *start, size_t len) +{ + size_t addr = (size_t)start; + size_t end = addr + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && + is_addr_in_local_nonsensitive_range(addr)) + for (; addr < end; addr = pgd_addr_end(addr, end)) + asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); +} diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 4bfed53e210e..8692eb50f4a5 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1498,6 +1498,12 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) { unsigned long address = read_cr2(); irqentry_state_t state; + /* + * There is a very small chance that an NMI could cause an asi_exit() + * before this asi_get_current(), but that is ok, we will just do + * the fixup on the next page fault. + */ + struct asi *asi = asi_get_current(); prefetchw(¤t->mm->mmap_lock); @@ -1539,6 +1545,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) instrumentation_begin(); handle_page_fault(regs, error_code, address); + asi_do_lazy_map(asi, address); instrumentation_end(); irqentry_exit(regs, state); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 51c9c4a488e8..a1c8ebff70e8 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -13,6 +13,7 @@ #define ASI_MAX_NUM 0 #define ASI_GLOBAL_NONSENSITIVE NULL +#define ASI_LOCAL_NONSENSITIVE NULL #define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END @@ -31,7 +32,9 @@ int asi_register_class(const char *name, uint flags, static inline void asi_unregister_class(int asi_index) { } -static inline void asi_init_mm_state(struct mm_struct *mm) { } +static inline int asi_init_mm_state(struct mm_struct *mm) { return 0; } + +static inline void asi_free_mm_state(struct mm_struct *mm) { } static inline int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) @@ -67,9 +70,16 @@ static inline int asi_map(struct asi *asi, void *addr, size_t len) return 0; } +static inline +void asi_sync_mapping(struct asi *asi, void *addr, size_t len) { } + static inline void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } + +static inline +void asi_do_lazy_map(struct asi *asi, size_t addr) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } diff --git a/kernel/fork.c b/kernel/fork.c index 3695a32ee9bd..dd5a86e913ea 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -699,6 +699,7 @@ void __mmdrop(struct mm_struct *mm) mm_free_pgd(mm); destroy_context(mm); mmu_notifier_subscriptions_destroy(mm); + asi_free_mm_state(mm); check_mm(mm); put_user_ns(mm->user_ns); free_mm(mm); @@ -1072,17 +1073,20 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->def_flags = 0; } - asi_init_mm_state(mm); - if (mm_alloc_pgd(mm)) goto fail_nopgd; if (init_new_context(p, mm)) goto fail_nocontext; + if (asi_init_mm_state(mm)) + goto fail_noasi; + mm->user_ns = get_user_ns(user_ns); + return mm; +fail_noasi: fail_nocontext: mm_free_pgd(mm); fail_nopgd: From patchwork Wed Feb 23 05:21:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756379 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A911FC433F5 for ; Wed, 23 Feb 2022 05:24:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCA538D0015; Wed, 23 Feb 2022 00:24:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B7AD58D0001; Wed, 23 Feb 2022 00:24:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CB788D0015; Wed, 23 Feb 2022 00:24:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0182.hostedemail.com [216.40.44.182]) by kanga.kvack.org (Postfix) with ESMTP id 8A6058D0001 for ; Wed, 23 Feb 2022 00:24:29 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4812D181AF5D0 for ; Wed, 23 Feb 2022 05:24:29 +0000 (UTC) X-FDA: 79172904258.18.8FF179E Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf03.hostedemail.com (Postfix) with ESMTP id CE02A20002 for ; Wed, 23 Feb 2022 05:24:28 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id i6-20020a255406000000b006241e97e420so18864029ybb.5 for ; Tue, 22 Feb 2022 21:24:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0QeFswLcX9iEWT+ZD7rOizRcb8mHIR1VarnP+zim9mU=; b=AoOb1g/bwih9VgHUpDsuRj472kE7mvs4Hd5rKJ629FXGqay4OsONzwnNGCn4KSszBQ 7tR0Ehd/9kdVBMulmZqqn0EKhuRWwo5FmNiVUwzvcobRe45iRRTjqmq29bq6BO+aH2aK iVFm5Q63DJUj0wcMPLNxFHfvSAy/DcNXa1BAm45xpqSab7/o/iDvyz2xnot3Gux2i/bz ep5/x08uO4oi4Oa/VRppWnEJLR30fLBFaY+9aHNaO68XzDzENmX6oojUDxShaZpJxCzn AzyiCxL/E+FUObYZMNwPUDfM+bxPv2Rn6QD88aXQUJCPLfukjGOAKhVZbWGz7HVG4WwU EzCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0QeFswLcX9iEWT+ZD7rOizRcb8mHIR1VarnP+zim9mU=; b=j0FhkAYBYF8b58ZT8v3ispYsoFjTaSCQQ+hdlv82NGUhkN4S7RAxQC08cAPb596r3I xrs+O85L6GbcLbcuukNq/eIkjgJSgdpENQrrxI4jTgD/rcu1WFNELJ6IEGV3KbCCJ8Gd 2+67UaY2myNwzCYu5Lqv8YyXktpFsI1ZOcLaLlV2aWOgYXe/f7R+yTRka2xSL5T96cFh BDuBn87mJ2oAjUKMowetS8dZiIeGx9qE3yV3CBaTux+HB6B08m31onoz1JLnOhF04fN5 0BbKunzam6+8D9Ca+LASiI1cOVVbNuFzesi0fFCpuhlQCe3tAuZt+RLBsfwVx7i1lnr5 duJg== X-Gm-Message-State: AOAM531f8CDGubbZ76QQcGITCDWMG7T+WVv35Xg0J1zLoA81W47HrmBB kfVIgbWXEdWTj+1I5jocf2wFxpBlGBWH X-Google-Smtp-Source: ABdhPJx4QwVPvcY0cnBZCAO3p1suuFvqTv6OhknMLXEOsTBctmZQz9YZ5chFPkEZtJ5h60746WaCODirXSsF X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6fc1:0:b0:624:43a0:c16c with SMTP id k184-20020a256fc1000000b0062443a0c16cmr21681170ybc.222.1645593868088; Tue, 22 Feb 2022 21:24:28 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:55 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-20-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 19/47] mm: asi: Support for locally nonsensitive page allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: CE02A20002 X-Stat-Signature: x6n5aub3uzh4ge3eunu8qr1sk6ayqw6x Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="AoOb1g/b"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3DMUVYgcKCAUozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3DMUVYgcKCAUozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-HE-Tag: 1645593868-23008 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new GFP flag, __GFP_LOCAL_NONSENSITIVE, is added to allocate pages that are considered non-sensitive within the context of the current process, but sensitive in the context of other processes. For these allocations, page->asi_mm is set to the current mm during allocation. It must be set to the same value when the page is freed. Though it can potentially be overwritten and used for some other purpose in the meantime, as long as it is restored before freeing. Signed-off-by: Junaid Shahid --- include/linux/gfp.h | 5 +++- include/linux/mm_types.h | 17 ++++++++++-- include/trace/events/mmflags.h | 1 + mm/page_alloc.c | 47 ++++++++++++++++++++++++++++------ tools/perf/builtin-kmem.c | 1 + 5 files changed, 60 insertions(+), 11 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 07a99a463a34..2ab394adbda3 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -62,8 +62,10 @@ struct vm_area_struct; #endif #ifdef CONFIG_ADDRESS_SPACE_ISOLATION #define ___GFP_GLOBAL_NONSENSITIVE 0x4000000u +#define ___GFP_LOCAL_NONSENSITIVE 0x8000000u #else #define ___GFP_GLOBAL_NONSENSITIVE 0 +#define ___GFP_LOCAL_NONSENSITIVE 0 #endif /* If the above are modified, __GFP_BITS_SHIFT may need updating */ @@ -255,9 +257,10 @@ struct vm_area_struct; /* Allocate non-sensitive memory */ #define __GFP_GLOBAL_NONSENSITIVE ((__force gfp_t)___GFP_GLOBAL_NONSENSITIVE) +#define __GFP_LOCAL_NONSENSITIVE ((__force gfp_t)___GFP_LOCAL_NONSENSITIVE) /* Room for N __GFP_FOO bits */ -#define __GFP_BITS_SHIFT 27 +#define __GFP_BITS_SHIFT 28 #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) /** diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8624d2783661..f9702d070975 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -193,8 +193,21 @@ struct page { struct rcu_head rcu_head; #ifdef CONFIG_ADDRESS_SPACE_ISOLATION - /* Links the pages_to_free_async list */ - struct llist_node async_free_node; + struct { + /* Links the pages_to_free_async list */ + struct llist_node async_free_node; + + unsigned long _asi_pad_1; + unsigned long _asi_pad_2; + + /* + * Upon allocation of a locally non-sensitive page, set + * to the allocating mm. Must be set to the same mm when + * the page is freed. May potentially be overwritten in + * the meantime, as long as it is restored before free. + */ + struct mm_struct *asi_mm; + }; #endif }; diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 96e61d838bec..c00b8a4e1968 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -51,6 +51,7 @@ {(unsigned long)__GFP_KSWAPD_RECLAIM, "__GFP_KSWAPD_RECLAIM"},\ {(unsigned long)__GFP_ZEROTAGS, "__GFP_ZEROTAGS"}, \ {(unsigned long)__GFP_SKIP_KASAN_POISON,"__GFP_SKIP_KASAN_POISON"},\ + {(unsigned long)__GFP_LOCAL_NONSENSITIVE, "__GFP_LOCAL_NONSENSITIVE"},\ {(unsigned long)__GFP_GLOBAL_NONSENSITIVE, "__GFP_GLOBAL_NONSENSITIVE"}\ #define show_gfp_flags(flags) \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index a4048fa1868a..01784bff2a80 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5231,19 +5231,33 @@ early_initcall(asi_page_alloc_init); static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) { uint i; + struct asi *asi; + + VM_BUG_ON((gfp_mask & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE)) == + (__GFP_GLOBAL_NONSENSITIVE | __GFP_LOCAL_NONSENSITIVE)); if (!static_asi_enabled()) return 0; + if (!(gfp_mask & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE))) + return 0; + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + asi = ASI_GLOBAL_NONSENSITIVE; for (i = 0; i < (1 << order); i++) __SetPageGlobalNonSensitive(page + i); - - return asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), - PAGE_SIZE * (1 << order), gfp_mask); + } else { + asi = ASI_LOCAL_NONSENSITIVE; + for (i = 0; i < (1 << order); i++) { + __SetPageLocalNonSensitive(page + i); + page[i].asi_mm = current->mm; + } } - return 0; + return asi_map_gfp(asi, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); } static bool asi_unmap_freed_pages(struct page *page, unsigned int order) @@ -5251,18 +5265,28 @@ static bool asi_unmap_freed_pages(struct page *page, unsigned int order) void *va; size_t len; bool async_flush_needed; + struct asi *asi; + + VM_BUG_ON(PageGlobalNonSensitive(page) && PageLocalNonSensitive(page)); if (!static_asi_enabled()) return true; - if (!PageGlobalNonSensitive(page)) + if (PageGlobalNonSensitive(page)) + asi = ASI_GLOBAL_NONSENSITIVE; + else if (PageLocalNonSensitive(page)) + asi = &page->asi_mm->asi[0]; + else return true; + /* Heuristic to check that page->asi_mm is actually an mm_struct */ + VM_BUG_ON(PageLocalNonSensitive(page) && asi->mm != page->asi_mm); + va = page_to_virt(page); len = PAGE_SIZE * (1 << order); async_flush_needed = irqs_disabled() || in_interrupt(); - asi_unmap(ASI_GLOBAL_NONSENSITIVE, va, len, !async_flush_needed); + asi_unmap(asi, va, len, !async_flush_needed); if (!async_flush_needed) return true; @@ -5476,8 +5500,15 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, return NULL; } - if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) - gfp |= __GFP_ZERO; + if (static_asi_enabled()) { + if ((gfp & __GFP_LOCAL_NONSENSITIVE) && + !mm_asi_enabled(current->mm)) + gfp &= ~__GFP_LOCAL_NONSENSITIVE; + + if (gfp & (__GFP_GLOBAL_NONSENSITIVE | + __GFP_LOCAL_NONSENSITIVE)) + gfp |= __GFP_ZERO; + } gfp &= gfp_allowed_mask; /* diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 5857953cd5c1..a2337fc3404f 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -661,6 +661,7 @@ static const struct { { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, { "__GFP_GLOBAL_NONSENSITIVE", "GNS" }, + { "__GFP_LOCAL_NONSENSITIVE", "LNS" }, }; static size_t max_gfp_len; From patchwork Wed Feb 23 05:21:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756380 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8183C433FE for ; Wed, 23 Feb 2022 05:24:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D79968D0016; Wed, 23 Feb 2022 00:24:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2A248D0001; Wed, 23 Feb 2022 00:24:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B797C8D0016; Wed, 23 Feb 2022 00:24:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id A83BD8D0001 for ; Wed, 23 Feb 2022 00:24:31 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 8B2411204C5 for ; Wed, 23 Feb 2022 05:24:31 +0000 (UTC) X-FDA: 79172904342.04.7232214 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id E7870120007 for ; Wed, 23 Feb 2022 05:24:30 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d6180e0ab4so162338067b3.2 for ; Tue, 22 Feb 2022 21:24:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=KctzOijb2lPzQRsVweA+t5lqX+AMon2xx7uJ8sEe3V0=; b=o6x+fXjXHT+YmW/ZEagw0M3GAQliRWz6GkGaqA4u+vbUpzhuqjZg6dpP5meVpt0ROf kKIgOplxmZwcy8fpFEZTWPlgRlLYr6BEMXfWuy6EkVszZL1eiXWI04tPu/Bave36tyh0 Pg/rLhImfsoHjz3O2aw8506JPbQ7x/tOlKAxNgPJdEMnSjZ93q2iD4DVERcopQg4JNYp GNKE8XJSp9XzGQ0dIwkM6cgZOM8UkwzVOfAn0JxbE5VvBEVsLLUtWn7Lc0a3H+tJ18Mo xFRsJWo21wVW8cCrGTePkWMS8ReiZ3Pptclz1bJcEt4nXmu0gRlgbulAgKKuDIl4DXV3 55IA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=KctzOijb2lPzQRsVweA+t5lqX+AMon2xx7uJ8sEe3V0=; b=pMWb0wak4OTD/blQnVFKU5lBRt8oYe5kxbmhtUT9UIch9uvgumxtdr3KzcN4lK47bz UPjcW9HCkwAyj+Ldo2TCxfHPXM5zPplbpfvNBzRGOq2cP1+wweJebtzfKZ8wmL2Hka3v p3Jn+m/yzYphqXNZkl9MdjhvCpR4NUUVglAu9f2RNoXrtlD4oHR0gVH4JCa8KvVGNpqD 5zNtd7AxrHA/3t4/ywRR2xMtHUB793TFNfTCMqUILdGDmWo6Nc0Kz52Wn1QbQPvEqbNm 7G7fx2JxBGrru7odl8YtqjCvPm+a4cWKMgtUDELqBVj5B094VWaYUczAIIw7vrHBWem3 xVFQ== X-Gm-Message-State: AOAM533ded9k67ekaoBW4LC3podvuKW5lYr6hfYgXp6vp1XEaiDL3YBD S+vuQy4BmBlXxP2C9i4QuYkJEgGw4Ekl X-Google-Smtp-Source: ABdhPJz4jScT7T8fgZ/6SCKV/MEmg4Vf6MpNhLflLFYdfY4h7Ie3CFO275rWWB7jrjHpkdQgBhPjBxZoyl00 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:d90b:0:b0:61d:e8c7:82ff with SMTP id q11-20020a25d90b000000b0061de8c782ffmr26287345ybg.171.1645593870304; Tue, 22 Feb 2022 21:24:30 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:56 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-21-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 20/47] mm: asi: Support for locally non-sensitive vmalloc allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=o6x+fXjX; spf=pass (imf29.hostedemail.com: domain of 3DsUVYgcKCAcq1uhpkznvvnsl.jvtspu14-ttr2hjr.vyn@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3DsUVYgcKCAcq1uhpkznvvnsl.jvtspu14-ttr2hjr.vyn@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E7870120007 X-Stat-Signature: k39dcad4c9upsp3ixdphdaecpj3pxbgh X-HE-Tag: 1645593870-417012 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new flag, VM_LOCAL_NONSENSITIVE is added to designate locally non-sensitive vmalloc/vmap areas. When using the __vmalloc / __vmalloc_node APIs, if the corresponding GFP flag is specified, the VM flag is automatically added. When using the __vmalloc_node_range API, either flag can be specified independently. The VM flag will only map the vmalloc area as non-sensitive, while the GFP flag will only map the underlying direct map area as non-sensitive. When using the __vmalloc_node_range API, instead of VMALLOC_START/END, VMALLOC_LOCAL_NONSENSITIVE_START/END should be used. This is the range that will have different ASI page tables for each process, thereby providing the local mapping. A command line parameter vmalloc_local_nonsensitive_percent is added to specify the approximate division between the per-process and global vmalloc ranges. Note that regular/sensitive vmalloc/vmap allocations are not restricted by this division and can go anywhere in the entire vmalloc range. The division only applies to non-sensitive allocations. Since no attempt is made to balance regular/sensitive allocations across the division, it is possible that one of these ranges gets filled up by regular allocations, leaving no room for the non-sensitive allocations for which that range was designated. But since the vmalloc range is fairly large, so hopefully that will not be a problem in practice. If that assumption turns out to be incorrect, we could implement a more sophisticated scheme. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 + arch/x86/include/asm/page_64.h | 2 + arch/x86/include/asm/pgtable_64_types.h | 7 ++- arch/x86/mm/asi.c | 57 ++++++++++++++++++ include/asm-generic/asi.h | 5 ++ include/linux/vmalloc.h | 6 ++ mm/vmalloc.c | 78 ++++++++++++++++++++----- 7 files changed, 142 insertions(+), 15 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index f11010c0334b..e3cbf6d8801e 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -46,6 +46,8 @@ DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); extern pgd_t asi_global_nonsensitive_pgd[]; +void asi_vmalloc_init(void); + int asi_init_mm_state(struct mm_struct *mm); void asi_free_mm_state(struct mm_struct *mm); diff --git a/arch/x86/include/asm/page_64.h b/arch/x86/include/asm/page_64.h index 2845eca02552..b17574349572 100644 --- a/arch/x86/include/asm/page_64.h +++ b/arch/x86/include/asm/page_64.h @@ -18,6 +18,8 @@ extern unsigned long vmemmap_base; #ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern unsigned long vmalloc_global_nonsensitive_start; +extern unsigned long vmalloc_local_nonsensitive_end; extern unsigned long asi_local_map_base; DECLARE_STATIC_KEY_FALSE(asi_local_map_initialized); diff --git a/arch/x86/include/asm/pgtable_64_types.h b/arch/x86/include/asm/pgtable_64_types.h index 0fc380ba25b8..06793f7ef1aa 100644 --- a/arch/x86/include/asm/pgtable_64_types.h +++ b/arch/x86/include/asm/pgtable_64_types.h @@ -142,8 +142,13 @@ extern unsigned int ptrs_per_p4d; #define VMALLOC_END (VMALLOC_START + (VMALLOC_SIZE_TB << 40) - 1) #ifdef CONFIG_ADDRESS_SPACE_ISOLATION -#define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START + +#define VMALLOC_LOCAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_LOCAL_NONSENSITIVE_END vmalloc_local_nonsensitive_end + +#define VMALLOC_GLOBAL_NONSENSITIVE_START vmalloc_global_nonsensitive_start #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END + #endif #define MODULES_VADDR (__START_KERNEL_map + KERNEL_IMAGE_SIZE) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 3ba0971a318d..91e5ff1224ff 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -3,6 +3,7 @@ #include #include #include +#include #include #include @@ -28,6 +29,17 @@ EXPORT_SYMBOL(asi_local_map_initialized); unsigned long asi_local_map_base __ro_after_init; EXPORT_SYMBOL(asi_local_map_base); +unsigned long vmalloc_global_nonsensitive_start __ro_after_init; +EXPORT_SYMBOL(vmalloc_global_nonsensitive_start); + +unsigned long vmalloc_local_nonsensitive_end __ro_after_init; +EXPORT_SYMBOL(vmalloc_local_nonsensitive_end); + +/* Approximate percent only. Rounded to PGDIR_SIZE boundary. */ +static uint vmalloc_local_nonsensitive_percent __ro_after_init = 50; +core_param(vmalloc_local_nonsensitive_percent, + vmalloc_local_nonsensitive_percent, uint, 0444); + int asi_register_class(const char *name, uint flags, const struct asi_hooks *ops) { @@ -307,6 +319,10 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) i++) set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); + for (i = pgd_index(VMALLOC_LOCAL_NONSENSITIVE_START); + i <= pgd_index(VMALLOC_LOCAL_NONSENSITIVE_END); i++) + set_pgd(asi->pgd + i, mm->asi[0].pgd[i]); + for (i = pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); @@ -432,6 +448,10 @@ void asi_free_mm_state(struct mm_struct *mm) pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)) + 1); + asi_free_pgd_range(&mm->asi[0], + pgd_index(VMALLOC_LOCAL_NONSENSITIVE_START), + pgd_index(VMALLOC_LOCAL_NONSENSITIVE_END) + 1); + free_page((ulong)mm->asi[0].pgd); } @@ -671,3 +691,40 @@ void asi_sync_mapping(struct asi *asi, void *start, size_t len) for (; addr < end; addr = pgd_addr_end(addr, end)) asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); } + +void __init asi_vmalloc_init(void) +{ + uint start_index = pgd_index(VMALLOC_START); + uint end_index = pgd_index(VMALLOC_END); + uint global_start_index; + + if (!boot_cpu_has(X86_FEATURE_ASI)) { + vmalloc_global_nonsensitive_start = VMALLOC_START; + vmalloc_local_nonsensitive_end = VMALLOC_END; + return; + } + + if (vmalloc_local_nonsensitive_percent == 0) { + vmalloc_local_nonsensitive_percent = 1; + pr_warn("vmalloc_local_nonsensitive_percent must be non-zero"); + } + + if (vmalloc_local_nonsensitive_percent >= 100) { + vmalloc_local_nonsensitive_percent = 99; + pr_warn("vmalloc_local_nonsensitive_percent must be less than 100"); + } + + global_start_index = start_index + (end_index - start_index) * + vmalloc_local_nonsensitive_percent / 100; + global_start_index = max(global_start_index, start_index + 1); + + vmalloc_global_nonsensitive_start = -(PTRS_PER_PGD - global_start_index) + * PGDIR_SIZE; + vmalloc_local_nonsensitive_end = vmalloc_global_nonsensitive_start - 1; + + pr_debug("vmalloc_global_nonsensitive_start = %llx", + vmalloc_global_nonsensitive_start); + + VM_BUG_ON(vmalloc_local_nonsensitive_end >= VMALLOC_END); + VM_BUG_ON(vmalloc_global_nonsensitive_start <= VMALLOC_START); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index a1c8ebff70e8..7c50d8b64fa4 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -18,6 +18,9 @@ #define VMALLOC_GLOBAL_NONSENSITIVE_START VMALLOC_START #define VMALLOC_GLOBAL_NONSENSITIVE_END VMALLOC_END +#define VMALLOC_LOCAL_NONSENSITIVE_START VMALLOC_START +#define VMALLOC_LOCAL_NONSENSITIVE_END VMALLOC_END + #ifndef _ASSEMBLY_ struct asi_hooks {}; @@ -36,6 +39,8 @@ static inline int asi_init_mm_state(struct mm_struct *mm) { return 0; } static inline void asi_free_mm_state(struct mm_struct *mm) { } +static inline void asi_vmalloc_init(void) { } + static inline int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 5f85690f27b6..2b4eafc21fa5 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -41,8 +41,10 @@ struct notifier_block; /* in notifier.h */ #ifdef CONFIG_ADDRESS_SPACE_ISOLATION #define VM_GLOBAL_NONSENSITIVE 0x00000800 /* Similar to __GFP_GLOBAL_NONSENSITIVE */ +#define VM_LOCAL_NONSENSITIVE 0x00001000 /* Similar to __GFP_LOCAL_NONSENSITIVE */ #else #define VM_GLOBAL_NONSENSITIVE 0 +#define VM_LOCAL_NONSENSITIVE 0 #endif /* bits [20..32] reserved for arch specific ioremap internals */ @@ -67,6 +69,10 @@ struct vm_struct { unsigned int nr_pages; phys_addr_t phys_addr; const void *caller; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Valid if flags contain VM_*_NONSENSITIVE */ + struct asi *asi; +#endif }; struct vmap_area { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index f13bfe7e896b..ea94d8a1e2e9 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2391,18 +2391,25 @@ void __init vmalloc_init(void) */ vmap_init_free_space(); vmap_initialized = true; + + asi_vmalloc_init(); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + static int asi_map_vm_area(struct vm_struct *area) { if (!static_asi_enabled()) return 0; if (area->flags & VM_GLOBAL_NONSENSITIVE) - return asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, - get_vm_area_size(area)); + area->asi = ASI_GLOBAL_NONSENSITIVE; + else if (area->flags & VM_LOCAL_NONSENSITIVE) + area->asi = ASI_LOCAL_NONSENSITIVE; + else + return 0; - return 0; + return asi_map(area->asi, area->addr, get_vm_area_size(area)); } static void asi_unmap_vm_area(struct vm_struct *area) @@ -2415,11 +2422,17 @@ static void asi_unmap_vm_area(struct vm_struct *area) * the case when the existing flush from try_purge_vmap_area_lazy() * and/or vm_unmap_aliases() happens non-lazily. */ - if (area->flags & VM_GLOBAL_NONSENSITIVE) - asi_unmap(ASI_GLOBAL_NONSENSITIVE, area->addr, - get_vm_area_size(area), true); + if (area->flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) + asi_unmap(area->asi, area->addr, get_vm_area_size(area), true); } +#else + +static inline int asi_map_vm_area(struct vm_struct *area) { return 0; } +static inline void asi_unmap_vm_area(struct vm_struct *area) { } + +#endif + static inline void setup_vmalloc_vm_locked(struct vm_struct *vm, struct vmap_area *va, unsigned long flags, const void *caller) { @@ -2463,6 +2476,15 @@ static struct vm_struct *__get_vm_area_node(unsigned long size, if (unlikely(!size)) return NULL; + if (static_asi_enabled()) { + VM_BUG_ON((flags & VM_LOCAL_NONSENSITIVE) && + !(start >= VMALLOC_LOCAL_NONSENSITIVE_START && + end <= VMALLOC_LOCAL_NONSENSITIVE_END)); + + VM_BUG_ON((flags & VM_GLOBAL_NONSENSITIVE) && + start < VMALLOC_GLOBAL_NONSENSITIVE_START); + } + if (flags & VM_IOREMAP) align = 1ul << clamp_t(int, get_count_order_long(size), PAGE_SHIFT, IOREMAP_MAX_ORDER); @@ -3073,8 +3095,22 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, if (WARN_ON_ONCE(!size)) return NULL; - if (static_asi_enabled() && (vm_flags & VM_GLOBAL_NONSENSITIVE)) - gfp_mask |= __GFP_ZERO; + if (static_asi_enabled()) { + VM_BUG_ON((vm_flags & (VM_LOCAL_NONSENSITIVE | + VM_GLOBAL_NONSENSITIVE)) == + (VM_LOCAL_NONSENSITIVE | VM_GLOBAL_NONSENSITIVE)); + + if ((vm_flags & VM_LOCAL_NONSENSITIVE) && + !mm_asi_enabled(current->mm)) { + vm_flags &= ~VM_LOCAL_NONSENSITIVE; + + if (end == VMALLOC_LOCAL_NONSENSITIVE_END) + end = VMALLOC_END; + } + + if (vm_flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) + gfp_mask |= __GFP_ZERO; + } if ((size >> PAGE_SHIFT) > totalram_pages()) { warn_alloc(gfp_mask, NULL, @@ -3166,11 +3202,19 @@ void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, int node, const void *caller) { ulong vm_flags = 0; + ulong start = VMALLOC_START, end = VMALLOC_END; - if (static_asi_enabled() && (gfp_mask & __GFP_GLOBAL_NONSENSITIVE)) - vm_flags |= VM_GLOBAL_NONSENSITIVE; + if (static_asi_enabled()) { + if (gfp_mask & __GFP_GLOBAL_NONSENSITIVE) { + vm_flags |= VM_GLOBAL_NONSENSITIVE; + start = VMALLOC_GLOBAL_NONSENSITIVE_START; + } else if (gfp_mask & __GFP_LOCAL_NONSENSITIVE) { + vm_flags |= VM_LOCAL_NONSENSITIVE; + end = VMALLOC_LOCAL_NONSENSITIVE_END; + } + } - return __vmalloc_node_range(size, align, VMALLOC_START, VMALLOC_END, + return __vmalloc_node_range(size, align, start, end, gfp_mask, PAGE_KERNEL, vm_flags, node, caller); } /* @@ -3678,9 +3722,15 @@ struct vm_struct **pcpu_get_vm_areas(const unsigned long *offsets, /* verify parameters and allocate data structures */ BUG_ON(offset_in_page(align) || !is_power_of_2(align)); - if (static_asi_enabled() && (flags & VM_GLOBAL_NONSENSITIVE)) { - vmalloc_start = VMALLOC_GLOBAL_NONSENSITIVE_START; - vmalloc_end = VMALLOC_GLOBAL_NONSENSITIVE_END; + if (static_asi_enabled()) { + VM_BUG_ON((flags & (VM_LOCAL_NONSENSITIVE | + VM_GLOBAL_NONSENSITIVE)) == + (VM_LOCAL_NONSENSITIVE | VM_GLOBAL_NONSENSITIVE)); + + if (flags & VM_GLOBAL_NONSENSITIVE) + vmalloc_start = VMALLOC_GLOBAL_NONSENSITIVE_START; + else if (flags & VM_LOCAL_NONSENSITIVE) + vmalloc_end = VMALLOC_LOCAL_NONSENSITIVE_END; } vmalloc_start = ALIGN(vmalloc_start, align); From patchwork Wed Feb 23 05:21:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756381 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C157C43219 for ; Wed, 23 Feb 2022 05:24:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 11CC98D0017; Wed, 23 Feb 2022 00:24:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0CBF58D0001; Wed, 23 Feb 2022 00:24:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5FA78D0017; Wed, 23 Feb 2022 00:24:33 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id CF3ED8D0001 for ; Wed, 23 Feb 2022 00:24:33 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 9B7179F5DD for ; Wed, 23 Feb 2022 05:24:33 +0000 (UTC) X-FDA: 79172904426.20.FC09611 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf10.hostedemail.com (Postfix) with ESMTP id 34D70C0004 for ; Wed, 23 Feb 2022 05:24:33 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id o5-20020a25d705000000b0062499d760easo8077273ybg.7 for ; Tue, 22 Feb 2022 21:24:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=wpN/U5eluX6Jnat+pEDX2C0UlbFhsXsgqK2IMjW00B0=; b=eO68TZsayJUnzF9ppfHaPSwA9f3cpJtiM/bQv1NjIx1viksnomhVCZg/OPfesExCQe GPBpKhmjuCayGTZhMCQafKSmuYZ+smeOmFtUbvbkWS5siPQw7nd2sd5jds4KSpKgIMG1 5rAZMh+2rFOS1AHpWlupbprjH1FaaIPoFdMADMVIZrEncJu5IwxbAAD77jq17vk7nHI/ pHx7Yx3y5+Ja+7mdhxYJ7U+IEaib+spU25N1cAyqYIyEgycYL490w9sjOc0zBhbj55fl oXav/h9TUMbHXVfYpZ1MYXyKyDqxgCsBYAWNUi6HD47X/71tlKLWuj7QNFucy0Ul1WpD v86A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=wpN/U5eluX6Jnat+pEDX2C0UlbFhsXsgqK2IMjW00B0=; b=J5PHvvW+XhYpSaqxbA4o1FAdmyfppUF+wnDSdI0aHt6CAgxC6GWOj5BQ+qt/UtlLQh eM/1d4Y8yEe/x4TWBzX9xlTHPRA9Xk909cGgn2+QOyvuQK3y5BW69t+XdrOH6WeZbrcz I+iS5cC2cwHEpcsc1GsxgDa/oYlZuJhiSiV8Xqn0KEWLSRMTInQ431J0TreIJrNkNj3b awBioznCYJJGqOw+1yOcVqdOvk6jkbYizcCU129BGgAKvsUAk7gzKRB3/P5GJsXQHQL2 CNxzMAkSZtBnK2FF6EfWL+BYqq4/q70Pxbd9IwdZu6vR3yUjAoxzmgfn8CtgU8FG+D0U VpRQ== X-Gm-Message-State: AOAM5318Kvb0TgXn2yVVe+f1QoS8b3r+NmHtX3XrR7YUjD3vHnZXf8HR G2lUuD0zWfbq+/Kh+RmUFg4LX0rcHMf3 X-Google-Smtp-Source: ABdhPJy8YBcGGLS1CDa1jQAvWHy6IX+C9Fx9N9t0vFRpgYOq3ptmbXD9mdu9QbdYBQ3ESjXtAc+9DWkF+HCP X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7141:0:b0:2d3:d549:23f8 with SMTP id m62-20020a817141000000b002d3d54923f8mr27573261ywc.87.1645593872454; Tue, 22 Feb 2022 21:24:32 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:57 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-22-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 21/47] mm: asi: Add support for locally non-sensitive VM_USERMAP pages From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 34D70C0004 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=eO68TZsa; spf=pass (imf10.hostedemail.com: domain of 3EMUVYgcKCAks3wjrm1pxxpun.lxvurw36-vvt4jlt.x0p@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3EMUVYgcKCAks3wjrm1pxxpun.lxvurw36-vvt4jlt.x0p@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: w9fmbkjanc3gh9rus7mu7aofkdz6y35k X-HE-Tag: 1645593873-969337 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: VM_USERMAP pages can be mapped into userspace, which would overwrite the asi_mm field, so we restore that field when freeing these pages. Signed-off-by: Junaid Shahid --- mm/vmalloc.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ea94d8a1e2e9..a89866a926f6 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -2424,6 +2424,14 @@ static void asi_unmap_vm_area(struct vm_struct *area) */ if (area->flags & (VM_GLOBAL_NONSENSITIVE | VM_LOCAL_NONSENSITIVE)) asi_unmap(area->asi, area->addr, get_vm_area_size(area), true); + + if (area->flags & VM_USERMAP) { + uint i; + + for (i = 0; i < area->nr_pages; i++) + if (PageLocalNonSensitive(area->pages[i])) + area->pages[i]->asi_mm = area->asi->mm; + } } #else From patchwork Wed Feb 23 05:21:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756382 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 22C3DC433EF for ; Wed, 23 Feb 2022 05:24:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 930228D0018; Wed, 23 Feb 2022 00:24:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8E13B8D0001; Wed, 23 Feb 2022 00:24:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 670F98D0018; Wed, 23 Feb 2022 00:24:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0068.hostedemail.com [216.40.44.68]) by kanga.kvack.org (Postfix) with ESMTP id 544018D0001 for ; Wed, 23 Feb 2022 00:24:36 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 1728A8249980 for ; Wed, 23 Feb 2022 05:24:36 +0000 (UTC) X-FDA: 79172904552.11.DD7C34C Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf25.hostedemail.com (Postfix) with ESMTP id 901F5A0002 for ; Wed, 23 Feb 2022 05:24:35 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id m10-20020a25800a000000b0061daa5b7151so26450090ybk.10 for ; Tue, 22 Feb 2022 21:24:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=eHvI/S5CbFiuxrTOsf9OKECf8MKQA40+sVHVsgBWXTI=; b=V0N2AQ61NCzV6BZmqiQxUYs5pGX63pZ6VVRWlptLLGQUiVMxzlkVsCzNfrYnsMXVug KGkm/PYQEu9If/h+TwyTP773A843C1vccvswNpzQzC+CWF0lyH9rv4lYVpwygBHxjjxF KtfQ48TD03/CH3MJc6uQdii7U/orjcd6yROvM49vbkxsBatJT7hHIeSSgYOHhBpNRFtZ ikDCPeVq4k3/1dGPRt9rMDvqd7qp2qvlw0rdUxHJz6cMzz6PpmzwFKFFGSl65Hota3P/ hcaYP68KMD05XrecJbTtx+liHxuoMDRkdxyvRDaTUelX+R7q43SiNNrm4qp+y/niXgLp +m3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=eHvI/S5CbFiuxrTOsf9OKECf8MKQA40+sVHVsgBWXTI=; b=lCIApHe+zONByb5yC1y/mty7W77vs3nMlZeydNswgNROh2kjdZ+GKxka6TvUS1af5G G7zh7kHkO0J4PUiCQRkjNRaXSo8T1uj2oCmPBN/4ycBn+a//P0qNjbVs7GkXn0fB4Zms TVK9xmjle1vqAJS+q9CCDpPFhFGbRwu3ES+DLJv5VDzhD6I8xcjiICiYJmygKN1KrQvE 8QYTXR67zfRTtXhpCnk7heCtFzNhBtzYzgfReoJRdY9NZ3lUClswWZQKdxxrSKLFcxIC b7x5KV4jhhqvC4UTbXNRMS/aFsgOeTTmesfyNyxqecpTxU3wINqZJ/G7fJEYb8xpLxS3 sC9w== X-Gm-Message-State: AOAM531Hsj//OMd+abctFAGRBSpyGTm7o98jp41ymZEQWwzVsHNh45SV qUBNRK3SmeFOtCU4N9VlF1fmdX/Kl3BT X-Google-Smtp-Source: ABdhPJylISnqnK8ITQ9nblGksWK/PwAayD9axmVlFF61BCPqFWoHn9afvvtl0MNwA3JR60UblOXBCmfjIzzy X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7951:0:b0:2d6:b7bf:216a with SMTP id u78-20020a817951000000b002d6b7bf216amr24436525ywc.258.1645593874907; Tue, 22 Feb 2022 21:24:34 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:58 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-23-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 22/47] mm: asi: Added refcounting when initilizing an asi From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=V0N2AQ61; spf=pass (imf25.hostedemail.com: domain of 3EsUVYgcKCAsu5ylto3rzzrwp.nzxwty58-xxv6lnv.z2r@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3EsUVYgcKCAsu5ylto3rzzrwp.nzxwty58-xxv6lnv.z2r@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 901F5A0002 X-Stat-Signature: 75cza84rznsa9krmhyz6bdaiktk1rsxd X-HE-Tag: 1645593875-166997 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Some KVM tests initilize multiple VMs in a single process. For these cases, we want to suppurt multiple callse to asi_init() before a single asi_destroy is called. We want the initilization to happen exactly once. IF asi_destroy() is called, release the resources only if the counter reached zero. In our current implementation, asi's are tied to a specific mm. This may change in a future implementation. In which case, the mutex for the refcounting will need to move to struct asi. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 1 + arch/x86/mm/asi.c | 52 +++++++++++++++++++++++++++++++++----- include/linux/mm_types.h | 2 ++ kernel/fork.c | 3 +++ 4 files changed, 51 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index e3cbf6d8801e..2dc465f78bcc 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -40,6 +40,7 @@ struct asi { pgd_t *pgd; struct asi_class *class; struct mm_struct *mm; + int64_t asi_ref_count; }; DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 91e5ff1224ff..ac35323193a3 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -282,9 +282,25 @@ static int __init asi_global_init(void) } subsys_initcall(asi_global_init) +/* We're assuming we hold mm->asi_init_lock */ +static void __asi_destroy(struct asi *asi) +{ + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + + /* If refcount is non-zero, it means asi_init() was called multiple + * times. We free the asi pgd only when the last VM is destroyed. */ + if (--(asi->asi_ref_count) > 0) + return; + + asi_free_pgd(asi); + memset(asi, 0, sizeof(struct asi)); +} + int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { - struct asi *asi = &mm->asi[asi_index]; + int err = 0; + struct asi *asi = &mm->asi[asi_index]; *out_asi = NULL; @@ -295,6 +311,15 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) WARN_ON(asi_index == 0 || asi_index >= ASI_MAX_NUM); WARN_ON(asi->pgd != NULL); + /* Currently, mm and asi structs are conceptually tied together. In + * future implementations an asi object might be unrelated to a specicic + * mm. In that future implementation - the mutex will have to be inside + * asi. */ + mutex_lock(&mm->asi_init_lock); + + if (asi->asi_ref_count++ > 0) + goto exit_unlock; /* err is 0 */ + /* * For now, we allocate 2 pages to avoid any potential problems with * KPTI code. This won't be needed once KPTI is folded into the ASI @@ -302,8 +327,10 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) */ asi->pgd = (pgd_t *)__get_free_pages(GFP_PGTABLE_USER, PGD_ALLOCATION_ORDER); - if (!asi->pgd) - return -ENOMEM; + if (!asi->pgd) { + err = -ENOMEM; + goto exit_unlock; + } asi->class = &asi_class[asi_index]; asi->mm = mm; @@ -328,19 +355,30 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); } - *out_asi = asi; +exit_unlock: + if (err) + __asi_destroy(asi); - return 0; + /* This unlock signals future asi_init() callers that we finished. */ + mutex_unlock(&mm->asi_init_lock); + + if (!err) + *out_asi = asi; + return err; } EXPORT_SYMBOL_GPL(asi_init); void asi_destroy(struct asi *asi) { + struct mm_struct *mm; + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; - asi_free_pgd(asi); - memset(asi, 0, sizeof(struct asi)); + mm = asi->mm; + mutex_lock(&mm->asi_init_lock); + __asi_destroy(asi); + mutex_unlock(&mm->asi_init_lock); } EXPORT_SYMBOL_GPL(asi_destroy); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index f9702d070975..e6980ae31323 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -628,6 +629,7 @@ struct mm_struct { * these resources for every mm in the system, we expect that * only VM mm's will have this flag set. */ bool asi_enabled; + struct mutex asi_init_lock; #endif struct user_namespace *user_ns; diff --git a/kernel/fork.c b/kernel/fork.c index dd5a86e913ea..68b3aeab55ac 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1084,6 +1084,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->user_ns = get_user_ns(user_ns); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + mutex_init(&mm->asi_init_lock); +#endif return mm; fail_noasi: From patchwork Wed Feb 23 05:21:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756383 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B82FC433EF for ; Wed, 23 Feb 2022 05:24:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9CE218D0019; Wed, 23 Feb 2022 00:24:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 97DB18D0001; Wed, 23 Feb 2022 00:24:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7A8F58D0019; Wed, 23 Feb 2022 00:24:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 6BC998D0001 for ; Wed, 23 Feb 2022 00:24:38 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3788C6E7 for ; Wed, 23 Feb 2022 05:24:38 +0000 (UTC) X-FDA: 79172904636.10.34BAA38 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf10.hostedemail.com (Postfix) with ESMTP id BAD36C0005 for ; Wed, 23 Feb 2022 05:24:37 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id k10-20020a056902070a00b0062469b00335so10829409ybt.14 for ; Tue, 22 Feb 2022 21:24:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=migmFG7UQDaW8MeCncNCopIOLc7/KCi4nAx8Y23Kqls=; b=ZIf827+EcfBkAYznkNAzIxjepuJOo5jGJUpKJJAWVlbJLp/v2xx9WndG0Zfv0Y30pp QVoiE4x5/isYQS6r529V5FuPTwvJXepA8/Ig/UwtVbwxtSPhO5eBWW/xZu7GdKR2hwRD noJsJ3Ur8bMBMS1eL71R7PXbr8K1C4T4AmgEfBVK5/+rkrUOvlO+o5qBs5UJm8vfEFH8 iWvYMGLx9ZnbGLPu3ZtgRIXCqWDVhmvwLhi1ORJZdwX1PxFlhSsfJbPaCpw4raGaKYCV 43iNhXkj/THFgjjQb9mrwbCVq0tdVwLBhSI29IV5lijsBJ9zHJayAmn0j12b9dgg/i30 Q+4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=migmFG7UQDaW8MeCncNCopIOLc7/KCi4nAx8Y23Kqls=; b=urA243WaUe41L6D4wpj8acq9laKvGjHXeeFzSRc8Hn2S+aYrurSGFcn3Lb5CVWunQQ eJ3aNH+nn6t46AhalIaG/hKBL5BO/EB5eWCiWVDMRu+x/tgxcCJOpk5zzOVeMqXU/D/9 rUH0a+je5K2Oh2V5mwoBvJRnqHZfNZaDukRL7ekhWGc22Voy+B+3SgOxxWTNQlINnPQU pnB7IpNhkfBWoajDctV7iBAMNh0Yuy+QNGtmsmKctLy9YmrEpA69SxDDXChIoBrw8IYW P/B1B7+bB4oduwqaQLYvTunVmCvJUYAqMXA49rF702iKyQ/03NQD8ehSkny81U+fgBXf 7Xkg== X-Gm-Message-State: AOAM531xwm/6yn3hxhZ88Mi9U75yxRDJea4atMNQ/IACBXtrfLjjsnbS uVcOETTXILHgI1aaBey0eECr03kN00Ik X-Google-Smtp-Source: ABdhPJzpSBL0VJdokvTlUTI5d1w3wFKyOwiaPENifHATjNeM+qviCG3MD0z5nH5EeDzE1oVe+ERWkd3RD0Jr X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:84d5:0:b0:2d1:e85:bf04 with SMTP id u204-20020a8184d5000000b002d10e85bf04mr27926930ywf.465.1645593877093; Tue, 22 Feb 2022 21:24:37 -0800 (PST) Date: Tue, 22 Feb 2022 21:21:59 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-24-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 23/47] mm: asi: Add support for mapping all userspace memory into ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: BAD36C0005 X-Stat-Signature: rekkryrfehajpeqou6itdtuypkhhne4y Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZIf827+E; spf=pass (imf10.hostedemail.com: domain of 3FcUVYgcKCA4x81owr6u22uzs.q20zw18B-00y9oqy.25u@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3FcUVYgcKCA4x81owr6u22uzs.q20zw18B-00y9oqy.25u@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1645593877-484414 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds a new ASI class flag, ASI_MAP_ALL_USERSPACE, which if set, would automatically map all userspace addresses into that ASI address space. This is achieved by lazily cloning the userspace PGD entries during page faults encountered while in that restricted address space. When the userspace PGD entry is cleared (e.g. in munmap()), we go through all restricted address spaces with the ASI_MAP_ALL_USERSPACE flag and clear the corresponding entry in those address spaces as well. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 2 + arch/x86/mm/asi.c | 81 ++++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 7 ++++ mm/memory.c | 2 + 4 files changed, 92 insertions(+) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 2dc465f78bcc..062ccac07fd9 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -68,6 +68,8 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb); void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); void asi_sync_mapping(struct asi *asi, void *addr, size_t len); void asi_do_lazy_map(struct asi *asi, size_t addr); +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr); +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr); static inline void asi_init_thread_state(struct thread_struct *thread) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ac35323193a3..a3d96be76fa9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -702,6 +702,41 @@ static bool is_addr_in_local_nonsensitive_range(size_t addr) addr < VMALLOC_GLOBAL_NONSENSITIVE_START; } +static void asi_clone_user_pgd(struct asi *asi, size_t addr) +{ + pgd_t *src = pgd_offset_pgd(asi->mm->pgd, addr); + pgd_t *dst = pgd_offset_pgd(asi->pgd, addr); + pgdval_t old_src, curr_src; + + if (pgd_val(*dst)) + return; + + VM_BUG_ON(!irqs_disabled()); + + /* + * This synchronizes against the PGD entry getting cleared by + * free_pgd_range(). That path has the following steps: + * 1. pgd_clear + * 2. asi_clear_user_pgd + * 3. Remote TLB Flush + * 4. Free page tables + * + * (3) will be blocked for the duration of this function because the + * IPI will remain pending until interrupts are re-enabled. + * + * The following loop ensures that if we read the PGD value before + * (1) and write it after (2), we will re-read the value and write + * the new updated value. + */ + curr_src = pgd_val(*src); + do { + set_pgd(dst, __pgd(curr_src)); + smp_mb(); + old_src = curr_src; + curr_src = pgd_val(*src); + } while (old_src != curr_src); +} + void asi_do_lazy_map(struct asi *asi, size_t addr) { if (!static_cpu_has(X86_FEATURE_ASI) || !asi) @@ -710,6 +745,9 @@ void asi_do_lazy_map(struct asi *asi, size_t addr) if ((asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) && is_addr_in_local_nonsensitive_range(addr)) asi_clone_pgd(asi->pgd, asi->mm->asi[0].pgd, addr); + else if ((asi->class->flags & ASI_MAP_ALL_USERSPACE) && + addr < TASK_SIZE_MAX) + asi_clone_user_pgd(asi, addr); } /* @@ -766,3 +804,46 @@ void __init asi_vmalloc_init(void) VM_BUG_ON(vmalloc_local_nonsensitive_end >= VMALLOC_END); VM_BUG_ON(vmalloc_global_nonsensitive_start <= VMALLOC_START); } + +static void __asi_clear_user_pgd(struct mm_struct *mm, size_t addr) +{ + uint i; + + if (!static_cpu_has(X86_FEATURE_ASI) || !mm_asi_enabled(mm)) + return; + + /* + * This function is called right after pgd_clear/p4d_clear. + * We need to be sure that the preceding pXd_clear is visible before + * the ASI pgd clears below. Compare with asi_clone_user_pgd(). + */ + smp_mb__before_atomic(); + + /* + * We need to ensure that the ASI PGD tables do not get freed from + * under us. We can potentially use RCU to avoid that, but since + * this path is probably not going to be too performance sensitive, + * so we just acquire the lock to block asi_destroy(). + */ + mutex_lock(&mm->asi_init_lock); + + for (i = 1; i < ASI_MAX_NUM; i++) + if (mm->asi[i].class && + (mm->asi[i].class->flags & ASI_MAP_ALL_USERSPACE)) + set_pgd(pgd_offset_pgd(mm->asi[i].pgd, addr), + native_make_pgd(0)); + + mutex_unlock(&mm->asi_init_lock); +} + +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr) +{ + if (pgtable_l5_enabled()) + __asi_clear_user_pgd(mm, addr); +} + +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) +{ + if (!pgtable_l5_enabled()) + __asi_clear_user_pgd(mm, addr); +} diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 7c50d8b64fa4..8513d0d7865a 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -6,6 +6,7 @@ /* ASI class flags */ #define ASI_MAP_STANDARD_NONSENSITIVE 1 +#define ASI_MAP_ALL_USERSPACE 2 #ifndef CONFIG_ADDRESS_SPACE_ISOLATION @@ -85,6 +86,12 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) { } static inline void asi_do_lazy_map(struct asi *asi, size_t addr) { } +static inline +void asi_clear_user_pgd(struct mm_struct *mm, size_t addr) { } + +static inline +void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } diff --git a/mm/memory.c b/mm/memory.c index 8f1de811a1dc..667ece86e051 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -296,6 +296,7 @@ static inline void free_pud_range(struct mmu_gather *tlb, p4d_t *p4d, pud = pud_offset(p4d, start); p4d_clear(p4d); + asi_clear_user_p4d(tlb->mm, start); pud_free_tlb(tlb, pud, start); mm_dec_nr_puds(tlb->mm); } @@ -330,6 +331,7 @@ static inline void free_p4d_range(struct mmu_gather *tlb, pgd_t *pgd, p4d = p4d_offset(pgd, start); pgd_clear(pgd); + asi_clear_user_pgd(tlb->mm, start); p4d_free_tlb(tlb, p4d, start); } From patchwork Wed Feb 23 05:22:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756384 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A8FB7C433EF for ; Wed, 23 Feb 2022 05:24:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DB148D001A; Wed, 23 Feb 2022 00:24:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 389FC8D0001; Wed, 23 Feb 2022 00:24:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DE668D001A; Wed, 23 Feb 2022 00:24:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id F3DE08D0001 for ; Wed, 23 Feb 2022 00:24:40 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B7EC71816393E for ; Wed, 23 Feb 2022 05:24:40 +0000 (UTC) X-FDA: 79172904720.15.5AC1F14 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 20D19140002 for ; Wed, 23 Feb 2022 05:24:39 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id j17-20020a25ec11000000b0061dabf74012so26622318ybh.15 for ; Tue, 22 Feb 2022 21:24:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ztkGsqUqkQcfbxlBdMnsFZA5zf9U5+9bkMDZbcQFZD8=; b=n78DK1+xw1pRWsJW1CFG6TQ6Gavlq4ZQ5W8oD9M2VCMnu8PrU3OyRhoo6bkFuFB9Bp w2aijS13h+Op8okZ6CS3EIZUB1IYhktNxmLMznN4orFxlVaThvl2eVvRB8zbVhIFGOvq jpsqzPo5HOiq6C+toVkqrWMJLKA9J7cIFPDVog0m9lqNblUfFa8H0hzg7NsV5GaTgQ4A 5jBI/lIXeOUGWHVKqN95Y8V5NuPIznfs6f5wBqsu6+GDk2tYuPnXsBHJNOaqpE8YL9hS tuNt6ye+jTS2wLjdTDudRIQlCeKYaeA1rt8TrSn8xLY0Ts3RiaaKU5y9wgWJ9NQ/XyT5 wAiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ztkGsqUqkQcfbxlBdMnsFZA5zf9U5+9bkMDZbcQFZD8=; b=psjiUfYMxKQUR7Rxjs8zQ6AgbPur0tYVRsDB/8kvLJSLRGW3DQZ5F56pddsKAJOFGy V9ZEsAKZ1R6NR2we6IlLzkY4aPA0jNYn2ObX474lsA/sHrwR+h4aWDlFPNuG6YLhzMET Eaj05LhIlWoAsUS6iESd0GH1TbIFdKDHGPJWNh7vSg3tb9wMgU7JNTEzXxK5q7JX+V0u hgh56w/mWZC+tZvXcTJ7bkf7rSc60EVY2oGJI+SqIH58fiJk4e6NO40zB8xgKrqMGsH0 Bl/3Ep76SrOsPERNejBGH1uYNbuKQKBMZLM42Gm6B6dUIRXQiwL/XH1A2NKPEMJiEPF7 aZCQ== X-Gm-Message-State: AOAM5338pJem1GHdtZc+igqHv/04uXhtCgEK+6g6VPHfdHUJO5o2+FuT 7eXlGiopzxJR/bde6Mqw0xYAXO3/kIFN X-Google-Smtp-Source: ABdhPJzcVkgi3JMg9FekP/jbB0HbpE3UWWhA+GGvHutupthetmKemwuv8RCt3TmZZl6hj5AyHVL2xNuWi5Ck X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:7694:0:b0:624:a2d9:c8f0 with SMTP id r142-20020a257694000000b00624a2d9c8f0mr10070639ybc.523.1645593879400; Tue, 22 Feb 2022 21:24:39 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:00 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-25-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 24/47] mm: asi: Support for local non-sensitive slab caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 20D19140002 X-Stat-Signature: csmfxrqbeu4rgnek4u4nid6de8m7ah6d X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=n78DK1+x; spf=pass (imf23.hostedemail.com: domain of 3F8UVYgcKCBAzA3qyt8w44w1u.s421y3AD-220Bqs0.47w@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3F8UVYgcKCBAzA3qyt8w44w1u.s421y3AD-220Bqs0.47w@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593879-176761 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A new flag SLAB_LOCAL_NONSENSITIVE is added to designate that a slab cache can be used for local non-sensitive allocations. For such caches, a per-process child cache will be created when a process tries to make an allocation from that cache for the first time, similar to the per-memcg child caches that used to exist before the object based memcg charging mechanism. (A lot of the infrastructure for handling these child caches is derived from the original per-memcg cache code). If a cache only has SLAB_LOCAL_NONSENSITIVE, then all allocations from that cache will automatically be considered locally non-sensitive. But if a cache has both SLAB_LOCAL_NONSENSITIVE and SLAB_GLOBAL_NONSENSITIVE, then each allocation must specify one of __GFP_LOCAL_NONSENSITIVE or __GFP_GLOBAL_NONSENSITIVE. Note that the first locally non-sensitive allocation that a process makes from a given slab cache must occur from a sleepable context. If that is not the case, then a new kmem_cache_precreate_local* API must be called from a sleepable context before the first allocation. Signed-off-by: Junaid Shahid --- arch/x86/mm/asi.c | 5 + include/linux/mm_types.h | 4 + include/linux/sched/mm.h | 12 ++ include/linux/slab.h | 38 +++- include/linux/slab_def.h | 4 + kernel/fork.c | 3 +- mm/slab.c | 41 ++++- mm/slab.h | 151 +++++++++++++++- mm/slab_common.c | 363 ++++++++++++++++++++++++++++++++++++++- 9 files changed, 602 insertions(+), 19 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index a3d96be76fa9..6b9a0f5ab391 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -4,6 +4,7 @@ #include #include #include +#include #include #include @@ -455,6 +456,8 @@ int asi_init_mm_state(struct mm_struct *mm) memset(mm->asi, 0, sizeof(mm->asi)); mm->asi_enabled = false; + RCU_INIT_POINTER(mm->local_slab_caches, NULL); + mm->local_slab_caches_array_size = 0; /* * TODO: In addition to a cgroup flag, we may also want a per-process @@ -482,6 +485,8 @@ void asi_free_mm_state(struct mm_struct *mm) if (!boot_cpu_has(X86_FEATURE_ASI) || !mm->asi_enabled) return; + free_local_slab_caches(mm); + asi_free_pgd_range(&mm->asi[0], pgd_index(ASI_LOCAL_MAP), pgd_index(ASI_LOCAL_MAP + PFN_PHYS(max_possible_pfn)) + 1); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index e6980ae31323..56511adc263e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -517,6 +517,10 @@ struct mm_struct { struct asi asi[ASI_MAX_NUM]; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_cache * __rcu *local_slab_caches; + uint local_slab_caches_array_size; +#endif /** * @mm_users: The number of users including userspace. * diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h index aca874d33fe6..c9122d4436d4 100644 --- a/include/linux/sched/mm.h +++ b/include/linux/sched/mm.h @@ -37,9 +37,21 @@ static inline void mmgrab(struct mm_struct *mm) } extern void __mmdrop(struct mm_struct *mm); +extern void mmdrop_async(struct mm_struct *mm); static inline void mmdrop(struct mm_struct *mm) { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * We really only need to do this if we are in an atomic context. + * Unfortunately, there doesn't seem to be a reliable way to detect + * atomic context across all kernel configs. So we just always do async. + */ + if (rcu_access_pointer(mm->local_slab_caches)) { + mmdrop_async(mm); + return; + } +#endif /* * The implicit full barrier implied by atomic_dec_and_test() is * required by the membarrier system call before returning to diff --git a/include/linux/slab.h b/include/linux/slab.h index 7b8a3853d827..ef9c73c0d874 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -93,6 +93,8 @@ /* Avoid kmemleak tracing */ #define SLAB_NOLEAKTRACE ((slab_flags_t __force)0x00800000U) +/* 0x01000000U is used below for SLAB_LOCAL_NONSENSITIVE */ + /* Fault injection mark */ #ifdef CONFIG_FAILSLAB # define SLAB_FAILSLAB ((slab_flags_t __force)0x02000000U) @@ -121,8 +123,10 @@ #define SLAB_DEACTIVATED ((slab_flags_t __force)0x10000000U) #ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define SLAB_LOCAL_NONSENSITIVE ((slab_flags_t __force)0x01000000U) #define SLAB_GLOBAL_NONSENSITIVE ((slab_flags_t __force)0x20000000U) #else +#define SLAB_LOCAL_NONSENSITIVE 0 #define SLAB_GLOBAL_NONSENSITIVE 0 #endif @@ -377,7 +381,8 @@ static __always_inline struct kmem_cache *get_kmalloc_cache(gfp_t flags, { #ifdef CONFIG_ADDRESS_SPACE_ISOLATION - if (static_asi_enabled() && (flags & __GFP_GLOBAL_NONSENSITIVE)) + if (static_asi_enabled() && + (flags & (__GFP_GLOBAL_NONSENSITIVE | __GFP_LOCAL_NONSENSITIVE))) return nonsensitive_kmalloc_caches[kmalloc_type(flags)][index]; #endif return kmalloc_caches[kmalloc_type(flags)][index]; @@ -800,4 +805,35 @@ int slab_dead_cpu(unsigned int cpu); #define slab_dead_cpu NULL #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags); +void free_local_slab_caches(struct mm_struct *mm); +int kmem_cache_precreate_local(struct kmem_cache *s); +int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t flags); + +#else + +static inline +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags) +{ + return NULL; +} + +static inline void free_local_slab_caches(struct mm_struct *mm) { } + +static inline int kmem_cache_precreate_local(struct kmem_cache *s) +{ + return 0; +} + +static inline int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t flags) +{ + return 0; +} + +#endif + #endif /* _LINUX_SLAB_H */ diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h index 3aa5e1e73ab6..53cbc1f40031 100644 --- a/include/linux/slab_def.h +++ b/include/linux/slab_def.h @@ -81,6 +81,10 @@ struct kmem_cache { unsigned int *random_seq; #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_local_cache_info local_cache_info; +#endif + unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ diff --git a/kernel/fork.c b/kernel/fork.c index 68b3aeab55ac..d7f55de00947 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -714,13 +714,14 @@ static void mmdrop_async_fn(struct work_struct *work) __mmdrop(mm); } -static void mmdrop_async(struct mm_struct *mm) +void mmdrop_async(struct mm_struct *mm) { if (unlikely(atomic_dec_and_test(&mm->mm_count))) { INIT_WORK(&mm->async_put_work, mmdrop_async_fn); schedule_work(&mm->async_put_work); } } +EXPORT_SYMBOL(mmdrop_async); static inline void free_signal_struct(struct signal_struct *sig) { diff --git a/mm/slab.c b/mm/slab.c index 5a928d95d67b..44cf6d127a4c 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -1403,6 +1403,8 @@ static void kmem_freepages(struct kmem_cache *cachep, struct page *page) /* In union with page->mapping where page allocator expects NULL */ page->slab_cache = NULL; + restore_page_nonsensitive_metadata(page, cachep); + if (current->reclaim_state) current->reclaim_state->reclaimed_slab += 1 << order; unaccount_slab_page(page, order, cachep); @@ -2061,11 +2063,9 @@ int __kmem_cache_create(struct kmem_cache *cachep, slab_flags_t flags) cachep->allocflags |= GFP_DMA32; if (flags & SLAB_RECLAIM_ACCOUNT) cachep->allocflags |= __GFP_RECLAIMABLE; - if (flags & SLAB_GLOBAL_NONSENSITIVE) - cachep->allocflags |= __GFP_GLOBAL_NONSENSITIVE; cachep->size = size; cachep->reciprocal_buffer_size = reciprocal_value(size); - + set_nonsensitive_cache_params(cachep); #if DEBUG /* * If we're going to use the generic kernel_map_pages() @@ -3846,8 +3846,8 @@ static int setup_kmem_cache_nodes(struct kmem_cache *cachep, gfp_t gfp) } /* Always called with the slab_mutex held */ -static int do_tune_cpucache(struct kmem_cache *cachep, int limit, - int batchcount, int shared, gfp_t gfp) +static int __do_tune_cpucache(struct kmem_cache *cachep, int limit, + int batchcount, int shared, gfp_t gfp) { struct array_cache __percpu *cpu_cache, *prev; int cpu; @@ -3892,6 +3892,29 @@ static int do_tune_cpucache(struct kmem_cache *cachep, int limit, return setup_kmem_cache_nodes(cachep, gfp); } +static int do_tune_cpucache(struct kmem_cache *cachep, int limit, + int batchcount, int shared, gfp_t gfp) +{ + int ret; + struct kmem_cache *c; + + ret = __do_tune_cpucache(cachep, limit, batchcount, shared, gfp); + + if (slab_state < FULL) + return ret; + + if ((ret < 0) || !is_root_cache(cachep)) + return ret; + + lockdep_assert_held(&slab_mutex); + for_each_child_cache(c, cachep) { + /* return value determined by the root cache only */ + __do_tune_cpucache(c, limit, batchcount, shared, gfp); + } + + return ret; +} + /* Called with slab_mutex held always */ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp) { @@ -3904,6 +3927,14 @@ static int enable_cpucache(struct kmem_cache *cachep, gfp_t gfp) if (err) goto end; + if (!is_root_cache(cachep)) { + struct kmem_cache *root = get_root_cache(cachep); + + limit = root->limit; + shared = root->shared; + batchcount = root->batchcount; + } + /* * The head array serves three purposes: * - create a LIFO ordering, i.e. return objects that are cache-warm diff --git a/mm/slab.h b/mm/slab.h index f190f4fc0286..b9e11038be27 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -5,6 +5,45 @@ * Internal slab definitions */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct kmem_local_cache_info { + /* Valid for child caches. NULL for the root cache itself. */ + struct kmem_cache *root_cache; + union { + /* For root caches */ + struct { + int cache_id; + struct list_head __root_caches_node; + struct list_head children; + /* + * For SLAB_LOCAL_NONSENSITIVE root caches, this points + * to the cache to be used for local non-sensitive + * allocations from processes without ASI enabled. + * + * For root caches with only SLAB_LOCAL_NONSENSITIVE, + * the root cache itself is used as the sensitive cache. + * + * For root caches with both SLAB_LOCAL_NONSENSITIVE and + * SLAB_GLOBAL_NONSENSITIVE, the sensitive cache will be + * a child cache allocated on-demand. + * + * For non-sensiitve kmalloc caches, the sensitive cache + * will just be the corresponding regular kmalloc cache. + */ + struct kmem_cache *sensitive_cache; + }; + + /* For child (process-local) caches */ + struct { + struct mm_struct *mm; + struct list_head children_node; + }; + }; +}; + +#endif + #ifdef CONFIG_SLOB /* * Common fields provided in kmem_cache by all slab allocators @@ -128,8 +167,7 @@ static inline slab_flags_t kmem_cache_flags(unsigned int object_size, } #endif -/* This will also include SLAB_LOCAL_NONSENSITIVE in a later patch. */ -#define SLAB_NONSENSITIVE SLAB_GLOBAL_NONSENSITIVE +#define SLAB_NONSENSITIVE (SLAB_GLOBAL_NONSENSITIVE | SLAB_LOCAL_NONSENSITIVE) /* Legal flag mask for kmem_cache_create(), for various configurations */ #define SLAB_CORE_FLAGS (SLAB_HWCACHE_ALIGN | SLAB_CACHE_DMA | \ @@ -251,6 +289,99 @@ static inline bool kmem_cache_debug_flags(struct kmem_cache *s, slab_flags_t fla return false; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +/* List of all root caches. */ +extern struct list_head slab_root_caches; +#define root_caches_node local_cache_info.__root_caches_node + +/* + * Iterate over all child caches of the given root cache. The caller must hold + * slab_mutex. + */ +#define for_each_child_cache(iter, root) \ + list_for_each_entry(iter, &(root)->local_cache_info.children, \ + local_cache_info.children_node) + +static inline bool is_root_cache(struct kmem_cache *s) +{ + return !s->local_cache_info.root_cache; +} + +static inline bool slab_equal_or_root(struct kmem_cache *s, + struct kmem_cache *p) +{ + return p == s || p == s->local_cache_info.root_cache; +} + +/* + * We use suffixes to the name in child caches because we can't have caches + * created in the system with the same name. But when we print them + * locally, better refer to them with the base name + */ +static inline const char *cache_name(struct kmem_cache *s) +{ + if (!is_root_cache(s)) + s = s->local_cache_info.root_cache; + return s->name; +} + +static inline struct kmem_cache *get_root_cache(struct kmem_cache *s) +{ + if (is_root_cache(s)) + return s; + return s->local_cache_info.root_cache; +} + +static inline +void restore_page_nonsensitive_metadata(struct page *page, + struct kmem_cache *cachep) +{ + if (PageLocalNonSensitive(page)) { + VM_BUG_ON(is_root_cache(cachep)); + page->asi_mm = cachep->local_cache_info.mm; + } +} + +void set_nonsensitive_cache_params(struct kmem_cache *s); + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +#define slab_root_caches slab_caches +#define root_caches_node list + +#define for_each_child_cache(iter, root) \ + for ((void)(iter), (void)(root); 0; ) + +static inline bool is_root_cache(struct kmem_cache *s) +{ + return true; +} + +static inline bool slab_equal_or_root(struct kmem_cache *s, + struct kmem_cache *p) +{ + return s == p; +} + +static inline const char *cache_name(struct kmem_cache *s) +{ + return s->name; +} + +static inline struct kmem_cache *get_root_cache(struct kmem_cache *s) +{ + return s; +} + +static inline void restore_page_nonsensitive_metadata(struct page *page, + struct kmem_cache *cachep) +{ } + +static inline void set_nonsensitive_cache_params(struct kmem_cache *s) { } + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #ifdef CONFIG_MEMCG_KMEM int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s, gfp_t gfp, bool new_page); @@ -449,11 +580,12 @@ static inline struct kmem_cache *cache_from_obj(struct kmem_cache *s, void *x) struct kmem_cache *cachep; if (!IS_ENABLED(CONFIG_SLAB_FREELIST_HARDENED) && + !(s->flags & SLAB_LOCAL_NONSENSITIVE) && !kmem_cache_debug_flags(s, SLAB_CONSISTENCY_CHECKS)) return s; cachep = virt_to_cache(x); - if (WARN(cachep && cachep != s, + if (WARN(cachep && !slab_equal_or_root(cachep, s), "%s: Wrong slab cache. %s but object is from %s\n", __func__, s->name, cachep->name)) print_tracking(cachep, x); @@ -501,11 +633,24 @@ static inline struct kmem_cache *slab_pre_alloc_hook(struct kmem_cache *s, if (static_asi_enabled()) { VM_BUG_ON(!(s->flags & SLAB_GLOBAL_NONSENSITIVE) && (flags & __GFP_GLOBAL_NONSENSITIVE)); + VM_BUG_ON(!(s->flags & SLAB_LOCAL_NONSENSITIVE) && + (flags & __GFP_LOCAL_NONSENSITIVE)); + VM_BUG_ON((s->flags & SLAB_NONSENSITIVE) == SLAB_NONSENSITIVE && + !(flags & (__GFP_LOCAL_NONSENSITIVE | + __GFP_GLOBAL_NONSENSITIVE))); } if (should_failslab(s, flags)) return NULL; + if (static_asi_enabled() && + (!(flags & __GFP_GLOBAL_NONSENSITIVE) && + (s->flags & SLAB_LOCAL_NONSENSITIVE))) { + s = get_local_kmem_cache(s, current->mm, flags); + if (!s) + return NULL; + } + if (!memcg_slab_pre_alloc_hook(s, objcgp, size, flags)) return NULL; diff --git a/mm/slab_common.c b/mm/slab_common.c index 72dee2494bf8..b486b72d6344 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -42,6 +42,13 @@ static void slab_caches_to_rcu_destroy_workfn(struct work_struct *work); static DECLARE_WORK(slab_caches_to_rcu_destroy_work, slab_caches_to_rcu_destroy_workfn); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static DEFINE_IDA(nonsensitive_cache_ids); +static uint max_num_local_slab_caches = 32; + +#endif + /* * Set of flags that will prevent slab merging */ @@ -131,6 +138,69 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr, return i; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +LIST_HEAD(slab_root_caches); + +static void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) +{ + if (root) { + s->local_cache_info.root_cache = root; + list_add(&s->local_cache_info.children_node, + &root->local_cache_info.children); + } else { + s->local_cache_info.cache_id = -1; + INIT_LIST_HEAD(&s->local_cache_info.children); + list_add(&s->root_caches_node, &slab_root_caches); + } +} + +static void cleanup_local_cache_info(struct kmem_cache *s) +{ + if (is_root_cache(s)) { + VM_BUG_ON(!list_empty(&s->local_cache_info.children)); + + list_del(&s->root_caches_node); + if (s->local_cache_info.cache_id >= 0) + ida_free(&nonsensitive_cache_ids, + s->local_cache_info.cache_id); + } else { + struct mm_struct *mm = s->local_cache_info.mm; + struct kmem_cache *root_cache = s->local_cache_info.root_cache; + int id = root_cache->local_cache_info.cache_id; + + list_del(&s->local_cache_info.children_node); + if (mm) { + struct kmem_cache **local_caches = + rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + local_caches[id] = NULL; + } + } +} + +void set_nonsensitive_cache_params(struct kmem_cache *s) +{ + if (s->flags & SLAB_GLOBAL_NONSENSITIVE) { + s->allocflags |= __GFP_GLOBAL_NONSENSITIVE; + VM_BUG_ON(!is_root_cache(s)); + } else if (s->flags & SLAB_LOCAL_NONSENSITIVE) { + if (is_root_cache(s)) + s->local_cache_info.sensitive_cache = s; + else + s->allocflags |= __GFP_LOCAL_NONSENSITIVE; + } +} + +#else + +static inline +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) { } + +static inline void cleanup_local_cache_info(struct kmem_cache *s) { } + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + /* * Figure out what the alignment of the objects will be given a set of * flags, a user specified alignment and the size of the objects. @@ -168,6 +238,9 @@ int slab_unmergeable(struct kmem_cache *s) if (slab_nomerge || (s->flags & SLAB_NEVER_MERGE)) return 1; + if (!is_root_cache(s)) + return 1; + if (s->ctor) return 1; @@ -202,7 +275,7 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align, if (flags & SLAB_NEVER_MERGE) return NULL; - list_for_each_entry_reverse(s, &slab_caches, list) { + list_for_each_entry_reverse(s, &slab_root_caches, root_caches_node) { if (slab_unmergeable(s)) continue; @@ -254,6 +327,8 @@ static struct kmem_cache *create_cache(const char *name, s->useroffset = useroffset; s->usersize = usersize; + init_local_cache_info(s, root_cache); + err = __kmem_cache_create(s, flags); if (err) goto out_free_cache; @@ -266,6 +341,7 @@ static struct kmem_cache *create_cache(const char *name, return s; out_free_cache: + cleanup_local_cache_info(s); kmem_cache_free(kmem_cache, s); goto out; } @@ -459,6 +535,7 @@ static int shutdown_cache(struct kmem_cache *s) return -EBUSY; list_del(&s->list); + cleanup_local_cache_info(s); if (s->flags & SLAB_TYPESAFE_BY_RCU) { #ifdef SLAB_SUPPORTS_SYSFS @@ -480,6 +557,36 @@ static int shutdown_cache(struct kmem_cache *s) return 0; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static int shutdown_child_caches(struct kmem_cache *s) +{ + struct kmem_cache *c, *c2; + int r; + + VM_BUG_ON(!is_root_cache(s)); + + lockdep_assert_held(&slab_mutex); + + list_for_each_entry_safe(c, c2, &s->local_cache_info.children, + local_cache_info.children_node) { + r = shutdown_cache(c); + if (r) + return r; + } + + return 0; +} + +#else + +static inline int shutdown_child_caches(struct kmem_cache *s) +{ + return 0; +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + void slab_kmem_cache_release(struct kmem_cache *s) { __kmem_cache_release(s); @@ -501,7 +608,10 @@ void kmem_cache_destroy(struct kmem_cache *s) if (s->refcount) goto out_unlock; - err = shutdown_cache(s); + err = shutdown_child_caches(s); + if (!err) + err = shutdown_cache(s); + if (err) { pr_err("%s %s: Slab cache still has objects\n", __func__, s->name); @@ -651,6 +761,8 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name, s->useroffset = useroffset; s->usersize = usersize; + init_local_cache_info(s, NULL); + err = __kmem_cache_create(s, flags); if (err) @@ -897,6 +1009,13 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type, slab_flags_t flags) */ if (IS_ENABLED(CONFIG_MEMCG_KMEM) && (type == KMALLOC_NORMAL)) caches[type][idx]->refcount = -1; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + + if (flags & SLAB_NONSENSITIVE) + caches[type][idx]->local_cache_info.sensitive_cache = + kmalloc_caches[type][idx]; +#endif } /* @@ -1086,12 +1205,12 @@ static void print_slabinfo_header(struct seq_file *m) void *slab_start(struct seq_file *m, loff_t *pos) { mutex_lock(&slab_mutex); - return seq_list_start(&slab_caches, *pos); + return seq_list_start(&slab_root_caches, *pos); } void *slab_next(struct seq_file *m, void *p, loff_t *pos) { - return seq_list_next(p, &slab_caches, pos); + return seq_list_next(p, &slab_root_caches, pos); } void slab_stop(struct seq_file *m, void *p) @@ -1099,6 +1218,24 @@ void slab_stop(struct seq_file *m, void *p) mutex_unlock(&slab_mutex); } +static void +accumulate_children_slabinfo(struct kmem_cache *s, struct slabinfo *info) +{ + struct kmem_cache *c; + struct slabinfo sinfo; + + for_each_child_cache(c, s) { + memset(&sinfo, 0, sizeof(sinfo)); + get_slabinfo(c, &sinfo); + + info->active_slabs += sinfo.active_slabs; + info->num_slabs += sinfo.num_slabs; + info->shared_avail += sinfo.shared_avail; + info->active_objs += sinfo.active_objs; + info->num_objs += sinfo.num_objs; + } +} + static void cache_show(struct kmem_cache *s, struct seq_file *m) { struct slabinfo sinfo; @@ -1106,8 +1243,10 @@ static void cache_show(struct kmem_cache *s, struct seq_file *m) memset(&sinfo, 0, sizeof(sinfo)); get_slabinfo(s, &sinfo); + accumulate_children_slabinfo(s, &sinfo); + seq_printf(m, "%-17s %6lu %6lu %6u %4u %4d", - s->name, sinfo.active_objs, sinfo.num_objs, s->size, + cache_name(s), sinfo.active_objs, sinfo.num_objs, s->size, sinfo.objects_per_slab, (1 << sinfo.cache_order)); seq_printf(m, " : tunables %4u %4u %4u", @@ -1120,9 +1259,9 @@ static void cache_show(struct kmem_cache *s, struct seq_file *m) static int slab_show(struct seq_file *m, void *p) { - struct kmem_cache *s = list_entry(p, struct kmem_cache, list); + struct kmem_cache *s = list_entry(p, struct kmem_cache, root_caches_node); - if (p == slab_caches.next) + if (p == slab_root_caches.next) print_slabinfo_header(m); cache_show(s, m); return 0; @@ -1148,14 +1287,14 @@ void dump_unreclaimable_slab(void) pr_info("Unreclaimable slab info:\n"); pr_info("Name Used Total\n"); - list_for_each_entry(s, &slab_caches, list) { + list_for_each_entry(s, &slab_root_caches, root_caches_node) { if (s->flags & SLAB_RECLAIM_ACCOUNT) continue; get_slabinfo(s, &sinfo); if (sinfo.num_objs > 0) - pr_info("%-17s %10luKB %10luKB\n", s->name, + pr_info("%-17s %10luKB %10luKB\n", cache_name(s), (sinfo.active_objs * s->size) / 1024, (sinfo.num_objs * s->size) / 1024); } @@ -1361,3 +1500,209 @@ int should_failslab(struct kmem_cache *s, gfp_t gfpflags) return 0; } ALLOW_ERROR_INJECTION(should_failslab, ERRNO); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static int resize_local_slab_caches_array(struct mm_struct *mm, gfp_t flags) +{ + struct kmem_cache **new_array; + struct kmem_cache **old_array = + rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + + new_array = kcalloc(max_num_local_slab_caches, + sizeof(struct kmem_cache *), flags); + if (!new_array) + return -ENOMEM; + + if (old_array) + memcpy(new_array, old_array, mm->local_slab_caches_array_size * + sizeof(struct kmem_cache *)); + + rcu_assign_pointer(mm->local_slab_caches, new_array); + smp_store_release(&mm->local_slab_caches_array_size, + max_num_local_slab_caches); + + if (old_array) { + synchronize_rcu(); + kfree(old_array); + } + + return 0; +} + +static int get_or_alloc_cache_id(struct kmem_cache *root_cache, gfp_t flags) +{ + int id = root_cache->local_cache_info.cache_id; + + if (id >= 0) + return id; + + id = ida_alloc_max(&nonsensitive_cache_ids, + max_num_local_slab_caches - 1, flags); + if (id == -ENOSPC) { + max_num_local_slab_caches *= 2; + id = ida_alloc_max(&nonsensitive_cache_ids, + max_num_local_slab_caches - 1, flags); + } + + if (id >= 0) + root_cache->local_cache_info.cache_id = id; + + return id; +} + +static struct kmem_cache *create_local_kmem_cache(struct kmem_cache *root_cache, + struct mm_struct *mm, + gfp_t flags) +{ + char *name; + struct kmem_cache *s = NULL; + slab_flags_t slab_flags = root_cache->flags & CACHE_CREATE_MASK; + struct kmem_cache **cache_ptr; + + flags &= GFP_RECLAIM_MASK; + + mutex_lock(&slab_mutex); + + if (mm_asi_enabled(mm)) { + struct kmem_cache **caches; + int id = get_or_alloc_cache_id(root_cache, flags); + + if (id < 0) + goto out; + + flags |= __GFP_ACCOUNT; + + if (mm->local_slab_caches_array_size <= id && + resize_local_slab_caches_array(mm, flags) < 0) + goto out; + + caches = rcu_dereference_protected(mm->local_slab_caches, + lockdep_is_held(&slab_mutex)); + cache_ptr = &caches[id]; + if (*cache_ptr) { + s = *cache_ptr; + goto out; + } + + slab_flags &= ~SLAB_GLOBAL_NONSENSITIVE; + name = kasprintf(flags, "%s(%d:%s)", root_cache->name, + task_pid_nr(mm->owner), mm->owner->comm); + if (!name) + goto out; + + } else { + cache_ptr = &root_cache->local_cache_info.sensitive_cache; + if (*cache_ptr) { + s = *cache_ptr; + goto out; + } + + slab_flags &= ~SLAB_NONSENSITIVE; + name = kasprintf(flags, "%s(sensitive)", root_cache->name); + if (!name) + goto out; + } + + s = create_cache(name, + root_cache->object_size, + root_cache->align, + slab_flags, + root_cache->useroffset, root_cache->usersize, + root_cache->ctor, root_cache); + if (IS_ERR(s)) { + pr_info("Unable to create child kmem cache %s. Err %ld", + name, PTR_ERR(s)); + kfree(name); + s = NULL; + goto out; + } + + if (mm_asi_enabled(mm)) + s->local_cache_info.mm = mm; + + smp_store_release(cache_ptr, s); +out: + mutex_unlock(&slab_mutex); + + return s; +} + +struct kmem_cache *get_local_kmem_cache(struct kmem_cache *s, + struct mm_struct *mm, gfp_t flags) +{ + struct kmem_cache *local_cache = NULL; + + if (!(s->flags & SLAB_LOCAL_NONSENSITIVE) || !is_root_cache(s)) + return s; + + if (mm_asi_enabled(mm)) { + struct kmem_cache **caches; + int id = READ_ONCE(s->local_cache_info.cache_id); + uint array_size = smp_load_acquire( + &mm->local_slab_caches_array_size); + + if (id >= 0 && array_size > id) { + rcu_read_lock(); + caches = rcu_dereference(mm->local_slab_caches); + local_cache = smp_load_acquire(&caches[id]); + rcu_read_unlock(); + } + } else { + local_cache = + smp_load_acquire(&s->local_cache_info.sensitive_cache); + } + + if (!local_cache) + local_cache = create_local_kmem_cache(s, mm, flags); + + return local_cache; +} + +void free_local_slab_caches(struct mm_struct *mm) +{ + uint i; + struct kmem_cache **caches = + rcu_dereference_protected(mm->local_slab_caches, + atomic_read(&mm->mm_count) == 0); + + if (!caches) + return; + + cpus_read_lock(); + mutex_lock(&slab_mutex); + + for (i = 0; i < mm->local_slab_caches_array_size; i++) + if (caches[i]) + WARN_ON(shutdown_cache(caches[i])); + + mutex_unlock(&slab_mutex); + cpus_read_unlock(); + + kfree(caches); +} + +int kmem_cache_precreate_local(struct kmem_cache *s) +{ + VM_BUG_ON(!is_root_cache(s)); + VM_BUG_ON(!in_task()); + might_sleep(); + + return get_local_kmem_cache(s, current->mm, GFP_KERNEL) ? 0 : -ENOMEM; +} +EXPORT_SYMBOL(kmem_cache_precreate_local); + +int kmem_cache_precreate_local_kmalloc(size_t size, gfp_t flags) +{ + struct kmem_cache *s = kmalloc_slab(size, + flags | __GFP_LOCAL_NONSENSITIVE); + + if (ZERO_OR_NULL_PTR(s)) + return 0; + + return kmem_cache_precreate_local(s); +} +EXPORT_SYMBOL(kmem_cache_precreate_local_kmalloc); + +#endif From patchwork Wed Feb 23 05:22:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756385 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09971C433F5 for ; Wed, 23 Feb 2022 05:24:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 33F228D001B; Wed, 23 Feb 2022 00:24:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C62B8D0001; Wed, 23 Feb 2022 00:24:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 118978D001B; Wed, 23 Feb 2022 00:24:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0233.hostedemail.com [216.40.44.233]) by kanga.kvack.org (Postfix) with ESMTP id 0367C8D0001 for ; Wed, 23 Feb 2022 00:24:43 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id BCCBC1816393E for ; Wed, 23 Feb 2022 05:24:42 +0000 (UTC) X-FDA: 79172904804.25.7C6D754 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 4347980003 for ; Wed, 23 Feb 2022 05:24:42 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d2d45c0df7so163208827b3.1 for ; Tue, 22 Feb 2022 21:24:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=qiOY3I7MTzDOyydGuOsoL3kibVHTWqt/T8NonRDSdjI=; b=gHD/ZzZ2afDsPzQ43uzTfHDeNudIA9BOq9xrr9L14+jP+dH61iVLZ9hzKNk6oCuBOI frJ4Y/pMW9lHMFuHIDT5LntquphVA/hdUftBkeKzRimB/OZsUvqrqt+OTldNL/5M2dvF +PIjZDytqBI9qNUICzwPtZ9WZreG3la5anScSUKXP0fumh99WG+FQaCeyfbwMohmDN9Z TqQY1Ne53dTgZlticBLJ/nLFoMZJ6npZslQ1FhKLNnIG+gBpH+SHHrgjBmREZgI8pcKn GtFQ6xyENvmtRWa1WOq2159X/570g6HXWtqzHRNADNoxBQZmRvsc5EkR5xI3zImo/8Xk CC5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=qiOY3I7MTzDOyydGuOsoL3kibVHTWqt/T8NonRDSdjI=; b=ZnGuNsckVZpSGazdP44hft8Bt99lJeoqPnrNu3TO54e3alXfAO7vsJ2UydK3xrfjXV aU43+GwykJ1pg8EEaz1/MUy01dJR7MgKozoxAHO//nT4AcsmDC2mSuRev8F1IPB8fBtQ 5Id72OuisWsNbaByc1Hd379ykzbOkjpgfNkYXaxN7TfXvp9wW2YfZ+l/HMjOxBSZizlA j5WKj+XnkPLeYaLahsFi2//YqwswXgiEYW8MWwkhCYJwZFW4+n8FKg6/Ti9LPZTxJHnK xvVxPHzqjkyU1ioS+vtYDvTgFtS+LvcgSfZ8YeXBCQuYc9y2D7LmJqVQi9AF4WFdwr5n cVyQ== X-Gm-Message-State: AOAM530OYd/+BMCRtYgYjYcPHGrb5P37pQnKUpFq7SRoyFSoQrGtH9IV OGX84TYe8MfKmDUgMuEomwSWUU5AWre0 X-Google-Smtp-Source: ABdhPJz0RUH+JeSNvxcFS2t+PaXJsk2Yx25dJd/bxX+okavoC9VlrjteMlukvkj4zkv9eNKI+iSQhOk6jOMd X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bad2:0:b0:620:fe28:ff53 with SMTP id a18-20020a25bad2000000b00620fe28ff53mr26732357ybk.340.1645593881600; Tue, 22 Feb 2022 21:24:41 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:01 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-26-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 25/47] mm: asi: Avoid warning from NMI userspace accesses in ASI context From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="gHD/ZzZ2"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 3GcUVYgcKCBI1C5s0vAy66y3w.u64305CF-442Dsu2.69y@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3GcUVYgcKCBI1C5s0vAy66y3w.u64305CF-442Dsu2.69y@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 4347980003 X-Stat-Signature: w7gpp8pz3q1y9eb47yoaoeba77ffp54k X-HE-Tag: 1645593882-559695 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000229, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: nmi_uaccess_okay() emits a warning if current CR3 != mm->pgd. Limit the warning to only when ASI is not active. Signed-off-by: Junaid Shahid --- arch/x86/mm/tlb.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 25bee959d1d3..628f1cd904ac 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1292,7 +1292,8 @@ bool nmi_uaccess_okay(void) if (loaded_mm != current_mm) return false; - VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa())); + VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa()) && + !is_asi_active()); return true; } From patchwork Wed Feb 23 05:22:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756386 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43605C433EF for ; Wed, 23 Feb 2022 05:24:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5B15F8D001C; Wed, 23 Feb 2022 00:24:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 537228D0001; Wed, 23 Feb 2022 00:24:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3896A8D001C; Wed, 23 Feb 2022 00:24:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id 267798D0001 for ; Wed, 23 Feb 2022 00:24:45 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DC2059F842 for ; Wed, 23 Feb 2022 05:24:44 +0000 (UTC) X-FDA: 79172904888.25.4A9D2B1 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf07.hostedemail.com (Postfix) with ESMTP id 76B8540003 for ; Wed, 23 Feb 2022 05:24:44 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id a12-20020a056902056c00b0061dc0f2a94aso26532437ybt.6 for ; Tue, 22 Feb 2022 21:24:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=mHknd275MnGb/PaPBY3xJHVlcaSr+gb2g5yi23zkCMQ=; b=L60sUeBO3icqMJb7NMSmlVXjmIABeVNLCCtZzlOUINB+XKCgvwoL14gcA1/NI1wk1X /ltNckj3/3p7zfm1t1rOui84yRyyoV+XIZUZcTPfV3dYES6ZKCv3iaEJAriiM57Y+pJ2 8u/ZPuM69uvOpnXWpYYvEAAwrxzB6ihF2OY0/HqTUANIyhOPp6jmVN945B9gOsa+htR7 ghj+eGPzbx50ZpS/8LrSXhCGcEICsrhvK12TqRAuRKoUTOubcXhnkqoaG145HMstky5e ZFHHetF4BUFaO5ULQYQ3U7cD5n4sgmCCalGk7gwPkRzwbBni9d6kLYQ28QmTUA8pH14D Kjrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=mHknd275MnGb/PaPBY3xJHVlcaSr+gb2g5yi23zkCMQ=; b=Jk3xZcQAO1T8/iJjkQiMCHN/9OUvZk2Hx1jXdVNOkhwkAQNo/TBdXYhKbgLoaxFl8/ NohfgPnL8hxy0uCXn/ipsFl/VdiiD+c+lAmQJEWWRmyNNIBTMJ9wc0REvrfmp3WZ6QxG fN7zBhtqJ67xwvJVG/JSBow5gg5Z+zdoIeZ9ZfvwxlPm+ijsJrJZn6aklxonlDknFVgs GX+8FgGmWaMi0GQRgKUWrE2x1BzWD4DhX4G7vBe0tRw5CHNDJvy9IYMeF4pahnudhb38 7Qw8I8WAhR3ZtzaQHNWcnsqdzIH10kTW2mClg0CwJ/4iYEW4bMFhyxv25R7GMjy72jNi zguw== X-Gm-Message-State: AOAM531hAa0kpbVsZvJREZl02ML7nkKlCev6tjWx/W9pdRIeIwA/WAWu /3AvEt38UqoamfQYAhgZrjjVVjkNhEb1 X-Google-Smtp-Source: ABdhPJw5GHEn07Q8rEslG5JDYumsMKlC/t9Gjkx79Rz/pi8IkAfQS2IAOFNhhCDkd7GvnzpI+G+qB+zNYrIt X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:8414:0:b0:2d0:fdd8:f7e2 with SMTP id u20-20020a818414000000b002d0fdd8f7e2mr27082928ywf.156.1645593883815; Tue, 22 Feb 2022 21:24:43 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:02 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-27-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 26/47] mm: asi: Use separate PCIDs for restricted address spaces From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 76B8540003 X-Stat-Signature: bat5gyptksat59zhppaz93p6g97yn9rx Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L60sUeBO; spf=pass (imf07.hostedemail.com: domain of 3G8UVYgcKCBQ3E7u2xC08805y.w86527EH-664Fuw4.8B0@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3G8UVYgcKCBQ3E7u2xC08805y.w86527EH-664Fuw4.8B0@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593884-615815 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Each restricted address space is assigned a separate PCID. Since currently only one ASI instance per-class exists for a given process, the PCID is just derived from the class index. This commit only sets the appropriate PCID when switching CR3, but does not set the NOFLUSH bit. That will be done by later patches. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 3 ++- arch/x86/include/asm/tlbflush.h | 3 +++ arch/x86/mm/asi.c | 6 +++-- arch/x86/mm/tlb.c | 45 ++++++++++++++++++++++++++++++--- 4 files changed, 50 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 062ccac07fd9..aaa0d0bdbf59 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -40,7 +40,8 @@ struct asi { pgd_t *pgd; struct asi_class *class; struct mm_struct *mm; - int64_t asi_ref_count; + u16 pcid_index; + int64_t asi_ref_count; }; DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 3c43ad46c14a..f9ec5e67e361 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -260,6 +260,9 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); unsigned long build_cr3(pgd_t *pgd, u16 asid); +unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); + +u16 asi_pcid(struct asi *asi, u16 asid); #endif /* !MODULE */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6b9a0f5ab391..dbfea3dc4bb1 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -335,6 +335,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) asi->class = &asi_class[asi_index]; asi->mm = mm; + asi->pcid_index = asi_index; if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; @@ -386,6 +387,7 @@ EXPORT_SYMBOL_GPL(asi_destroy); void __asi_enter(void) { u64 asi_cr3; + u16 pcid; struct asi *target = this_cpu_read(asi_cpu_state.target_asi); VM_BUG_ON(preemptible()); @@ -399,8 +401,8 @@ void __asi_enter(void) this_cpu_write(asi_cpu_state.curr_asi, target); - asi_cr3 = build_cr3(target->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + pcid = asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + asi_cr3 = build_cr3_pcid(target->pgd, pcid, false); write_cr3(asi_cr3); if (target->class->ops.post_asi_enter) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 628f1cd904ac..312b9c185a55 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -97,7 +97,12 @@ # define PTI_CONSUMED_PCID_BITS 0 #endif -#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) +#define ASI_CONSUMED_PCID_BITS ASI_MAX_NUM_ORDER +#define ASI_PCID_BITS_SHIFT CR3_AVAIL_PCID_BITS +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS - \ + ASI_CONSUMED_PCID_BITS) + +static_assert(TLB_NR_DYN_ASIDS < BIT(CR3_AVAIL_PCID_BITS)); /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account @@ -154,6 +159,34 @@ static inline u16 user_pcid(u16 asid) return ret; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +u16 asi_pcid(struct asi *asi, u16 asid) +{ + return kern_pcid(asid) | (asi->pcid_index << ASI_PCID_BITS_SHIFT); +} + +#else /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +u16 asi_pcid(struct asi *asi, u16 asid) +{ + return kern_pcid(asid); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + +unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush) +{ + u64 noflush_bit = 0; + + if (!static_cpu_has(X86_FEATURE_PCID)) + pcid = 0; + else if (noflush) + noflush_bit = CR3_NOFLUSH; + + return __sme_pa(pgd) | pcid | noflush_bit; +} + inline unsigned long build_cr3(pgd_t *pgd, u16 asid) { if (static_cpu_has(X86_FEATURE_PCID)) { @@ -1078,13 +1111,17 @@ unsigned long __get_current_cr3_fast(void) pgd_t *pgd; u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); struct asi *asi = asi_get_current(); + u16 pcid; - if (asi) + if (asi) { pgd = asi_pgd(asi); - else + pcid = asi_pcid(asi, asid); + } else { pgd = this_cpu_read(cpu_tlbstate.loaded_mm)->pgd; + pcid = kern_pcid(asid); + } - cr3 = build_cr3(pgd, asid); + cr3 = build_cr3_pcid(pgd, pcid, false); /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); From patchwork Wed Feb 23 05:22:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756387 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2B7A3C4332F for ; Wed, 23 Feb 2022 05:24:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7D45F8D001D; Wed, 23 Feb 2022 00:24:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 782BC8D0001; Wed, 23 Feb 2022 00:24:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AC708D001D; Wed, 23 Feb 2022 00:24:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 4C6368D0001 for ; Wed, 23 Feb 2022 00:24:47 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1DC8223091 for ; Wed, 23 Feb 2022 05:24:47 +0000 (UTC) X-FDA: 79172905014.01.028A45D Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf30.hostedemail.com (Postfix) with ESMTP id 9249080006 for ; Wed, 23 Feb 2022 05:24:46 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d6bca75aa2so141252107b3.18 for ; Tue, 22 Feb 2022 21:24:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=uZNDncAP+HAcrLlxbbBV0jFwf5VHCLxBgZuEl9jtIsw=; b=r7+qr6EGJVpdhQWG8PE8sNYbh0H6kTLpdH6ctOr0wPyBw/Z56OW3u31LYarb4QOnRX rF4lg3l0IVEqPlgJWaO9d3dWxAEdyJe1HcTUm1KWff1m3xuyDOy9VVB9sTaWZhhffwst nktPdcdn5pMY1JLUsndyvZTdi+apZc+M8CANZ+FleFcjiBh1KMmN3V5cV4UkdqGp6dj7 eto63Nt1iYZsRhYWTV9r7EgsApuw+VuQRiPkO/uEZytDeujtWN47qFOJnrbnHuYu8/o4 722Zq0PbXoAaWsxp0efHDuBM6HhxyWCGqP/XygpW0eZmFRIaDvse+6/DxrUVYKJYRCUo G3aA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=uZNDncAP+HAcrLlxbbBV0jFwf5VHCLxBgZuEl9jtIsw=; b=7OKj8U3HsgIqcZL8QNHqZ/r0cUhAhbUX4aNXRm+SKJ8oAPgA246jKsmDFdfCB682/a i+znKioJkJf11S6I+Pd/0ObT/qd8SjezgprLNsUnoAjgtVBjTJtGK+Er5PwOsQnhaqjK dRhiB1Z36xY+QxVaugkG0FOFOQFrdo2Dak4/8MTMCLuahsu5allyvVdaZjh2MMck2ReO 7tdZJotPMYNrlrZf6KHkNFvgSZCBCsYvVcg7kw+CcBRYhy6r+obVZwkOFNXV+J9GwVTQ F8BQNeZJYLlDnaTJFbijwDRuLZamVbF7sMolP48xjlUtyHhC84Ke6UjZY1Xjs+5FEqRG bLFA== X-Gm-Message-State: AOAM530q+hVSXb60y7lVuOfQZSetgqeeS83yr9zoI4PVNA//JSbQa69T 6784bL8aEwJ85+RoVpvJBj1HIBhqYzh1 X-Google-Smtp-Source: ABdhPJyGbSHeEEC3fevrTXJ9nvduL9Idz1OuetENZ+6YGEpuvdg8wlXwAyrhsHfA5UVeantdGlz2RTX/R7Xv X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6993:0:b0:624:55af:336c with SMTP id e141-20020a256993000000b0062455af336cmr19351145ybc.412.1645593885940; Tue, 22 Feb 2022 21:24:45 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:03 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-28-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 27/47] mm: asi: Avoid TLB flushes during ASI CR3 switches when possible From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Stat-Signature: uiwe574dcxsb8k3m4p9i18win77toeof X-Rspam-User: Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=r7+qr6EG; spf=pass (imf30.hostedemail.com: domain of 3HcUVYgcKCBY5G9w4zE2AA270.yA8749GJ-886Hwy6.AD2@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3HcUVYgcKCBY5G9w4zE2AA270.yA8749GJ-886Hwy6.AD2@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9249080006 X-HE-Tag: 1645593886-13965 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The TLB flush functions are modified to flush the ASI PCIDs in addition to the unrestricted kernel PCID and the KPTI PCID. Some tracking is also added to figure out when the TLB state for ASI PCIDs is out-of-date (e.g. due to lack of INVPCID support), and ASI Enter/Exit use this information to skip a TLB flush during the CR3 switch when the TLB is already up-to-date. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 11 ++- arch/x86/include/asm/tlbflush.h | 47 ++++++++++ arch/x86/mm/asi.c | 38 +++++++- arch/x86/mm/tlb.c | 152 ++++++++++++++++++++++++++++++-- 4 files changed, 234 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index aaa0d0bdbf59..1a77917c79c7 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -126,11 +126,18 @@ static inline void asi_intr_exit(void) if (static_cpu_has(X86_FEATURE_ASI)) { barrier(); - if (--current->thread.intr_nest_depth == 0) + if (--current->thread.intr_nest_depth == 0) { + barrier(); __asi_enter(); + } } } +static inline int asi_intr_nest_depth(void) +{ + return current->thread.intr_nest_depth; +} + #define INIT_MM_ASI(init_mm) \ .asi = { \ [0] = { \ @@ -150,6 +157,8 @@ static inline void asi_intr_enter(void) { } static inline void asi_intr_exit(void) { } +static inline int asi_intr_nest_depth(void) { return 0; } + static inline void asi_init_thread_state(struct thread_struct *thread) { } static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index f9ec5e67e361..295bebdb4395 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -12,6 +12,7 @@ #include #include #include +#include void __flush_tlb_all(void); @@ -59,9 +60,20 @@ static inline void cr4_clear_bits(unsigned long mask) */ #define TLB_NR_DYN_ASIDS 6 +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +struct asi_tlb_context { + bool flush_pending; +}; + +#endif + struct tlb_context { u64 ctx_id; u64 tlb_gen; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct asi_tlb_context asi_context[ASI_MAX_NUM]; +#endif }; struct tlb_state { @@ -100,6 +112,10 @@ struct tlb_state { */ bool invalidate_other; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* If set, ASI Exit needs to do a TLB flush during the CR3 switch */ + bool kern_pcid_needs_flush; +#endif /* * Mask that contains TLB_NR_DYN_ASIDS+1 bits to indicate * the corresponding user PCID needs a flush next time we @@ -262,8 +278,39 @@ extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); unsigned long build_cr3(pgd_t *pgd, u16 asid); unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); +u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static inline bool *__asi_tlb_flush_pending(struct asi *asi) +{ + struct tlb_state *tlb_state; + struct tlb_context *tlb_context; + + tlb_state = this_cpu_ptr(&cpu_tlbstate); + tlb_context = &tlb_state->ctxs[tlb_state->loaded_mm_asid]; + return &tlb_context->asi_context[asi->pcid_index].flush_pending; +} + +static inline bool asi_get_and_clear_tlb_flush_pending(struct asi *asi) +{ + bool *tlb_flush_pending_ptr = __asi_tlb_flush_pending(asi); + bool tlb_flush_pending = READ_ONCE(*tlb_flush_pending_ptr); + + if (tlb_flush_pending) + WRITE_ONCE(*tlb_flush_pending_ptr, false); + + return tlb_flush_pending; +} + +static inline void asi_clear_pending_tlb_flush(struct asi *asi) +{ + WRITE_ONCE(*__asi_tlb_flush_pending(asi), false); +} + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif /* !MODULE */ #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index dbfea3dc4bb1..17b8e6e60312 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -388,6 +388,7 @@ void __asi_enter(void) { u64 asi_cr3; u16 pcid; + bool need_flush = false; struct asi *target = this_cpu_read(asi_cpu_state.target_asi); VM_BUG_ON(preemptible()); @@ -401,8 +402,18 @@ void __asi_enter(void) this_cpu_write(asi_cpu_state.curr_asi, target); + if (static_cpu_has(X86_FEATURE_PCID)) + need_flush = asi_get_and_clear_tlb_flush_pending(target); + + /* + * It is possible that we may get a TLB flush IPI after + * already reading need_flush, in which case we won't do the + * flush below. However, in that case the interrupt epilog + * will also call __asi_enter(), which will do the flush. + */ + pcid = asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); - asi_cr3 = build_cr3_pcid(target->pgd, pcid, false); + asi_cr3 = build_cr3_pcid(target->pgd, pcid, !need_flush); write_cr3(asi_cr3); if (target->class->ops.post_asi_enter) @@ -437,12 +448,31 @@ void asi_exit(void) asi = this_cpu_read(asi_cpu_state.curr_asi); if (asi) { + bool need_flush = false; + if (asi->class->ops.pre_asi_exit) asi->class->ops.pre_asi_exit(); - unrestricted_cr3 = - build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + if (static_cpu_has(X86_FEATURE_PCID) && + !static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + need_flush = this_cpu_read( + cpu_tlbstate.kern_pcid_needs_flush); + this_cpu_write(cpu_tlbstate.kern_pcid_needs_flush, + false); + } + + /* + * It is possible that we may get a TLB flush IPI after + * already reading need_flush. However, in that case the IPI + * will not set flush_pending for the unrestricted address + * space, as that is done by flush_tlb_one_user() only if + * asi_intr_nest_depth() is 0. + */ + + unrestricted_cr3 = build_cr3_pcid( + this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + kern_pcid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)), + !need_flush); write_cr3(unrestricted_cr3); this_cpu_write(asi_cpu_state.curr_asi, NULL); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 312b9c185a55..5c9681df3a16 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -114,7 +114,7 @@ static_assert(TLB_NR_DYN_ASIDS < BIT(CR3_AVAIL_PCID_BITS)); /* * Given @asid, compute kPCID */ -static inline u16 kern_pcid(u16 asid) +inline u16 kern_pcid(u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); @@ -166,6 +166,60 @@ u16 asi_pcid(struct asi *asi, u16 asid) return kern_pcid(asid) | (asi->pcid_index << ASI_PCID_BITS_SHIFT); } +static void invalidate_kern_pcid(void) +{ + this_cpu_write(cpu_tlbstate.kern_pcid_needs_flush, true); +} + +static void invalidate_asi_pcid(struct asi *asi, u16 asid) +{ + uint i; + struct asi_tlb_context *asi_tlb_context; + + if (!static_cpu_has(X86_FEATURE_ASI) || + !static_cpu_has(X86_FEATURE_PCID)) + return; + + asi_tlb_context = this_cpu_ptr(cpu_tlbstate.ctxs[asid].asi_context); + + if (asi) + asi_tlb_context[asi->pcid_index].flush_pending = true; + else + for (i = 1; i < ASI_MAX_NUM; i++) + asi_tlb_context[i].flush_pending = true; +} + +static void flush_asi_pcid(struct asi *asi) +{ + u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* + * The flag should be cleared before the INVPCID, to avoid clearing it + * in case an interrupt/exception sets it again after the INVPCID. + */ + asi_clear_pending_tlb_flush(asi); + invpcid_flush_single_context(asi_pcid(asi, asid)); +} + +static void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t addr) +{ + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + invalidate_asi_pcid(asi, asid); + } else if (asi) { + invpcid_flush_one(asi_pcid(asi, asid), addr); + } else { + uint i; + struct mm_struct *mm = this_cpu_read(cpu_tlbstate.loaded_mm); + + for (i = 1; i < ASI_MAX_NUM; i++) + if (mm->asi[i].pgd) + invpcid_flush_one(asi_pcid(&mm->asi[i], asid), + addr); + } +} + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ u16 asi_pcid(struct asi *asi, u16 asid) @@ -173,6 +227,11 @@ u16 asi_pcid(struct asi *asi, u16 asid) return kern_pcid(asid); } +static inline void invalidate_kern_pcid(void) { } +static inline void invalidate_asi_pcid(struct asi *asi, u16 asid) { } +static inline void flush_asi_pcid(struct asi *asi) { } +static inline void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t addr) { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush) @@ -223,7 +282,8 @@ static void clear_asid_other(void) * This is only expected to be set if we have disabled * kernel _PAGE_GLOBAL pages. */ - if (!static_cpu_has(X86_FEATURE_PTI)) { + if (!static_cpu_has(X86_FEATURE_PTI) && + !cpu_feature_enabled(X86_FEATURE_ASI)) { WARN_ON_ONCE(1); return; } @@ -313,6 +373,7 @@ static void load_new_mm_cr3(pgd_t *pgdir, u16 new_asid, bool need_flush) if (need_flush) { invalidate_user_asid(new_asid); + invalidate_asi_pcid(NULL, new_asid); new_mm_cr3 = build_cr3(pgdir, new_asid); } else { new_mm_cr3 = build_cr3_noflush(pgdir, new_asid); @@ -741,11 +802,17 @@ void initialize_tlbstate_and_flush(void) this_cpu_write(cpu_tlbstate.next_asid, 1); this_cpu_write(cpu_tlbstate.ctxs[0].ctx_id, mm->context.ctx_id); this_cpu_write(cpu_tlbstate.ctxs[0].tlb_gen, tlb_gen); + invalidate_asi_pcid(NULL, 0); for (i = 1; i < TLB_NR_DYN_ASIDS; i++) this_cpu_write(cpu_tlbstate.ctxs[i].ctx_id, 0); } +static inline void invlpg(unsigned long addr) +{ + asm volatile("invlpg (%0)" ::"r"(addr) : "memory"); +} + /* * flush_tlb_func()'s memory ordering requirement is that any * TLB fills that happen after we flush the TLB are ordered after we @@ -967,7 +1034,8 @@ void flush_tlb_multi(const struct cpumask *cpumask, * least 95%) of allocations, and is small enough that we are * confident it will not cause too much overhead. Each single * flush is about 100 ns, so this caps the maximum overhead at - * _about_ 3,000 ns. + * _about_ 3,000 ns (plus upto an additional ~3000 ns for each + * ASI instance, or for KPTI). * * This is in units of pages. */ @@ -1157,7 +1225,8 @@ void flush_tlb_one_kernel(unsigned long addr) */ flush_tlb_one_user(addr); - if (!static_cpu_has(X86_FEATURE_PTI)) + if (!static_cpu_has(X86_FEATURE_PTI) && + !cpu_feature_enabled(X86_FEATURE_ASI)) return; /* @@ -1174,9 +1243,45 @@ void flush_tlb_one_kernel(unsigned long addr) */ STATIC_NOPV void native_flush_tlb_one_user(unsigned long addr) { - u32 loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + u16 loaded_mm_asid; - asm volatile("invlpg (%0)" ::"r" (addr) : "memory"); + if (!static_cpu_has(X86_FEATURE_PCID)) { + invlpg(addr); + return; + } + + loaded_mm_asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + + /* + * If we don't have INVPCID support, then we do an ASI Exit so that + * the invlpg happens in the unrestricted address space, and we + * invalidate the ASI PCID so that it is flushed at the next ASI Enter. + * + * But if a valid target ASI is set, then an ASI Exit can be ephemeral + * due to interrupts/exceptions/NMIs (except if we are already inside + * one), so we just invalidate both the ASI and the unrestricted kernel + * PCIDs and let the invlpg flush whichever happens to be the current + * address space. This is a bit more wasteful, but this scenario is not + * actually expected to occur with the current usage of ASI, and is + * handled here just for completeness. (If we wanted to optimize this, + * we could manipulate the intr_nest_depth to guarantee that an ASI + * Exit is not ephemeral). + */ + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() == 0) + invalidate_kern_pcid(); + else + asi_exit(); + } + + /* Flush the unrestricted kernel address space */ + if (!is_asi_active()) + invlpg(addr); + else + invpcid_flush_one(kern_pcid(loaded_mm_asid), addr); + + __flush_tlb_one_asi(NULL, loaded_mm_asid, addr); if (!static_cpu_has(X86_FEATURE_PTI)) return; @@ -1235,6 +1340,9 @@ STATIC_NOPV void native_flush_tlb_global(void) */ STATIC_NOPV void native_flush_tlb_local(void) { + struct asi *asi; + u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + /* * Preemption or interrupts must be disabled to protect the access * to the per CPU variable and to prevent being preempted between @@ -1242,10 +1350,36 @@ STATIC_NOPV void native_flush_tlb_local(void) */ WARN_ON_ONCE(preemptible()); - invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + /* + * If we don't have INVPCID support, then we have to use + * write_cr3(read_cr3()). However, that is not safe when ASI is active, + * as an interrupt/exception/NMI could cause an ASI Exit in the middle + * and change CR3. So we trigger an ASI Exit beforehand. But if a valid + * target ASI is set, then an ASI Exit can also be ephemeral due to + * interrupts (except if we are already inside one), and thus we have to + * fallback to a global TLB flush. + */ + if (!static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() == 0) { + native_flush_tlb_global(); + return; + } + asi_exit(); + } - /* If current->mm == NULL then the read_cr3() "borrows" an mm */ - native_write_cr3(__native_read_cr3()); + invalidate_user_asid(asid); + invalidate_asi_pcid(NULL, asid); + + asi = asi_get_current(); + + if (!asi) { + /* If current->mm == NULL then the read_cr3() "borrows" an mm */ + native_write_cr3(__native_read_cr3()); + } else { + invpcid_flush_single_context(kern_pcid(asid)); + flush_asi_pcid(asi); + } } void flush_tlb_local(void) From patchwork Wed Feb 23 05:22:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756388 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7413AC433FE for ; Wed, 23 Feb 2022 05:24:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A43858D001E; Wed, 23 Feb 2022 00:24:49 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9CFAC8D0001; Wed, 23 Feb 2022 00:24:49 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D2498D001E; Wed, 23 Feb 2022 00:24:49 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0159.hostedemail.com [216.40.44.159]) by kanga.kvack.org (Postfix) with ESMTP id 694818D0001 for ; Wed, 23 Feb 2022 00:24:49 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 23CC91816393E for ; Wed, 23 Feb 2022 05:24:49 +0000 (UTC) X-FDA: 79172905098.12.6F5C012 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf21.hostedemail.com (Postfix) with ESMTP id B7D781C0002 for ; Wed, 23 Feb 2022 05:24:48 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d2d45c0df7so163210227b3.1 for ; Tue, 22 Feb 2022 21:24:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ht+B2vhllT9TUcS9YCbehkBKQfQF+JUOyLFYYS9ZTII=; b=X4LjelXhHrsgUAmOKDupgWtkMo3qDIfheiClzbfmM97+mtJS0Ry/l7ad7kEFvNmAMO ru8xaZ7F2Wij3ltuM/YwpEuAwo9u28wNABujo/9gA7WhQcealUnUhGS4exGNNI00mM/h QXyufTreNJLTmpGLik/8e62TY2RamcG4YVt1a9kq7jRUpJ5gpqeZZz9Udb++OBULoX7j 8m+I6tqS0xRBkRNDUPy1xvHpGNE8YXXtkcOVcVJNJbGCsF4uJmSxV3FYbdH5JUcqjBWM ug6RC3+13tjcavk58yKUhJSjRMK2SZqloPpTOfJuLf4aHFrFV1j6/rXU3Dwv5NYwIvsx Wd6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ht+B2vhllT9TUcS9YCbehkBKQfQF+JUOyLFYYS9ZTII=; b=wA20J1J+GblOLv2FkcSrd27bW9TzWWIMMgQ/Ep42rcJTMcuOOuU3CcU/aMY0vGtRuj d0U88FWFb7+qBzy7wpul6sruILNGeRmZT4BHL1chXK3BkmkSuTO+pPSZR6zdkAeqZi1X xsa4lS1UMzP2Z6v9m+kMICjn73WQsdhVl4H9meyJhNggK/RUJLJfN+X562HYk6GJkYSf OtzU0fPLTlunSaD0wd33ha1SqOclfOBkLKUnOk44KTEzPcq91lXoKCKDHMTPqJFcx/zJ O/ipgtxTpUgtbGMGZ5DQxoPQE9cumCB5fDH5zTrx8+dC86v0ds65D9PZXAx6iCzUsnRf YVyQ== X-Gm-Message-State: AOAM531TFCYXoz6DkkYDkecawYuaHinf6Nv8lhmZmwwK7kLcWumip+9G 7qUmfjX8Tq27PNb8icEDXSaZgMeb89Nj X-Google-Smtp-Source: ABdhPJzjFJlAylJJF1esEWWn/fBYg6FPQgSZQ8x6J/wS+NLWdoHP+2b1OSKZqRs0ZEPrSWXIKV7/9dmH6JQ/ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6b4d:0:b0:624:7295:42ee with SMTP id o13-20020a256b4d000000b00624729542eemr15381999ybm.290.1645593888119; Tue, 22 Feb 2022 21:24:48 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:04 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-29-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 28/47] mm: asi: Avoid TLB flush IPIs to CPUs not in ASI context From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=X4LjelXh; spf=pass (imf21.hostedemail.com: domain of 3IMUVYgcKCBk8JCz72H5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3IMUVYgcKCBk8JCz72H5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B7D781C0002 X-Stat-Signature: x64kibr8u18p17n3n6rzhiafx6grncaq X-HE-Tag: 1645593888-92754 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Most CPUs will not be running in a restricted ASI address space at any given time. So when we need to do an ASI TLB flush, we can skip those CPUs and let them do a flush at the time of the next ASI Enter. Furthermore, for flushes related to local non-sensitive memory, we can restrict the CPU set even further to those CPUs that have that specific mm_struct loaded. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 9 +- arch/x86/include/asm/tlbflush.h | 47 +++---- arch/x86/mm/asi.c | 73 +++++++++-- arch/x86/mm/tlb.c | 209 ++++++++++++++++++++++++++++++-- 4 files changed, 282 insertions(+), 56 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 1a77917c79c7..35421356584b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -41,6 +41,8 @@ struct asi { struct asi_class *class; struct mm_struct *mm; u16 pcid_index; + atomic64_t *tlb_gen; + atomic64_t __tlb_gen; int64_t asi_ref_count; }; @@ -138,11 +140,16 @@ static inline int asi_intr_nest_depth(void) return current->thread.intr_nest_depth; } +void asi_get_latest_tlb_gens(struct asi *asi, u64 *latest_local_tlb_gen, + u64 *latest_global_tlb_gen); + #define INIT_MM_ASI(init_mm) \ .asi = { \ [0] = { \ .pgd = asi_global_nonsensitive_pgd, \ - .mm = &init_mm \ + .mm = &init_mm, \ + .__tlb_gen = ATOMIC64_INIT(1), \ + .tlb_gen = &init_mm.asi[0].__tlb_gen \ } \ }, diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 295bebdb4395..85315d1d2d70 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -63,7 +63,8 @@ static inline void cr4_clear_bits(unsigned long mask) #ifdef CONFIG_ADDRESS_SPACE_ISOLATION struct asi_tlb_context { - bool flush_pending; + u64 local_tlb_gen; + u64 global_tlb_gen; }; #endif @@ -223,6 +224,20 @@ struct flush_tlb_info { unsigned int initiating_cpu; u8 stride_shift; u8 freed_tables; + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* + * We can't use the mm pointer above, as there can be some cases where + * the mm is already freed. Of course, a flush wouldn't be necessary + * in that case, and we would know that when we compare the context ID. + * + * If U64_MAX, then a global flush would be done. + */ + u64 mm_context_id; + + /* If non-zero, flush only the ASI instance with this PCID index. */ + u16 asi_pcid_index; +#endif }; void flush_tlb_local(void); @@ -281,36 +296,6 @@ unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); -#ifdef CONFIG_ADDRESS_SPACE_ISOLATION - -static inline bool *__asi_tlb_flush_pending(struct asi *asi) -{ - struct tlb_state *tlb_state; - struct tlb_context *tlb_context; - - tlb_state = this_cpu_ptr(&cpu_tlbstate); - tlb_context = &tlb_state->ctxs[tlb_state->loaded_mm_asid]; - return &tlb_context->asi_context[asi->pcid_index].flush_pending; -} - -static inline bool asi_get_and_clear_tlb_flush_pending(struct asi *asi) -{ - bool *tlb_flush_pending_ptr = __asi_tlb_flush_pending(asi); - bool tlb_flush_pending = READ_ONCE(*tlb_flush_pending_ptr); - - if (tlb_flush_pending) - WRITE_ONCE(*tlb_flush_pending_ptr, false); - - return tlb_flush_pending; -} - -static inline void asi_clear_pending_tlb_flush(struct asi *asi) -{ - WRITE_ONCE(*__asi_tlb_flush_pending(asi), false); -} - -#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ - #endif /* !MODULE */ #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 17b8e6e60312..29c74b6d4262 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -355,6 +355,11 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) for (i = pgd_index(VMALLOC_GLOBAL_NONSENSITIVE_START); i < PTRS_PER_PGD; i++) set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + + asi->tlb_gen = &mm->asi[0].__tlb_gen; + } else { + asi->tlb_gen = &asi->__tlb_gen; + atomic64_set(asi->tlb_gen, 1); } exit_unlock: @@ -384,11 +389,26 @@ void asi_destroy(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_destroy); +void asi_get_latest_tlb_gens(struct asi *asi, u64 *latest_local_tlb_gen, + u64 *latest_global_tlb_gen) +{ + if (likely(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) + *latest_global_tlb_gen = + atomic64_read(ASI_GLOBAL_NONSENSITIVE->tlb_gen); + else + *latest_global_tlb_gen = 0; + + *latest_local_tlb_gen = atomic64_read(asi->tlb_gen); +} + void __asi_enter(void) { u64 asi_cr3; u16 pcid; bool need_flush = false; + u64 latest_local_tlb_gen, latest_global_tlb_gen; + struct tlb_state *tlb_state; + struct asi_tlb_context *tlb_context; struct asi *target = this_cpu_read(asi_cpu_state.target_asi); VM_BUG_ON(preemptible()); @@ -397,17 +417,35 @@ void __asi_enter(void) if (!target || target == this_cpu_read(asi_cpu_state.curr_asi)) return; - VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == - LOADED_MM_SWITCHING); + tlb_state = this_cpu_ptr(&cpu_tlbstate); + VM_BUG_ON(tlb_state->loaded_mm == LOADED_MM_SWITCHING); this_cpu_write(asi_cpu_state.curr_asi, target); - if (static_cpu_has(X86_FEATURE_PCID)) - need_flush = asi_get_and_clear_tlb_flush_pending(target); + if (static_cpu_has(X86_FEATURE_PCID)) { + /* + * curr_asi write has to happen before the asi->tlb_gen reads + * below. + * + * See comments in asi_flush_tlb_range(). + */ + smp_mb(); + + asi_get_latest_tlb_gens(target, &latest_local_tlb_gen, + &latest_global_tlb_gen); + + tlb_context = &tlb_state->ctxs[tlb_state->loaded_mm_asid] + .asi_context[target->pcid_index]; + + if (READ_ONCE(tlb_context->local_tlb_gen) < latest_local_tlb_gen + || READ_ONCE(tlb_context->global_tlb_gen) < + latest_global_tlb_gen) + need_flush = true; + } /* * It is possible that we may get a TLB flush IPI after - * already reading need_flush, in which case we won't do the + * already calculating need_flush, in which case we won't do the * flush below. However, in that case the interrupt epilog * will also call __asi_enter(), which will do the flush. */ @@ -416,6 +454,23 @@ void __asi_enter(void) asi_cr3 = build_cr3_pcid(target->pgd, pcid, !need_flush); write_cr3(asi_cr3); + if (static_cpu_has(X86_FEATURE_PCID)) { + /* + * There is a small possibility that an interrupt happened + * after the read of the latest_*_tlb_gen above and when + * that interrupt did an asi_enter() upon return, it read + * an even higher latest_*_tlb_gen and already updated the + * tlb_context->*tlb_gen accordingly. In that case, the + * following will move back the tlb_context->*tlb_gen. That + * isn't ideal, but it should not cause any correctness issues. + * We may just end up doing an unnecessary TLB flush on the next + * asi_enter(). If we really needed to avoid that, we could + * just do a cmpxchg, but it is likely not necessary. + */ + WRITE_ONCE(tlb_context->local_tlb_gen, latest_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, latest_global_tlb_gen); + } + if (target->class->ops.post_asi_enter) target->class->ops.post_asi_enter(); } @@ -504,6 +559,8 @@ int asi_init_mm_state(struct mm_struct *mm) if (!mm->asi_enabled) return 0; + mm->asi[0].tlb_gen = &mm->asi[0].__tlb_gen; + atomic64_set(mm->asi[0].tlb_gen, 1); mm->asi[0].mm = mm; mm->asi[0].pgd = (pgd_t *)__get_free_page(GFP_PGTABLE_USER); if (!mm->asi[0].pgd) @@ -718,12 +775,6 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) asi_flush_tlb_range(asi, addr, len); } -void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) -{ - /* Later patches will do a more optimized flush. */ - flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); -} - void *asi_va(unsigned long pa) { struct page *page = pfn_to_page(PHYS_PFN(pa)); diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 5c9681df3a16..2a442335501f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -31,6 +31,8 @@ # define __flush_tlb_multi(msk, info) native_flush_tlb_multi(msk, info) #endif +STATIC_NOPV void native_flush_tlb_global(void); + /* * TLB flushing, formerly SMP-only * c/o Linus Torvalds. @@ -173,7 +175,6 @@ static void invalidate_kern_pcid(void) static void invalidate_asi_pcid(struct asi *asi, u16 asid) { - uint i; struct asi_tlb_context *asi_tlb_context; if (!static_cpu_has(X86_FEATURE_ASI) || @@ -183,21 +184,30 @@ static void invalidate_asi_pcid(struct asi *asi, u16 asid) asi_tlb_context = this_cpu_ptr(cpu_tlbstate.ctxs[asid].asi_context); if (asi) - asi_tlb_context[asi->pcid_index].flush_pending = true; + asi_tlb_context[asi->pcid_index] = + (struct asi_tlb_context) { 0 }; else - for (i = 1; i < ASI_MAX_NUM; i++) - asi_tlb_context[i].flush_pending = true; + memset(asi_tlb_context, 0, + sizeof(struct asi_tlb_context) * ASI_MAX_NUM); } static void flush_asi_pcid(struct asi *asi) { u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); - /* - * The flag should be cleared before the INVPCID, to avoid clearing it - * in case an interrupt/exception sets it again after the INVPCID. - */ - asi_clear_pending_tlb_flush(asi); + struct asi_tlb_context *tlb_context = this_cpu_ptr( + &cpu_tlbstate.ctxs[asid].asi_context[asi->pcid_index]); + u64 latest_local_tlb_gen = atomic64_read(asi->tlb_gen); + u64 latest_global_tlb_gen = atomic64_read( + ASI_GLOBAL_NONSENSITIVE->tlb_gen); + invpcid_flush_single_context(asi_pcid(asi, asid)); + + /* + * This could sometimes move the *_tlb_gen backwards. See comments + * in __asi_enter(). + */ + WRITE_ONCE(tlb_context->local_tlb_gen, latest_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, latest_global_tlb_gen); } static void __flush_tlb_one_asi(struct asi *asi, u16 asid, size_t addr) @@ -1050,7 +1060,7 @@ static DEFINE_PER_CPU(unsigned int, flush_tlb_info_idx); static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables, - u64 new_tlb_gen) + u64 new_tlb_gen, u64 mm_ctx_id, u16 asi_pcid_index) { struct flush_tlb_info *info = this_cpu_ptr(&flush_tlb_info); @@ -1071,6 +1081,11 @@ static struct flush_tlb_info *get_flush_tlb_info(struct mm_struct *mm, info->new_tlb_gen = new_tlb_gen; info->initiating_cpu = smp_processor_id(); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + info->mm_context_id = mm_ctx_id; + info->asi_pcid_index = asi_pcid_index; +#endif + return info; } @@ -1104,7 +1119,7 @@ void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, new_tlb_gen = inc_mm_tlb_gen(mm); info = get_flush_tlb_info(mm, start, end, stride_shift, freed_tables, - new_tlb_gen); + new_tlb_gen, 0, 0); /* * flush_tlb_multi() is not optimized for the common case in which only @@ -1157,7 +1172,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) struct flush_tlb_info *info; preempt_disable(); - info = get_flush_tlb_info(NULL, start, end, 0, false, 0); + info = get_flush_tlb_info(NULL, start, end, 0, false, 0, 0, 0); on_each_cpu(do_kernel_range_flush, info, 1); @@ -1166,6 +1181,174 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) } } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static inline void invlpg_range(size_t start, size_t end, size_t stride) +{ + size_t addr; + + for (addr = start; addr < end; addr += stride) + invlpg(addr); +} + +static bool asi_needs_tlb_flush(struct asi *asi, struct flush_tlb_info *info) +{ + if (!asi || + (info->mm_context_id != U64_MAX && + info->mm_context_id != asi->mm->context.ctx_id) || + (info->asi_pcid_index && info->asi_pcid_index != asi->pcid_index)) + return false; + + if (unlikely(!(asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE)) && + (info->mm_context_id == U64_MAX || !info->asi_pcid_index)) + return false; + + return true; +} + +static void __flush_asi_tlb_all(struct asi *asi) +{ + if (static_cpu_has(X86_FEATURE_INVPCID_SINGLE)) { + flush_asi_pcid(asi); + return; + } + + /* See comments in native_flush_tlb_local() */ + if (unlikely(!asi_is_target_unrestricted()) && + asi_intr_nest_depth() == 0) { + native_flush_tlb_global(); + return; + } + + /* Let the next ASI Enter do the flush */ + asi_exit(); +} + +static void do_asi_tlb_flush(void *data) +{ + struct flush_tlb_info *info = data; + struct tlb_state *tlb_state = this_cpu_ptr(&cpu_tlbstate); + struct asi_tlb_context *tlb_context; + struct asi *asi = asi_get_current(); + u64 latest_local_tlb_gen, latest_global_tlb_gen; + u64 curr_local_tlb_gen, curr_global_tlb_gen; + u64 new_local_tlb_gen, new_global_tlb_gen; + bool do_flush_all; + + count_vm_tlb_event(NR_TLB_REMOTE_FLUSH_RECEIVED); + + if (!asi_needs_tlb_flush(asi, info)) + return; + + do_flush_all = info->end - info->start > + (tlb_single_page_flush_ceiling << PAGE_SHIFT); + + if (!static_cpu_has(X86_FEATURE_PCID)) { + if (do_flush_all) + __flush_asi_tlb_all(asi); + else + invlpg_range(info->start, info->end, PAGE_SIZE); + return; + } + + tlb_context = &tlb_state->ctxs[tlb_state->loaded_mm_asid] + .asi_context[asi->pcid_index]; + + asi_get_latest_tlb_gens(asi, &latest_local_tlb_gen, + &latest_global_tlb_gen); + + curr_local_tlb_gen = READ_ONCE(tlb_context->local_tlb_gen); + curr_global_tlb_gen = READ_ONCE(tlb_context->global_tlb_gen); + + if (info->mm_context_id == U64_MAX) { + new_global_tlb_gen = info->new_tlb_gen; + new_local_tlb_gen = curr_local_tlb_gen; + } else { + new_local_tlb_gen = info->new_tlb_gen; + new_global_tlb_gen = curr_global_tlb_gen; + } + + /* Somebody already did a full flush */ + if (new_local_tlb_gen <= curr_local_tlb_gen && + new_global_tlb_gen <= curr_global_tlb_gen) + return; + + /* + * If we can't bring the TLB up-to-date with a range flush, then do a + * full flush anyway. + */ + if (do_flush_all || !(new_local_tlb_gen == latest_local_tlb_gen && + new_global_tlb_gen == latest_global_tlb_gen && + new_local_tlb_gen <= curr_local_tlb_gen + 1 && + new_global_tlb_gen <= curr_global_tlb_gen + 1)) { + __flush_asi_tlb_all(asi); + return; + } + + invlpg_range(info->start, info->end, PAGE_SIZE); + + /* + * If we are still in ASI context, then all the INVLPGs flushed the + * ASI PCID and so we can update the tlb_gens. + */ + if (asi_get_current() == asi) { + WRITE_ONCE(tlb_context->local_tlb_gen, new_local_tlb_gen); + WRITE_ONCE(tlb_context->global_tlb_gen, new_global_tlb_gen); + } +} + +static bool is_asi_active_on_cpu(int cpu, void *info) +{ + return per_cpu(asi_cpu_state.curr_asi, cpu); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + size_t start = (size_t)addr; + size_t end = start + len; + struct flush_tlb_info *info; + u64 mm_context_id; + const cpumask_t *cpu_mask; + u64 new_tlb_gen = 0; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + if (static_cpu_has(X86_FEATURE_PCID)) { + new_tlb_gen = atomic64_inc_return(asi->tlb_gen); + + /* + * The increment of tlb_gen must happen before the curr_asi + * reads in is_asi_active_on_cpu(). That ensures that if another + * CPU is in asi_enter() and happens to write to curr_asi after + * is_asi_active_on_cpu() read it, it will see the updated + * tlb_gen and perform a flush during the TLB switch. + */ + smp_mb__after_atomic(); + } + + preempt_disable(); + + if (asi == ASI_GLOBAL_NONSENSITIVE) { + mm_context_id = U64_MAX; + cpu_mask = cpu_online_mask; + } else { + mm_context_id = asi->mm->context.ctx_id; + cpu_mask = mm_cpumask(asi->mm); + } + + info = get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, + mm_context_id, asi->pcid_index); + + on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, + true, cpu_mask); + + put_flush_tlb_info(); + preempt_enable(); +} + +#endif + /* * This can be used from process context to figure out what the value of * CR3 is without needing to do a (slow) __read_cr3(). @@ -1415,7 +1598,7 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) int cpu = get_cpu(); - info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, 0); + info = get_flush_tlb_info(NULL, 0, TLB_FLUSH_ALL, 0, false, 0, 0, 0); /* * flush_tlb_multi() is not optimized for the common case in which only * a local TLB flush is needed. Optimize this use-case by calling From patchwork Wed Feb 23 05:22:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756389 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA67FC433F5 for ; Wed, 23 Feb 2022 05:24:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D50418D001F; Wed, 23 Feb 2022 00:24:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CFEC68D0001; Wed, 23 Feb 2022 00:24:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BC7BC8D001F; Wed, 23 Feb 2022 00:24:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0027.hostedemail.com [216.40.44.27]) by kanga.kvack.org (Postfix) with ESMTP id AE3668D0001 for ; Wed, 23 Feb 2022 00:24:51 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 72DE08249980 for ; Wed, 23 Feb 2022 05:24:51 +0000 (UTC) X-FDA: 79172905182.13.3996038 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf27.hostedemail.com (Postfix) with ESMTP id D709D40002 for ; Wed, 23 Feb 2022 05:24:50 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id d17-20020a253611000000b006244e94b7b4so14171749yba.4 for ; Tue, 22 Feb 2022 21:24:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xuSOdR8cu+3sckZI73No+K9Gsgd7KYxoVcTvOLd9SRE=; b=L1vzi2HE8mlZkGyGTJ2UNkXNrCVKkhnCd6edxiG2679p05lL9RiITgedK2mHCr1gL1 r2NbbiBimY0XMB+ilTAP/tinO8mfPjfgIiKo0Vo4CIbvxKkOOdhmUUJYzmRPnVEAIELz CaVtVNSnDCTPdbHIF9MBinzjbvZUfNhf/kWzWIx9iGyWw1+hTpjVB6Q9fAmZ4GqOYOQ9 UfdgAMaldxCH3MN9zplE0ZKYVWhDkyZhstobQhDLPnFPNehmwZEzxYqfvR65uiELnTbL w+77nDxtgfJrLubYlcykpyqTJJAw4YhWcHGLjL5ZvkLqkeKvUc2iiWBXMc2g3lSez3xL mBMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xuSOdR8cu+3sckZI73No+K9Gsgd7KYxoVcTvOLd9SRE=; b=O24PQVt2B0KhlKbKypp7XLjIjZd6ZsyBAwuaVIW4kOlEJ5/spo6gFAEqVbKmjLGMnG 5chb1p1zc3smSJQBr2bmm05m0ZNySCGGPUsGLelqEMe4/7Xxb9pQvfdyV3n2JH8GYWuH c+7CZq7l+dB9DLHosri2iFEoptovI+m9X5bpJDV9XTBpXiVmJJAlqNmDD01vdVtaUD22 dPI6BqXkLBYLVRmIdVM2RWS53rsivauVZch2Cf+NpED7pVmJkAqM11CwTHJwYC2le20O k6qSrY3fkIHnoea5aOLytp4yOJQvz0kTrLnXJG6cPhWYvX9OzBJ6FfM44JHPZD2A9ThJ J6bA== X-Gm-Message-State: AOAM533xLt6f84tY7qgYXRNmM5Ihqng1ObgdksmqYQ/FUFGE5DRRza2F t7wRb2JxExZqHywzwI1RfGFZjg5AD7tI X-Google-Smtp-Source: ABdhPJxz+92U+hDuWLEHSSBb08IRQ0vjXntk0j4pQJ0mXKWbSMLKP0n+iJJ9k2gkkoSoqaVm7kKAazHDcEee X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:aa2c:0:b0:624:64ce:8550 with SMTP id s41-20020a25aa2c000000b0062464ce8550mr16649367ybi.105.1645593890279; Tue, 22 Feb 2022 21:24:50 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:05 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-30-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 29/47] mm: asi: Reduce TLB flushes when freeing pages asynchronously From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: D709D40002 X-Stat-Signature: jxj1ys37jbxzprudmb1k46gnb1n6f9j9 X-Rspam-User: Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=L1vzi2HE; spf=pass (imf27.hostedemail.com: domain of 3IsUVYgcKCBsALE194J7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3IsUVYgcKCBsALE194J7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593890-523460 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When we are freeing pages asynchronously (because the original free was issued with IRQs disabled), issue only one TLB flush per execution of the async work function. If there is only one page to free, we do a targeted flush for that page only. Otherwise, we just do a full flush. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/tlbflush.h | 8 +++++ arch/x86/mm/tlb.c | 52 ++++++++++++++++++++------------- include/linux/mm_types.h | 30 +++++++++++++------ mm/page_alloc.c | 40 ++++++++++++++++++++----- 4 files changed, 93 insertions(+), 37 deletions(-) diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 85315d1d2d70..7d04aa2a5f86 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -296,6 +296,14 @@ unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, bool noflush); u16 kern_pcid(u16 asid); u16 asi_pcid(struct asi *asi, u16 asid); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +void __asi_prepare_tlb_flush(struct asi *asi, u64 *new_tlb_gen); +void __asi_flush_tlb_range(u64 mm_context_id, u16 pcid_index, u64 new_tlb_gen, + size_t start, size_t end, const cpumask_t *cpu_mask); + +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif /* !MODULE */ #endif /* _ASM_X86_TLBFLUSH_H */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 2a442335501f..fcd2c8e92f83 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1302,21 +1302,10 @@ static bool is_asi_active_on_cpu(int cpu, void *info) return per_cpu(asi_cpu_state.curr_asi, cpu); } -void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +void __asi_prepare_tlb_flush(struct asi *asi, u64 *new_tlb_gen) { - size_t start = (size_t)addr; - size_t end = start + len; - struct flush_tlb_info *info; - u64 mm_context_id; - const cpumask_t *cpu_mask; - u64 new_tlb_gen = 0; - - if (!static_cpu_has(X86_FEATURE_ASI)) - return; - if (static_cpu_has(X86_FEATURE_PCID)) { - new_tlb_gen = atomic64_inc_return(asi->tlb_gen); - + *new_tlb_gen = atomic64_inc_return(asi->tlb_gen); /* * The increment of tlb_gen must happen before the curr_asi * reads in is_asi_active_on_cpu(). That ensures that if another @@ -1326,8 +1315,35 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) */ smp_mb__after_atomic(); } +} + +void __asi_flush_tlb_range(u64 mm_context_id, u16 pcid_index, u64 new_tlb_gen, + size_t start, size_t end, const cpumask_t *cpu_mask) +{ + struct flush_tlb_info *info; preempt_disable(); + info = get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, + mm_context_id, pcid_index); + + on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, + true, cpu_mask); + put_flush_tlb_info(); + preempt_enable(); +} + +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + size_t start = (size_t)addr; + size_t end = start + len; + u64 mm_context_id; + u64 new_tlb_gen = 0; + const cpumask_t *cpu_mask; + + if (!static_cpu_has(X86_FEATURE_ASI)) + return; + + __asi_prepare_tlb_flush(asi, &new_tlb_gen); if (asi == ASI_GLOBAL_NONSENSITIVE) { mm_context_id = U64_MAX; @@ -1337,14 +1353,8 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) cpu_mask = mm_cpumask(asi->mm); } - info = get_flush_tlb_info(NULL, start, end, 0, false, new_tlb_gen, - mm_context_id, asi->pcid_index); - - on_each_cpu_cond_mask(is_asi_active_on_cpu, do_asi_tlb_flush, info, - true, cpu_mask); - - put_flush_tlb_info(); - preempt_enable(); + __asi_flush_tlb_range(mm_context_id, asi->pcid_index, new_tlb_gen, + start, end, cpu_mask); } #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 56511adc263e..7d38229ca85c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -193,21 +193,33 @@ struct page { /** @rcu_head: You can use this to free a page by RCU. */ struct rcu_head rcu_head; -#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#if defined(CONFIG_ADDRESS_SPACE_ISOLATION) && !defined(BUILD_VDSO32) struct { /* Links the pages_to_free_async list */ struct llist_node async_free_node; unsigned long _asi_pad_1; - unsigned long _asi_pad_2; + u64 asi_tlb_gen; - /* - * Upon allocation of a locally non-sensitive page, set - * to the allocating mm. Must be set to the same mm when - * the page is freed. May potentially be overwritten in - * the meantime, as long as it is restored before free. - */ - struct mm_struct *asi_mm; + union { + /* + * Upon allocation of a locally non-sensitive + * page, set to the allocating mm. Must be set + * to the same mm when the page is freed. May + * potentially be overwritten in the meantime, + * as long as it is restored before free. + */ + struct mm_struct *asi_mm; + + /* + * Set to the above mm's context ID if the page + * is being freed asynchronously. Can't directly + * use the mm_struct, unless we take additional + * steps to avoid it from being freed while the + * async work is pending. + */ + u64 asi_mm_ctx_id; + }; }; #endif }; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 01784bff2a80..998ff6a56732 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5182,20 +5182,41 @@ static void async_free_work_fn(struct work_struct *work) { struct page *page, *tmp; struct llist_node *pages_to_free; - void *va; - size_t len; + size_t addr; uint order; pages_to_free = llist_del_all(this_cpu_ptr(&pages_to_free_async)); - /* A later patch will do a more optimized TLB flush. */ + if (!pages_to_free) + return; + + /* If we only have one page to free, then do a targeted TLB flush. */ + if (!llist_next(pages_to_free)) { + page = llist_entry(pages_to_free, struct page, async_free_node); + addr = (size_t)page_to_virt(page); + order = page->private; + + __asi_flush_tlb_range(page->asi_mm_ctx_id, 0, page->asi_tlb_gen, + addr, addr + PAGE_SIZE * (1 << order), + cpu_online_mask); + /* Need to clear, since it shares space with page->mapping. */ + page->asi_tlb_gen = 0; + + __free_the_page(page, order); + return; + } + + /* + * Otherwise, do a full flush. We could potentially try to optimize it + * via taking a union of what needs to be flushed, but it may not be + * worth the additional complexity. + */ + asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, 0, TLB_FLUSH_ALL); llist_for_each_entry_safe(page, tmp, pages_to_free, async_free_node) { - va = page_to_virt(page); order = page->private; - len = PAGE_SIZE * (1 << order); - - asi_flush_tlb_range(ASI_GLOBAL_NONSENSITIVE, va, len); + /* Need to clear, since it shares space with page->mapping. */ + page->asi_tlb_gen = 0; __free_the_page(page, order); } } @@ -5291,6 +5312,11 @@ static bool asi_unmap_freed_pages(struct page *page, unsigned int order) if (!async_flush_needed) return true; + page->asi_mm_ctx_id = PageGlobalNonSensitive(page) + ? U64_MAX : asi->mm->context.ctx_id; + + __asi_prepare_tlb_flush(asi, &page->asi_tlb_gen); + page->private = order; llist_add(&page->async_free_node, this_cpu_ptr(&pages_to_free_async)); From patchwork Wed Feb 23 05:22:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E03A2C433EF for ; Wed, 23 Feb 2022 05:24:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3DBAE8D0003; Wed, 23 Feb 2022 00:24:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2A00C8D0001; Wed, 23 Feb 2022 00:24:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 13F828D0003; Wed, 23 Feb 2022 00:24:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id 056568D0001 for ; Wed, 23 Feb 2022 00:24:54 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id B30919CD53 for ; Wed, 23 Feb 2022 05:24:53 +0000 (UTC) X-FDA: 79172905266.25.228B7BD Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf25.hostedemail.com (Postfix) with ESMTP id 4FA35A0003 for ; Wed, 23 Feb 2022 05:24:53 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d6baed6aafso144146867b3.3 for ; Tue, 22 Feb 2022 21:24:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=vK4stcDcFWOmnS7XFdl48O2PCgqnR6b+h+9UyqGSKHo=; b=WeE4/kJf1JBwoHaoRE1hNnNZMWwTToI7DEJHTID7WIxYsRDDOQgqC+1Tt7v7oX0Gpi ZmRLZAZ2vDdsuIDB2S8uWUw1YsFiIl0loYVOvIAhqXAjeeol/TFwLp8ORGxcWp10Znnv Fm89xpJAHuDIyCGkiR6qSwg/Ed2RYqbaYiZbeHx1AcGw15lQM8m5E4AwvqrpiEZ4xaWB fo5SYVGmwmAcFx/4QD/6NyE3+JjhT+8OXDlEZf2AotxVIkqYFj/Rh40KkPJ8CI4n5jQK 7ZdRdvip/O5Och4s64bR2f4QDQ48Fke+i6WKvHWJfT9D9ayGbTZXm6UK8dYn9Je6DvRz f9cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vK4stcDcFWOmnS7XFdl48O2PCgqnR6b+h+9UyqGSKHo=; b=CWV8LIiTLHD/E3FKkIbFuaRGwjVNg6j6SJ3pa8VNu+p48VY94gfcackzep9PnrXwAh Ybnv3nS+ng82q8VBYZ2NM+f+FyNmfv96n6jeIP/7hxFFKdR2mjgWHZzD/WrBwantRF2g 80yuUTTpKywM6zMfO0xwSsMr4PvE+KZA1h65Buec5w4DYA1+sm4SxoB4EaJ5oMl16a5a IJIJH2qiQHtZ2CvRoRKHB/tKEFYN3THSSv80A8JGRX7gTd9KFaU6BzkQ6c1GPfOKBap/ Yp4Uo9WKjt7EOGq474yb+ntaAjNf3gVewCw06pXp+7l59USzIOLQvV6qPi8vK4/j+oFf HA1A== X-Gm-Message-State: AOAM531e/FhzG8sc4/gx8fsGu2F2X5BxhLjne/1z78Mpv+6rx+vT/Z/V 0eY7/0CbOMTwuUsc0WcOKVurIO2mdt6s X-Google-Smtp-Source: ABdhPJyP9PANVGMHQ1NQcOMu/RGh/MDmkswDGY7BrfXZ28lpcuzIVCL0D2Bn/aYlOJdsoN0ZdyV9/jnSWe04 X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a05:6902:543:b0:61d:c152:bd19 with SMTP id z3-20020a056902054300b0061dc152bd19mr27379968ybs.377.1645593892611; Tue, 22 Feb 2022 21:24:52 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:06 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-31-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 30/47] mm: asi: Add API for mapping userspace address ranges From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 4FA35A0003 X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="WeE4/kJf"; spf=pass (imf25.hostedemail.com: domain of 3JMUVYgcKCB0CNG3B6L9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3JMUVYgcKCB0CNG3B6L9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 4y8singre6bm63kc4qxy1ur5oth6z8zg X-HE-Tag: 1645593893-885528 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: asi_map_user()/asi_unmap_user() can be used to map userspace address ranges for ASI classes that do not specify ASI_MAP_ALL_USERSPACE. In addition, another structure, asi_pgtbl_pool, allows for pre-allocating a set of pages to avoid having to allocate memory for page tables within asi_map_user(), which makes it easier to use that function while holding locks. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/asi.h | 19 +++ arch/x86/mm/asi.c | 252 ++++++++++++++++++++++++++++++++++--- include/asm-generic/asi.h | 21 ++++ include/linux/mm_types.h | 2 +- 4 files changed, 275 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 35421356584b..bdb2f70d4f85 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -44,6 +44,12 @@ struct asi { atomic64_t *tlb_gen; atomic64_t __tlb_gen; int64_t asi_ref_count; + rwlock_t user_map_lock; +}; + +struct asi_pgtbl_pool { + struct page *pgtbl_list; + uint count; }; DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); @@ -74,6 +80,19 @@ void asi_do_lazy_map(struct asi *asi, size_t addr); void asi_clear_user_pgd(struct mm_struct *mm, size_t addr); void asi_clear_user_p4d(struct mm_struct *mm, size_t addr); +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end); +void asi_unmap_user(struct asi *asi, void *va, size_t len); +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t flags); +void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool); + +static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) +{ + pool->pgtbl_list = NULL; + pool->count = 0; +} + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->intr_nest_depth = 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 29c74b6d4262..9b1bd005f343 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -86,6 +86,55 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +static ulong get_pgtbl_from_pool(struct asi_pgtbl_pool *pool) +{ + struct page *pgtbl; + + if (pool->count == 0) + return 0; + + pgtbl = pool->pgtbl_list; + pool->pgtbl_list = pgtbl->asi_pgtbl_pool_next; + pgtbl->asi_pgtbl_pool_next = NULL; + pool->count--; + + return (ulong)page_address(pgtbl); +} + +static void return_pgtbl_to_pool(struct asi_pgtbl_pool *pool, ulong virt) +{ + struct page *pgtbl = virt_to_page(virt); + + pgtbl->asi_pgtbl_pool_next = pool->pgtbl_list; + pool->pgtbl_list = pgtbl; + pool->count++; +} + +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t flags) +{ + if (!static_cpu_has(X86_FEATURE_ASI)) + return 0; + + while (pool->count < count) { + ulong pgtbl = get_zeroed_page(flags); + + if (!pgtbl) + return -ENOMEM; + + return_pgtbl_to_pool(pool, pgtbl); + } + + return 0; +} +EXPORT_SYMBOL_GPL(asi_fill_pgtbl_pool); + +void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool) +{ + while (pool->count > 0) + free_page(get_pgtbl_from_pool(pool)); +} +EXPORT_SYMBOL_GPL(asi_clear_pgtbl_pool); + static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) { pgd_t *src = pgd_offset_pgd(src_table, addr); @@ -110,10 +159,12 @@ static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ static level##_t * asi_##level##_alloc(struct asi *asi, \ base##_t *base, ulong addr, \ - gfp_t flags) \ + gfp_t flags, \ + struct asi_pgtbl_pool *pool) \ { \ if (unlikely(base##_none(*base))) { \ - ulong pgtbl = get_zeroed_page(flags); \ + ulong pgtbl = pool ? get_pgtbl_from_pool(pool) \ + : get_zeroed_page(flags); \ phys_addr_t pgtbl_pa; \ \ if (pgtbl == 0) \ @@ -127,7 +178,10 @@ static level##_t * asi_##level##_alloc(struct asi *asi, \ mm_inc_nr_##level##s(asi->mm); \ } else { \ paravirt_release_##level(PHYS_PFN(pgtbl_pa)); \ - free_page(pgtbl); \ + if (pool) \ + return_pgtbl_to_pool(pool, pgtbl); \ + else \ + free_page(pgtbl); \ } \ \ /* NOP on native. PV call on Xen. */ \ @@ -336,6 +390,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) asi->class = &asi_class[asi_index]; asi->mm = mm; asi->pcid_index = asi_index; + rwlock_init(&asi->user_map_lock); if (asi->class->flags & ASI_MAP_STANDARD_NONSENSITIVE) { uint i; @@ -650,11 +705,6 @@ static bool follow_physaddr(struct mm_struct *mm, size_t virt, /* * Map the given range into the ASI page tables. The source of the mapping * is the regular unrestricted page tables. - * Can be used to map any kernel memory. - * - * The caller MUST ensure that the source mapping will not change during this - * function. For dynamic kernel memory, this is generally ensured by mapping - * the memory within the allocator. * * If the source mapping is a large page and the range being mapped spans the * entire large page, then it will be mapped as a large page in the ASI page @@ -664,19 +714,17 @@ static bool follow_physaddr(struct mm_struct *mm, size_t virt, * destination page, but that should be ok for now, as usually in such cases, * the range would consist of a small-ish number of pages. */ -int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +int __asi_map(struct asi *asi, size_t start, size_t end, gfp_t gfp_flags, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) { size_t virt; - size_t start = (size_t)addr; - size_t end = start + len; size_t page_size; - if (!static_cpu_has(X86_FEATURE_ASI) || !asi) - return 0; - VM_BUG_ON(start & ~PAGE_MASK); - VM_BUG_ON(len & ~PAGE_MASK); - VM_BUG_ON(start < TASK_SIZE_MAX); + VM_BUG_ON(end & ~PAGE_MASK); + VM_BUG_ON(end > allowed_end); + VM_BUG_ON(start < allowed_start); gfp_flags &= GFP_RECLAIM_MASK; @@ -702,14 +750,15 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) continue; \ } \ \ - level = asi_##level##_alloc(asi, base, virt, gfp_flags);\ + level = asi_##level##_alloc(asi, base, virt, \ + gfp_flags, pool); \ if (!level) \ return -ENOMEM; \ \ if (page_size >= LEVEL##_SIZE && \ (level##_none(*level) || level##_leaf(*level)) && \ is_page_within_range(virt, LEVEL##_SIZE, \ - start, end)) { \ + allowed_start, allowed_end)) {\ page_size = LEVEL##_SIZE; \ phys &= LEVEL##_MASK; \ \ @@ -737,6 +786,26 @@ int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) return 0; } +/* + * Maps the given kernel address range into the ASI page tables. + * + * The caller MUST ensure that the source mapping will not change during this + * function. For dynamic kernel memory, this is generally ensured by mapping + * the memory within the allocator. + */ +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +{ + size_t start = (size_t)addr; + size_t end = start + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return 0; + + VM_BUG_ON(start < TASK_SIZE_MAX); + + return __asi_map(asi, start, end, gfp_flags, NULL, start, end); +} + int asi_map(struct asi *asi, void *addr, size_t len) { return asi_map_gfp(asi, addr, len, GFP_KERNEL); @@ -935,3 +1004,150 @@ void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) if (!pgtable_l5_enabled()) __asi_clear_user_pgd(mm, addr); } + +/* + * Maps the given userspace address range into the ASI page tables. + * + * The caller MUST ensure that the source mapping will not change during this + * function e.g. by synchronizing via MMU notifiers or acquiring the + * appropriate locks. + */ +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) +{ + int err; + size_t start = (size_t)addr; + size_t end = start + len; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return 0; + + VM_BUG_ON(end > TASK_SIZE_MAX); + + read_lock(&asi->user_map_lock); + err = __asi_map(asi, start, end, GFP_NOWAIT, pool, + allowed_start, allowed_end); + read_unlock(&asi->user_map_lock); + + return err; +} +EXPORT_SYMBOL_GPL(asi_map_user); + +static bool +asi_unmap_free_pte_range(struct asi_pgtbl_pool *pgtbls_to_free, + pte_t *pte, size_t addr, size_t end) +{ + do { + pte_clear(NULL, addr, pte); + } while (pte++, addr += PAGE_SIZE, addr != end); + + return true; +} + +#define DEFINE_ASI_UNMAP_FREE_RANGE(level, LEVEL, next_level, NEXT_LVL_SIZE) \ +static bool \ +asi_unmap_free_##level##_range(struct asi_pgtbl_pool *pgtbls_to_free, \ + level##_t *level, size_t addr, size_t end) \ +{ \ + bool unmapped = false; \ + size_t next; \ + \ + do { \ + next = level##_addr_end(addr, end); \ + if (level##_none(*level)) \ + continue; \ + \ + if (IS_ALIGNED(addr, LEVEL##_SIZE) && \ + IS_ALIGNED(next, LEVEL##_SIZE)) { \ + if (!level##_large(*level)) { \ + ulong pgtbl = level##_page_vaddr(*level); \ + struct page *page = virt_to_page(pgtbl); \ + \ + page->private = PG_LEVEL_##NEXT_LVL_SIZE; \ + return_pgtbl_to_pool(pgtbls_to_free, pgtbl); \ + } \ + level##_clear(level); \ + unmapped = true; \ + } else { \ + /* \ + * At this time, we don't have a case where we need to \ + * unmap a subset of a huge page. But that could arise \ + * in the future. In that case, we'll need to split \ + * the huge mapping here. \ + */ \ + if (WARN_ON(level##_large(*level))) \ + continue; \ + \ + unmapped |= asi_unmap_free_##next_level##_range( \ + pgtbls_to_free, \ + next_level##_offset(level, addr), \ + addr, next); \ + } \ + } while (level++, addr = next, addr != end); \ + \ + return unmapped; \ +} + +DEFINE_ASI_UNMAP_FREE_RANGE(pmd, PMD, pte, 4K) +DEFINE_ASI_UNMAP_FREE_RANGE(pud, PUD, pmd, 2M) +DEFINE_ASI_UNMAP_FREE_RANGE(p4d, P4D, pud, 1G) +DEFINE_ASI_UNMAP_FREE_RANGE(pgd, PGDIR, p4d, 512G) + +static bool asi_unmap_and_free_range(struct asi_pgtbl_pool *pgtbls_to_free, + struct asi *asi, size_t addr, size_t end) +{ + size_t next; + bool unmapped = false; + pgd_t *pgd = pgd_offset_pgd(asi->pgd, addr); + + BUILD_BUG_ON((void *)&((struct page *)NULL)->private == + (void *)&((struct page *)NULL)->asi_pgtbl_pool_next); + + if (pgtable_l5_enabled()) + return asi_unmap_free_pgd_range(pgtbls_to_free, pgd, addr, end); + + do { + next = pgd_addr_end(addr, end); + unmapped |= asi_unmap_free_p4d_range(pgtbls_to_free, + p4d_offset(pgd, addr), + addr, next); + } while (pgd++, addr = next, addr != end); + + return unmapped; +} + +void asi_unmap_user(struct asi *asi, void *addr, size_t len) +{ + static void (*const free_pgtbl_at_level[])(struct asi *, size_t) = { + NULL, + asi_free_pte, + asi_free_pmd, + asi_free_pud, + asi_free_p4d + }; + + struct asi_pgtbl_pool pgtbls_to_free = { 0 }; + size_t start = (size_t)addr; + size_t end = start + len; + bool unmapped; + + if (!static_cpu_has(X86_FEATURE_ASI) || !asi) + return; + + write_lock(&asi->user_map_lock); + unmapped = asi_unmap_and_free_range(&pgtbls_to_free, asi, start, end); + write_unlock(&asi->user_map_lock); + + if (unmapped) + asi_flush_tlb_range(asi, addr, len); + + while (pgtbls_to_free.count > 0) { + size_t pgtbl = get_pgtbl_from_pool(&pgtbls_to_free); + struct page *page = virt_to_page(pgtbl); + + VM_BUG_ON(page->private >= PG_LEVEL_NUM); + free_pgtbl_at_level[page->private](asi, pgtbl); + } +} +EXPORT_SYMBOL_GPL(asi_unmap_user); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 8513d0d7865a..fffb323d2a00 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -26,6 +26,7 @@ struct asi_hooks {}; struct asi {}; +struct asi_pgtbl_pool {}; static inline int asi_register_class(const char *name, uint flags, @@ -92,6 +93,26 @@ void asi_clear_user_pgd(struct mm_struct *mm, size_t addr) { } static inline void asi_clear_user_p4d(struct mm_struct *mm, size_t addr) { } +static inline +int asi_map_user(struct asi *asi, void *addr, size_t len, + struct asi_pgtbl_pool *pool, + size_t allowed_start, size_t allowed_end) +{ + return 0; +} + +static inline void asi_unmap_user(struct asi *asi, void *va, size_t len) { } + +static inline +int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t flags) +{ + return 0; +} + +static inline void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool) { } + +static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) { } + static inline void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 7d38229ca85c..c3f209720a84 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -198,7 +198,7 @@ struct page { /* Links the pages_to_free_async list */ struct llist_node async_free_node; - unsigned long _asi_pad_1; + struct page *asi_pgtbl_pool_next; u64 asi_tlb_gen; union { From patchwork Wed Feb 23 05:22:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E8A7C433F5 for ; Wed, 23 Feb 2022 05:24:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 673708D0005; Wed, 23 Feb 2022 00:24:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D4198D0001; Wed, 23 Feb 2022 00:24:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4741C8D0005; Wed, 23 Feb 2022 00:24:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0177.hostedemail.com [216.40.44.177]) by kanga.kvack.org (Postfix) with ESMTP id 389CB8D0001 for ; Wed, 23 Feb 2022 00:24:56 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id ECA4A9D647 for ; Wed, 23 Feb 2022 05:24:55 +0000 (UTC) X-FDA: 79172905350.23.6BAB32D Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf22.hostedemail.com (Postfix) with ESMTP id 83787C0003 for ; Wed, 23 Feb 2022 05:24:55 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d6d36ec646so129197197b3.23 for ; Tue, 22 Feb 2022 21:24:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Wjvyoa2FJqkBAKFP17bCuJdeRktO4ffR+raoUv+wC04=; b=IuC+day3sN1PmeI3q0+DeqehWlmR6vmu4lqtMz2zTkSs/sLWgZjKSMjbDxjarGYK3w hUZZQKn00coi8l0R0gTq63im2iXgDqy8kS7ZoG/GfBb24X2etuAVLlEbTVnHHQVOpffA 7kJw+tCwaYdWlS1krugMDGNM7sJ3OTIDt7ytbn7/z+HK1a8ew2aSceVwKePXDv9f7WZz m0Q/zwZ6fjCsUI3xB5Psf0pqjVkAy2TXCzeSjVt+m4RXqs776AhpaQ80FhrK9Kk/bCDo Sz1kG0HnL9LnI2+RNBaFmgg3coHxCTDPj+PGc/Rc5iXJGNpAzaSB6KONLlzImp2qWQaO osLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Wjvyoa2FJqkBAKFP17bCuJdeRktO4ffR+raoUv+wC04=; b=j7IEadEwgc3ZmESJzbrOAlNIqy3g8XQ2uTcTEswudpMvykRe3p9VKgY1ncRav5WyKP y/i/PhbsnT05bjediIObClGIFHBLQYDDJpzJfjY2nmKZaEzSZ+cenC+bm297GRrYoCyN szex0j7dqJcYN/1LcLg0EBjz6aFZRTeSBLwkHGTnLsIeGsUZRK/XcSDsUtoKqaQmrKAN 0T5gg46WKca2liuOx60I5Iwg87CB894Nvp87CjcJpioRBUO2I4WG/CcMzox1AkK0j9md tKTOAAoWLfzz9dczv6NXbpaKw297UVZeRq+Rr8iTbwaAH9Lo/UbhEBM6Zwr7OKQIuJLj DMJA== X-Gm-Message-State: AOAM5316czYMgZIhbcg1sxJufeiGT/afBQwViddB7WLx5ngY96uPC0fK gD/Tdbn1wXZf8an+oEE7q0TNRWitjM3C X-Google-Smtp-Source: ABdhPJyc5lR4Rjfv+Xgr2/PVDwYzQX3XdlHXcXxiuT59EhJPeYW9yxuIt2bJoEcFUWJJKvKStyvmN4Ecr+Gy X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:e4c2:0:b0:2d4:da21:cc07 with SMTP id n185-20020a0de4c2000000b002d4da21cc07mr27147139ywe.16.1645593894760; Tue, 22 Feb 2022 21:24:54 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:07 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-32-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 31/47] mm: asi: Support for non-sensitive SLUB caches From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 83787C0003 X-Stat-Signature: xxcuj5sjw3buthiywgaiamxhqxoj8gij Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=IuC+day3; spf=pass (imf22.hostedemail.com: domain of 3JsUVYgcKCB8EPI5D8NBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3JsUVYgcKCB8EPI5D8NBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593895-832208 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds support for allocating global and local non-sensitive objects using the SLUB allocator. Similar to SLAB, per-process child caches are created for locally non-sensitive allocations. This mechanism is based on a modified form of the earlier implementation of per-memcg caches. Signed-off-by: Junaid Shahid --- include/linux/slub_def.h | 6 ++ mm/slab.h | 5 ++ mm/slab_common.c | 33 +++++++-- mm/slub.c | 140 ++++++++++++++++++++++++++++++++++++++- security/Kconfig | 3 +- 5 files changed, 179 insertions(+), 8 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 0fa751b946fa..6e185b61582c 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -137,6 +137,12 @@ struct kmem_cache { struct kasan_cache kasan_info; #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct kmem_local_cache_info local_cache_info; + /* For propagation, maximum size of a stored attr */ + unsigned int max_attr_size; +#endif + unsigned int useroffset; /* Usercopy region offset */ unsigned int usersize; /* Usercopy region size */ diff --git a/mm/slab.h b/mm/slab.h index b9e11038be27..8799bcdd2fff 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -216,6 +216,7 @@ int __kmem_cache_shutdown(struct kmem_cache *); void __kmem_cache_release(struct kmem_cache *); int __kmem_cache_shrink(struct kmem_cache *); void slab_kmem_cache_release(struct kmem_cache *); +void kmem_cache_shrink_all(struct kmem_cache *s); struct seq_file; struct file; @@ -344,6 +345,7 @@ void restore_page_nonsensitive_metadata(struct page *page, } void set_nonsensitive_cache_params(struct kmem_cache *s); +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root); #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ @@ -380,6 +382,9 @@ static inline void restore_page_nonsensitive_metadata(struct page *page, static inline void set_nonsensitive_cache_params(struct kmem_cache *s) { } +static inline +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) { } + #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ #ifdef CONFIG_MEMCG_KMEM diff --git a/mm/slab_common.c b/mm/slab_common.c index b486b72d6344..efa61b97902a 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -142,7 +142,7 @@ int __kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t nr, LIST_HEAD(slab_root_caches); -static void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) +void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) { if (root) { s->local_cache_info.root_cache = root; @@ -194,9 +194,6 @@ void set_nonsensitive_cache_params(struct kmem_cache *s) #else -static inline -void init_local_cache_info(struct kmem_cache *s, struct kmem_cache *root) { } - static inline void cleanup_local_cache_info(struct kmem_cache *s) { } #endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ @@ -644,6 +641,34 @@ int kmem_cache_shrink(struct kmem_cache *cachep) } EXPORT_SYMBOL(kmem_cache_shrink); +/** + * kmem_cache_shrink_all - shrink a cache and all child caches for root cache + * @s: The cache pointer + */ +void kmem_cache_shrink_all(struct kmem_cache *s) +{ + struct kmem_cache *c; + + if (!static_asi_enabled() || !is_root_cache(s)) { + kmem_cache_shrink(s); + return; + } + + kasan_cache_shrink(s); + __kmem_cache_shrink(s); + + /* + * We have to take the slab_mutex to protect from the child cache list + * modification. + */ + mutex_lock(&slab_mutex); + for_each_child_cache(c, s) { + kasan_cache_shrink(c); + __kmem_cache_shrink(c); + } + mutex_unlock(&slab_mutex); +} + bool slab_is_available(void) { return slab_state >= UP; diff --git a/mm/slub.c b/mm/slub.c index abe7db581d68..df0191f8b0e2 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -289,6 +289,21 @@ static void debugfs_slab_add(struct kmem_cache *); static inline void debugfs_slab_add(struct kmem_cache *s) { } #endif +#if defined(CONFIG_SYSFS) && defined(CONFIG_ADDRESS_SPACE_ISOLATION) +static void propagate_slab_attrs_from_parent(struct kmem_cache *s); +static void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len); +#else +static inline void propagate_slab_attrs_from_parent(struct kmem_cache *s) { } + +static inline +void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len) +{ } +#endif + static inline void stat(const struct kmem_cache *s, enum stat_item si) { #ifdef CONFIG_SLUB_STATS @@ -2015,6 +2030,7 @@ static void __free_slab(struct kmem_cache *s, struct page *page) if (current->reclaim_state) current->reclaim_state->reclaimed_slab += pages; unaccount_slab_page(page, order, s); + restore_page_nonsensitive_metadata(page, s); __free_pages(page, order); } @@ -4204,6 +4220,8 @@ static int kmem_cache_open(struct kmem_cache *s, slab_flags_t flags) } } + set_nonsensitive_cache_params(s); + #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \ defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) if (system_has_cmpxchg_double() && (s->flags & SLAB_NO_CMPXCHG) == 0) @@ -4797,6 +4815,10 @@ static struct kmem_cache * __init bootstrap(struct kmem_cache *static_cache) #endif } list_add(&s->list, &slab_caches); + init_local_cache_info(s, NULL); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + list_del(&static_cache->root_caches_node); +#endif return s; } @@ -4863,7 +4885,7 @@ struct kmem_cache * __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, slab_flags_t flags, void (*ctor)(void *)) { - struct kmem_cache *s; + struct kmem_cache *s, *c; s = find_mergeable(size, align, flags, name, ctor); if (s) { @@ -4876,6 +4898,11 @@ __kmem_cache_alias(const char *name, unsigned int size, unsigned int align, s->object_size = max(s->object_size, size); s->inuse = max(s->inuse, ALIGN(size, sizeof(void *))); + for_each_child_cache(c, s) { + c->object_size = s->object_size; + c->inuse = max(c->inuse, ALIGN(size, sizeof(void *))); + } + if (sysfs_slab_alias(s, name)) { s->refcount--; s = NULL; @@ -4889,6 +4916,9 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags) { int err; + if (!static_asi_enabled()) + flags &= ~SLAB_NONSENSITIVE; + err = kmem_cache_open(s, flags); if (err) return err; @@ -4897,6 +4927,8 @@ int __kmem_cache_create(struct kmem_cache *s, slab_flags_t flags) if (slab_state <= UP) return 0; + propagate_slab_attrs_from_parent(s); + err = sysfs_slab_add(s); if (err) { __kmem_cache_release(s); @@ -5619,7 +5651,7 @@ static ssize_t shrink_store(struct kmem_cache *s, const char *buf, size_t length) { if (buf[0] == '1') - kmem_cache_shrink(s); + kmem_cache_shrink_all(s); else return -EINVAL; return length; @@ -5829,6 +5861,87 @@ static ssize_t slab_attr_show(struct kobject *kobj, return err; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + +static void propagate_slab_attrs_from_parent(struct kmem_cache *s) +{ + int i; + char *buffer = NULL; + struct kmem_cache *root_cache; + + if (is_root_cache(s)) + return; + + root_cache = s->local_cache_info.root_cache; + + /* + * This mean this cache had no attribute written. Therefore, no point + * in copying default values around + */ + if (!root_cache->max_attr_size) + return; + + for (i = 0; i < ARRAY_SIZE(slab_attrs); i++) { + char mbuf[64]; + char *buf; + struct slab_attribute *attr = to_slab_attr(slab_attrs[i]); + ssize_t len; + + if (!attr || !attr->store || !attr->show) + continue; + + /* + * It is really bad that we have to allocate here, so we will + * do it only as a fallback. If we actually allocate, though, + * we can just use the allocated buffer until the end. + * + * Most of the slub attributes will tend to be very small in + * size, but sysfs allows buffers up to a page, so they can + * theoretically happen. + */ + if (buffer) { + buf = buffer; + } else if (root_cache->max_attr_size < ARRAY_SIZE(mbuf) && + !IS_ENABLED(CONFIG_SLUB_STATS)) { + buf = mbuf; + } else { + buffer = (char *)get_zeroed_page(GFP_KERNEL); + if (WARN_ON(!buffer)) + continue; + buf = buffer; + } + + len = attr->show(root_cache, buf); + if (len > 0) + attr->store(s, buf, len); + } + + if (buffer) + free_page((unsigned long)buffer); +} + +static void propagate_slab_attr_to_children(struct kmem_cache *s, + struct attribute *attr, + const char *buf, size_t len) +{ + struct kmem_cache *c; + struct slab_attribute *attribute = to_slab_attr(attr); + + if (static_asi_enabled()) { + mutex_lock(&slab_mutex); + + if (s->max_attr_size < len) + s->max_attr_size = len; + + for_each_child_cache(c, s) + attribute->store(c, buf, len); + + mutex_unlock(&slab_mutex); + } +} + +#endif + static ssize_t slab_attr_store(struct kobject *kobj, struct attribute *attr, const char *buf, size_t len) @@ -5844,6 +5957,27 @@ static ssize_t slab_attr_store(struct kobject *kobj, return -EIO; err = attribute->store(s, buf, len); + + /* + * This is a best effort propagation, so this function's return + * value will be determined by the parent cache only. This is + * basically because not all attributes will have a well + * defined semantics for rollbacks - most of the actions will + * have permanent effects. + * + * Returning the error value of any of the children that fail + * is not 100 % defined, in the sense that users seeing the + * error code won't be able to know anything about the state of + * the cache. + * + * Only returning the error code for the parent cache at least + * has well defined semantics. The cache being written to + * directly either failed or succeeded, in which case we loop + * through the descendants with best-effort propagation. + */ + if (slab_state >= FULL && err >= 0 && is_root_cache(s)) + propagate_slab_attr_to_children(s, attr, buf, len); + return err; } @@ -5866,7 +6000,7 @@ static struct kset *slab_kset; static inline struct kset *cache_kset(struct kmem_cache *s) { - return slab_kset; + return is_root_cache(s) ? slab_kset : NULL; } #define ID_STR_LENGTH 64 diff --git a/security/Kconfig b/security/Kconfig index 070a948b5266..a5cfb09352b0 100644 --- a/security/Kconfig +++ b/security/Kconfig @@ -68,7 +68,8 @@ config PAGE_TABLE_ISOLATION config ADDRESS_SPACE_ISOLATION bool "Allow code to run with a reduced kernel address space" default n - depends on X86_64 && !UML && SLAB && !NEED_PER_CPU_KM + depends on X86_64 && !UML && !NEED_PER_CPU_KM + depends on SLAB || SLUB depends on !PARAVIRT depends on !MEMORY_HOTPLUG help From patchwork Wed Feb 23 05:22:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8AA8FC433EF for ; Wed, 23 Feb 2022 05:24:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B53BC8D0006; Wed, 23 Feb 2022 00:24:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B03058D0001; Wed, 23 Feb 2022 00:24:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9561C8D0006; Wed, 23 Feb 2022 00:24:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id 871F28D0001 for ; Wed, 23 Feb 2022 00:24:58 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 50873181BC43F for ; Wed, 23 Feb 2022 05:24:58 +0000 (UTC) X-FDA: 79172905476.16.86A9839 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf04.hostedemail.com (Postfix) with ESMTP id 7222C40004 for ; Wed, 23 Feb 2022 05:24:57 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id l3-20020a25ad43000000b0062462e2af34so11457911ybe.17 for ; Tue, 22 Feb 2022 21:24:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8mE7Rl7hvX81PshmRchAKXUw+5mI1RoqZfuGnjfnK/Y=; b=nQn+PHDiNT2rMQDvqUgnDcQMr757vUw1xTfeWksvd/uBs+Y7jycIPiRL7S70urPZ7R 8fkcrVfE/buziQatz1VzPrbZiKkVL88UDR47kWH9EmA5cOthED7jMA11HUYJo4xm65uX LbfLzZbh1vmkmQVGPzCZHhrvQkG2yMMs6jlVlPGs+f1MYI9eWUOQGRHF/yHWkKAQuG9+ 90/ZWnlFIBlEl9DgOrfwf+XgiBD2V3yGyekMz3rvqGQAKX6yRkMkTlYi4dCG11i6F1bh Phi+HrSt7E6OdAajjE7T2WGE+PvAmrOCFwLD3vGl5nSuUdeg1Ly1y8XU57nLBdCyfJFb Mrkw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8mE7Rl7hvX81PshmRchAKXUw+5mI1RoqZfuGnjfnK/Y=; b=ND4M0U7Iz/HjsBjjNnbcWtvsxcivEED2jiC91SMuicuJ8LZgT8wY7b5tQWbH2zKO5d MsQ0qC4MiDLaqh0cNNTW2ba4DGh4Sm4SCFP/fYkJFwFZzOZDFaMPd5oUJ0GQ9Q03drB9 BRMTgvSszRcxqAXXu+l/nnZlcB5rTxVF6OUwuHDpq76gqzutMh7yqwcTYJz3iVdXrXg5 DHrgIS0uYOBfMrridh63RAlsKhtyjJVPAI/+C9KGwNZyFWY4566Nzi1SBTwtXJr85/Qx fMD6oUFTx5my5dAlLr2ndVx+gmOiR9F4L6V2T8pbdBPBw8fVocIvTjdJeVg6x05RUvPI c+zQ== X-Gm-Message-State: AOAM531L08S3mvFr4buoZXxVI2cWPPGJVysUiROnd1YTaQ8CG7Jwvq/h OMDU7xOZ2KfVmu5pLWciGXpmDmSIaReV X-Google-Smtp-Source: ABdhPJw219sWzgLSxca1tmB8U/7JNof+qQBV1MFnda0VeJBWRGjfZ3IqT80R/0u8HdfUK/arZaU6tbXLeppv X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:df4e:0:b0:2d0:ab1e:6055 with SMTP id i75-20020a0ddf4e000000b002d0ab1e6055mr27301388ywe.333.1645593896772; Tue, 22 Feb 2022 21:24:56 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:08 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-33-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 32/47] x86: asi: Allocate FPU state separately when ASI is enabled. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Stat-Signature: mukh4pftua5zw1kz1jg6qbe4hazaapgm X-Rspam-User: Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=nQn+PHDi; spf=pass (imf04.hostedemail.com: domain of 3KMUVYgcKCCEGRK7FAPDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3KMUVYgcKCCEGRK7FAPDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 7222C40004 X-HE-Tag: 1645593897-69513 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We are going to be mapping the task_struct in the restricted ASI address space. However, the task_struct also contains the FPU register state embedded inside it, which can contain sensitive information. So when ASI is enabled, always allocate the FPU state from a separate slab cache to keep it out of task_struct. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/fpu/api.h | 1 + arch/x86/kernel/fpu/core.c | 45 ++++++++++++++++++++++++++++++++-- arch/x86/kernel/fpu/init.c | 7 ++++-- arch/x86/kernel/fpu/internal.h | 1 + arch/x86/kernel/fpu/xstate.c | 21 +++++++++++++--- arch/x86/kernel/process.c | 7 +++++- 6 files changed, 74 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index c2767a6a387e..6f5ca3c2ef4a 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -112,6 +112,7 @@ extern void fpu__init_cpu(void); extern void fpu__init_system(struct cpuinfo_x86 *c); extern void fpu__init_check_bugs(void); extern void fpu__resume_cpu(void); +extern void fpstate_cache_init(void); #ifdef CONFIG_MATH_EMULATION extern void fpstate_init_soft(struct swregs_state *soft); diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index 8ea306b1bf8e..d7859573973d 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -59,6 +59,8 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu); */ DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); +struct kmem_cache *fpstate_cachep; + static bool kernel_fpu_disabled(void) { return this_cpu_read(in_kernel_fpu); @@ -443,7 +445,9 @@ static void __fpstate_reset(struct fpstate *fpstate) void fpstate_reset(struct fpu *fpu) { /* Set the fpstate pointer to the default fpstate */ - fpu->fpstate = &fpu->__fpstate; + if (!cpu_feature_enabled(X86_FEATURE_ASI)) + fpu->fpstate = &fpu->__fpstate; + __fpstate_reset(fpu->fpstate); /* Initialize the permission related info in fpu */ @@ -464,6 +468,26 @@ static inline void fpu_inherit_perms(struct fpu *dst_fpu) } } +void fpstate_cache_init(void) +{ + if (cpu_feature_enabled(X86_FEATURE_ASI)) { + size_t fpstate_size; + + /* TODO: Is the ALIGN-64 really needed? */ + fpstate_size = fpu_kernel_cfg.default_size + + ALIGN(offsetof(struct fpstate, regs), 64); + + fpstate_cachep = kmem_cache_create_usercopy( + "fpstate", + fpstate_size, + __alignof__(struct fpstate), + SLAB_PANIC | SLAB_ACCOUNT, + offsetof(struct fpstate, regs), + fpu_kernel_cfg.default_size, + NULL); + } +} + /* Clone current's FPU state on fork */ int fpu_clone(struct task_struct *dst, unsigned long clone_flags) { @@ -473,6 +497,22 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags) /* The new task's FPU state cannot be valid in the hardware. */ dst_fpu->last_cpu = -1; + if (cpu_feature_enabled(X86_FEATURE_ASI)) { + dst_fpu->fpstate = kmem_cache_alloc_node( + fpstate_cachep, GFP_KERNEL, + page_to_nid(virt_to_page(dst))); + if (!dst_fpu->fpstate) + return -ENOMEM; + + /* + * TODO: We may be able to skip the copy since the registers are + * restored below anyway. + */ + memcpy(dst_fpu->fpstate, src_fpu->fpstate, + fpu_kernel_cfg.default_size + + offsetof(struct fpstate, regs)); + } + fpstate_reset(dst_fpu); if (!cpu_feature_enabled(X86_FEATURE_FPU)) @@ -531,7 +571,8 @@ int fpu_clone(struct task_struct *dst, unsigned long clone_flags) void fpu_thread_struct_whitelist(unsigned long *offset, unsigned long *size) { *offset = offsetof(struct thread_struct, fpu.__fpstate.regs); - *size = fpu_kernel_cfg.default_size; + *size = cpu_feature_enabled(X86_FEATURE_ASI) + ? 0 : fpu_kernel_cfg.default_size; } /* diff --git a/arch/x86/kernel/fpu/init.c b/arch/x86/kernel/fpu/init.c index 621f4b6cac4a..8b722bf98135 100644 --- a/arch/x86/kernel/fpu/init.c +++ b/arch/x86/kernel/fpu/init.c @@ -161,9 +161,11 @@ static void __init fpu__init_task_struct_size(void) /* * Add back the dynamically-calculated register state - * size. + * size, except when ASI is enabled, since in that case + * the FPU state is always allocated dynamically. */ - task_size += fpu_kernel_cfg.default_size; + if (!cpu_feature_enabled(X86_FEATURE_ASI)) + task_size += fpu_kernel_cfg.default_size; /* * We dynamically size 'struct fpu', so we require that @@ -223,6 +225,7 @@ static void __init fpu__init_init_fpstate(void) */ void __init fpu__init_system(struct cpuinfo_x86 *c) { + current->thread.fpu.fpstate = ¤t->thread.fpu.__fpstate; fpstate_reset(¤t->thread.fpu); fpu__init_system_early_generic(c); diff --git a/arch/x86/kernel/fpu/internal.h b/arch/x86/kernel/fpu/internal.h index dbdb31f55fc7..30acc7d0cb1a 100644 --- a/arch/x86/kernel/fpu/internal.h +++ b/arch/x86/kernel/fpu/internal.h @@ -3,6 +3,7 @@ #define __X86_KERNEL_FPU_INTERNAL_H extern struct fpstate init_fpstate; +extern struct kmem_cache *fpstate_cachep; /* CPU feature check wrappers */ static __always_inline __pure bool use_xsave(void) diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c index d28829403ed0..96d12f351f19 100644 --- a/arch/x86/kernel/fpu/xstate.c +++ b/arch/x86/kernel/fpu/xstate.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -1495,8 +1496,15 @@ arch_initcall(xfd_update_static_branch) void fpstate_free(struct fpu *fpu) { - if (fpu->fpstate && fpu->fpstate != &fpu->__fpstate) - vfree(fpu->fpstate); + WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_ASI) && + fpu->fpstate == &fpu->__fpstate); + + if (fpu->fpstate && fpu->fpstate != &fpu->__fpstate) { + if (fpu->fpstate->is_valloc) + vfree(fpu->fpstate); + else + kmem_cache_free(fpstate_cachep, fpu->fpstate); + } } /** @@ -1574,7 +1582,14 @@ static int fpstate_realloc(u64 xfeatures, unsigned int ksize, fpregs_unlock(); - vfree(curfps); + WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_ASI) && !curfps); + if (curfps) { + if (curfps->is_valloc) + vfree(curfps); + else + kmem_cache_free(fpstate_cachep, curfps); + } + return 0; } diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index c8d4a00a4de7..f9bd1c3415d4 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -80,6 +80,11 @@ EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); DEFINE_PER_CPU(bool, __tss_limit_invalid); EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); +void __init arch_task_cache_init(void) +{ + fpstate_cache_init(); +} + /* * this gets called so that we can store lazy state into memory and copy the * current task into the new thread. @@ -101,7 +106,7 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_X86_64 void arch_release_task_struct(struct task_struct *tsk) { - if (fpu_state_size_dynamic()) + if (fpu_state_size_dynamic() || cpu_feature_enabled(X86_FEATURE_ASI)) fpstate_free(&tsk->thread.fpu); } #endif From patchwork Wed Feb 23 05:22:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 859ACC433F5 for ; Wed, 23 Feb 2022 05:25:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C408E8D0007; Wed, 23 Feb 2022 00:25:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BEDD68D0001; Wed, 23 Feb 2022 00:25:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A1C668D0007; Wed, 23 Feb 2022 00:25:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0207.hostedemail.com [216.40.44.207]) by kanga.kvack.org (Postfix) with ESMTP id 8C0BE8D0001 for ; Wed, 23 Feb 2022 00:25:00 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 4E1F598C2F for ; Wed, 23 Feb 2022 05:25:00 +0000 (UTC) X-FDA: 79172905560.11.7CF1F1F Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf26.hostedemail.com (Postfix) with ESMTP id BCAB8140002 for ; Wed, 23 Feb 2022 05:24:59 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d6baed6aafso144148147b3.3 for ; Tue, 22 Feb 2022 21:24:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=vfUNGUzwKPwc6P8NLNU+G63bl3V3MlYghQJcmtOSBSE=; b=C/GDnUFZL4LltMsvS5FMOZ/XDx/Be18sR4fVDAdJWkBYRmDcKo9NUb1We9xS3LmtRX yqMgXlP8N7xZF8Ux9+U55HvDcm/rY0cp41ABtNZNYkR8wwwnlp5Tc/Z+WHg/tqCClya8 drGn7Okq0DGr0wllM266iYWMJ6XzagvRVWp/SuheBkEQDwBl5n0u5tK2RWK5glZ9ZfVv eEvwT9dopjeCYjYsqzOL8esps1cNs0pAqvlGBd0CeJIX0vdenIhUxI0GrChVSsd4iQJA j46UlOr9zWrPPZy9HcKzVHRGC9BZ83U6lLLZDxYZVj1AeRVrqyWEuBaLgII5jGj/PpWl RKzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vfUNGUzwKPwc6P8NLNU+G63bl3V3MlYghQJcmtOSBSE=; b=V1EFr6lcMz5pXR+F70VkUuoMz7QBZDZH/5AAQfYecfFbhe792SP0igFl1nart+IgmK xuO98o2B0izkX1luuzfQ5KJjEC9gQ5ELmnF9Q2NEAO80bipPJlgnVv0qnnmxw/wludus mnT1U7e/0YpsdouOIIY2mn6vOYafM3Ee2iaX0QB2/NJILUbOe0kms99B2mDbF0r+uT0V T5mW1qdzPtUeJBmWpqtxRpZmRHZ8ahD63CCpIT959hRjLD6cale4RdEbQx0uja3vZpCa 634l7x1clubIzYRR2ahoeyPeO7f48SlPHVAzavEwABSsHlYJfShR7ypWRAghQvOxC8zz tOoQ== X-Gm-Message-State: AOAM533sQTiHDQylJxhOIBaVvFPNllMHdL9brsX5IuS5AeZ1qRc98mVx ks/Zdi9wsdb3ORwffWrY4DrMv9m2brpm X-Google-Smtp-Source: ABdhPJzDIIPPMetTAHkgCVvjLqBOhwyQ9fOAQZEm98thqfQ3nuARy106JiYU5crLZ1jrdqAZ9ScyawbA4X0V X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:945:0:b0:2ca:287c:6cf3 with SMTP id 66-20020a810945000000b002ca287c6cf3mr26007076ywj.408.1645593899054; Tue, 22 Feb 2022 21:24:59 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:09 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-34-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 33/47] kvm: asi: Map guest memory into restricted ASI address space From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BCAB8140002 X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="C/GDnUFZ"; spf=pass (imf26.hostedemail.com: domain of 3K8UVYgcKCCQJUNAIDSGOOGLE.COMLINUX-MMKVACK.ORG@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3K8UVYgcKCCQJUNAIDSGOOGLE.COMLINUX-MMKVACK.ORG@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: yuamojy7p6rpzcm9bkwx989pwzz1rzs4 X-HE-Tag: 1645593899-715126 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: A module parameter treat_all_userspace_as_nonsensitive is added, which if set, maps the entire userspace of the process running the VM into the ASI restricted address space. If the flag is not set (the default), then just the userspace memory mapped into the VM's address space is mapped into the ASI restricted address space. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/mmu.h | 6 ++++ arch/x86/kvm/mmu/mmu.c | 54 +++++++++++++++++++++++++++++++++ arch/x86/kvm/mmu/paging_tmpl.h | 14 +++++++++ arch/x86/kvm/x86.c | 19 +++++++++++- include/linux/kvm_host.h | 3 ++ virt/kvm/kvm_main.c | 7 +++++ 7 files changed, 104 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 98cbd6447e3e..e63a2f244d7b 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -681,6 +681,8 @@ struct kvm_vcpu_arch { struct kvm_mmu_memory_cache mmu_gfn_array_cache; struct kvm_mmu_memory_cache mmu_page_header_cache; + struct asi_pgtbl_pool asi_pgtbl_pool; + /* * QEMU userspace and the guest each have their own FPU state. * In vcpu_run, we switch between the user and guest FPU contexts. diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 9ae6168d381e..60b84331007d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -49,6 +49,12 @@ #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP) +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern bool treat_all_userspace_as_nonsensitive; +#else +#define treat_all_userspace_as_nonsensitive true +#endif + static __always_inline u64 rsvd_bits(int s, int e) { BUILD_BUG_ON(__builtin_constant_p(e) && __builtin_constant_p(s) && e < s); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index fcdf3f8bb59a..485c0ba3ce8b 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -91,6 +91,11 @@ __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint"); static bool __read_mostly force_flush_and_sync_on_reuse; module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +bool __ro_after_init treat_all_userspace_as_nonsensitive; +module_param(treat_all_userspace_as_nonsensitive, bool, 0444); +#endif + /* * When setting this variable to true it enables Two-Dimensional-Paging * where the hardware walks 2 page tables: @@ -2757,6 +2762,21 @@ static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, return ret; } +static void asi_map_gfn_range(struct kvm_vcpu *vcpu, + struct kvm_memory_slot *slot, + gfn_t gfn, size_t npages) +{ + int err; + size_t hva = __gfn_to_hva_memslot(slot, gfn); + + err = asi_map_user(vcpu->kvm->asi, (void *)hva, PAGE_SIZE * npages, + &vcpu->arch.asi_pgtbl_pool, slot->userspace_addr, + slot->userspace_addr + slot->npages * PAGE_SIZE); + if (err) + kvm_err("asi_map_user for %lx-%lx failed with code %d", hva, + hva + PAGE_SIZE * npages, err); +} + static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, u64 *start, u64 *end) @@ -2776,6 +2796,9 @@ static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, if (ret <= 0) return -1; + if (!treat_all_userspace_as_nonsensitive) + asi_map_gfn_range(vcpu, slot, gfn, ret); + for (i = 0; i < ret; i++, gfn++, start++) { mmu_set_spte(vcpu, slot, start, access, gfn, page_to_pfn(pages[i]), NULL); @@ -3980,6 +4003,15 @@ static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, return true; } +static void vcpu_fill_asi_pgtbl_pool(struct kvm_vcpu *vcpu) +{ + int err = asi_fill_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool, + CONFIG_PGTABLE_LEVELS - 1, GFP_KERNEL); + + if (err) + kvm_err("asi_fill_pgtbl_pool failed with code %d", err); +} + /* * Returns true if the page fault is stale and needs to be retried, i.e. if the * root was invalidated by a memslot update or a relevant mmu_notifier fired. @@ -4013,6 +4045,7 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu); unsigned long mmu_seq; + bool try_asi_map; int r; fault->gfn = fault->addr >> PAGE_SHIFT; @@ -4038,6 +4071,12 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) return r; + try_asi_map = !treat_all_userspace_as_nonsensitive && + !is_noslot_pfn(fault->pfn); + + if (try_asi_map) + vcpu_fill_asi_pgtbl_pool(vcpu); + r = RET_PF_RETRY; if (is_tdp_mmu_fault) @@ -4052,6 +4091,9 @@ static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (r) goto out_unlock; + if (try_asi_map) + asi_map_gfn_range(vcpu, fault->slot, fault->gfn, 1); + if (is_tdp_mmu_fault) r = kvm_tdp_mmu_map(vcpu, fault); else @@ -5584,6 +5626,8 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; + asi_init_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); + ret = __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu); if (ret) return ret; @@ -5713,6 +5757,15 @@ static void kvm_mmu_invalidate_zap_pages_in_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, struct kvm_page_track_notifier_node *node) { + /* + * Currently, we just zap the entire address range, instead of only the + * memslot. So we also just asi_unmap the entire userspace. But in the + * future, if we zap only the range belonging to the memslot, then we + * should also asi_unmap only that range. + */ + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, 0, TASK_SIZE_MAX); + kvm_mmu_zap_all_fast(kvm); } @@ -6194,6 +6247,7 @@ void kvm_mmu_destroy(struct kvm_vcpu *vcpu) free_mmu_pages(&vcpu->arch.root_mmu); free_mmu_pages(&vcpu->arch.guest_mmu); mmu_free_memory_caches(vcpu); + asi_clear_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); } void kvm_mmu_module_exit(void) diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 708a5d297fe1..193317ad60a4 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -584,6 +584,9 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (is_error_pfn(pfn)) return false; + if (!treat_all_userspace_as_nonsensitive) + asi_map_gfn_range(vcpu, slot, gfn, 1); + mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL); kvm_release_pfn_clean(pfn); return true; @@ -836,6 +839,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault int r; unsigned long mmu_seq; bool is_self_change_mapping; + bool try_asi_map; pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); WARN_ON_ONCE(fault->is_tdp); @@ -890,6 +894,12 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) return r; + try_asi_map = !treat_all_userspace_as_nonsensitive && + !is_noslot_pfn(fault->pfn); + + if (try_asi_map) + vcpu_fill_asi_pgtbl_pool(vcpu); + /* * Do not change pte_access if the pfn is a mmio page, otherwise * we will cache the incorrect access into mmio spte. @@ -919,6 +929,10 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault r = make_mmu_pages_available(vcpu); if (r) goto out_unlock; + + if (try_asi_map) + asi_map_gfn_range(vcpu, fault->slot, walker.gfn, 1); + r = FNAME(fetch)(vcpu, fault, &walker); kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dd07f677d084..d0df14deae80 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8722,7 +8722,10 @@ int kvm_arch_init(void *opaque) goto out_free_percpu; if (ops->runtime_ops->flush_sensitive_cpu_state) { - r = asi_register_class("KVM", ASI_MAP_STANDARD_NONSENSITIVE, + r = asi_register_class("KVM", + ASI_MAP_STANDARD_NONSENSITIVE | + (treat_all_userspace_as_nonsensitive ? + ASI_MAP_ALL_USERSPACE : 0), &kvm_asi_hooks); if (r < 0) goto out_mmu_exit; @@ -9675,6 +9678,17 @@ void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, apic_address = gfn_to_hva(kvm, APIC_DEFAULT_PHYS_BASE >> PAGE_SHIFT); if (start <= apic_address && apic_address < end) kvm_make_all_cpus_request(kvm, KVM_REQ_APIC_PAGE_RELOAD); + + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, (void *)start, end - start); +} + +void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end) +{ + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, (void *)start, end - start); } void kvm_vcpu_reload_apic_access_page(struct kvm_vcpu *vcpu) @@ -11874,6 +11888,9 @@ void kvm_arch_commit_memory_region(struct kvm *kvm, void kvm_arch_flush_shadow_all(struct kvm *kvm) { + if (!treat_all_userspace_as_nonsensitive) + asi_unmap_user(kvm->asi, 0, TASK_SIZE_MAX); + kvm_mmu_zap_all(kvm); } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 9dd63ed21f75..f31f7442eced 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1819,6 +1819,9 @@ static inline long kvm_arch_vcpu_async_ioctl(struct file *filp, void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, unsigned long start, unsigned long end); +void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end); #ifdef CONFIG_HAVE_KVM_VCPU_RUN_PID_CHANGE int kvm_arch_vcpu_run_pid_change(struct kvm_vcpu *vcpu); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 72c4e6b39389..e8e9c8588908 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -162,6 +162,12 @@ __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, { } +__weak void kvm_arch_mmu_notifier_invalidate_range_start(struct kvm *kvm, + unsigned long start, + unsigned long end) +{ +} + bool kvm_is_zone_device_pfn(kvm_pfn_t pfn) { /* @@ -685,6 +691,7 @@ static int kvm_mmu_notifier_invalidate_range_start(struct mmu_notifier *mn, spin_unlock(&kvm->mn_invalidate_lock); __kvm_handle_hva_range(kvm, &hva_range); + kvm_arch_mmu_notifier_invalidate_range_start(kvm, range->start, range->end); return 0; } From patchwork Wed Feb 23 05:22:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9645CC433EF for ; Wed, 23 Feb 2022 05:25:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C9358D0020; Wed, 23 Feb 2022 00:25:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 279798D0001; Wed, 23 Feb 2022 00:25:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A9468D0020; Wed, 23 Feb 2022 00:25:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id E003B8D0001 for ; Wed, 23 Feb 2022 00:25:02 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 9F6CC9D677 for ; Wed, 23 Feb 2022 05:25:02 +0000 (UTC) X-FDA: 79172905644.19.663096A Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 3384EC0004 for ; Wed, 23 Feb 2022 05:25:02 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id x1-20020a25a001000000b0061c64ee0196so26708697ybh.9 for ; Tue, 22 Feb 2022 21:25:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ADygLqIPpXiMqBEcZG+FIIc+2sjjDL7sC0SphzjVQM8=; b=XTSLmsGvhvcUg1TWPppXBzPUQLRac82+QtTaWAJbfENnf+LFsbfGfYAYKSS2y7fP86 xw06QgEtf7pep5PqE6Q1aM+NyxahQ7SmgpjVkx80n1ORA4q/bqa1CVOsjpTMMWmZloKQ uucN4SETymYCsva4W1OHKwzmwTJIk9TZAOYPJ1y+ZPVUmyQQ1PbH7n1l3q9WUPnCWwcM Cb3CxmDXmSaVEz0TvEtPSskpBLRm0AocmFGmy3Znq/tmFUH/ahr88pzMeI+oucZvgSN3 VZ4LlqahzSCUtRhfh8h6InSjlirWGXuPDeznUvG9vYUdMzAnIxg/DQM0QaE6Bs8x3s0G p5Tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ADygLqIPpXiMqBEcZG+FIIc+2sjjDL7sC0SphzjVQM8=; b=H6N9kLSvpfhsonFP5EkW8Bvn/Bm5OsaXUlmaRhfp7DBPCw2aTt2KYngrOqPoIYu8Vs 2to0VS+WYNqMq2JzLalS8DlUVqfnxmaCSmSaKapha9CoT//76n7k+pTw0brB6MPvtH0L G4q622UODTDFwvFUSXjBtNb4F7eHUJ94/WmQm0tymfq0dmXM6U5zdc0eXF3p5GuxIVhf NVH4CS84UMl7dZuIIQOBK4DRC6+ZkLvzeGJa+LMs0r6+smsigvvfS0ubypMX2Z8/ozi+ fJXJra6eF5Ffu2TpSlodjTu5z7uh02UqZ/SF/SjkrSqRKkkZISBouk/DmroiG9sWsE60 yS7w== X-Gm-Message-State: AOAM532pPLB8m571fuUlG+dcW/w/tz3wED3HsH96PGftHCNhPKBoOgC8 pqv6ZKjoOvhtUdAePbQhpOG21IXLdcLI X-Google-Smtp-Source: ABdhPJxnEaZg6RVeib/qxGX0ultucEuvlDuM1/3KZuHHbAgoAGzQNWHho88GWzU0EQgaqkKLhw8qsZUiJNi/ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:354f:0:b0:2d0:e91f:c26 with SMTP id c76-20020a81354f000000b002d0e91f0c26mr27033178ywa.318.1645593901360; Tue, 22 Feb 2022 21:25:01 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:10 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-35-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 34/47] kvm: asi: Unmap guest memory from ASI address space when using nested virt From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, oweisse@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=XTSLmsGv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3LcUVYgcKCCYLWPCKFUIQQING.EQONKPWZ-OOMXCEM.QTI@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3LcUVYgcKCCYLWPCKFUIQQING.EQONKPWZ-OOMXCEM.QTI@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 3384EC0004 X-Stat-Signature: sbdgxb5ffy8z5quurff1d1idn4rwgwm3 X-HE-Tag: 1645593902-782031 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: L1 guest memory as a whole cannot be considered non-sensitive when an L2 is running. Even if L1 is using its own mitigations, L2 VM Exits could, in theory, bring into the cache some sensitive L1 memory without L1 getting a chance to flush it. For simplicity, we just unmap the entire L1 memory from the ASI restricted address space when nested virtualization is turned on. Though this is overridden if the treat_all_userspace_as_nonsensitive flag is enabled. In the future, we could potentially map some portions of L1 memory which are known to contain non-sensitive memory, which would reduce ASI overhead during nested virtualization. Note that unmapping the guest memory still leaves a slight hole because L2 could also potentially access copies of L1 VCPU registers stored in L0 kernel structures. In the future, this could be mitigated by having a separate ASI address space for each VCPU and treating the associated structures as locally non-sensitive only within that VCPU's ASI address space. Signed-off-by: Junaid Shahid --- arch/x86/include/asm/kvm_host.h | 6 ++++++ arch/x86/kvm/mmu/mmu.c | 10 ++++++++++ arch/x86/kvm/vmx/nested.c | 22 ++++++++++++++++++++++ 3 files changed, 38 insertions(+) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index e63a2f244d7b..8ba88bbcf895 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1200,6 +1200,12 @@ struct kvm_arch { */ struct list_head tdp_mmu_pages; + /* + * Number of VCPUs that have enabled nested virtualization. + * Currently only maintained when ASI is enabled. + */ + int nested_virt_enabled_count; + /* * Protects accesses to the following fields when the MMU lock * is held in read mode: diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 485c0ba3ce8b..5785a0d02558 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -94,6 +94,7 @@ module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); #ifdef CONFIG_ADDRESS_SPACE_ISOLATION bool __ro_after_init treat_all_userspace_as_nonsensitive; module_param(treat_all_userspace_as_nonsensitive, bool, 0444); +EXPORT_SYMBOL_GPL(treat_all_userspace_as_nonsensitive); #endif /* @@ -2769,6 +2770,15 @@ static void asi_map_gfn_range(struct kvm_vcpu *vcpu, int err; size_t hva = __gfn_to_hva_memslot(slot, gfn); + /* + * For now, we just don't map any guest memory when using nested + * virtualization. In the future, we could potentially map some + * portions of guest memory which are known to contain only memory + * which would be considered non-sensitive. + */ + if (vcpu->kvm->arch.nested_virt_enabled_count) + return; + err = asi_map_user(vcpu->kvm->asi, (void *)hva, PAGE_SIZE * npages, &vcpu->arch.asi_pgtbl_pool, slot->userspace_addr, slot->userspace_addr + slot->npages * PAGE_SIZE); diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 9c941535f78c..0a0092e4102d 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -318,6 +318,14 @@ static void free_nested(struct kvm_vcpu *vcpu) nested_release_evmcs(vcpu); free_loaded_vmcs(&vmx->nested.vmcs02); + + if (cpu_feature_enabled(X86_FEATURE_ASI) && + !treat_all_userspace_as_nonsensitive) { + write_lock(&vcpu->kvm->mmu_lock); + WARN_ON(vcpu->kvm->arch.nested_virt_enabled_count <= 0); + vcpu->kvm->arch.nested_virt_enabled_count--; + write_unlock(&vcpu->kvm->mmu_lock); + } } /* @@ -4876,6 +4884,20 @@ static int enter_vmx_operation(struct kvm_vcpu *vcpu) pt_update_intercept_for_msr(vcpu); } + if (cpu_feature_enabled(X86_FEATURE_ASI) && + !treat_all_userspace_as_nonsensitive) { + /* + * We do the increment under the MMU lock in order to prevent + * it from happening concurrently with asi_map_gfn_range(). + */ + write_lock(&vcpu->kvm->mmu_lock); + WARN_ON(vcpu->kvm->arch.nested_virt_enabled_count < 0); + vcpu->kvm->arch.nested_virt_enabled_count++; + write_unlock(&vcpu->kvm->mmu_lock); + + asi_unmap_user(vcpu->kvm->asi, 0, TASK_SIZE_MAX); + } + return 0; out_shadow_vmcs: From patchwork Wed Feb 23 05:22:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B005CC433F5 for ; Wed, 23 Feb 2022 05:25:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C4178D0008; Wed, 23 Feb 2022 00:25:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 44CC18D0001; Wed, 23 Feb 2022 00:25:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 314068D0008; Wed, 23 Feb 2022 00:25:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0036.hostedemail.com [216.40.44.36]) by kanga.kvack.org (Postfix) with ESMTP id 223A08D0001 for ; Wed, 23 Feb 2022 00:25:05 -0500 (EST) Received: from smtpin15.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C85DD98C2F for ; Wed, 23 Feb 2022 05:25:04 +0000 (UTC) X-FDA: 79172905728.15.D33C22F Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf11.hostedemail.com (Postfix) with ESMTP id 586D040003 for ; Wed, 23 Feb 2022 05:25:04 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id b12-20020a056902030c00b0061d720e274aso26587805ybs.20 for ; Tue, 22 Feb 2022 21:25:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=tRhCbGDitng8H5bHVni9LeLhw+OutL5exoDLV2q/On1vnMgXPuEEjcg+LbKI1cVqrD Vyo7V47FL7/0MDpSopFI7nReIbg7le3SY0oaOVkKdiMGKdDx7pdDSJXZaaKKRg410xu2 t4RPD/m0Rz3825891oGpbAo2zhoskX0LmYLk/s6S6UydDxkiw0PDjI+nunjVplgqDwrA +5Jz1StT9q9LpFCspfDpRycEYKD4vtMQDfcSS06Robo3QGT5pJP4suYnnRPaHrjD0ty7 l5qUqZO6zEun+3vP4Gykt/1OOPF+J93aOR22CevdnVkqI4Do06KPWQJ6jdbyHleq2VP2 sP7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=z2A7zhGCAXM89Mm8qf+swCLr5Dn/epWs5SXVvIpUsek=; b=o9Y8ZdPyfS8FT9QoHshlvGrYwD3GVQ12y4WQwgJu/ORUESKiVJGspfZt4f0v79vUGo IxJ3mgiPjNfv7Sh7N6+HxSI4UABAwWXK4SPhinsWBQhjXN4C2MaH0Gt4xPZ4DOYI1/mb Bj0Ka9xdUpu3oRWIXtxpeUUUz31tCby5ZgHjv45mDu8O4UgP6aGNX369S6ERbyphgrn6 NNxaGre/H3yUV2XOoBP3LPHBYiaiGe5hBpPVxw/1SYHgbFczjpRJetAWPC14Bl7Hyc42 2K1MbHYhmtj5lKbRimgAzJ8rhchGFT7zZNu8U+cIARdCBioRoPltaxAs3d3/h0mg4ZZI ZvsQ== X-Gm-Message-State: AOAM531i2gecPDb9e9SmxzoVW167GW5WyFGCCOnam2Z4ptp5xnEfpqco imtHTSZy9rwGIDZ8AgtSaJoBrm1K4DrH X-Google-Smtp-Source: ABdhPJxQh2hzOmUCmk3wlhtqctWGzojDMn80W5x6yZGJyfVoEE9kdvLHRV4rxoztBsJ+fy5+R2LvtwTwKQ0F X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:eb09:0:b0:2d1:e0df:5104 with SMTP id u9-20020a0deb09000000b002d1e0df5104mr27669944ywe.250.1645593903681; Tue, 22 Feb 2022 21:25:03 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:11 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-36-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 35/47] mm: asi: asi_exit() on PF, skip handling if address is accessible From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 586D040003 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tRhCbGDi; spf=pass (imf11.hostedemail.com: domain of 3L8UVYgcKCCgNYREMHWKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--junaids.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3L8UVYgcKCCgNYREMHWKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: uhmsq7uwng8wzh3pfuqwgmf7stzj71rp X-HE-Tag: 1645593904-920069 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse On a page-fault - do asi_exit(). Then check if now after the exit the address is accessible. We do this by refactoring spurious_kernel_fault() into two parts: 1. Verify that the error code value is something that could arise from a lazy TLB update. 2. Walk the page table and verify permissions, which is now called is_address_accessible_now(). We also define PTE_PRESENT() and PMD_PRESENT() which are suitable for checking userspace pages. For the sake of spurious faualts, pte_present() and pmd_present() are only good for kernelspace pages. This is because these macros might return true even if the present bit is 0 (only relevant for userspace). Signed-off-by: Ofir Weisse --- arch/x86/mm/fault.c | 60 ++++++++++++++++++++++++++++++++++------ include/linux/mm_types.h | 3 ++ 2 files changed, 55 insertions(+), 8 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 8692eb50f4a5..d08021ba380b 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -982,6 +982,8 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) return 1; } +static int is_address_accessible_now(unsigned long error_code, unsigned long address, + pgd_t *pgd); /* * Handle a spurious fault caused by a stale TLB entry. * @@ -1003,15 +1005,13 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3 * (Optional Invalidation). */ +/* A spurious fault is also possible when Address Space Isolation (ASI) is in + * use. Specifically, code running withing an ASI domain touched memory outside + * the domain. This access causes a page-fault --> asi_exit() */ static noinline int spurious_kernel_fault(unsigned long error_code, unsigned long address) { pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - int ret; /* * Only writes to RO or instruction fetches from NX may cause @@ -1027,6 +1027,37 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return 0; pgd = init_mm.pgd + pgd_index(address); + return is_address_accessible_now(error_code, address, pgd); +} +NOKPROBE_SYMBOL(spurious_kernel_fault); + + +/* Check if an address (kernel or userspace) would cause a page fault if + * accessed now. + * + * For kernel addresses, pte_present and pmd_present are sufficioent. For + * userspace, we must use PTE_PRESENT and PMD_PRESENT, which will only check the + * present bits. + * The existing pmd_present() in arch/x86/include/asm/pgtable.h is misleading. + * The PMD page might be in the middle of split_huge_page with present bit + * clear, but pmd_present will still return true. We are inteerested in knowing + * if the page is accessible to hardware - that is - the present bit is 1. */ +#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT) + +/* pte_present will return true is _PAGE_PROTNONE is 1. We care if the hardware + * can actually access the page right now. */ +#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT) + +static noinline int +is_address_accessible_now(unsigned long error_code, unsigned long address, + pgd_t *pgd) +{ + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + if (!pgd_present(*pgd)) return 0; @@ -1045,14 +1076,14 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return spurious_kernel_fault_check(error_code, (pte_t *) pud); pmd = pmd_offset(pud, address); - if (!pmd_present(*pmd)) + if (!PMD_PRESENT(*pmd)) return 0; if (pmd_large(*pmd)) return spurious_kernel_fault_check(error_code, (pte_t *) pmd); pte = pte_offset_kernel(pmd, address); - if (!pte_present(*pte)) + if (!PTE_PRESENT(*pte)) return 0; ret = spurious_kernel_fault_check(error_code, pte); @@ -1068,7 +1099,6 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return ret; } -NOKPROBE_SYMBOL(spurious_kernel_fault); int show_unhandled_signals = 1; @@ -1504,6 +1534,20 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) * the fixup on the next page fault. */ struct asi *asi = asi_get_current(); + if (asi) + asi_exit(); + + /* handle_page_fault() might call BUG() if we run it for a kernel + * address. This might be the case if we got here due to an ASI fault. + * We avoid this case by checking whether the address is now, after a + * potential asi_exit(), accessible by hardware. If it is - there's + * nothing to do. + */ + if (current && mm_asi_enabled(current->mm)) { + pgd_t *pgd = (pgd_t*)__va(read_cr3_pa()) + pgd_index(address); + if (is_address_accessible_now(error_code, address, pgd)) + return; + } prefetchw(¤t->mm->mmap_lock); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c3f209720a84..560909e80841 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -707,6 +707,9 @@ extern struct mm_struct init_mm; #ifdef CONFIG_ADDRESS_SPACE_ISOLATION static inline bool mm_asi_enabled(struct mm_struct *mm) { + if (!mm) + return false; + return mm->asi_enabled; } #else From patchwork Wed Feb 23 05:22:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2D26C433FE for ; Wed, 23 Feb 2022 05:25:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7FA488D0009; Wed, 23 Feb 2022 00:25:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 782258D0001; Wed, 23 Feb 2022 00:25:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5FC298D0009; Wed, 23 Feb 2022 00:25:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 3B4138D0001 for ; Wed, 23 Feb 2022 00:25:07 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0752723033 for ; Wed, 23 Feb 2022 05:25:07 +0000 (UTC) X-FDA: 79172905854.05.9E36E0B Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf01.hostedemail.com (Postfix) with ESMTP id 9AF5740002 for ; Wed, 23 Feb 2022 05:25:06 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d0a5824ec6so162115767b3.0 for ; Tue, 22 Feb 2022 21:25:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Bg8ARgAvpxqrVZujUdMBpF3O5GREYgeeic1zpLXlEAs=; b=ZkYcibzizevz7miAILMO787yJLGRxl/cgzlUmAaY/wAepMJq6EHrvI2zYY3SBnMhsX y7RWUVRNbxiaiAE1t5uCyYG9PImJ+/9dR0KGIg93jTvqY0XFQDy403oNMPINt668O0wX lN1x9DKfkMN5phGLIKP2guH1oM8X+oF56FSnIjn/s1bc0sFaDAvUiQYbfWLnDYkAyeoI kOK471SB7zkEDfAJTXNkMVhdXnFBGV5vBVLgIoK40R3LZcAduulDDg2qjj+85oLE8w4w JUmk2whW1jd4H91VlFkZkcbARwnTRCiIuS1DGvK4lzbHyv57ehOmB1Yejn5DI2lnROWR Cvjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Bg8ARgAvpxqrVZujUdMBpF3O5GREYgeeic1zpLXlEAs=; b=JCgISFmuak6g9dy88/N6IG462r4vFpOHM18OsSNZMs1cSVbk5R5sc+gRgNwIeK7wZ6 MVeYtJQLxaAYu+XZDt+Auwqx6ykJpDet/ETN4mN41xYPuhvp2Mxw4IBxOo/9/csEbvGv ZLvK+sPsqywdSEyK+9okqKG5L2OPdP+xyor8nd3SLy9ITknVHwPXgHbVatvxCnc+Cab/ kiTxVrTfNnEXoMfWh+UklRC4gU/17wGpC9RTiVcvHvPyx8hvlcSaEZBm/TZ/7aSq/tLH 3WRRY/6HK15NWo56d2ApeMIo8hQnUPEPG0fGR1u6KoaSanj/IFXa2tmgBXnVU71Sohl6 TTJg== X-Gm-Message-State: AOAM533FMbJRBLncKaw2QeKEiFDOdKh7VdWtzEACj2k51cfM5PNNBNlG 5rK7XCvc6L/orn6xbwfDtCLnxflG48wz X-Google-Smtp-Source: ABdhPJxyXiC19n49tKMSkgiaHfTQpZqUfpkjxmpDblETVmyKjxfScd3DY0k63b7nH5huwuEYpTcB2HPTnvtH X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6993:0:b0:624:55af:336c with SMTP id e141-20020a256993000000b0062455af336cmr19351739ybc.412.1645593905875; Tue, 22 Feb 2022 21:25:05 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:12 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-37-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 36/47] mm: asi: Adding support for dynamic percpu ASI allocations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZkYcibzi; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3McUVYgcKCCoPaTGOJYMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3McUVYgcKCCoPaTGOJYMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9AF5740002 X-Stat-Signature: xftygpadn1ser6qzewac5brfhjr59wi8 X-HE-Tag: 1645593906-883272 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Adding infrastructure to support pcpu_alloc with gfp flag of __GFP_GLOBAL_NONSENSITIVE. We use a similar mechanism as the earlier infrastructure for memcg percpu allocations and add pcpu type PCPU_CHUNK_ASI_NONSENSITIVE. pcpu_chunk_list(PCPU_CHUNK_ASI_NONSENSITIVE) will return a list of ASI nonsensitive percpu chunks, allowing most of the code to be unchanged. Signed-off-by: Ofir Weisse --- mm/percpu-internal.h | 23 ++++++- mm/percpu-km.c | 5 +- mm/percpu-vm.c | 6 +- mm/percpu.c | 139 ++++++++++++++++++++++++++++++++++--------- 4 files changed, 141 insertions(+), 32 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 639662c20c82..2fac01114edc 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -5,6 +5,15 @@ #include #include +enum pcpu_chunk_type { + PCPU_CHUNK_ROOT, +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + PCPU_CHUNK_ASI_NONSENSITIVE, +#endif + PCPU_NR_CHUNK_TYPES, + PCPU_FAIL_ALLOC = PCPU_NR_CHUNK_TYPES +}; + /* * pcpu_block_md is the metadata block struct. * Each chunk's bitmap is split into a number of full blocks. @@ -59,6 +68,9 @@ struct pcpu_chunk { #ifdef CONFIG_MEMCG_KMEM struct obj_cgroup **obj_cgroups; /* vector of object cgroups */ #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + bool is_asi_nonsensitive; /* ASI nonsensitive chunk */ +#endif int nr_pages; /* # of pages served by this chunk */ int nr_populated; /* # of populated pages */ @@ -68,7 +80,7 @@ struct pcpu_chunk { extern spinlock_t pcpu_lock; -extern struct list_head *pcpu_chunk_lists; +extern struct list_head *pcpu_chunk_lists[PCPU_NR_CHUNK_TYPES]; extern int pcpu_nr_slots; extern int pcpu_sidelined_slot; extern int pcpu_to_depopulate_slot; @@ -113,6 +125,15 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) return pcpu_nr_pages_to_map_bits(chunk->nr_pages); } +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +{ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (chunk->is_asi_nonsensitive) + return PCPU_CHUNK_ASI_NONSENSITIVE; +#endif + return PCPU_CHUNK_ROOT; +} + #ifdef CONFIG_PERCPU_STATS #include diff --git a/mm/percpu-km.c b/mm/percpu-km.c index fe31aa19db81..01e31bd55860 100644 --- a/mm/percpu-km.c +++ b/mm/percpu-km.c @@ -50,7 +50,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, /* nada */ } -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT; struct pcpu_chunk *chunk; @@ -58,7 +59,7 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) unsigned long flags; int i; - chunk = pcpu_alloc_chunk(gfp); + chunk = pcpu_alloc_chunk(type, gfp); if (!chunk) return NULL; diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 5579a96ad782..59f3b55abdd1 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -357,7 +357,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, pcpu_free_pages(chunk, pages, page_start, page_end); } -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { struct pcpu_chunk *chunk; struct vm_struct **vms; @@ -368,7 +369,8 @@ static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp) gfp &= ~__GFP_GLOBAL_NONSENSITIVE; - chunk = pcpu_alloc_chunk(gfp); + chunk = pcpu_alloc_chunk(type, gfp); + if (!chunk) return NULL; diff --git a/mm/percpu.c b/mm/percpu.c index f5b2c2ea5a54..beaca5adf9d4 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -172,7 +172,7 @@ struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init; DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */ static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */ -struct list_head *pcpu_chunk_lists __ro_after_init; /* chunk list slots */ +struct list_head *pcpu_chunk_lists[PCPU_NR_CHUNK_TYPES] __ro_after_init; /* chunk list slots */ /* chunks which need their map areas extended, protected by pcpu_lock */ static LIST_HEAD(pcpu_map_extend_chunks); @@ -531,10 +531,12 @@ static void __pcpu_chunk_move(struct pcpu_chunk *chunk, int slot, bool move_front) { if (chunk != pcpu_reserved_chunk) { + struct list_head *pcpu_type_lists = + pcpu_chunk_lists[pcpu_chunk_type(chunk)]; if (move_front) - list_move(&chunk->list, &pcpu_chunk_lists[slot]); + list_move(&chunk->list, &pcpu_type_lists[slot]); else - list_move_tail(&chunk->list, &pcpu_chunk_lists[slot]); + list_move_tail(&chunk->list, &pcpu_type_lists[slot]); } } @@ -570,13 +572,16 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot) static void pcpu_isolate_chunk(struct pcpu_chunk *chunk) { + struct list_head *pcpu_type_lists = + pcpu_chunk_lists[pcpu_chunk_type(chunk)]; + lockdep_assert_held(&pcpu_lock); if (!chunk->isolated) { chunk->isolated = true; pcpu_nr_empty_pop_pages -= chunk->nr_empty_pop_pages; } - list_move(&chunk->list, &pcpu_chunk_lists[pcpu_to_depopulate_slot]); + list_move(&chunk->list, &pcpu_type_lists[pcpu_to_depopulate_slot]); } static void pcpu_reintegrate_chunk(struct pcpu_chunk *chunk) @@ -1438,7 +1443,8 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, return chunk; } -static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) +static struct pcpu_chunk *pcpu_alloc_chunk(enum pcpu_chunk_type type, + gfp_t gfp) { struct pcpu_chunk *chunk; int region_bits; @@ -1475,6 +1481,13 @@ static struct pcpu_chunk *pcpu_alloc_chunk(gfp_t gfp) goto objcg_fail; } #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* TODO: (oweisse) do asi_map for nonsensitive chunks */ + if (type == PCPU_CHUNK_ASI_NONSENSITIVE) + chunk->is_asi_nonsensitive = true; + else + chunk->is_asi_nonsensitive = false; +#endif pcpu_init_md_blocks(chunk); @@ -1580,7 +1593,8 @@ static void pcpu_depopulate_chunk(struct pcpu_chunk *chunk, int page_start, int page_end); static void pcpu_post_unmap_tlb_flush(struct pcpu_chunk *chunk, int page_start, int page_end); -static struct pcpu_chunk *pcpu_create_chunk(gfp_t gfp); +static struct pcpu_chunk *pcpu_create_chunk(enum pcpu_chunk_type type, + gfp_t gfp); static void pcpu_destroy_chunk(struct pcpu_chunk *chunk); static struct page *pcpu_addr_to_page(void *addr); static int __init pcpu_verify_alloc_info(const struct pcpu_alloc_info *ai); @@ -1733,6 +1747,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, unsigned long flags; void __percpu *ptr; size_t bits, bit_align; + enum pcpu_chunk_type type; + struct list_head *pcpu_type_lists; gfp = current_gfp_context(gfp); /* whitelisted flags that can be passed to the backing allocators */ @@ -1763,6 +1779,16 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, if (unlikely(!pcpu_memcg_pre_alloc_hook(size, gfp, &objcg))) return NULL; + type = PCPU_CHUNK_ROOT; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (static_asi_enabled() && (gfp & __GFP_GLOBAL_NONSENSITIVE)) { + type = PCPU_CHUNK_ASI_NONSENSITIVE; + pcpu_gfp |= __GFP_GLOBAL_NONSENSITIVE; + } +#endif + pcpu_type_lists = pcpu_chunk_lists[type]; + BUG_ON(!pcpu_type_lists); + if (!is_atomic) { /* * pcpu_balance_workfn() allocates memory under this mutex, @@ -1800,7 +1826,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, restart: /* search through normal chunks */ for (slot = pcpu_size_to_slot(size); slot <= pcpu_free_slot; slot++) { - list_for_each_entry_safe(chunk, next, &pcpu_chunk_lists[slot], + list_for_each_entry_safe(chunk, next, &pcpu_type_lists[slot], list) { off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); @@ -1830,8 +1856,8 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, goto fail; } - if (list_empty(&pcpu_chunk_lists[pcpu_free_slot])) { - chunk = pcpu_create_chunk(pcpu_gfp); + if (list_empty(&pcpu_type_lists[pcpu_free_slot])) { + chunk = pcpu_create_chunk(type, pcpu_gfp); if (!chunk) { err = "failed to allocate new chunk"; goto fail; @@ -1983,12 +2009,19 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) * CONTEXT: * pcpu_lock (can be dropped temporarily) */ -static void pcpu_balance_free(bool empty_only) + +static void __pcpu_balance_free(bool empty_only, + enum pcpu_chunk_type type) { LIST_HEAD(to_free); - struct list_head *free_head = &pcpu_chunk_lists[pcpu_free_slot]; + struct list_head *pcpu_type_lists = pcpu_chunk_lists[type]; + struct list_head *free_head; struct pcpu_chunk *chunk, *next; + if (!pcpu_type_lists) + return; + free_head = &pcpu_type_lists[pcpu_free_slot]; + lockdep_assert_held(&pcpu_lock); /* @@ -2026,6 +2059,14 @@ static void pcpu_balance_free(bool empty_only) spin_lock_irq(&pcpu_lock); } +static void pcpu_balance_free(bool empty_only) +{ + enum pcpu_chunk_type type; + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { + __pcpu_balance_free(empty_only, type); + } +} + /** * pcpu_balance_populated - manage the amount of populated pages * @@ -2038,12 +2079,21 @@ static void pcpu_balance_free(bool empty_only) * CONTEXT: * pcpu_lock (can be dropped temporarily) */ -static void pcpu_balance_populated(void) +static void __pcpu_balance_populated(enum pcpu_chunk_type type) { /* gfp flags passed to underlying allocators */ - const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN; + const gfp_t gfp = GFP_KERNEL | __GFP_NORETRY | __GFP_NOWARN +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + | (type == PCPU_CHUNK_ASI_NONSENSITIVE ? + __GFP_GLOBAL_NONSENSITIVE : 0) +#endif + ; struct pcpu_chunk *chunk; int slot, nr_to_pop, ret; + struct list_head *pcpu_type_lists = pcpu_chunk_lists[type]; + + if (!pcpu_type_lists) + return; lockdep_assert_held(&pcpu_lock); @@ -2074,7 +2124,7 @@ static void pcpu_balance_populated(void) if (!nr_to_pop) break; - list_for_each_entry(chunk, &pcpu_chunk_lists[slot], list) { + list_for_each_entry(chunk, &pcpu_type_lists[slot], list) { nr_unpop = chunk->nr_pages - chunk->nr_populated; if (nr_unpop) break; @@ -2107,7 +2157,7 @@ static void pcpu_balance_populated(void) if (nr_to_pop) { /* ran out of chunks to populate, create a new one and retry */ spin_unlock_irq(&pcpu_lock); - chunk = pcpu_create_chunk(gfp); + chunk = pcpu_create_chunk(type, gfp); cond_resched(); spin_lock_irq(&pcpu_lock); if (chunk) { @@ -2117,6 +2167,14 @@ static void pcpu_balance_populated(void) } } +static void pcpu_balance_populated() +{ + enum pcpu_chunk_type type; + + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) + __pcpu_balance_populated(type); +} + /** * pcpu_reclaim_populated - scan over to_depopulate chunks and free empty pages * @@ -2132,13 +2190,19 @@ static void pcpu_balance_populated(void) * pcpu_lock (can be dropped temporarily) * */ -static void pcpu_reclaim_populated(void) + + +static void __pcpu_reclaim_populated(enum pcpu_chunk_type type) { struct pcpu_chunk *chunk; struct pcpu_block_md *block; int freed_page_start, freed_page_end; int i, end; bool reintegrate; + struct list_head *pcpu_type_lists = pcpu_chunk_lists[type]; + + if (!pcpu_type_lists) + return; lockdep_assert_held(&pcpu_lock); @@ -2148,8 +2212,8 @@ static void pcpu_reclaim_populated(void) * other accessor is the free path which only returns area back to the * allocator not touching the populated bitmap. */ - while (!list_empty(&pcpu_chunk_lists[pcpu_to_depopulate_slot])) { - chunk = list_first_entry(&pcpu_chunk_lists[pcpu_to_depopulate_slot], + while (!list_empty(&pcpu_type_lists[pcpu_to_depopulate_slot])) { + chunk = list_first_entry(&pcpu_type_lists[pcpu_to_depopulate_slot], struct pcpu_chunk, list); WARN_ON(chunk->immutable); @@ -2219,10 +2283,18 @@ static void pcpu_reclaim_populated(void) pcpu_reintegrate_chunk(chunk); else list_move_tail(&chunk->list, - &pcpu_chunk_lists[pcpu_sidelined_slot]); + &pcpu_type_lists[pcpu_sidelined_slot]); } } +static void pcpu_reclaim_populated(void) +{ + enum pcpu_chunk_type type; + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { + __pcpu_reclaim_populated(type); + } +} + /** * pcpu_balance_workfn - manage the amount of free chunks and populated pages * @work: unused @@ -2268,6 +2340,7 @@ void free_percpu(void __percpu *ptr) unsigned long flags; int size, off; bool need_balance = false; + struct list_head *pcpu_type_lists = NULL; if (!ptr) return; @@ -2280,6 +2353,8 @@ void free_percpu(void __percpu *ptr) chunk = pcpu_chunk_addr_search(addr); off = addr - chunk->base_addr; + pcpu_type_lists = pcpu_chunk_lists[pcpu_chunk_type(chunk)]; + BUG_ON(!pcpu_type_lists); size = pcpu_free_area(chunk, off); @@ -2293,7 +2368,7 @@ void free_percpu(void __percpu *ptr) if (!chunk->isolated && chunk->free_bytes == pcpu_unit_size) { struct pcpu_chunk *pos; - list_for_each_entry(pos, &pcpu_chunk_lists[pcpu_free_slot], list) + list_for_each_entry(pos, &pcpu_type_lists[pcpu_free_slot], list) if (pos != chunk) { need_balance = true; break; @@ -2601,6 +2676,7 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, int map_size; unsigned long tmp_addr; size_t alloc_size; + enum pcpu_chunk_type type; #define PCPU_SETUP_BUG_ON(cond) do { \ if (unlikely(cond)) { \ @@ -2723,15 +2799,24 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, pcpu_free_slot = pcpu_sidelined_slot + 1; pcpu_to_depopulate_slot = pcpu_free_slot + 1; pcpu_nr_slots = pcpu_to_depopulate_slot + 1; - pcpu_chunk_lists = memblock_alloc(pcpu_nr_slots * - sizeof(pcpu_chunk_lists[0]), + for (type = 0; type < PCPU_NR_CHUNK_TYPES; type++) { +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (type == PCPU_CHUNK_ASI_NONSENSITIVE && + !static_asi_enabled()) { + pcpu_chunk_lists[type] = NULL; + continue; + } +#endif + pcpu_chunk_lists[type] = memblock_alloc(pcpu_nr_slots * + sizeof(pcpu_chunk_lists[0][0]), SMP_CACHE_BYTES); - if (!pcpu_chunk_lists) - panic("%s: Failed to allocate %zu bytes\n", __func__, - pcpu_nr_slots * sizeof(pcpu_chunk_lists[0])); + if (!pcpu_chunk_lists[type]) + panic("%s: Failed to allocate %zu bytes\n", __func__, + pcpu_nr_slots * sizeof(pcpu_chunk_lists[0][0])); - for (i = 0; i < pcpu_nr_slots; i++) - INIT_LIST_HEAD(&pcpu_chunk_lists[i]); + for (i = 0; i < pcpu_nr_slots; i++) + INIT_LIST_HEAD(&pcpu_chunk_lists[type][i]); + } /* * The end of the static region needs to be aligned with the From patchwork Wed Feb 23 05:22:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 70216C433F5 for ; Wed, 23 Feb 2022 05:25:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C41228D000A; Wed, 23 Feb 2022 00:25:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B51EF8D0001; Wed, 23 Feb 2022 00:25:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9F5358D000A; Wed, 23 Feb 2022 00:25:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0051.hostedemail.com [216.40.44.51]) by kanga.kvack.org (Postfix) with ESMTP id 9139B8D0001 for ; Wed, 23 Feb 2022 00:25:09 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 4F084181CA34A for ; Wed, 23 Feb 2022 05:25:09 +0000 (UTC) X-FDA: 79172905938.31.F4823F4 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf16.hostedemail.com (Postfix) with ESMTP id CF1BE180004 for ; Wed, 23 Feb 2022 05:25:08 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id i6-20020a255406000000b006241e97e420so18865132ybb.5 for ; Tue, 22 Feb 2022 21:25:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=MBcUE/GjgKAOqkE7k3I16nNGFc2xdrG3rcZqdKaqU1M=; b=huyqkiXgbsdUUaeO2YcSpDguTaIsD2ZiHBttX5Znp2Wu7SotB1quckmXNNZhq3hhVZ CvfghBA5TRiyV7nNwREBXze4Ut4RZl9/dXE69gre6OrHccOhGsM1R39ve/3Fg95dFOoY BvDeP6788/BG9zG4uemrkiHvTMwkxVsftUJqZuGB4ivr+/aWtu4qk7dIqmSXEwo1GhxB UQrfqBGsjla2ZXhbWoUWiWOKjoE4zvV4OfIuCxJHeuNkl1zKwODoNYWYiaOMSwj9wOKz BS0K9yywZ4zV80Is1LL2lLof7xLiZKUcwJTWCzQDyX5SOghoozr2KNat8Mt1uT82lph4 NHIQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=MBcUE/GjgKAOqkE7k3I16nNGFc2xdrG3rcZqdKaqU1M=; b=wtrYg4hlCKcKhN0Ke0JznZznj+T52B0bljoFzYqSHOr1jNoo3oHdII96a/0Q2T063S jwH8VdzV4I7VUhLMNSlnRR3FcRf2pib8ee9apxDTk81wRbyqzEV73K/vaxtNwK5GvnlE bMJEhdSULeKLMduaYLy+SSdHwomcrYysZ2N1+NKRBUfcF8rUJNJT3sXQrjpcRdVDNz/e 4Jx1SOB8qdWFK8ONwh1sv4y0AnI7yEdl7DWDGWy61AqgwXYzB2T1eOTaQRC27vTv9AEt zIsGyjgj+Pita+qoWkka9TTA5U70akb42jeih+gUrlNv4JN8IfwZ+hJdvqyrWOx/84Oz /gKg== X-Gm-Message-State: AOAM533CO8ggGU2ernavM0rJMy6BKCJvGdJFD7Y6Jg6SnFg6xNgxlw5D ODZEfnCpQ8GanVl5y2Gg5muucY90ubUG X-Google-Smtp-Source: ABdhPJzeh7uokPy0ce1B9Y+L+/uH+xpWq2sOY5OlnwbNrO2jfXStVDw+VQrszqUGfmnR9pf7y3TYhwx7yyIZ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:7d56:0:b0:2d6:90d9:770c with SMTP id y83-20020a817d56000000b002d690d9770cmr26589608ywc.277.1645593908116; Tue, 22 Feb 2022 21:25:08 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:13 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-38-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 37/47] mm: asi: ASI annotation support for static variables. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: CF1BE180004 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=huyqkiXg; spf=pass (imf16.hostedemail.com: domain of 3NMUVYgcKCC0SdWJRMbPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3NMUVYgcKCC0SdWJRMbPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 8rqs96xweyx1g3psxydmt5kxprw3djfz X-HE-Tag: 1645593908-291455 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Added the following annotations: __asi_not_sensitive: for static variables which are considered not sensitive. __asi_not_sensitive_readmostly: similar to __read_mostly, for non-sensitive static variables. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 12 ++++++++++++ include/asm-generic/asi.h | 6 ++++++ include/asm-generic/vmlinux.lds.h | 18 +++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index bdb2f70d4f85..6dd9c7c8a2b8 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -177,6 +177,18 @@ static inline pgd_t *asi_pgd(struct asi *asi) return asi->pgd; } +/* IMPORTANT: Any modification to the name here should also be applied to + * include/asm-generic/vmlinux.lds.h */ +#define ASI_NON_SENSITIVE_SECTION_NAME ".data..asi_non_sensitive" +#define ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME \ + ".data..asi_non_sensitive_readmostly" + +#define __asi_not_sensitive \ + __section(ASI_NON_SENSITIVE_SECTION_NAME) + +#define __asi_not_sensitive_readmostly \ + __section(ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME) + #else /* CONFIG_ADDRESS_SPACE_ISOLATION */ static inline void asi_intr_enter(void) { } diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index fffb323d2a00..d9082267a5dd 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -121,6 +121,12 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } #define static_asi_enabled() false +/* IMPORTANT: Any modification to the name here should also be applied to + * include/asm-generic/vmlinux.lds.h */ + +#define __asi_not_sensitive +#define __asi_not_sensitive_readmostly + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 42f3866bca69..c769d939c15f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -374,10 +374,26 @@ . = ALIGN(PAGE_SIZE); \ __nosave_end = .; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ASI_NOT_SENSITIVE_DATA(page_align) \ + . = ALIGN(page_align); \ + __start_asi_nonsensitive = .; \ + *(.data..asi_non_sensitive) \ + . = ALIGN(page_align); \ + __end_asi_nonsensitive = .; \ + __start_asi_nonsensitive_readmostly = .; \ + *(.data..asi_non_sensitive_readmostly) \ + . = ALIGN(page_align); \ + __end_asi_nonsensitive_readmostly = .; +#else +#define ASI_NOT_SENSITIVE_DATA +#endif + #define PAGE_ALIGNED_DATA(page_align) \ . = ALIGN(page_align); \ *(.data..page_aligned) \ - . = ALIGN(page_align); + . = ALIGN(page_align); \ + ASI_NOT_SENSITIVE_DATA(page_align) #define READ_MOSTLY_DATA(align) \ . = ALIGN(align); \ From patchwork Wed Feb 23 05:22:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09281C433F5 for ; Wed, 23 Feb 2022 05:25:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E60648D000C; Wed, 23 Feb 2022 00:25:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E105F8D0001; Wed, 23 Feb 2022 00:25:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C63848D000C; Wed, 23 Feb 2022 00:25:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0145.hostedemail.com [216.40.44.145]) by kanga.kvack.org (Postfix) with ESMTP id B89A48D0001 for ; Wed, 23 Feb 2022 00:25:11 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6740F181CAC7C for ; Wed, 23 Feb 2022 05:25:11 +0000 (UTC) X-FDA: 79172906022.20.74A269E Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf08.hostedemail.com (Postfix) with ESMTP id DDE63160005 for ; Wed, 23 Feb 2022 05:25:10 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id x1-20020a25a001000000b0061c64ee0196so26708917ybh.9 for ; Tue, 22 Feb 2022 21:25:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=KXrCYGgvo83JEgYfSU2xPmsuGvCpoOVuzDhVbsTzD0I=; b=peKSlY/Xr61DvdFpVmXc37FMKYnl+vkSCRtwe1UF1UGbHZ0v3cMEdZbiKKvXTZ5GbO laOcMzhj8sBkgNnt1mc4a0NZR0hCQ3gPAdN/2qIaNw43JOyvktEB0Y0rn3QcamgO9gZ8 7DbNXkvTjq6TGSgTmAj628OIdjZHlZYdFsABsewAF21y3xPSWYsHQAWzziaNW39gFRL5 yWxZv0skPs5ax5wbHVY8sW6jeIPlZLUx/HQpjPftWbdPu83nt4wFjKis+ce33NEw7HrY 03O9DgYBb/75g6K+v8UjyCDA3xzr7E/F2srmJjBiOtEMdOtKjPJnG/mlJ872cZ/zgYG/ 2wwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=KXrCYGgvo83JEgYfSU2xPmsuGvCpoOVuzDhVbsTzD0I=; b=hZbc1kZZqrJKWeaOYWlvJ9RceXvLBgKvLN2Ycpi4ufYyBDrzzeTORg3EN1nBj6hZjz cpoMMmgjnxjOY8miF3oiWrFr55wwVg3Ip36oJIhfM80ILBBGvk0P4M/wJNqjmY6ZTABl UC3n0bYFLiIIQ7QkNQvtDLPX1UQ6AVjxP5YwHrbK9mHA1HNJz4tN8iipS+l1ilLfnn9F 3BvmgQkHrbCfB3BokgBmewAGlNLouLtkqC9EuSntRjKHnN3YyKGiaO6dEAhQ+2z+Xp4B hCgK0o89oXHZhI1R2gk2whASx95oSQXt1d1EpGbprPRvIaDUIqlLOto5dykJt7zFzuoU bEIw== X-Gm-Message-State: AOAM531whLxCPpxnPZE8hNBmjP2N87Zjw/s2lkWqZGHXRwnxb2HOZdWR PNEPvvdkQK1CB8eI93IcoHVJwJ9QfNqL X-Google-Smtp-Source: ABdhPJyZfLfl11T/5CrHmveifdlj9QKZBHoh8PI3AMF6sP4DdsodfKmQ8jHyUSfS7xsGTotzIsjTTydYzCuX X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:c607:0:b0:2cb:a34a:355c with SMTP id l7-20020a81c607000000b002cba34a355cmr27125747ywi.487.1645593910227; Tue, 22 Feb 2022 21:25:10 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:14 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-39-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 38/47] mm: asi: ASI annotation support for dynamic modules. From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: DDE63160005 X-Rspam-User: Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="peKSlY/X"; spf=pass (imf08.hostedemail.com: domain of 3NsUVYgcKCC8UfYLTOdRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3NsUVYgcKCC8UfYLTOdRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: s6a4iggxxua8t11u7w9p6yfd3fzdbjmg X-HE-Tag: 1645593910-809773 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Adding support for use of ASI static variable annotations in dynamic modules: - __asi_not_sensitive and - __asi_not_sensitive_readmostly Per module, we now have the following offsets: 1. asi_section_offset/size - which should be mapped into asi global pool 2. asi_readmostly_section/size - same as above, for read mostly data; 3. once_section_offset/size - is considered asi non-sensitive Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 3 ++ arch/x86/mm/asi.c | 66 ++++++++++++++++++++++++++++++++++++++ include/asm-generic/asi.h | 3 ++ include/linux/module.h | 9 ++++++ kernel/module.c | 58 +++++++++++++++++++++++++++++++++ 5 files changed, 139 insertions(+) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 6dd9c7c8a2b8..d43f6aadffee 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -98,6 +98,9 @@ static inline void asi_init_thread_state(struct thread_struct *thread) thread->intr_nest_depth = 0; } +int asi_load_module(struct module* module); +void asi_unload_module(struct module* module); + static inline void asi_set_target_unrestricted(void) { if (static_cpu_has(X86_FEATURE_ASI)) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 9b1bd005f343..6c14aa1fc4aa 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -5,6 +5,7 @@ #include #include #include +#include #include #include @@ -308,6 +309,71 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); +/* asi_load_module() is called from layout_and_allocate() in kernel/module.c + * We map the module and its data in init_mm.asi_pgd[0]. +*/ +int asi_load_module(struct module* module) +{ + int err = 0; + + /* Map the cod/text */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base, + module->core_layout.ro_after_init_size ); + if (err) + return err; + + /* Map global variables annotated as non-sensitive for ASI */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_section_offset, + module->core_layout.asi_section_size ); + if (err) + return err; + + /* Map global variables annotated as non-sensitive for ASI */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_readmostly_section_offset, + module->core_layout.asi_readmostly_section_size); + if (err) + return err; + + /* Map .data.once section as well */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.once_section_offset, + module->core_layout.once_section_size ); + if (err) + return err; + + return 0; +} +EXPORT_SYMBOL_GPL(asi_load_module); + +void asi_unload_module(struct module* module) +{ + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base, + module->core_layout.ro_after_init_size, true); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_section_offset, + module->core_layout.asi_section_size, true); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.asi_readmostly_section_offset, + module->core_layout.asi_readmostly_section_size, true); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, + module->core_layout.base + + module->core_layout.once_section_offset, + module->core_layout.once_section_size, true); + +} + static int __init asi_global_init(void) { uint i, n; diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index d9082267a5dd..2763cb1a974c 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -120,6 +120,7 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } #define static_asi_enabled() false +static inline int asi_load_module(struct module* module) {return 0;} /* IMPORTANT: Any modification to the name here should also be applied to * include/asm-generic/vmlinux.lds.h */ @@ -127,6 +128,8 @@ void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } #define __asi_not_sensitive #define __asi_not_sensitive_readmostly +static inline void asi_unload_module(struct module* module) { } + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/linux/module.h b/include/linux/module.h index c9f1200b2312..82267a95f936 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -336,6 +336,15 @@ struct module_layout { #ifdef CONFIG_MODULES_TREE_LOOKUP struct mod_tree_node mtn; #endif + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + unsigned int asi_section_offset; + unsigned int asi_section_size; + unsigned int asi_readmostly_section_offset; + unsigned int asi_readmostly_section_size; + unsigned int once_section_offset; + unsigned int once_section_size; +#endif }; #ifdef CONFIG_MODULES_TREE_LOOKUP diff --git a/kernel/module.c b/kernel/module.c index 84a9141a5e15..d363b8a0ee24 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -2159,6 +2159,8 @@ static void free_module(struct module *mod) { trace_module_free(mod); + asi_unload_module(mod); + mod_sysfs_teardown(mod); /* @@ -2416,6 +2418,31 @@ static bool module_init_layout_section(const char *sname) return module_init_section(sname); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void asi_record_sections_layout(struct module *mod, + const char *sname, + Elf_Shdr *s) +{ + if (strstarts(sname, ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME)) { + mod->core_layout.asi_readmostly_section_offset = s->sh_entsize; + mod->core_layout.asi_readmostly_section_size = s->sh_size; + } + else if (strstarts(sname, ASI_NON_SENSITIVE_SECTION_NAME)) { + mod->core_layout.asi_section_offset = s->sh_entsize; + mod->core_layout.asi_section_size = s->sh_size; + } + if (strstarts(sname, ".data.once")) { + mod->core_layout.once_section_offset = s->sh_entsize; + mod->core_layout.once_section_size = s->sh_size; + } +} +#else +static void asi_record_sections_layout(struct module *mod, + const char *sname, + Elf_Shdr *s) +{} +#endif + /* * Lay out the SHF_ALLOC sections in a way not dissimilar to how ld * might -- code, read-only data, read-write data, small data. Tally @@ -2453,6 +2480,7 @@ static void layout_sections(struct module *mod, struct load_info *info) || module_init_layout_section(sname)) continue; s->sh_entsize = get_offset(mod, &mod->core_layout.size, s, i); + asi_record_sections_layout(mod, sname, s); pr_debug("\t%s\n", sname); } switch (m) { @@ -3558,6 +3586,25 @@ static bool blacklisted(const char *module_name) } core_param(module_blacklist, module_blacklist, charp, 0400); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void asi_fix_section_size_and_alignment(struct load_info *info, + char *section_to_fix) +{ + unsigned int ndx = find_sec(info, section_to_fix ); + if (!ndx) + return; + + info->sechdrs[ndx].sh_addralign = PAGE_SIZE; + info->sechdrs[ndx].sh_size = + ALIGN( info->sechdrs[ndx].sh_size, PAGE_SIZE ); +} +#else +static inline void asi_fix_section_size_and_alignment(struct load_info *info, + char *section_to_fix) +{} +#endif + + static struct module *layout_and_allocate(struct load_info *info, int flags) { struct module *mod; @@ -3600,6 +3647,15 @@ static struct module *layout_and_allocate(struct load_info *info, int flags) if (ndx) info->sechdrs[ndx].sh_flags |= SHF_RO_AFTER_INIT; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* These are sections we will want to map into an ASI page-table. We + * therefore need these sections to be aligned to a PAGE_SIZE */ + asi_fix_section_size_and_alignment(info, ASI_NON_SENSITIVE_SECTION_NAME); + asi_fix_section_size_and_alignment(info, + ASI_NON_SENSITIVE_READ_MOSTLY_SECTION_NAME); + asi_fix_section_size_and_alignment(info, ".data.once"); +#endif + /* * Determine total sizes, and put offsets in sh_entsize. For now * this is done generically; there doesn't appear to be any @@ -4127,6 +4183,8 @@ static int load_module(struct load_info *info, const char __user *uargs, /* Get rid of temporary copy. */ free_copy(info); + asi_load_module(mod); + /* Done! */ trace_module_load(mod); From patchwork Wed Feb 23 05:22:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1DEFC433F5 for ; Wed, 23 Feb 2022 05:25:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E129C8D000F; Wed, 23 Feb 2022 00:25:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DC3258D0001; Wed, 23 Feb 2022 00:25:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C63A18D000F; Wed, 23 Feb 2022 00:25:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0144.hostedemail.com [216.40.44.144]) by kanga.kvack.org (Postfix) with ESMTP id B68B98D0001 for ; Wed, 23 Feb 2022 00:25:13 -0500 (EST) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 7AD04181CA34A for ; Wed, 23 Feb 2022 05:25:13 +0000 (UTC) X-FDA: 79172906106.16.E4C9F6A Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf09.hostedemail.com (Postfix) with ESMTP id 0FD64140002 for ; Wed, 23 Feb 2022 05:25:12 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d07ae11467so163393527b3.12 for ; Tue, 22 Feb 2022 21:25:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Kqez4sz2gXA92s/yKdI0Sadgd/VKyd58nC75f61sH3c=; b=kY6HPVZ5+V/4ADSj5gTm7IQPleGSUbrDBXkArXIMxRp8wv39Fiygf+bF+YqLodOTpp AZsLSr13DzroNgqyc8IDfXzcQUL9DyFnAH4zuxHj+73S9iVBPAI2bhK64r7+IZ9iPw2j l6U45p4hnLVXIe3HFpHe4qJp9mEpOsCgXDx6xEMFZlJGN4bxSQMLkshnkvHzcWhTYszS 3RalZYOgTpQ3xR1RbA6oUKLNH9FzKB/Oq6geEXx6xglDHFlrVYGZZaZlqsNdHMvuOFaS qOuzTGhwZjX65TLhLW2uhuxlSD1GB251cNcp9nHh4Mb+0b/OHXDC3ll2brbPttKVPhId WuIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Kqez4sz2gXA92s/yKdI0Sadgd/VKyd58nC75f61sH3c=; b=rxy/23w1RaJ1Og8OPhJvYa8pzfAUrNvPSnZRxlpXdUAUiUZ9w/FX+Og3oeX60u9Las J2/142yw83ymk7znlfese1uzHFLSYPi/Z8E04AFUeI58dnCTPpa/zgy/5mukd0VhGe48 JRzfC74wnZ0tEM2OmZEl0n5BdOxDqEM7ml7Myi4/DkUul8bIv+ExYxi0N7yrRF6yWQBX yqbjyOoiQLBhirBg9JvWgZ5AnnP1hQGhCt0uBs7iYHvtngEt5mUozPoCLs5eEu1XWLWd 11Bt+odL2EucE+91VMnt+RsOijymjB6jkvf42Rt4XXuMpMubNeHMOPscu4gzzBowoEz0 OPtw== X-Gm-Message-State: AOAM533R76bVzaKzhGoadQeW9N4lz5qnnHs7OMfFBYGd4L1cBwe8aHyy COU7ua02VcsHzcXX15fvvJwGMgndRqO5 X-Google-Smtp-Source: ABdhPJwRpDGu9uCaO7SUVy/qucopgnc8bRss4/HLiRU4Az2YJmA6Oz65Xuo4RY/0idM9yle+QHSTEPNKFl9M X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:34c9:0:b0:623:fc5f:b98 with SMTP id b192-20020a2534c9000000b00623fc5f0b98mr27190113yba.195.1645593912355; Tue, 22 Feb 2022 21:25:12 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:15 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-40-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 39/47] mm: asi: Skip conventional L1TF/MDS mitigations From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kY6HPVZ5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3OMUVYgcKCDEWhaNVQfTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3OMUVYgcKCDEWhaNVQfTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0FD64140002 X-Stat-Signature: sp7q68onb45hnfbh4kaqabt41optha5f X-HE-Tag: 1645593912-397160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse If ASI is enabled for an mm, then the L1D flushes and MDS mitigations will be taken care of ASI. We check if asi is enabled by checking current->mm->asi_enabled. To use ASI, a cgroup flag must be set before the VM process is forked - causing a flag mm->asi_enabled to be set. Signed-off-by: Ofir Weisse --- arch/x86/kvm/vmx/vmx.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e0178b57be75..6549fef39f2b 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6609,7 +6609,11 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, kvm_guest_enter_irqoff(); - vmx_flush_sensitive_cpu_state(vcpu); + /* If Address Space Isolation is enabled, it will take care of L1D + * flushes, and will also mitigate MDS. In other words, if no ASI - + * flush sensitive cpu state. */ + if (!static_asi_enabled() || !mm_asi_enabled(current->mm)) + vmx_flush_sensitive_cpu_state(vcpu); asi_enter(vcpu->kvm->asi); From patchwork Wed Feb 23 05:22:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 809FFC4332F for ; Wed, 23 Feb 2022 05:25:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5FFDA8D0011; Wed, 23 Feb 2022 00:25:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5ABFE8D0001; Wed, 23 Feb 2022 00:25:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 426BC8D0011; Wed, 23 Feb 2022 00:25:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0055.hostedemail.com [216.40.44.55]) by kanga.kvack.org (Postfix) with ESMTP id 251138D0001 for ; Wed, 23 Feb 2022 00:25:16 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id DD2529D677 for ; Wed, 23 Feb 2022 05:25:15 +0000 (UTC) X-FDA: 79172906190.21.502B0FE Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 597C3180002 for ; Wed, 23 Feb 2022 05:25:15 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d07ae11462so163405647b3.8 for ; Tue, 22 Feb 2022 21:25:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=AJ9oF4MxpZqgNWeQmqV576xo41knbB79umnOaNmXddY=; b=gDjBd/x5dYNgFpVj+6AXYinaFyQ0y0nY9xCgZVbh0gj9MDv8hxw/demmgSf5+q0lIl f5u9TMyC094RJt+uGxErIIL6jEyzHxyof4Xb7WeyC6uYfiR8WIq5qp+TlkXog/JDvZk7 2mtgyW1bRYWB4o+N58ElEO1CyDNRbfSatixMLbQR1FnYuk0Gfj5u+Ct8tTOprcSTTI3D O9Kk/stLoRds2nPZbzZ/5aJHis4MCrOVxHj7FYqsiO1KWp6tTJxhsj5RQHjcB5DUXElF yIJbJ9KqvnvVAuRlFRGpGXt2ubrzQb0wSafap3WMiTiEiFUzbC7xRedGoDqwKzVo3u9S eXvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=AJ9oF4MxpZqgNWeQmqV576xo41knbB79umnOaNmXddY=; b=FcU691Qvmo+vVsFJjyx9nFy2Kyf5ISTK4lhVC4wRwhHWzc6EkaN8ogJj54K+YF9D2d A5JUCLV4rpnoMIWrlK6foZJDt8srTQeN8RwImVuFgkzG9hk/jX4dFZyaLxQZ+7yDu9Cw PL+EHpfl8xrq6xlvU3E86etqooJzlVh9JujOYCfk/x71VanMHEzaPXbR/jhqvx9FUA6z bezzK13dEnM7nxOoRvCmAtjmQtygrieQD4JhQzoDon0oPvz0bnh8eXafqItC65Qneq3w uAPI5Rpttg/g3mpk3/clycOXFoH56Qiq0UBDtkoGhWbrS/yCuhKLwqe03fX1sO5PWyXQ 1rzg== X-Gm-Message-State: AOAM533NdTR4ppEGXCYcv54ZfvaHt5ANXtsj6tCDPu2l+q89iIsq/uuP Rj38EhBsg0I3HRhKlP/E95lRlzeuFZlu X-Google-Smtp-Source: ABdhPJxXcU94ggDYk55LTym0qr5/ST37jyZ7dcNnih2o73ZZ8KtzSOHKd++I3ohJEg/v0wMLNTTLtdFeGJBD X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:3dcb:0:b0:2ca:72dd:904c with SMTP id k194-20020a813dcb000000b002ca72dd904cmr28454275ywa.290.1645593914646; Tue, 22 Feb 2022 21:25:14 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:16 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-41-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 40/47] mm: asi: support for static percpu DEFINE_PER_CPU*_ASI From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="gDjBd/x5"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of 3OsUVYgcKCDMYjcPXShVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3OsUVYgcKCDMYjcPXShVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--junaids.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 597C3180002 X-Stat-Signature: oya4w5qd74mwcjezgsxbgnwphdwj5x69 X-HE-Tag: 1645593915-690627 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse Implemented the following PERCPU static declarations: - DECLARE/DEFINE_PER_CPU_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE - DECLARE/DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE These definitions are also supported in dynamic modules. To support percpu variables in dynamic modules, we're creating an ASI pcpu reserved chunk. The reserved size PERCPU_MODULE_RESERVE is now split between the normal reserved chunk and the ASI one. Signed-off-by: Ofir Weisse --- arch/x86/mm/asi.c | 39 +++++++- include/asm-generic/percpu.h | 6 ++ include/asm-generic/vmlinux.lds.h | 5 + include/linux/module.h | 6 ++ include/linux/percpu-defs.h | 39 ++++++++ include/linux/percpu.h | 8 +- kernel/module-internal.h | 1 + kernel/module.c | 154 ++++++++++++++++++++++++++---- mm/percpu.c | 134 ++++++++++++++++++++++---- 9 files changed, 356 insertions(+), 36 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6c14aa1fc4aa..ba373b461855 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -309,6 +309,32 @@ static int __init set_asi_param(char *str) } early_param("asi", set_asi_param); +static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu, err; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr = per_cpu_ptr(percpu_addr, cpu); + err = asi_map(asi, ptr, len); + if (err) + return err; + } + + return 0; +} + +static void asi_unmap_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr = per_cpu_ptr(percpu_addr, cpu); + asi_unmap(asi, ptr, len, true); + } +} + /* asi_load_module() is called from layout_and_allocate() in kernel/module.c * We map the module and its data in init_mm.asi_pgd[0]. */ @@ -347,7 +373,13 @@ int asi_load_module(struct module* module) if (err) return err; - return 0; + err = asi_map_percpu(ASI_GLOBAL_NONSENSITIVE, + module->percpu_asi, + module->percpu_asi_size ); + if (err) + return err; + + return 0; } EXPORT_SYMBOL_GPL(asi_load_module); @@ -372,6 +404,9 @@ void asi_unload_module(struct module* module) module->core_layout.once_section_offset, module->core_layout.once_section_size, true); + asi_unmap_percpu(ASI_GLOBAL_NONSENSITIVE, module->percpu_asi, + module->percpu_asi_size); + } static int __init asi_global_init(void) @@ -399,6 +434,8 @@ static int __init asi_global_init(void) static_branch_enable(&asi_local_map_initialized); + pcpu_map_asi_reserved_chunk(); + return 0; } subsys_initcall(asi_global_init) diff --git a/include/asm-generic/percpu.h b/include/asm-generic/percpu.h index 6432a7fade91..40001b74114f 100644 --- a/include/asm-generic/percpu.h +++ b/include/asm-generic/percpu.h @@ -50,6 +50,12 @@ extern void setup_per_cpu_areas(void); #endif /* SMP */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __init pcpu_map_asi_reserved_chunk(void); +#else +static inline void pcpu_map_asi_reserved_chunk(void) {} +#endif + #ifndef PER_CPU_BASE_SECTION #ifdef CONFIG_SMP #define PER_CPU_BASE_SECTION ".data..percpu" diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index c769d939c15f..0a931aedc285 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -1080,6 +1080,11 @@ . = ALIGN(cacheline); \ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ + . = ALIGN(PAGE_SIZE); \ + __per_cpu_asi_start = .; \ + *(.data..percpu..asi_non_sensitive) \ + . = ALIGN(PAGE_SIZE); \ + __per_cpu_asi_end = .; \ PERCPU_DECRYPTED_SECTION \ __per_cpu_end = .; diff --git a/include/linux/module.h b/include/linux/module.h index 82267a95f936..d4d020bae171 100644 --- a/include/linux/module.h +++ b/include/linux/module.h @@ -463,6 +463,12 @@ struct module { /* Per-cpu data. */ void __percpu *percpu; unsigned int percpu_size; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Per-cpu data for ASI */ + void __percpu *percpu_asi; + unsigned int percpu_asi_size; +#endif /* CONFIG_ADDRESS_SPACE_ISOLATION */ + #endif void *noinstr_text_start; unsigned int noinstr_text_size; diff --git a/include/linux/percpu-defs.h b/include/linux/percpu-defs.h index af1071535de8..5d9fdc93e0fa 100644 --- a/include/linux/percpu-defs.h +++ b/include/linux/percpu-defs.h @@ -170,6 +170,45 @@ #define DEFINE_PER_CPU_READ_MOSTLY(type, name) \ DEFINE_PER_CPU_SECTION(type, name, "..read_mostly") +/* + * Declaration/definition used for per-CPU variables which for the sake for + * address space isolation (ASI) are deemed not sensitive + */ +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +#define ASI_PERCPU_SECTION "..asi_non_sensitive" +#else +#define ASI_PERCPU_SECTION "" +#endif + +#define DECLARE_PER_CPU_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) + +#define DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned_in_smp + +#define DECLARE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned + +#define DECLARE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DECLARE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + __aligned(PAGE_SIZE) + +#define DEFINE_PER_CPU_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) + +#define DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned_in_smp + +#define DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + ____cacheline_aligned + +#define DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(type, name) \ + DEFINE_PER_CPU_SECTION(type, name, ASI_PERCPU_SECTION) \ + __aligned(PAGE_SIZE) /* * Declaration/definition used for per-CPU variables that should be accessed diff --git a/include/linux/percpu.h b/include/linux/percpu.h index ae4004e7957e..a2cc4c32cabd 100644 --- a/include/linux/percpu.h +++ b/include/linux/percpu.h @@ -13,7 +13,8 @@ /* enough to cover all DEFINE_PER_CPUs in modules */ #ifdef CONFIG_MODULES -#define PERCPU_MODULE_RESERVE (8 << 10) +/* #define PERCPU_MODULE_RESERVE (8 << 10) */ +#define PERCPU_MODULE_RESERVE (16 << 10) #else #define PERCPU_MODULE_RESERVE 0 #endif @@ -123,6 +124,11 @@ extern int __init pcpu_page_first_chunk(size_t reserved_size, #endif extern void __percpu *__alloc_reserved_percpu(size_t size, size_t align) __alloc_size(1); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +extern void __percpu *__alloc_reserved_percpu_asi(size_t size, size_t align); +#endif + extern bool __is_kernel_percpu_address(unsigned long addr, unsigned long *can_addr); extern bool is_kernel_percpu_address(unsigned long addr); diff --git a/kernel/module-internal.h b/kernel/module-internal.h index 33783abc377b..44c05ae06b2c 100644 --- a/kernel/module-internal.h +++ b/kernel/module-internal.h @@ -25,6 +25,7 @@ struct load_info { #endif struct { unsigned int sym, str, mod, vers, info, pcpu; + unsigned int pcpu_asi; } index; }; diff --git a/kernel/module.c b/kernel/module.c index d363b8a0ee24..0048b7843903 100644 --- a/kernel/module.c +++ b/kernel/module.c @@ -587,6 +587,13 @@ static inline void __percpu *mod_percpu(struct module *mod) return mod->percpu; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static inline void __percpu *mod_percpu_asi(struct module *mod) +{ + return mod->percpu_asi; +} +#endif + static int percpu_modalloc(struct module *mod, struct load_info *info) { Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu]; @@ -611,9 +618,34 @@ static int percpu_modalloc(struct module *mod, struct load_info *info) return 0; } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static int percpu_asi_modalloc(struct module *mod, struct load_info *info) +{ + Elf_Shdr *pcpusec = &info->sechdrs[info->index.pcpu_asi]; + unsigned long align = pcpusec->sh_addralign; + + if ( !pcpusec->sh_size) + return 0; + + mod->percpu_asi = __alloc_reserved_percpu_asi(pcpusec->sh_size, align); + if (!mod->percpu_asi) { + pr_warn("%s: Could not allocate %lu bytes percpu data\n", + mod->name, (unsigned long)pcpusec->sh_size); + return -ENOMEM; + } + mod->percpu_asi_size = pcpusec->sh_size; + + return 0; +} +#endif + static void percpu_modfree(struct module *mod) { free_percpu(mod->percpu); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + free_percpu(mod->percpu_asi); +#endif } static unsigned int find_pcpusec(struct load_info *info) @@ -621,6 +653,13 @@ static unsigned int find_pcpusec(struct load_info *info) return find_sec(info, ".data..percpu"); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static unsigned int find_pcpusec_asi(struct load_info *info) +{ + return find_sec(info, ".data..percpu" ASI_PERCPU_SECTION ); +} +#endif + static void percpu_modcopy(struct module *mod, const void *from, unsigned long size) { @@ -630,6 +669,39 @@ static void percpu_modcopy(struct module *mod, memcpy(per_cpu_ptr(mod->percpu, cpu), from, size); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +static void percpu_asi_modcopy(struct module *mod, + const void *from, unsigned long size) +{ + int cpu; + + for_each_possible_cpu(cpu) + memcpy(per_cpu_ptr(mod->percpu_asi, cpu), from, size); +} +#endif + +bool __is_module_percpu_address_helper(unsigned long addr, + unsigned long *can_addr, + unsigned int cpu, + void* percpu_start, + unsigned int percpu_size) +{ + void *start = per_cpu_ptr(percpu_start, cpu); + void *va = (void *)addr; + + if (va >= start && va < start + percpu_size) { + if (can_addr) { + *can_addr = (unsigned long) (va - start); + *can_addr += (unsigned long) + per_cpu_ptr(percpu_start, + get_boot_cpu_id()); + } + return true; + } + + return false; +} + bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr) { struct module *mod; @@ -640,22 +712,34 @@ bool __is_module_percpu_address(unsigned long addr, unsigned long *can_addr) list_for_each_entry_rcu(mod, &modules, list) { if (mod->state == MODULE_STATE_UNFORMED) continue; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (!mod->percpu_size && !mod->percpu_asi_size) + continue; +#else if (!mod->percpu_size) continue; +#endif for_each_possible_cpu(cpu) { - void *start = per_cpu_ptr(mod->percpu, cpu); - void *va = (void *)addr; - - if (va >= start && va < start + mod->percpu_size) { - if (can_addr) { - *can_addr = (unsigned long) (va - start); - *can_addr += (unsigned long) - per_cpu_ptr(mod->percpu, - get_boot_cpu_id()); - } + if (__is_module_percpu_address_helper(addr, + can_addr, + cpu, + mod->percpu, + mod->percpu_size)) { preempt_enable(); return true; - } + } + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (__is_module_percpu_address_helper( + addr, + can_addr, + cpu, + mod->percpu_asi, + mod->percpu_asi_size)) { + preempt_enable(); + return true; + } +#endif } } @@ -2344,6 +2428,10 @@ static int simplify_symbols(struct module *mod, const struct load_info *info) /* Divert to percpu allocation if a percpu var. */ if (sym[i].st_shndx == info->index.pcpu) secbase = (unsigned long)mod_percpu(mod); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + else if (sym[i].st_shndx == info->index.pcpu_asi) + secbase = (unsigned long)mod_percpu_asi(mod); +#endif else secbase = info->sechdrs[sym[i].st_shndx].sh_addr; sym[i].st_value += secbase; @@ -2664,6 +2752,10 @@ static char elf_type(const Elf_Sym *sym, const struct load_info *info) return 'U'; if (sym->st_shndx == SHN_ABS || sym->st_shndx == info->index.pcpu) return 'a'; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (sym->st_shndx == info->index.pcpu_asi) + return 'a'; +#endif if (sym->st_shndx >= SHN_LORESERVE) return '?'; if (sechdrs[sym->st_shndx].sh_flags & SHF_EXECINSTR) @@ -2691,7 +2783,8 @@ static char elf_type(const Elf_Sym *sym, const struct load_info *info) } static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs, - unsigned int shnum, unsigned int pcpundx) + unsigned int shnum, unsigned int pcpundx, + unsigned pcpu_asi_ndx) { const Elf_Shdr *sec; @@ -2701,7 +2794,7 @@ static bool is_core_symbol(const Elf_Sym *src, const Elf_Shdr *sechdrs, return false; #ifdef CONFIG_KALLSYMS_ALL - if (src->st_shndx == pcpundx) + if (src->st_shndx == pcpundx || src->st_shndx == pcpu_asi_ndx ) return true; #endif @@ -2743,7 +2836,7 @@ static void layout_symtab(struct module *mod, struct load_info *info) for (ndst = i = 0; i < nsrc; i++) { if (i == 0 || is_livepatch_module(mod) || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, - info->index.pcpu)) { + info->index.pcpu, info->index.pcpu_asi)) { strtab_size += strlen(&info->strtab[src[i].st_name])+1; ndst++; } @@ -2807,7 +2900,7 @@ static void add_kallsyms(struct module *mod, const struct load_info *info) mod->kallsyms->typetab[i] = elf_type(src + i, info); if (i == 0 || is_livepatch_module(mod) || is_core_symbol(src+i, info->sechdrs, info->hdr->e_shnum, - info->index.pcpu)) { + info->index.pcpu, info->index.pcpu_asi)) { mod->core_kallsyms.typetab[ndst] = mod->kallsyms->typetab[i]; dst[ndst] = src[i]; @@ -3289,6 +3382,12 @@ static int setup_load_info(struct load_info *info, int flags) info->index.pcpu = find_pcpusec(info); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + info->index.pcpu_asi = find_pcpusec_asi(info); +#else + info->index.pcpu_asi = 0; +#endif + return 0; } @@ -3629,6 +3728,12 @@ static struct module *layout_and_allocate(struct load_info *info, int flags) /* We will do a special allocation for per-cpu sections later. */ info->sechdrs[info->index.pcpu].sh_flags &= ~(unsigned long)SHF_ALLOC; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (info->index.pcpu_asi) + info->sechdrs[info->index.pcpu_asi].sh_flags &= + ~(unsigned long)SHF_ALLOC; +#endif + /* * Mark ro_after_init section with SHF_RO_AFTER_INIT so that * layout_sections() can put it in the right place. @@ -3700,6 +3805,14 @@ static int post_relocation(struct module *mod, const struct load_info *info) percpu_modcopy(mod, (void *)info->sechdrs[info->index.pcpu].sh_addr, info->sechdrs[info->index.pcpu].sh_size); +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* Copy relocated percpu ASI area over. */ + percpu_asi_modcopy( + mod, + (void *)info->sechdrs[info->index.pcpu_asi].sh_addr, + info->sechdrs[info->index.pcpu_asi].sh_size); +#endif + /* Setup kallsyms-specific fields. */ add_kallsyms(mod, info); @@ -4094,6 +4207,11 @@ static int load_module(struct load_info *info, const char __user *uargs, err = percpu_modalloc(mod, info); if (err) goto unlink_mod; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + err = percpu_asi_modalloc(mod, info); + if (err) + goto unlink_mod; +#endif /* Now module is in final location, initialize linked lists, etc. */ err = module_unload_init(mod); @@ -4183,7 +4301,11 @@ static int load_module(struct load_info *info, const char __user *uargs, /* Get rid of temporary copy. */ free_copy(info); - asi_load_module(mod); + err = asi_load_module(mod); + /* If the ASI loading failed, it doesn't necessarily mean that the + * module loading failed. We print an error and move on. */ + if (err) + pr_err("ASI: failed loading module %s", mod->name); /* Done! */ trace_module_load(mod); diff --git a/mm/percpu.c b/mm/percpu.c index beaca5adf9d4..3665a5ea71ec 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -169,6 +169,10 @@ struct pcpu_chunk *pcpu_first_chunk __ro_after_init; */ struct pcpu_chunk *pcpu_reserved_chunk __ro_after_init; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +struct pcpu_chunk *pcpu_reserved_nonsensitive_chunk __ro_after_init; +#endif + DEFINE_SPINLOCK(pcpu_lock); /* all internal data structures */ static DEFINE_MUTEX(pcpu_alloc_mutex); /* chunk create/destroy, [de]pop, map ext */ @@ -1621,6 +1625,11 @@ static struct pcpu_chunk *pcpu_chunk_addr_search(void *addr) if (pcpu_addr_in_chunk(pcpu_first_chunk, addr)) return pcpu_first_chunk; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* is it in the reserved ASI region? */ + if (pcpu_addr_in_chunk(pcpu_reserved_nonsensitive_chunk, addr)) + return pcpu_reserved_nonsensitive_chunk; +#endif /* is it in the reserved region? */ if (pcpu_addr_in_chunk(pcpu_reserved_chunk, addr)) return pcpu_reserved_chunk; @@ -1805,23 +1814,37 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved, spin_lock_irqsave(&pcpu_lock, flags); +#define TRY_ALLOC_FROM_CHUNK(source_chunk, chunk_name) \ +do { \ + if (!source_chunk) { \ + err = chunk_name " chunk not allocated"; \ + goto fail_unlock; \ + } \ + chunk = source_chunk; \ + \ + off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); \ + if (off < 0) { \ + err = "alloc from " chunk_name " chunk failed"; \ + goto fail_unlock; \ + } \ + \ + off = pcpu_alloc_area(chunk, bits, bit_align, off); \ + if (off >= 0) \ + goto area_found; \ + \ + err = "alloc from " chunk_name " chunk failed"; \ + goto fail_unlock; \ +} while(0) + /* serve reserved allocations from the reserved chunk if available */ - if (reserved && pcpu_reserved_chunk) { - chunk = pcpu_reserved_chunk; - - off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic); - if (off < 0) { - err = "alloc from reserved chunk failed"; - goto fail_unlock; - } - - off = pcpu_alloc_area(chunk, bits, bit_align, off); - if (off >= 0) - goto area_found; - - err = "alloc from reserved chunk failed"; - goto fail_unlock; - } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (reserved && (gfp & __GFP_GLOBAL_NONSENSITIVE)) + TRY_ALLOC_FROM_CHUNK(pcpu_reserved_nonsensitive_chunk, + "reserverved ASI"); + else +#endif + if (reserved && pcpu_reserved_chunk) + TRY_ALLOC_FROM_CHUNK(pcpu_reserved_chunk, "reserved"); restart: /* search through normal chunks */ @@ -1998,6 +2021,14 @@ void __percpu *__alloc_reserved_percpu(size_t size, size_t align) return pcpu_alloc(size, align, true, GFP_KERNEL); } +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __percpu *__alloc_reserved_percpu_asi(size_t size, size_t align) +{ + return pcpu_alloc(size, align, true, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); +} +#endif + /** * pcpu_balance_free - manage the amount of free chunks * @empty_only: free chunks only if there are no populated pages @@ -2838,15 +2869,46 @@ void __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, * the dynamic region. */ tmp_addr = (unsigned long)base_addr + static_size; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* If ASI is used, split the reserved size between the nonsensitive + * chunk and the normal chunk evenly. */ + map_size = (ai->reserved_size / 2) ?: dyn_size; +#else map_size = ai->reserved_size ?: dyn_size; +#endif chunk = pcpu_alloc_first_chunk(tmp_addr, map_size); /* init dynamic chunk if necessary */ if (ai->reserved_size) { - pcpu_reserved_chunk = chunk; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + /* TODO: check if ASI was enabled via boot param or static branch */ + /* We allocated pcpu_reserved_nonsensitive_chunk only if + * pcpu_reserved_chunk is used as well. */ + pcpu_reserved_nonsensitive_chunk = chunk; + pcpu_reserved_nonsensitive_chunk->is_asi_nonsensitive = true; + /* We used the previous chunk as pcpu_reserved_nonsensitive_chunk. Now + * allocate pcpu_reserved_chunk */ + tmp_addr = (unsigned long)base_addr + static_size + + (ai->reserved_size / 2); + map_size = ai->reserved_size / 2; + chunk = pcpu_alloc_first_chunk(tmp_addr, map_size); +#endif + /* Whether ASI is enabled or disabled, the end result is the + * same: + * If ASI is enabled, tmp_addr, used for pcpu_first_chunk should + * be after + * 1. pcpu_reserved_nonsensitive_chunk AND + * 2. pcpu_reserved_chunk + * Since we split the reserve size in half, we skip in total the + * whole ai->reserved_size. + * If ASI is disabled, tmp_addr, used for pcpu_first_chunk is + * just after pcpu_reserved_chunk */ tmp_addr = (unsigned long)base_addr + static_size + ai->reserved_size; + + pcpu_reserved_chunk = chunk; + map_size = dyn_size; chunk = pcpu_alloc_first_chunk(tmp_addr, map_size); } @@ -3129,7 +3191,6 @@ int __init pcpu_embed_first_chunk(size_t reserved_size, size_t dyn_size, cpu_distance_fn); if (IS_ERR(ai)) return PTR_ERR(ai); - size_sum = ai->static_size + ai->reserved_size + ai->dyn_size; areas_size = PFN_ALIGN(ai->nr_groups * sizeof(void *)); @@ -3460,3 +3521,40 @@ static int __init percpu_enable_async(void) return 0; } subsys_initcall(percpu_enable_async); + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +void __init pcpu_map_asi_reserved_chunk(void) +{ + void *start_addr, *end_addr; + unsigned long map_start_addr, map_end_addr; + struct pcpu_chunk *chunk = pcpu_reserved_nonsensitive_chunk; + int err = 0; + + if (!chunk) + return; + + start_addr = chunk->base_addr + chunk->start_offset; + end_addr = chunk->base_addr + chunk->nr_pages * PAGE_SIZE - + chunk->end_offset; + + + /* No need in asi_map_percpu, since these addresses are "real". The + * chunk has full pages allocated, so we're not worried about leakage of + * data caused by start_addr-->end_addr not being page aligned. asi_map, + * however, will fail/crash if the addresses are not aligned. */ + map_start_addr = (unsigned long)start_addr & PAGE_MASK; + map_end_addr = PAGE_ALIGN((unsigned long)end_addr); + + pr_err("%s:%d mapping 0x%lx --> 0x%lx", + __FUNCTION__, __LINE__, map_start_addr, map_end_addr); + err = asi_map(ASI_GLOBAL_NONSENSITIVE, + (void*)map_start_addr, map_end_addr - map_start_addr); + + WARN(err, "Failed mapping percpu reserved chunk into ASI"); + + /* If we couldn't map the chuknk into ASI, it is useless. Set the chunk + * to NULL, so allocations from it will fail. */ + if (err) + pcpu_reserved_nonsensitive_chunk = NULL; +} +#endif From patchwork Wed Feb 23 05:22:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BFC82C433EF for ; Wed, 23 Feb 2022 05:25:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C16A58D0012; Wed, 23 Feb 2022 00:25:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BC7898D0001; Wed, 23 Feb 2022 00:25:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9A4A98D0012; Wed, 23 Feb 2022 00:25:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 83FA18D0001 for ; Wed, 23 Feb 2022 00:25:18 -0500 (EST) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 421F5181CAC7C for ; Wed, 23 Feb 2022 05:25:18 +0000 (UTC) X-FDA: 79172906316.22.60C9D33 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf17.hostedemail.com (Postfix) with ESMTP id 8EB3A40004 for ; Wed, 23 Feb 2022 05:25:17 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id b18-20020a25fa12000000b0062412a8200eso20210106ybe.22 for ; Tue, 22 Feb 2022 21:25:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fjoegs+Ro+S0LiOfkKDHuccTriIwowqx+uIdOr3iex0=; b=EbHx41T8pBJ5A2BSG8qjfuD65qMmT14wI+rGz7MtVYSyZZTtXAQ9SSi490wWxvTCX3 lF1yqHJ25Ogc1hfXuKo9v+hg5YjvOasU7cMgAwTxB6T3rHmXhsMhhA64NTfNq1j1V9t4 TXGaFy+GztCl/Zph97U1y4UYZaT827j5iNNm3e+F0mQoJhUP0sknSp9qrgRx9gx8cRMf Kb0JwVGCNa+GNMz65PcJow71XH07p87xccnzELq309vUDOD82kYnpx/YluPwYOrivr7t AT7gGVWsZa9CQMSKMpbyZuKYicDY2/ciGXtsoA+y078oLzomCpgRmU1bPsX32QTmpazD udew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fjoegs+Ro+S0LiOfkKDHuccTriIwowqx+uIdOr3iex0=; b=xtMuKpW1ldBzs0qsQO90nMvqciSk6CvOMMHc+shaHtRFWOXXbX5Im6HUcmR7FDkA2O Uck7MUgh5q/z+x6KBq8mQgxLZ3yO+SEk2pG0V7Pgo0UuYWjuo8WDaM04LmQP5wOrBiPh jmcN0Yrdppv106DZstOTdKMOaTz/NG1SvbGuO4ITVp8n/zWvGKh2dhsenhaGjfo3sFSC 2IgESPkpgPNoRgJi2tH2TwzzLEnpQelTxQFaDzBFE11DxhdIfeDYIrOGH+mzxGpyfqpb FD3vEkyoCrC8t1TVQh79243PaJHFRoBZMsH3UmITcK5dCiFzSE56YxU1pdfSBOJw9lMm 41wQ== X-Gm-Message-State: AOAM533YKrMiDoN/sYDZx21n7rhggOZCZVngzbt8P1l9MFUb/7Bqxw7x 1YXhbpxndqnP6pHSp60SsAE4b4VST7fv X-Google-Smtp-Source: ABdhPJyEpvnRe67yGfdP/oVogGTIG0cZjG1Fj1fU4LBOman77N8SUGlY7TMto2pQXc1VwXiknsbYEeXFLlaJ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a5b:cc8:0:b0:622:e87:2087 with SMTP id e8-20020a5b0cc8000000b006220e872087mr26256339ybr.106.1645593916871; Tue, 22 Feb 2022 21:25:16 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:17 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-42-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 41/47] mm: asi: Annotation of static variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 8EB3A40004 X-Stat-Signature: wwyzo3fzfibszxcngdmposgh6db9xwhj Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EbHx41T8; spf=pass (imf17.hostedemail.com: domain of 3PMUVYgcKCDUaleRZUjXffXcV.TfdcZelo-ddbmRTb.fiX@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3PMUVYgcKCDUaleRZUjXffXcV.TfdcZelo-ddbmRTb.fiX@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593917-80515 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain static variables as not sensitive. Some static variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit applies the __asi_not_sensitive* attributes to these frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/events/core.c | 4 ++-- arch/x86/events/intel/core.c | 2 +- arch/x86/events/msr.c | 2 +- arch/x86/events/perf_event.h | 2 +- arch/x86/include/asm/kvm_host.h | 4 ++-- arch/x86/kernel/alternative.c | 2 +- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kernel/setup.c | 4 ++-- arch/x86/kernel/smp.c | 2 +- arch/x86/kernel/tsc.c | 8 +++---- arch/x86/kvm/lapic.c | 2 +- arch/x86/kvm/mmu/spte.c | 2 +- arch/x86/kvm/mmu/spte.h | 2 +- arch/x86/kvm/mtrr.c | 2 +- arch/x86/kvm/vmx/capabilities.h | 14 ++++++------ arch/x86/kvm/vmx/vmx.c | 37 ++++++++++++++++--------------- arch/x86/kvm/x86.c | 35 +++++++++++++++-------------- arch/x86/mm/asi.c | 4 ++-- include/linux/debug_locks.h | 4 ++-- include/linux/jiffies.h | 4 ++-- include/linux/notifier.h | 2 +- include/linux/profile.h | 2 +- include/linux/rcupdate.h | 4 +++- include/linux/rcutree.h | 2 +- include/linux/sched/sysctl.h | 1 + init/main.c | 2 +- kernel/cgroup/cgroup.c | 5 +++-- kernel/cpu.c | 14 ++++++------ kernel/events/core.c | 4 ++-- kernel/freezer.c | 2 +- kernel/locking/lockdep.c | 14 ++++++------ kernel/panic.c | 2 +- kernel/printk/printk.c | 4 ++-- kernel/profile.c | 4 ++-- kernel/rcu/tree.c | 10 ++++----- kernel/rcu/update.c | 4 ++-- kernel/sched/clock.c | 2 +- kernel/sched/core.c | 6 ++--- kernel/sched/cpuacct.c | 2 +- kernel/sched/cputime.c | 2 +- kernel/sched/fair.c | 4 ++-- kernel/sched/loadavg.c | 2 +- kernel/sched/rt.c | 2 +- kernel/sched/sched.h | 4 ++-- kernel/smp.c | 2 +- kernel/softirq.c | 3 ++- kernel/time/hrtimer.c | 2 +- kernel/time/jiffies.c | 8 ++++++- kernel/time/ntp.c | 30 ++++++++++++------------- kernel/time/tick-common.c | 4 ++-- kernel/time/tick-internal.h | 2 +- kernel/time/tick-sched.c | 2 +- kernel/time/timekeeping.c | 10 ++++----- kernel/time/timekeeping.h | 2 +- kernel/time/timer.c | 2 +- kernel/trace/trace.c | 2 +- kernel/trace/trace_sched_switch.c | 4 ++-- lib/debug_locks.c | 5 +++-- mm/memory.c | 2 +- mm/page_alloc.c | 2 +- mm/sparse.c | 4 ++-- virt/kvm/kvm_main.c | 2 +- 62 files changed, 170 insertions(+), 156 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 38b2c779146f..db825bf053fd 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -44,7 +44,7 @@ #include "perf_event.h" -struct x86_pmu x86_pmu __read_mostly; +struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; static struct pmu pmu; DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = { @@ -2685,7 +2685,7 @@ static int x86_pmu_filter_match(struct perf_event *event) return 1; } -static struct pmu pmu = { +static struct pmu pmu __asi_not_sensitive = { .pmu_enable = x86_pmu_enable, .pmu_disable = x86_pmu_disable, diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c index ec6444f2c9dc..5b2b7473b2f2 100644 --- a/arch/x86/events/intel/core.c +++ b/arch/x86/events/intel/core.c @@ -189,7 +189,7 @@ static struct event_constraint intel_slm_event_constraints[] __read_mostly = EVENT_CONSTRAINT_END }; -static struct event_constraint intel_skl_event_constraints[] = { +static struct event_constraint intel_skl_event_constraints[] __asi_not_sensitive = { FIXED_EVENT_CONSTRAINT(0x00c0, 0), /* INST_RETIRED.ANY */ FIXED_EVENT_CONSTRAINT(0x003c, 1), /* CPU_CLK_UNHALTED.CORE */ FIXED_EVENT_CONSTRAINT(0x0300, 2), /* CPU_CLK_UNHALTED.REF */ diff --git a/arch/x86/events/msr.c b/arch/x86/events/msr.c index 96c775abe31f..db7bca37c726 100644 --- a/arch/x86/events/msr.c +++ b/arch/x86/events/msr.c @@ -280,7 +280,7 @@ static int msr_event_add(struct perf_event *event, int flags) return 0; } -static struct pmu pmu_msr = { +static struct pmu pmu_msr __asi_not_sensitive = { .task_ctx_nr = perf_sw_context, .attr_groups = attr_groups, .event_init = msr_event_init, diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 5480db242083..27cca7fd6f17 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1020,7 +1020,7 @@ static struct perf_pmu_format_hybrid_attr format_attr_hybrid_##_name = {\ } struct pmu *x86_get_pmu(unsigned int cpu); -extern struct x86_pmu x86_pmu __read_mostly; +extern struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; static __always_inline struct x86_perf_task_context_opt *task_context_opt(void *ctx) { diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 8ba88bbcf895..b7292c4fece7 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1542,8 +1542,8 @@ struct kvm_arch_async_pf { extern u32 __read_mostly kvm_nr_uret_msrs; extern u64 __read_mostly host_efer; -extern bool __read_mostly allow_smaller_maxphyaddr; -extern bool __read_mostly enable_apicv; +extern bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr; +extern bool __asi_not_sensitive_readmostly enable_apicv; extern struct kvm_x86_ops kvm_x86_ops; #define KVM_X86_OP(func) \ diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c index 23fb4d51a5da..9836ebe953ed 100644 --- a/arch/x86/kernel/alternative.c +++ b/arch/x86/kernel/alternative.c @@ -31,7 +31,7 @@ #include #include -int __read_mostly alternatives_patched; +int __asi_not_sensitive alternatives_patched; EXPORT_SYMBOL_GPL(alternatives_patched); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1c1f218a701d..6b5e6574e391 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -46,7 +46,7 @@ static void __init srbds_select_mitigation(void); static void __init l1d_flush_select_mitigation(void); /* The base value of the SPEC_CTRL MSR that always has to be preserved. */ -u64 x86_spec_ctrl_base; +u64 x86_spec_ctrl_base __asi_not_sensitive; EXPORT_SYMBOL_GPL(x86_spec_ctrl_base); static DEFINE_MUTEX(spec_ctrl_mutex); diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index e04f5e6eb33f..d8461ac88b36 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -116,7 +116,7 @@ static struct resource bss_resource = { struct cpuinfo_x86 new_cpu_data; /* Common CPU data for all CPUs */ -struct cpuinfo_x86 boot_cpu_data __read_mostly; +struct cpuinfo_x86 boot_cpu_data __asi_not_sensitive_readmostly; EXPORT_SYMBOL(boot_cpu_data); unsigned int def_to_bigsmp; @@ -133,7 +133,7 @@ struct ist_info ist_info; #endif #else -struct cpuinfo_x86 boot_cpu_data __read_mostly; +struct cpuinfo_x86 boot_cpu_data __asi_not_sensitive_readmostly; EXPORT_SYMBOL(boot_cpu_data); #endif diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index 06db901fabe8..e9e10ffc2ec2 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -257,7 +257,7 @@ static int __init nonmi_ipi_setup(char *str) __setup("nonmi_ipi", nonmi_ipi_setup); -struct smp_ops smp_ops = { +struct smp_ops smp_ops __asi_not_sensitive = { .smp_prepare_boot_cpu = native_smp_prepare_boot_cpu, .smp_prepare_cpus = native_smp_prepare_cpus, .smp_cpus_done = native_smp_cpus_done, diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index a698196377be..d7169da99b01 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -30,10 +30,10 @@ #include #include -unsigned int __read_mostly cpu_khz; /* TSC clocks / usec, not used here */ +unsigned int __asi_not_sensitive_readmostly cpu_khz; /* TSC clocks / usec, not used here */ EXPORT_SYMBOL(cpu_khz); -unsigned int __read_mostly tsc_khz; +unsigned int __asi_not_sensitive_readmostly tsc_khz; EXPORT_SYMBOL(tsc_khz); #define KHZ 1000 @@ -41,7 +41,7 @@ EXPORT_SYMBOL(tsc_khz); /* * TSC can be unstable due to cpufreq or due to unsynced TSCs */ -static int __read_mostly tsc_unstable; +static int __asi_not_sensitive_readmostly tsc_unstable; static unsigned int __initdata tsc_early_khz; static DEFINE_STATIC_KEY_FALSE(__use_tsc); @@ -1146,7 +1146,7 @@ static struct clocksource clocksource_tsc_early = { * this one will immediately take over. We will only register if TSC has * been found good. */ -static struct clocksource clocksource_tsc = { +static struct clocksource clocksource_tsc __asi_not_sensitive = { .name = "tsc", .rating = 300, .read = read_tsc, diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index f206fc35deff..213bbdfab49e 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -60,7 +60,7 @@ #define MAX_APIC_VECTOR 256 #define APIC_VECTORS_PER_REG 32 -static bool lapic_timer_advance_dynamic __read_mostly; +static bool lapic_timer_advance_dynamic __asi_not_sensitive_readmostly; #define LAPIC_TIMER_ADVANCE_ADJUST_MIN 100 /* clock cycles */ #define LAPIC_TIMER_ADVANCE_ADJUST_MAX 10000 /* clock cycles */ #define LAPIC_TIMER_ADVANCE_NS_INIT 1000 diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 0c76c45fdb68..13038fae5088 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -33,7 +33,7 @@ u64 __read_mostly shadow_mmio_mask; u64 __read_mostly shadow_mmio_access_mask; u64 __read_mostly shadow_present_mask; u64 __read_mostly shadow_me_mask; -u64 __read_mostly shadow_acc_track_mask; +u64 __asi_not_sensitive_readmostly shadow_acc_track_mask; u64 __read_mostly shadow_nonpresent_or_rsvd_mask; u64 __read_mostly shadow_nonpresent_or_rsvd_lower_gfn_mask; diff --git a/arch/x86/kvm/mmu/spte.h b/arch/x86/kvm/mmu/spte.h index cc432f9a966b..d1af03f63009 100644 --- a/arch/x86/kvm/mmu/spte.h +++ b/arch/x86/kvm/mmu/spte.h @@ -151,7 +151,7 @@ extern u64 __read_mostly shadow_me_mask; * shadow_acc_track_mask is the set of bits to be cleared in non-accessed * pages. */ -extern u64 __read_mostly shadow_acc_track_mask; +extern u64 __asi_not_sensitive_readmostly shadow_acc_track_mask; /* * This mask must be set on all non-zero Non-Present or Reserved SPTEs in order diff --git a/arch/x86/kvm/mtrr.c b/arch/x86/kvm/mtrr.c index a8502e02f479..66228abfa9fa 100644 --- a/arch/x86/kvm/mtrr.c +++ b/arch/x86/kvm/mtrr.c @@ -138,7 +138,7 @@ struct fixed_mtrr_segment { int range_start; }; -static struct fixed_mtrr_segment fixed_seg_table[] = { +static struct fixed_mtrr_segment fixed_seg_table[] __asi_not_sensitive = { /* MSR_MTRRfix64K_00000, 1 unit. 64K fixed mtrr. */ { .start = 0x0, diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h index 4705ad55abb5..0ab03ec7d6d0 100644 --- a/arch/x86/kvm/vmx/capabilities.h +++ b/arch/x86/kvm/vmx/capabilities.h @@ -6,13 +6,13 @@ #include "lapic.h" -extern bool __read_mostly enable_vpid; -extern bool __read_mostly flexpriority_enabled; -extern bool __read_mostly enable_ept; -extern bool __read_mostly enable_unrestricted_guest; -extern bool __read_mostly enable_ept_ad_bits; -extern bool __read_mostly enable_pml; -extern int __read_mostly pt_mode; +extern bool __asi_not_sensitive_readmostly enable_vpid; +extern bool __asi_not_sensitive_readmostly flexpriority_enabled; +extern bool __asi_not_sensitive_readmostly enable_ept; +extern bool __asi_not_sensitive_readmostly enable_unrestricted_guest; +extern bool __asi_not_sensitive_readmostly enable_ept_ad_bits; +extern bool __asi_not_sensitive_readmostly enable_pml; +extern int __asi_not_sensitive_readmostly pt_mode; #define PT_MODE_SYSTEM 0 #define PT_MODE_HOST_GUEST 1 diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6549fef39f2b..e1ad82c25a78 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -78,29 +78,29 @@ static const struct x86_cpu_id vmx_cpu_id[] = { MODULE_DEVICE_TABLE(x86cpu, vmx_cpu_id); #endif -bool __read_mostly enable_vpid = 1; +bool __asi_not_sensitive_readmostly enable_vpid = 1; module_param_named(vpid, enable_vpid, bool, 0444); -static bool __read_mostly enable_vnmi = 1; +static bool __asi_not_sensitive_readmostly enable_vnmi = 1; module_param_named(vnmi, enable_vnmi, bool, S_IRUGO); -bool __read_mostly flexpriority_enabled = 1; +bool __asi_not_sensitive_readmostly flexpriority_enabled = 1; module_param_named(flexpriority, flexpriority_enabled, bool, S_IRUGO); -bool __read_mostly enable_ept = 1; +bool __asi_not_sensitive_readmostly enable_ept = 1; module_param_named(ept, enable_ept, bool, S_IRUGO); -bool __read_mostly enable_unrestricted_guest = 1; +bool __asi_not_sensitive_readmostly enable_unrestricted_guest = 1; module_param_named(unrestricted_guest, enable_unrestricted_guest, bool, S_IRUGO); -bool __read_mostly enable_ept_ad_bits = 1; +bool __asi_not_sensitive_readmostly enable_ept_ad_bits = 1; module_param_named(eptad, enable_ept_ad_bits, bool, S_IRUGO); -static bool __read_mostly emulate_invalid_guest_state = true; +static bool __asi_not_sensitive_readmostly emulate_invalid_guest_state = true; module_param(emulate_invalid_guest_state, bool, S_IRUGO); -static bool __read_mostly fasteoi = 1; +static bool __asi_not_sensitive_readmostly fasteoi = 1; module_param(fasteoi, bool, S_IRUGO); module_param(enable_apicv, bool, S_IRUGO); @@ -110,13 +110,13 @@ module_param(enable_apicv, bool, S_IRUGO); * VMX and be a hypervisor for its own guests. If nested=0, guests may not * use VMX instructions. */ -static bool __read_mostly nested = 1; +static bool __asi_not_sensitive_readmostly nested = 1; module_param(nested, bool, S_IRUGO); -bool __read_mostly enable_pml = 1; +bool __asi_not_sensitive_readmostly enable_pml = 1; module_param_named(pml, enable_pml, bool, S_IRUGO); -static bool __read_mostly dump_invalid_vmcs = 0; +static bool __asi_not_sensitive_readmostly dump_invalid_vmcs = 0; module_param(dump_invalid_vmcs, bool, 0644); #define MSR_BITMAP_MODE_X2APIC 1 @@ -125,13 +125,13 @@ module_param(dump_invalid_vmcs, bool, 0644); #define KVM_VMX_TSC_MULTIPLIER_MAX 0xffffffffffffffffULL /* Guest_tsc -> host_tsc conversion requires 64-bit division. */ -static int __read_mostly cpu_preemption_timer_multi; -static bool __read_mostly enable_preemption_timer = 1; +static int __asi_not_sensitive_readmostly cpu_preemption_timer_multi; +static bool __asi_not_sensitive_readmostly enable_preemption_timer = 1; #ifdef CONFIG_X86_64 module_param_named(preemption_timer, enable_preemption_timer, bool, S_IRUGO); #endif -extern bool __read_mostly allow_smaller_maxphyaddr; +extern bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr; module_param(allow_smaller_maxphyaddr, bool, S_IRUGO); #define KVM_VM_CR0_ALWAYS_OFF (X86_CR0_NW | X86_CR0_CD) @@ -202,7 +202,7 @@ static unsigned int ple_window_max = KVM_VMX_DEFAULT_PLE_WINDOW_MAX; module_param(ple_window_max, uint, 0444); /* Default is SYSTEM mode, 1 for host-guest mode */ -int __read_mostly pt_mode = PT_MODE_SYSTEM; +int __asi_not_sensitive_readmostly pt_mode = PT_MODE_SYSTEM; module_param(pt_mode, int, S_IRUGO); static DEFINE_STATIC_KEY_FALSE(vmx_l1d_should_flush); @@ -421,7 +421,7 @@ static DEFINE_PER_CPU(struct list_head, loaded_vmcss_on_cpu); static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); -struct vmcs_config vmcs_config; +struct vmcs_config vmcs_config __asi_not_sensitive; struct vmx_capability vmx_capability; #define VMX_SEGMENT_FIELD(seg) \ @@ -453,7 +453,7 @@ static inline void vmx_segment_cache_clear(struct vcpu_vmx *vmx) vmx->segment_cache.bitmask = 0; } -static unsigned long host_idt_base; +static unsigned long host_idt_base __asi_not_sensitive; #if IS_ENABLED(CONFIG_HYPERV) static bool __read_mostly enlightened_vmcs = true; @@ -5549,7 +5549,8 @@ static int handle_bus_lock_vmexit(struct kvm_vcpu *vcpu) * may resume. Otherwise they set the kvm_run parameter to indicate what needs * to be done to userspace and return 0. */ -static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) = { +static int (*kvm_vmx_exit_handlers[])(struct kvm_vcpu *vcpu) __asi_not_sensitive += { [EXIT_REASON_EXCEPTION_NMI] = handle_exception_nmi, [EXIT_REASON_EXTERNAL_INTERRUPT] = handle_external_interrupt, [EXIT_REASON_TRIPLE_FAULT] = handle_triple_fault, diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index d0df14deae80..0df88eadab60 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -123,7 +123,7 @@ static int sync_regs(struct kvm_vcpu *vcpu); static int __set_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); static void __get_sregs2(struct kvm_vcpu *vcpu, struct kvm_sregs2 *sregs2); -struct kvm_x86_ops kvm_x86_ops __read_mostly; +struct kvm_x86_ops kvm_x86_ops __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(kvm_x86_ops); #define KVM_X86_OP(func) \ @@ -148,17 +148,17 @@ module_param(min_timer_period_us, uint, S_IRUGO | S_IWUSR); static bool __read_mostly kvmclock_periodic_sync = true; module_param(kvmclock_periodic_sync, bool, S_IRUGO); -bool __read_mostly kvm_has_tsc_control; +bool __asi_not_sensitive_readmostly kvm_has_tsc_control; EXPORT_SYMBOL_GPL(kvm_has_tsc_control); -u32 __read_mostly kvm_max_guest_tsc_khz; +u32 __asi_not_sensitive_readmostly kvm_max_guest_tsc_khz; EXPORT_SYMBOL_GPL(kvm_max_guest_tsc_khz); -u8 __read_mostly kvm_tsc_scaling_ratio_frac_bits; +u8 __asi_not_sensitive_readmostly kvm_tsc_scaling_ratio_frac_bits; EXPORT_SYMBOL_GPL(kvm_tsc_scaling_ratio_frac_bits); -u64 __read_mostly kvm_max_tsc_scaling_ratio; +u64 __asi_not_sensitive_readmostly kvm_max_tsc_scaling_ratio; EXPORT_SYMBOL_GPL(kvm_max_tsc_scaling_ratio); -u64 __read_mostly kvm_default_tsc_scaling_ratio; +u64 __asi_not_sensitive_readmostly kvm_default_tsc_scaling_ratio; EXPORT_SYMBOL_GPL(kvm_default_tsc_scaling_ratio); -bool __read_mostly kvm_has_bus_lock_exit; +bool __asi_not_sensitive_readmostly kvm_has_bus_lock_exit; EXPORT_SYMBOL_GPL(kvm_has_bus_lock_exit); /* tsc tolerance in parts per million - default to 1/2 of the NTP threshold */ @@ -171,20 +171,20 @@ module_param(tsc_tolerance_ppm, uint, S_IRUGO | S_IWUSR); * advancement entirely. Any other value is used as-is and disables adaptive * tuning, i.e. allows privileged userspace to set an exact advancement time. */ -static int __read_mostly lapic_timer_advance_ns = -1; +static int __asi_not_sensitive_readmostly lapic_timer_advance_ns = -1; module_param(lapic_timer_advance_ns, int, S_IRUGO | S_IWUSR); -static bool __read_mostly vector_hashing = true; +static bool __asi_not_sensitive_readmostly vector_hashing = true; module_param(vector_hashing, bool, S_IRUGO); -bool __read_mostly enable_vmware_backdoor = false; +bool __asi_not_sensitive_readmostly enable_vmware_backdoor = false; module_param(enable_vmware_backdoor, bool, S_IRUGO); EXPORT_SYMBOL_GPL(enable_vmware_backdoor); -static bool __read_mostly force_emulation_prefix = false; +static bool __asi_not_sensitive_readmostly force_emulation_prefix = false; module_param(force_emulation_prefix, bool, S_IRUGO); -int __read_mostly pi_inject_timer = -1; +int __asi_not_sensitive_readmostly pi_inject_timer = -1; module_param(pi_inject_timer, bint, S_IRUGO | S_IWUSR); /* @@ -216,13 +216,14 @@ static struct kvm_user_return_msrs __percpu *user_return_msrs; u64 __read_mostly host_efer; EXPORT_SYMBOL_GPL(host_efer); -bool __read_mostly allow_smaller_maxphyaddr = 0; +bool __asi_not_sensitive_readmostly allow_smaller_maxphyaddr = 0; EXPORT_SYMBOL_GPL(allow_smaller_maxphyaddr); -bool __read_mostly enable_apicv = true; +bool __asi_not_sensitive_readmostly enable_apicv = true; EXPORT_SYMBOL_GPL(enable_apicv); -u64 __read_mostly host_xss; +/* TODO(oweisse): how dangerous is this variable, from a security standpoint? */ +u64 __asi_not_sensitive_readmostly host_xss; EXPORT_SYMBOL_GPL(host_xss); u64 __read_mostly supported_xss; EXPORT_SYMBOL_GPL(supported_xss); @@ -292,7 +293,7 @@ const struct kvm_stats_header kvm_vcpu_stats_header = { sizeof(kvm_vcpu_stats_desc), }; -u64 __read_mostly host_xcr0; +u64 __asi_not_sensitive_readmostly host_xcr0; u64 __read_mostly supported_xcr0; EXPORT_SYMBOL_GPL(supported_xcr0); @@ -2077,7 +2078,7 @@ struct pvclock_gtod_data { u64 wall_time_sec; }; -static struct pvclock_gtod_data pvclock_gtod_data; +static struct pvclock_gtod_data pvclock_gtod_data __asi_not_sensitive; static void update_pvclock_gtod(struct timekeeper *tk) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index ba373b461855..fdc117929fc7 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -17,8 +17,8 @@ #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt -static struct asi_class asi_class[ASI_MAX_NUM]; -static DEFINE_SPINLOCK(asi_class_lock); +static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; +static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); diff --git a/include/linux/debug_locks.h b/include/linux/debug_locks.h index dbb409d77d4f..7bd0c3dd6d47 100644 --- a/include/linux/debug_locks.h +++ b/include/linux/debug_locks.h @@ -7,8 +7,8 @@ struct task_struct; -extern int debug_locks __read_mostly; -extern int debug_locks_silent __read_mostly; +extern int debug_locks; +extern int debug_locks_silent; static __always_inline int __debug_locks_off(void) diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h index 5e13f801c902..deccab0dcb4a 100644 --- a/include/linux/jiffies.h +++ b/include/linux/jiffies.h @@ -76,8 +76,8 @@ extern int register_refined_jiffies(long clock_tick_rate); * without sampling the sequence number in jiffies_lock. * get_jiffies_64() will do this for you as appropriate. */ -extern u64 __cacheline_aligned_in_smp jiffies_64; -extern unsigned long volatile __cacheline_aligned_in_smp __jiffy_arch_data jiffies; +extern u64 jiffies_64; +extern unsigned long volatile __jiffy_arch_data jiffies; #if (BITS_PER_LONG < 64) u64 get_jiffies_64(void); diff --git a/include/linux/notifier.h b/include/linux/notifier.h index 87069b8459af..a27b193b8e60 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -117,7 +117,7 @@ extern void srcu_init_notifier_head(struct srcu_notifier_head *nh); struct blocking_notifier_head name = \ BLOCKING_NOTIFIER_INIT(name) #define RAW_NOTIFIER_HEAD(name) \ - struct raw_notifier_head name = \ + struct raw_notifier_head name __asi_not_sensitive = \ RAW_NOTIFIER_INIT(name) #ifdef CONFIG_TREE_SRCU diff --git a/include/linux/profile.h b/include/linux/profile.h index fd18ca96f557..4988b6d05d4c 100644 --- a/include/linux/profile.h +++ b/include/linux/profile.h @@ -38,7 +38,7 @@ enum profile_type { #ifdef CONFIG_PROFILING -extern int prof_on __read_mostly; +extern int prof_on; /* init basic kernel profiler */ int profile_init(void); diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h index 5e0beb5c5659..34f5073c88a2 100644 --- a/include/linux/rcupdate.h +++ b/include/linux/rcupdate.h @@ -84,7 +84,7 @@ static inline int rcu_preempt_depth(void) /* Internal to kernel */ void rcu_init(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_sched_clock_irq(int user); void rcu_report_dead(unsigned int cpu); void rcutree_migrate_callbacks(int cpu); @@ -308,6 +308,8 @@ static inline int rcu_read_lock_any_held(void) #ifdef CONFIG_PROVE_RCU +/* TODO: ASI - (oweisse) we might want to switch ".data.unlikely" to some other + * section that will be mapped to ASI. */ /** * RCU_LOCKDEP_WARN - emit lockdep splat if specified condition is met * @c: condition to check diff --git a/include/linux/rcutree.h b/include/linux/rcutree.h index 53209d669400..76665db179fa 100644 --- a/include/linux/rcutree.h +++ b/include/linux/rcutree.h @@ -62,7 +62,7 @@ static inline void rcu_irq_exit_check_preempt(void) { } void exit_rcu(void); void rcu_scheduler_starting(void); -extern int rcu_scheduler_active __read_mostly; +extern int rcu_scheduler_active; void rcu_end_inkernel_boot(void); bool rcu_inkernel_boot_has_ended(void); bool rcu_is_watching(void); diff --git a/include/linux/sched/sysctl.h b/include/linux/sched/sysctl.h index 304f431178fd..1529e3835939 100644 --- a/include/linux/sched/sysctl.h +++ b/include/linux/sched/sysctl.h @@ -3,6 +3,7 @@ #define _LINUX_SCHED_SYSCTL_H #include +#include struct ctl_table; diff --git a/init/main.c b/init/main.c index bb984ed79de0..ce87fac83aed 100644 --- a/init/main.c +++ b/init/main.c @@ -123,7 +123,7 @@ extern void radix_tree_init(void); * operations which are not allowed with IRQ disabled are allowed while the * flag is set. */ -bool early_boot_irqs_disabled __read_mostly; +bool early_boot_irqs_disabled __asi_not_sensitive; enum system_states system_state __read_mostly; EXPORT_SYMBOL(system_state); diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index cafb8c114a21..729495e17363 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -162,7 +162,8 @@ static struct static_key_true *cgroup_subsys_on_dfl_key[] = { static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); /* the default hierarchy */ -struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; +struct cgroup_root cgrp_dfl_root __asi_not_sensitive = + { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; EXPORT_SYMBOL_GPL(cgrp_dfl_root); /* @@ -755,7 +756,7 @@ EXPORT_SYMBOL_GPL(of_css); * reference-counted, to improve performance when child cgroups * haven't been created. */ -struct css_set init_css_set = { +struct css_set init_css_set __asi_not_sensitive = { .refcount = REFCOUNT_INIT(1), .dom_cset = &init_css_set, .tasks = LIST_HEAD_INIT(init_css_set.tasks), diff --git a/kernel/cpu.c b/kernel/cpu.c index 407a2568f35e..59530bd5da39 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2581,26 +2581,26 @@ const DECLARE_BITMAP(cpu_all_bits, NR_CPUS) = CPU_BITS_ALL; EXPORT_SYMBOL(cpu_all_bits); #ifdef CONFIG_INIT_ALL_POSSIBLE -struct cpumask __cpu_possible_mask __read_mostly +struct cpumask __cpu_possible_mask __asi_not_sensitive_readmostly = {CPU_BITS_ALL}; #else -struct cpumask __cpu_possible_mask __read_mostly; +struct cpumask __cpu_possible_mask __asi_not_sensitive_readmostly; #endif EXPORT_SYMBOL(__cpu_possible_mask); -struct cpumask __cpu_online_mask __read_mostly; +struct cpumask __cpu_online_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_online_mask); -struct cpumask __cpu_present_mask __read_mostly; +struct cpumask __cpu_present_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_present_mask); -struct cpumask __cpu_active_mask __read_mostly; +struct cpumask __cpu_active_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_active_mask); -struct cpumask __cpu_dying_mask __read_mostly; +struct cpumask __cpu_dying_mask __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__cpu_dying_mask); -atomic_t __num_online_cpus __read_mostly; +atomic_t __num_online_cpus __asi_not_sensitive_readmostly; EXPORT_SYMBOL(__num_online_cpus); void init_cpu_present(const struct cpumask *src) diff --git a/kernel/events/core.c b/kernel/events/core.c index 30d94f68c5bd..6ea559b6e0f4 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9651,7 +9651,7 @@ static int perf_swevent_init(struct perf_event *event) return 0; } -static struct pmu perf_swevent = { +static struct pmu perf_swevent __asi_not_sensitive = { .task_ctx_nr = perf_sw_context, .capabilities = PERF_PMU_CAP_NO_NMI, @@ -9800,7 +9800,7 @@ static int perf_tp_event_init(struct perf_event *event) return 0; } -static struct pmu perf_tracepoint = { +static struct pmu perf_tracepoint __asi_not_sensitive = { .task_ctx_nr = perf_sw_context, .event_init = perf_tp_event_init, diff --git a/kernel/freezer.c b/kernel/freezer.c index 45ab36ffd0e7..6ca163e4880b 100644 --- a/kernel/freezer.c +++ b/kernel/freezer.c @@ -13,7 +13,7 @@ #include /* total number of freezing conditions in effect */ -atomic_t system_freezing_cnt = ATOMIC_INIT(0); +atomic_t __asi_not_sensitive system_freezing_cnt = ATOMIC_INIT(0); EXPORT_SYMBOL(system_freezing_cnt); /* indicate whether PM freezing is in effect, protected by diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 2270ec68f10a..1b8f51a37883 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -64,7 +64,7 @@ #include #ifdef CONFIG_PROVE_LOCKING -int prove_locking = 1; +int prove_locking __asi_not_sensitive = 1; module_param(prove_locking, int, 0644); #else #define prove_locking 0 @@ -186,8 +186,8 @@ unsigned long nr_zapped_classes; #ifndef CONFIG_DEBUG_LOCKDEP static #endif -struct lock_class lock_classes[MAX_LOCKDEP_KEYS]; -static DECLARE_BITMAP(lock_classes_in_use, MAX_LOCKDEP_KEYS); +struct lock_class lock_classes[MAX_LOCKDEP_KEYS] __asi_not_sensitive; +static DECLARE_BITMAP(lock_classes_in_use, MAX_LOCKDEP_KEYS) __asi_not_sensitive; static inline struct lock_class *hlock_class(struct held_lock *hlock) { @@ -389,7 +389,7 @@ static struct hlist_head classhash_table[CLASSHASH_SIZE]; #define __chainhashfn(chain) hash_long(chain, CHAINHASH_BITS) #define chainhashentry(chain) (chainhash_table + __chainhashfn((chain))) -static struct hlist_head chainhash_table[CHAINHASH_SIZE]; +static struct hlist_head chainhash_table[CHAINHASH_SIZE] __asi_not_sensitive; /* * the id of held_lock @@ -599,7 +599,7 @@ u64 lockdep_stack_hash_count(void) unsigned int nr_hardirq_chains; unsigned int nr_softirq_chains; unsigned int nr_process_chains; -unsigned int max_lockdep_depth; +unsigned int max_lockdep_depth __asi_not_sensitive; #ifdef CONFIG_DEBUG_LOCKDEP /* @@ -3225,8 +3225,8 @@ check_prevs_add(struct task_struct *curr, struct held_lock *next) return 0; } -struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS]; -static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS); +struct lock_chain lock_chains[MAX_LOCKDEP_CHAINS] __asi_not_sensitive; +static DECLARE_BITMAP(lock_chains_in_use, MAX_LOCKDEP_CHAINS) __asi_not_sensitive; static u16 chain_hlocks[MAX_LOCKDEP_CHAIN_HLOCKS]; unsigned long nr_zapped_lock_chains; unsigned int nr_free_chain_hlocks; /* Free chain_hlocks in buckets */ diff --git a/kernel/panic.c b/kernel/panic.c index cefd7d82366f..6d0ee3ddd58b 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -56,7 +56,7 @@ int panic_on_warn __read_mostly; unsigned long panic_on_taint; bool panic_on_taint_nousertaint = false; -int panic_timeout = CONFIG_PANIC_TIMEOUT; +int panic_timeout __asi_not_sensitive = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout); #define PANIC_PRINT_TASK_INFO 0x00000001 diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index 57b132b658e1..3425fb1554d3 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -75,7 +75,7 @@ EXPORT_SYMBOL(ignore_console_lock_warning); * Low level drivers may need that to know if they can schedule in * their unblank() callback or not. So let's export it. */ -int oops_in_progress; +int oops_in_progress __asi_not_sensitive; EXPORT_SYMBOL(oops_in_progress); /* @@ -2001,7 +2001,7 @@ static u8 *__printk_recursion_counter(void) local_irq_restore(flags); \ } while (0) -int printk_delay_msec __read_mostly; +int printk_delay_msec __asi_not_sensitive_readmostly; static inline void printk_delay(void) { diff --git a/kernel/profile.c b/kernel/profile.c index eb9c7f0f5ac5..c5beb9b0b0a8 100644 --- a/kernel/profile.c +++ b/kernel/profile.c @@ -44,10 +44,10 @@ static atomic_t *prof_buffer; static unsigned long prof_len; static unsigned short int prof_shift; -int prof_on __read_mostly; +int prof_on __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(prof_on); -static cpumask_var_t prof_cpu_mask; +static cpumask_var_t prof_cpu_mask __asi_not_sensitive; #if defined(CONFIG_SMP) && defined(CONFIG_PROC_FS) static DEFINE_PER_CPU(struct profile_hit *[2], cpu_profile_hits); static DEFINE_PER_CPU(int, cpu_profile_flip); diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index ef8d36f580fc..284d2722cf0c 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -82,7 +82,7 @@ static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { .cblist.flags = SEGCBLIST_SOFTIRQ_ONLY, #endif }; -static struct rcu_state rcu_state = { +static struct rcu_state rcu_state __asi_not_sensitive = { .level = { &rcu_state.node[0] }, .gp_state = RCU_GP_IDLE, .gp_seq = (0UL - 300UL) << RCU_SEQ_CTR_SHIFT, @@ -98,7 +98,7 @@ static struct rcu_state rcu_state = { static bool dump_tree; module_param(dump_tree, bool, 0444); /* By default, use RCU_SOFTIRQ instead of rcuc kthreads. */ -static bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT); +static __asi_not_sensitive bool use_softirq = !IS_ENABLED(CONFIG_PREEMPT_RT); #ifndef CONFIG_PREEMPT_RT module_param(use_softirq, bool, 0444); #endif @@ -125,7 +125,7 @@ int rcu_num_nodes __read_mostly = NUM_RCU_NODES; /* Total # rcu_nodes in use. */ * transitions from RCU_SCHEDULER_INIT to RCU_SCHEDULER_RUNNING after RCU * is fully initialized, including all of its kthreads having been spawned. */ -int rcu_scheduler_active __read_mostly; +int rcu_scheduler_active __asi_not_sensitive; EXPORT_SYMBOL_GPL(rcu_scheduler_active); /* @@ -140,7 +140,7 @@ EXPORT_SYMBOL_GPL(rcu_scheduler_active); * early boot to take responsibility for these callbacks, but one step at * a time. */ -static int rcu_scheduler_fully_active __read_mostly; +static int rcu_scheduler_fully_active __asi_not_sensitive; static void rcu_report_qs_rnp(unsigned long mask, struct rcu_node *rnp, unsigned long gps, unsigned long flags); @@ -470,7 +470,7 @@ module_param(qovld, long, 0444); static ulong jiffies_till_first_fqs = IS_ENABLED(CONFIG_RCU_STRICT_GRACE_PERIOD) ? 0 : ULONG_MAX; static ulong jiffies_till_next_fqs = ULONG_MAX; -static bool rcu_kick_kthreads; +static bool rcu_kick_kthreads __asi_not_sensitive; static int rcu_divisor = 7; module_param(rcu_divisor, int, 0644); diff --git a/kernel/rcu/update.c b/kernel/rcu/update.c index 156892c22bb5..b61a3854e62d 100644 --- a/kernel/rcu/update.c +++ b/kernel/rcu/update.c @@ -243,7 +243,7 @@ core_initcall(rcu_set_runtime_mode); #ifdef CONFIG_DEBUG_LOCK_ALLOC static struct lock_class_key rcu_lock_key; -struct lockdep_map rcu_lock_map = { +struct lockdep_map rcu_lock_map __asi_not_sensitive = { .name = "rcu_read_lock", .key = &rcu_lock_key, .wait_type_outer = LD_WAIT_FREE, @@ -494,7 +494,7 @@ EXPORT_SYMBOL_GPL(rcutorture_sched_setaffinity); #ifdef CONFIG_RCU_STALL_COMMON int rcu_cpu_stall_ftrace_dump __read_mostly; module_param(rcu_cpu_stall_ftrace_dump, int, 0644); -int rcu_cpu_stall_suppress __read_mostly; // !0 = suppress stall warnings. +int rcu_cpu_stall_suppress __asi_not_sensitive_readmostly; // !0 = suppress stall warnings. EXPORT_SYMBOL_GPL(rcu_cpu_stall_suppress); module_param(rcu_cpu_stall_suppress, int, 0644); int rcu_cpu_stall_timeout __read_mostly = CONFIG_RCU_CPU_STALL_TIMEOUT; diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c index c2b2859ddd82..6c3585053f05 100644 --- a/kernel/sched/clock.c +++ b/kernel/sched/clock.c @@ -84,7 +84,7 @@ static int __sched_clock_stable_early = 1; /* * We want: ktime_get_ns() + __gtod_offset == sched_clock() + __sched_clock_offset */ -__read_mostly u64 __sched_clock_offset; +__asi_not_sensitive u64 __sched_clock_offset; static __read_mostly u64 __gtod_offset; struct sched_clock_data { diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 44ea197c16ea..e1c08ff4130e 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -76,9 +76,9 @@ __read_mostly int sysctl_resched_latency_warn_once = 1; * Limited because this is done with IRQs disabled. */ #ifdef CONFIG_PREEMPT_RT -const_debug unsigned int sysctl_sched_nr_migrate = 8; +unsigned int sysctl_sched_nr_migrate __asi_not_sensitive_readmostly = 8; #else -const_debug unsigned int sysctl_sched_nr_migrate = 32; +unsigned int sysctl_sched_nr_migrate __asi_not_sensitive_readmostly = 32; #endif /* @@ -9254,7 +9254,7 @@ int in_sched_functions(unsigned long addr) * Default task group. * Every task in system belongs to this group at bootup. */ -struct task_group root_task_group; +struct task_group root_task_group __asi_not_sensitive; LIST_HEAD(task_groups); /* Cacheline aligned slab cache for task_group */ diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 893eece65bfd..6e3da149125c 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -50,7 +50,7 @@ static inline struct cpuacct *parent_ca(struct cpuacct *ca) } static DEFINE_PER_CPU(struct cpuacct_usage, root_cpuacct_cpuusage); -static struct cpuacct root_cpuacct = { +static struct cpuacct root_cpuacct __asi_not_sensitive = { .cpustat = &kernel_cpustat, .cpuusage = &root_cpuacct_cpuusage, }; diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 9392aea1804e..623b5feb142a 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -19,7 +19,7 @@ */ DEFINE_PER_CPU(struct irqtime, cpu_irqtime); -static int sched_clock_irqtime; +static int __asi_not_sensitive sched_clock_irqtime; void enable_sched_clock_irqtime(void) { diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 6e476f6d9435..dc9b6133b059 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -35,7 +35,7 @@ * * (default: 6ms * (1 + ilog(ncpus)), units: nanoseconds) */ -unsigned int sysctl_sched_latency = 6000000ULL; +__asi_not_sensitive unsigned int sysctl_sched_latency = 6000000ULL; static unsigned int normalized_sysctl_sched_latency = 6000000ULL; /* @@ -90,7 +90,7 @@ unsigned int sysctl_sched_child_runs_first __read_mostly; unsigned int sysctl_sched_wakeup_granularity = 1000000UL; static unsigned int normalized_sysctl_sched_wakeup_granularity = 1000000UL; -const_debug unsigned int sysctl_sched_migration_cost = 500000UL; +unsigned int sysctl_sched_migration_cost __asi_not_sensitive_readmostly = 500000UL; int sched_thermal_decay_shift; static int __init setup_sched_thermal_decay_shift(char *str) diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c index 954b229868d9..af71cde93e98 100644 --- a/kernel/sched/loadavg.c +++ b/kernel/sched/loadavg.c @@ -57,7 +57,7 @@ /* Variables and functions for calc_load */ atomic_long_t calc_load_tasks; -unsigned long calc_load_update; +unsigned long calc_load_update __asi_not_sensitive; unsigned long avenrun[3]; EXPORT_SYMBOL(avenrun); /* should be removed */ diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index b48baaba2fc2..9d5fbe66d355 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -14,7 +14,7 @@ static const u64 max_rt_runtime = MAX_BW; static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun); -struct rt_bandwidth def_rt_bandwidth; +struct rt_bandwidth def_rt_bandwidth __asi_not_sensitive; static enum hrtimer_restart sched_rt_period_timer(struct hrtimer *timer) { diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0e66749486e7..517c70a29a57 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2379,8 +2379,8 @@ extern void deactivate_task(struct rq *rq, struct task_struct *p, int flags); extern void check_preempt_curr(struct rq *rq, struct task_struct *p, int flags); -extern const_debug unsigned int sysctl_sched_nr_migrate; -extern const_debug unsigned int sysctl_sched_migration_cost; +extern unsigned int sysctl_sched_nr_migrate; +extern unsigned int sysctl_sched_migration_cost; #ifdef CONFIG_SCHED_DEBUG extern unsigned int sysctl_sched_latency; diff --git a/kernel/smp.c b/kernel/smp.c index 01a7c1706a58..c51fd981a4a9 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -1070,7 +1070,7 @@ static int __init maxcpus(char *str) early_param("maxcpus", maxcpus); /* Setup number of possible processor ids */ -unsigned int nr_cpu_ids __read_mostly = NR_CPUS; +unsigned int nr_cpu_ids __asi_not_sensitive = NR_CPUS; EXPORT_SYMBOL(nr_cpu_ids); /* An arch may set nr_cpu_ids earlier if needed, so this would be redundant */ diff --git a/kernel/softirq.c b/kernel/softirq.c index 41f470929e99..c462b7fab4d3 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -56,7 +56,8 @@ DEFINE_PER_CPU_ALIGNED(irq_cpustat_t, irq_stat); EXPORT_PER_CPU_SYMBOL(irq_stat); #endif -static struct softirq_action softirq_vec[NR_SOFTIRQS] __cacheline_aligned_in_smp; +static struct softirq_action softirq_vec[NR_SOFTIRQS] +__asi_not_sensitive ____cacheline_aligned; DEFINE_PER_CPU(struct task_struct *, ksoftirqd); diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 0ea8702eb516..8b176f5c01f2 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -706,7 +706,7 @@ hrtimer_force_reprogram(struct hrtimer_cpu_base *cpu_base, int skip_equal) * High resolution timer enabled ? */ static bool hrtimer_hres_enabled __read_mostly = true; -unsigned int hrtimer_resolution __read_mostly = LOW_RES_NSEC; +unsigned int hrtimer_resolution __asi_not_sensitive = LOW_RES_NSEC; EXPORT_SYMBOL_GPL(hrtimer_resolution); /* diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c index bc4db9e5ab70..c60f8da1cfb5 100644 --- a/kernel/time/jiffies.c +++ b/kernel/time/jiffies.c @@ -40,7 +40,13 @@ static struct clocksource clocksource_jiffies = { .max_cycles = 10, }; -__cacheline_aligned_in_smp DEFINE_RAW_SPINLOCK(jiffies_lock); +/* TODO(oweisse): __cacheline_aligned_in_smp is expanded to + __section__(".data..cacheline_aligned"))) which is at odds with + __asi_not_sensitive. We should consider instead using + __attribute__ ((__aligned__(XXX))) where XXX is a def for cacheline or + something*/ +/* __cacheline_aligned_in_smp */ +__asi_not_sensitive DEFINE_RAW_SPINLOCK(jiffies_lock); __cacheline_aligned_in_smp seqcount_raw_spinlock_t jiffies_seq = SEQCNT_RAW_SPINLOCK_ZERO(jiffies_seq, &jiffies_lock); diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index 406dccb79c2b..23711fb94323 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -31,13 +31,13 @@ /* USER_HZ period (usecs): */ -unsigned long tick_usec = USER_TICK_USEC; +unsigned long tick_usec __asi_not_sensitive = USER_TICK_USEC; /* SHIFTED_HZ period (nsecs): */ -unsigned long tick_nsec; +unsigned long tick_nsec __asi_not_sensitive; -static u64 tick_length; -static u64 tick_length_base; +static u64 tick_length __asi_not_sensitive; +static u64 tick_length_base __asi_not_sensitive; #define SECS_PER_DAY 86400 #define MAX_TICKADJ 500LL /* usecs */ @@ -54,36 +54,36 @@ static u64 tick_length_base; * * (TIME_ERROR prevents overwriting the CMOS clock) */ -static int time_state = TIME_OK; +static int time_state __asi_not_sensitive = TIME_OK; /* clock status bits: */ -static int time_status = STA_UNSYNC; +static int time_status __asi_not_sensitive = STA_UNSYNC; /* time adjustment (nsecs): */ -static s64 time_offset; +static s64 time_offset __asi_not_sensitive; /* pll time constant: */ -static long time_constant = 2; +static long time_constant __asi_not_sensitive = 2; /* maximum error (usecs): */ -static long time_maxerror = NTP_PHASE_LIMIT; +static long time_maxerror __asi_not_sensitive = NTP_PHASE_LIMIT; /* estimated error (usecs): */ -static long time_esterror = NTP_PHASE_LIMIT; +static long time_esterror __asi_not_sensitive = NTP_PHASE_LIMIT; /* frequency offset (scaled nsecs/secs): */ -static s64 time_freq; +static s64 time_freq __asi_not_sensitive; /* time at last adjustment (secs): */ -static time64_t time_reftime; +static time64_t time_reftime __asi_not_sensitive; -static long time_adjust; +static long time_adjust __asi_not_sensitive; /* constant (boot-param configurable) NTP tick adjustment (upscaled) */ -static s64 ntp_tick_adj; +static s64 ntp_tick_adj __asi_not_sensitive; /* second value of the next pending leapsecond, or TIME64_MAX if no leap */ -static time64_t ntp_next_leap_sec = TIME64_MAX; +static time64_t ntp_next_leap_sec __asi_not_sensitive = TIME64_MAX; #ifdef CONFIG_NTP_PPS diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index 46789356f856..cbe75661ca74 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -31,7 +31,7 @@ DEFINE_PER_CPU(struct tick_device, tick_cpu_device); * CPU which handles the tick and protected by jiffies_lock. There is * no requirement to write hold the jiffies seqcount for it. */ -ktime_t tick_next_period; +ktime_t tick_next_period __asi_not_sensitive; /* * tick_do_timer_cpu is a timer core internal variable which holds the CPU NR @@ -47,7 +47,7 @@ ktime_t tick_next_period; * at it will take over and keep the time keeping alive. The handover * procedure also covers cpu hotplug. */ -int tick_do_timer_cpu __read_mostly = TICK_DO_TIMER_BOOT; +int tick_do_timer_cpu __asi_not_sensitive_readmostly = TICK_DO_TIMER_BOOT; #ifdef CONFIG_NO_HZ_FULL /* * tick_do_timer_boot_cpu indicates the boot CPU temporarily owns diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index 649f2b48e8f0..ed7e2a18060a 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -15,7 +15,7 @@ DECLARE_PER_CPU(struct tick_device, tick_cpu_device); extern ktime_t tick_next_period; -extern int tick_do_timer_cpu __read_mostly; +extern int tick_do_timer_cpu; extern void tick_setup_periodic(struct clock_event_device *dev, int broadcast); extern void tick_handle_periodic(struct clock_event_device *dev); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index 17a283ce2b20..c23fecbb68c2 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -49,7 +49,7 @@ struct tick_sched *tick_get_tick_sched(int cpu) * jiffies_lock and jiffies_seq. tick_nohz_next_event() needs to get a * consistent view of jiffies and last_jiffies_update. */ -static ktime_t last_jiffies_update; +static ktime_t last_jiffies_update __asi_not_sensitive; /* * Must be called with interrupts disabled ! diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index dcdcb85121e4..120395965e45 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -39,7 +39,7 @@ enum timekeeping_adv_mode { TK_ADV_FREQ }; -DEFINE_RAW_SPINLOCK(timekeeper_lock); +__asi_not_sensitive DEFINE_RAW_SPINLOCK(timekeeper_lock); /* * The most important data for readout fits into a single 64 byte @@ -48,14 +48,14 @@ DEFINE_RAW_SPINLOCK(timekeeper_lock); static struct { seqcount_raw_spinlock_t seq; struct timekeeper timekeeper; -} tk_core ____cacheline_aligned = { +} tk_core ____cacheline_aligned __asi_not_sensitive = { .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, &timekeeper_lock), }; -static struct timekeeper shadow_timekeeper; +static struct timekeeper shadow_timekeeper __asi_not_sensitive; /* flag for if timekeeping is suspended */ -int __read_mostly timekeeping_suspended; +int __asi_not_sensitive_readmostly timekeeping_suspended; /** * struct tk_fast - NMI safe timekeeper @@ -72,7 +72,7 @@ struct tk_fast { }; /* Suspend-time cycles value for halted fast timekeeper. */ -static u64 cycles_at_suspend; +static u64 cycles_at_suspend __asi_not_sensitive; static u64 dummy_clock_read(struct clocksource *cs) { diff --git a/kernel/time/timekeeping.h b/kernel/time/timekeeping.h index 543beba096c7..b32ee75808fe 100644 --- a/kernel/time/timekeeping.h +++ b/kernel/time/timekeeping.h @@ -26,7 +26,7 @@ extern void update_process_times(int user); extern void do_timer(unsigned long ticks); extern void update_wall_time(void); -extern raw_spinlock_t jiffies_lock; +extern __asi_not_sensitive raw_spinlock_t jiffies_lock; extern seqcount_raw_spinlock_t jiffies_seq; #define CS_NAME_LEN 32 diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 85f1021ad459..0b09c99b568c 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -56,7 +56,7 @@ #define CREATE_TRACE_POINTS #include -__visible u64 jiffies_64 __cacheline_aligned_in_smp = INITIAL_JIFFIES; +u64 jiffies_64 __asi_not_sensitive ____cacheline_aligned = INITIAL_JIFFIES; EXPORT_SYMBOL(jiffies_64); diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 78ea542ce3bc..eaec3814c5a4 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -432,7 +432,7 @@ EXPORT_SYMBOL_GPL(unregister_ftrace_export); * The global_trace is the descriptor that holds the top-level tracing * buffers for the live tracing. */ -static struct trace_array global_trace = { +static struct trace_array global_trace __asi_not_sensitive = { .trace_flags = TRACE_DEFAULT_FLAGS, }; diff --git a/kernel/trace/trace_sched_switch.c b/kernel/trace/trace_sched_switch.c index e304196d7c28..d49db8e2430a 100644 --- a/kernel/trace/trace_sched_switch.c +++ b/kernel/trace/trace_sched_switch.c @@ -16,8 +16,8 @@ #define RECORD_CMDLINE 1 #define RECORD_TGID 2 -static int sched_cmdline_ref; -static int sched_tgid_ref; +static int sched_cmdline_ref __asi_not_sensitive; +static int sched_tgid_ref __asi_not_sensitive; static DEFINE_MUTEX(sched_register_mutex); static void diff --git a/lib/debug_locks.c b/lib/debug_locks.c index a75ee30b77cb..f2d217859be6 100644 --- a/lib/debug_locks.c +++ b/lib/debug_locks.c @@ -14,6 +14,7 @@ #include #include #include +#include /* * We want to turn all lock-debugging facilities on/off at once, @@ -22,7 +23,7 @@ * that would just muddy the log. So we report the first one and * shut up after that. */ -int debug_locks __read_mostly = 1; +int debug_locks __asi_not_sensitive_readmostly = 1; EXPORT_SYMBOL_GPL(debug_locks); /* @@ -30,7 +31,7 @@ EXPORT_SYMBOL_GPL(debug_locks); * 'silent failure': nothing is printed to the console when * a locking bug is detected. */ -int debug_locks_silent __read_mostly; +int debug_locks_silent __asi_not_sensitive_readmostly; EXPORT_SYMBOL_GPL(debug_locks_silent); /* diff --git a/mm/memory.c b/mm/memory.c index 667ece86e051..5aa39d0aba2b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -152,7 +152,7 @@ static int __init disable_randmaps(char *s) } __setup("norandmaps", disable_randmaps); -unsigned long zero_pfn __read_mostly; +unsigned long zero_pfn __asi_not_sensitive; EXPORT_SYMBOL(zero_pfn); unsigned long highest_memmap_pfn __read_mostly; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 998ff6a56732..9c850b8bd1fc 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -183,7 +183,7 @@ unsigned long totalreserve_pages __read_mostly; unsigned long totalcma_pages __read_mostly; int percpu_pagelist_high_fraction; -gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK; +gfp_t gfp_allowed_mask __asi_not_sensitive_readmostly = GFP_BOOT_MASK; DEFINE_STATIC_KEY_MAYBE(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, init_on_alloc); EXPORT_SYMBOL(init_on_alloc); diff --git a/mm/sparse.c b/mm/sparse.c index e5c84b0cf0c9..64dcf7fceaed 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -24,10 +24,10 @@ * 1) mem_section - memory sections, mem_map's for valid memory */ #ifdef CONFIG_SPARSEMEM_EXTREME -struct mem_section **mem_section; +struct mem_section **mem_section __asi_not_sensitive; #else struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT] - ____cacheline_internodealigned_in_smp; + ____cacheline_internodealigned_in_smp __asi_not_sensitive; #endif EXPORT_SYMBOL(mem_section); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index e8e9c8588908..0af973b950c2 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3497,7 +3497,7 @@ static int kvm_vcpu_release(struct inode *inode, struct file *filp) return 0; } -static struct file_operations kvm_vcpu_fops = { +static struct file_operations kvm_vcpu_fops __asi_not_sensitive = { .release = kvm_vcpu_release, .unlocked_ioctl = kvm_vcpu_ioctl, .mmap = kvm_vcpu_mmap, From patchwork Wed Feb 23 05:22:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9A5D5C4332F for ; Wed, 23 Feb 2022 05:25:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0808D8D0014; Wed, 23 Feb 2022 00:25:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 02DF18D0001; Wed, 23 Feb 2022 00:25:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC4838D0014; Wed, 23 Feb 2022 00:25:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0039.hostedemail.com [216.40.44.39]) by kanga.kvack.org (Postfix) with ESMTP id C745B8D0001 for ; Wed, 23 Feb 2022 00:25:20 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 833FC9F5DD for ; Wed, 23 Feb 2022 05:25:20 +0000 (UTC) X-FDA: 79172906400.21.B6F64A2 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf26.hostedemail.com (Postfix) with ESMTP id 0B8AF140002 for ; Wed, 23 Feb 2022 05:25:19 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d07ae11460so162196797b3.7 for ; Tue, 22 Feb 2022 21:25:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=OyDHTZ7Ac6r1LQol+lgAiZ5efN30RaXmeX7z0hPyuqA=; b=DdkDTnIiPn6TLdbXCmFvIo19iJFubFIhdETlSTTLRRjgZ4Sbw2XRK1yw+KEVLoro4T wP84t15CnH7WoNP9nOE1xCfFjeXiFihRdrvEeEaqbhIrww6MkpxMl0Beadw+zBAHg4Wm Y49K0SbIs0XAjHP26SPAuOOPd/FRd4AzW+548DKIXzxHOTL9Gnv2I7DesA74Y2aSN1w7 bukmgHnjTzQ3TsZeAD6EjoZj4bz08/67lWCDyTLOclgldihhQFgdX/f7KijewY11x+pe ztyxmT3V40T+w3mwFuSkO23/qUqxo0xoLiD6fHheWqBlYJ4PhhilLPTmelIkwEUqdEVv YRGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=OyDHTZ7Ac6r1LQol+lgAiZ5efN30RaXmeX7z0hPyuqA=; b=G1PWImuKXxd41oinX+VBsxkuHdR/RdnbuSAUi8rVQLR8JF94Jbp/3BxyHXsTr/kF5z KYFn8g/AXyZjXCYbG+0WEvnEylAavfOJOduj7nEDILnyi3hhqOQ1yIJkbaMVSdnysacq 8CC6wk7Ke2AV/iRm15Tgdc4EP+obPY+BhQlhqB5RHQ0UbAlT8hqgDcU+7/yOstOYxIkw bm8PfuUDEpPjUHVLppfxaVoQuih+p8ZDbDRXtM1wKoGdMIwHdHHgYfm4j9yLA6B64kY4 y8eSq5vbBFMUfH+8sKEdwmlsotk3xP0EefP+pRoD2Sn87yNkaQuhmCfecRFvdTLVbqBU oMnA== X-Gm-Message-State: AOAM533feaSw4xYoH3PSROPteLb38DMG6kPO6lgk5trgjV+8wvWbhZDm 9gL8jh9Rsmka7Y9bxDsKto7l8rCz5VQu X-Google-Smtp-Source: ABdhPJzJ67z80Uc/VNlFr/C8BlwDxiZI69VgqPgglpNrTFnLyw6IzyUmF2cWMgeuQwRsundlA5eSPH6P2ztJ X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:e45:0:b0:2d6:bc2e:3f66 with SMTP id 66-20020a810e45000000b002d6bc2e3f66mr22941292ywo.54.1645593919291; Tue, 22 Feb 2022 21:25:19 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:18 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-43-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 42/47] mm: asi: Annotation of PERCPU variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0B8AF140002 X-Stat-Signature: fbggft9ueshni3dd6ayajjus879wyh95 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=DdkDTnIi; spf=pass (imf26.hostedemail.com: domain of 3P8UVYgcKCDgdohUcXmaiiafY.Wigfchor-ggepUWe.ila@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3P8UVYgcKCDgdohUcXmaiiafY.Wigfchor-ggepUWe.ila@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1645593919-580304 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain static PERCPU variables as not sensitive. Some static variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit modified DEFINE_PER_CPU --> DEFINE_PER_CPU_ASI_NOT_SENSITIVE to variables which are frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/events/core.c | 2 +- arch/x86/events/intel/bts.c | 2 +- arch/x86/events/perf_event.h | 2 +- arch/x86/include/asm/asi.h | 2 +- arch/x86/include/asm/current.h | 2 +- arch/x86/include/asm/debugreg.h | 2 +- arch/x86/include/asm/desc.h | 2 +- arch/x86/include/asm/fpu/api.h | 2 +- arch/x86/include/asm/hardirq.h | 2 +- arch/x86/include/asm/hw_irq.h | 2 +- arch/x86/include/asm/percpu.h | 2 +- arch/x86/include/asm/preempt.h | 2 +- arch/x86/include/asm/processor.h | 12 ++++++------ arch/x86/include/asm/smp.h | 2 +- arch/x86/include/asm/tlbflush.h | 4 ++-- arch/x86/include/asm/topology.h | 2 +- arch/x86/kernel/apic/apic.c | 2 +- arch/x86/kernel/apic/x2apic_cluster.c | 6 +++--- arch/x86/kernel/cpu/common.c | 12 ++++++------ arch/x86/kernel/fpu/core.c | 2 +- arch/x86/kernel/hw_breakpoint.c | 2 +- arch/x86/kernel/irq.c | 2 +- arch/x86/kernel/irqinit.c | 2 +- arch/x86/kernel/nmi.c | 6 +++--- arch/x86/kernel/process.c | 4 ++-- arch/x86/kernel/setup_percpu.c | 4 ++-- arch/x86/kernel/smpboot.c | 3 ++- arch/x86/kernel/tsc.c | 2 +- arch/x86/kvm/x86.c | 2 +- arch/x86/kvm/x86.h | 2 +- arch/x86/mm/asi.c | 2 +- arch/x86/mm/init.c | 2 +- arch/x86/mm/tlb.c | 2 +- include/asm-generic/irq_regs.h | 2 +- include/linux/arch_topology.h | 2 +- include/linux/hrtimer.h | 2 +- include/linux/interrupt.h | 2 +- include/linux/kernel_stat.h | 4 ++-- include/linux/prandom.h | 2 +- kernel/events/core.c | 6 +++--- kernel/irq_work.c | 6 +++--- kernel/rcu/tree.c | 2 +- kernel/sched/core.c | 6 +++--- kernel/sched/cpufreq.c | 3 ++- kernel/sched/cputime.c | 2 +- kernel/sched/sched.h | 21 +++++++++++---------- kernel/sched/topology.c | 14 +++++++------- kernel/smp.c | 7 ++++--- kernel/softirq.c | 2 +- kernel/time/hrtimer.c | 2 +- kernel/time/tick-common.c | 2 +- kernel/time/tick-internal.h | 4 ++-- kernel/time/tick-sched.c | 2 +- kernel/time/timer.c | 2 +- kernel/trace/trace.c | 2 +- kernel/trace/trace_preemptirq.c | 2 +- kernel/watchdog.c | 12 ++++++------ lib/irq_regs.c | 2 +- lib/random32.c | 3 ++- virt/kvm/kvm_main.c | 2 +- 60 files changed, 112 insertions(+), 107 deletions(-) diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index db825bf053fd..2d9829d774d7 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -47,7 +47,7 @@ struct x86_pmu x86_pmu __asi_not_sensitive_readmostly; static struct pmu pmu; -DEFINE_PER_CPU(struct cpu_hw_events, cpu_hw_events) = { +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cpu_hw_events, cpu_hw_events) = { .enabled = 1, .pmu = &pmu, }; diff --git a/arch/x86/events/intel/bts.c b/arch/x86/events/intel/bts.c index 974e917e65b2..06d9de514b0d 100644 --- a/arch/x86/events/intel/bts.c +++ b/arch/x86/events/intel/bts.c @@ -36,7 +36,7 @@ enum { BTS_STATE_ACTIVE, }; -static DEFINE_PER_CPU(struct bts_ctx, bts_ctx); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct bts_ctx, bts_ctx); #define BTS_RECORD_SIZE 24 #define BTS_SAFETY_MARGIN 4080 diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h index 27cca7fd6f17..9a4855e6ffa6 100644 --- a/arch/x86/events/perf_event.h +++ b/arch/x86/events/perf_event.h @@ -1036,7 +1036,7 @@ static inline bool x86_pmu_has_lbr_callstack(void) x86_pmu.lbr_sel_map[PERF_SAMPLE_BRANCH_CALL_STACK_SHIFT] > 0; } -DECLARE_PER_CPU(struct cpu_hw_events, cpu_hw_events); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct cpu_hw_events, cpu_hw_events); int x86_perf_event_set_period(struct perf_event *event); diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index d43f6aadffee..6148e65fb0c2 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -52,7 +52,7 @@ struct asi_pgtbl_pool { uint count; }; -DECLARE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +DECLARE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct asi_state, asi_cpu_state); extern pgd_t asi_global_nonsensitive_pgd[]; diff --git a/arch/x86/include/asm/current.h b/arch/x86/include/asm/current.h index 3e204e6140b5..a4bcf1f305bf 100644 --- a/arch/x86/include/asm/current.h +++ b/arch/x86/include/asm/current.h @@ -8,7 +8,7 @@ #ifndef __ASSEMBLY__ struct task_struct; -DECLARE_PER_CPU(struct task_struct *, current_task); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task); static __always_inline struct task_struct *get_current(void) { diff --git a/arch/x86/include/asm/debugreg.h b/arch/x86/include/asm/debugreg.h index cfdf307ddc01..fa67db27b098 100644 --- a/arch/x86/include/asm/debugreg.h +++ b/arch/x86/include/asm/debugreg.h @@ -6,7 +6,7 @@ #include #include -DECLARE_PER_CPU(unsigned long, cpu_dr7); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_dr7); #ifndef CONFIG_PARAVIRT_XXL /* diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h index ab97b22ac04a..7d9fff8c9543 100644 --- a/arch/x86/include/asm/desc.h +++ b/arch/x86/include/asm/desc.h @@ -298,7 +298,7 @@ static inline void native_load_tls(struct thread_struct *t, unsigned int cpu) gdt[GDT_ENTRY_TLS_MIN + i] = t->tls_array[i]; } -DECLARE_PER_CPU(bool, __tss_limit_invalid); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(bool, __tss_limit_invalid); static inline void force_reload_TR(void) { diff --git a/arch/x86/include/asm/fpu/api.h b/arch/x86/include/asm/fpu/api.h index 6f5ca3c2ef4a..15abb1b05fbc 100644 --- a/arch/x86/include/asm/fpu/api.h +++ b/arch/x86/include/asm/fpu/api.h @@ -121,7 +121,7 @@ static inline void fpstate_init_soft(struct swregs_state *soft) {} #endif /* State tracking */ -DECLARE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct fpu *, fpu_fpregs_owner_ctx); /* Process cleanup */ #ifdef CONFIG_X86_64 diff --git a/arch/x86/include/asm/hardirq.h b/arch/x86/include/asm/hardirq.h index 275e7fd20310..2f70deca4a20 100644 --- a/arch/x86/include/asm/hardirq.h +++ b/arch/x86/include/asm/hardirq.h @@ -46,7 +46,7 @@ typedef struct { #endif } ____cacheline_aligned irq_cpustat_t; -DECLARE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(irq_cpustat_t, irq_stat); #define __ARCH_IRQ_STAT diff --git a/arch/x86/include/asm/hw_irq.h b/arch/x86/include/asm/hw_irq.h index d465ece58151..e561abfce735 100644 --- a/arch/x86/include/asm/hw_irq.h +++ b/arch/x86/include/asm/hw_irq.h @@ -128,7 +128,7 @@ extern char spurious_entries_start[]; #define VECTOR_RETRIGGERED ((void *)-2L) typedef struct irq_desc* vector_irq_t[NR_VECTORS]; -DECLARE_PER_CPU(vector_irq_t, vector_irq); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(vector_irq_t, vector_irq); #endif /* !ASSEMBLY_ */ diff --git a/arch/x86/include/asm/percpu.h b/arch/x86/include/asm/percpu.h index a3c33b79fb86..f9486bbe8a76 100644 --- a/arch/x86/include/asm/percpu.h +++ b/arch/x86/include/asm/percpu.h @@ -390,7 +390,7 @@ static inline bool x86_this_cpu_variable_test_bit(int nr, #include /* We can use this directly for local CPU (faster). */ -DECLARE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, this_cpu_off); #endif /* !__ASSEMBLY__ */ diff --git a/arch/x86/include/asm/preempt.h b/arch/x86/include/asm/preempt.h index fe5efbcba824..204a8532b870 100644 --- a/arch/x86/include/asm/preempt.h +++ b/arch/x86/include/asm/preempt.h @@ -7,7 +7,7 @@ #include #include -DECLARE_PER_CPU(int, __preempt_count); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, __preempt_count); /* We use the MSB mostly because its available */ #define PREEMPT_NEED_RESCHED 0x80000000 diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 20116efd2756..63831f9a503b 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -417,14 +417,14 @@ struct tss_struct { struct x86_io_bitmap io_bitmap; } __aligned(PAGE_SIZE); -DECLARE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw); +DECLARE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(struct tss_struct, cpu_tss_rw); /* Per CPU interrupt stacks */ struct irq_stack { char stack[IRQ_STACK_SIZE]; } __aligned(IRQ_STACK_SIZE); -DECLARE_PER_CPU(unsigned long, cpu_current_top_of_stack); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_current_top_of_stack); #ifdef CONFIG_X86_64 struct fixed_percpu_data { @@ -448,8 +448,8 @@ static inline unsigned long cpu_kernelmode_gs_base(int cpu) return (unsigned long)per_cpu(fixed_percpu_data.gs_base, cpu); } -DECLARE_PER_CPU(void *, hardirq_stack_ptr); -DECLARE_PER_CPU(bool, hardirq_stack_inuse); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(void *, hardirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(bool, hardirq_stack_inuse); extern asmlinkage void ignore_sysret(void); /* Save actual FS/GS selectors and bases to current->thread */ @@ -458,8 +458,8 @@ void current_save_fsgs(void); #ifdef CONFIG_STACKPROTECTOR DECLARE_PER_CPU(unsigned long, __stack_chk_guard); #endif -DECLARE_PER_CPU(struct irq_stack *, hardirq_stack_ptr); -DECLARE_PER_CPU(struct irq_stack *, softirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irq_stack *, hardirq_stack_ptr); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irq_stack *, softirq_stack_ptr); #endif /* !X86_64 */ struct perf_event; diff --git a/arch/x86/include/asm/smp.h b/arch/x86/include/asm/smp.h index 81a0211a372d..8d85a918532e 100644 --- a/arch/x86/include/asm/smp.h +++ b/arch/x86/include/asm/smp.h @@ -19,7 +19,7 @@ DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_llc_shared_map); DECLARE_PER_CPU_READ_MOSTLY(cpumask_var_t, cpu_l2c_shared_map); DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_llc_id); DECLARE_PER_CPU_READ_MOSTLY(u16, cpu_l2c_id); -DECLARE_PER_CPU_READ_MOSTLY(int, cpu_number); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, cpu_number); static inline struct cpumask *cpu_llc_shared_mask(int cpu) { diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 7d04aa2a5f86..adcdeb58d817 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -151,7 +151,7 @@ struct tlb_state { */ struct tlb_context ctxs[TLB_NR_DYN_ASIDS]; }; -DECLARE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state, cpu_tlbstate); struct tlb_state_shared { /* @@ -171,7 +171,7 @@ struct tlb_state_shared { */ bool is_lazy; }; -DECLARE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state_shared, cpu_tlbstate_shared); bool nmi_uaccess_okay(void); #define nmi_uaccess_okay nmi_uaccess_okay diff --git a/arch/x86/include/asm/topology.h b/arch/x86/include/asm/topology.h index cc164777e661..bff1a9123469 100644 --- a/arch/x86/include/asm/topology.h +++ b/arch/x86/include/asm/topology.h @@ -203,7 +203,7 @@ DECLARE_STATIC_KEY_FALSE(arch_scale_freq_key); #define arch_scale_freq_invariant() static_branch_likely(&arch_scale_freq_key) -DECLARE_PER_CPU(unsigned long, arch_freq_scale); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale); static inline long arch_scale_freq_capacity(int cpu) { diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c index b70344bf6600..5fa0ce0ecfb3 100644 --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -548,7 +548,7 @@ static struct clock_event_device lapic_clockevent = { .rating = 100, .irq = -1, }; -static DEFINE_PER_CPU(struct clock_event_device, lapic_events); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct clock_event_device, lapic_events); static const struct x86_cpu_id deadline_match[] __initconst = { X86_MATCH_INTEL_FAM6_MODEL_STEPPINGS(HASWELL_X, X86_STEPPINGS(0x2, 0x2), 0x3a), /* EP */ diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index e696e22d0531..655fe820a240 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -20,10 +20,10 @@ struct cluster_mask { * x86_cpu_to_logical_apicid for all online cpus in a sequential way. * Using per cpu variable would cost one cache line per cpu. */ -static u32 *x86_cpu_to_logical_apicid __read_mostly; +static u32 *x86_cpu_to_logical_apicid __asi_not_sensitive_readmostly; -static DEFINE_PER_CPU(cpumask_var_t, ipi_mask); -static DEFINE_PER_CPU_READ_MOSTLY(struct cluster_mask *, cluster_masks); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(cpumask_var_t, ipi_mask); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cluster_mask *, cluster_masks); static struct cluster_mask *cluster_hotplug_mask; static int x2apic_acpi_madt_oem_check(char *oem_id, char *oem_table_id) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 0083464de5e3..471b3a42db64 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1775,17 +1775,17 @@ EXPORT_PER_CPU_SYMBOL_GPL(fixed_percpu_data); * The following percpu variables are hot. Align current_task to * cacheline size such that they fall in the same cacheline. */ -DEFINE_PER_CPU(struct task_struct *, current_task) ____cacheline_aligned = +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task) ____cacheline_aligned = &init_task; EXPORT_PER_CPU_SYMBOL(current_task); -DEFINE_PER_CPU(void *, hardirq_stack_ptr); -DEFINE_PER_CPU(bool, hardirq_stack_inuse); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(void *, hardirq_stack_ptr); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, hardirq_stack_inuse); -DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, __preempt_count) = INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); -DEFINE_PER_CPU(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_current_top_of_stack) = TOP_OF_INIT_STACK; /* May not be marked __init: used by software suspend */ void syscall_init(void) @@ -1826,7 +1826,7 @@ void syscall_init(void) #else /* CONFIG_X86_64 */ -DEFINE_PER_CPU(struct task_struct *, current_task) = &init_task; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, current_task) = &init_task; EXPORT_PER_CPU_SYMBOL(current_task); DEFINE_PER_CPU(int, __preempt_count) = INIT_PREEMPT_COUNT; EXPORT_PER_CPU_SYMBOL(__preempt_count); diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c index d7859573973d..b59317c5721f 100644 --- a/arch/x86/kernel/fpu/core.c +++ b/arch/x86/kernel/fpu/core.c @@ -57,7 +57,7 @@ static DEFINE_PER_CPU(bool, in_kernel_fpu); /* * Track which context is using the FPU on the CPU: */ -DEFINE_PER_CPU(struct fpu *, fpu_fpregs_owner_ctx); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct fpu *, fpu_fpregs_owner_ctx); struct kmem_cache *fpstate_cachep; diff --git a/arch/x86/kernel/hw_breakpoint.c b/arch/x86/kernel/hw_breakpoint.c index 668a4a6533d9..c2ceea8f6801 100644 --- a/arch/x86/kernel/hw_breakpoint.c +++ b/arch/x86/kernel/hw_breakpoint.c @@ -36,7 +36,7 @@ #include /* Per cpu debug control register value */ -DEFINE_PER_CPU(unsigned long, cpu_dr7); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, cpu_dr7); EXPORT_PER_CPU_SYMBOL(cpu_dr7); /* Per cpu debug address registers values */ diff --git a/arch/x86/kernel/irq.c b/arch/x86/kernel/irq.c index 766ffe3ba313..5c5aa75050a5 100644 --- a/arch/x86/kernel/irq.c +++ b/arch/x86/kernel/irq.c @@ -26,7 +26,7 @@ #define CREATE_TRACE_POINTS #include -DEFINE_PER_CPU_SHARED_ALIGNED(irq_cpustat_t, irq_stat); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(irq_cpustat_t, irq_stat); EXPORT_PER_CPU_SYMBOL(irq_stat); atomic_t irq_err_count; diff --git a/arch/x86/kernel/irqinit.c b/arch/x86/kernel/irqinit.c index beb1bada1b0a..d7893e040695 100644 --- a/arch/x86/kernel/irqinit.c +++ b/arch/x86/kernel/irqinit.c @@ -46,7 +46,7 @@ * (these are usually mapped into the 0x30-0xff vector range) */ -DEFINE_PER_CPU(vector_irq_t, vector_irq) = { +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(vector_irq_t, vector_irq) = { [0 ... NR_VECTORS - 1] = VECTOR_UNUSED, }; diff --git a/arch/x86/kernel/nmi.c b/arch/x86/kernel/nmi.c index 4bce802d25fb..ef95071228ca 100644 --- a/arch/x86/kernel/nmi.c +++ b/arch/x86/kernel/nmi.c @@ -469,9 +469,9 @@ enum nmi_states { NMI_EXECUTING, NMI_LATCHED, }; -static DEFINE_PER_CPU(enum nmi_states, nmi_state); -static DEFINE_PER_CPU(unsigned long, nmi_cr2); -static DEFINE_PER_CPU(unsigned long, nmi_dr7); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(enum nmi_states, nmi_state); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, nmi_cr2); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, nmi_dr7); DEFINE_IDTENTRY_RAW(exc_nmi) { diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index f9bd1c3415d4..e4a32490dda0 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -56,7 +56,7 @@ * section. Since TSS's are completely CPU-local, we want them * on exact cacheline boundaries, to eliminate cacheline ping-pong. */ -__visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = { +__visible DEFINE_PER_CPU_PAGE_ALIGNED_ASI_NOT_SENSITIVE(struct tss_struct, cpu_tss_rw) = { .x86_tss = { /* * .sp0 is only used when entering ring 0 from a lower @@ -77,7 +77,7 @@ __visible DEFINE_PER_CPU_PAGE_ALIGNED(struct tss_struct, cpu_tss_rw) = { }; EXPORT_PER_CPU_SYMBOL(cpu_tss_rw); -DEFINE_PER_CPU(bool, __tss_limit_invalid); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, __tss_limit_invalid); EXPORT_PER_CPU_SYMBOL_GPL(__tss_limit_invalid); void __init arch_task_cache_init(void) diff --git a/arch/x86/kernel/setup_percpu.c b/arch/x86/kernel/setup_percpu.c index 7b65275544b2..13c94a512b7e 100644 --- a/arch/x86/kernel/setup_percpu.c +++ b/arch/x86/kernel/setup_percpu.c @@ -23,7 +23,7 @@ #include #include -DEFINE_PER_CPU_READ_MOSTLY(int, cpu_number); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, cpu_number); EXPORT_PER_CPU_SYMBOL(cpu_number); #ifdef CONFIG_X86_64 @@ -32,7 +32,7 @@ EXPORT_PER_CPU_SYMBOL(cpu_number); #define BOOT_PERCPU_OFFSET 0 #endif -DEFINE_PER_CPU_READ_MOSTLY(unsigned long, this_cpu_off) = BOOT_PERCPU_OFFSET; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, this_cpu_off) = BOOT_PERCPU_OFFSET; EXPORT_PER_CPU_SYMBOL(this_cpu_off); unsigned long __per_cpu_offset[NR_CPUS] __ro_after_init = { diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 617012f4619f..0cfc4fdc2476 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -2224,7 +2224,8 @@ static void disable_freq_invariance_workfn(struct work_struct *work) static DECLARE_WORK(disable_freq_invariance_work, disable_freq_invariance_workfn); -DEFINE_PER_CPU(unsigned long, arch_freq_scale) = SCHED_CAPACITY_SCALE; +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale) = + SCHED_CAPACITY_SCALE; void arch_scale_freq_tick(void) { diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index d7169da99b01..39c441409dec 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -59,7 +59,7 @@ struct cyc2ns { }; /* fits one cacheline */ -static DEFINE_PER_CPU_ALIGNED(struct cyc2ns, cyc2ns); +static DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct cyc2ns, cyc2ns); static int __init tsc_early_khz_setup(char *buf) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0df88eadab60..451872d178e5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -8523,7 +8523,7 @@ static void kvm_timer_init(void) kvmclock_cpu_online, kvmclock_cpu_down_prep); } -DEFINE_PER_CPU(struct kvm_vcpu *, current_vcpu); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, current_vcpu); EXPORT_PER_CPU_SYMBOL_GPL(current_vcpu); int kvm_is_in_guest(void) diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 4abcd8d9836d..3d5da4daaf53 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -392,7 +392,7 @@ static inline bool kvm_cstate_in_guest(struct kvm *kvm) return kvm->arch.cstate_in_guest; } -DECLARE_PER_CPU(struct kvm_vcpu *, current_vcpu); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, current_vcpu); static inline void kvm_before_interrupt(struct kvm_vcpu *vcpu) { diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index fdc117929fc7..04628949e89d 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -20,7 +20,7 @@ static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); -DEFINE_PER_CPU_ALIGNED(struct asi_state, asi_cpu_state); +DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct asi_state, asi_cpu_state); EXPORT_PER_CPU_SYMBOL_GPL(asi_cpu_state); __aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index dfff17363365..012631d03c4f 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -1025,7 +1025,7 @@ void __init zone_sizes_init(void) free_area_init(max_zone_pfns); } -__visible DEFINE_PER_CPU_ALIGNED(struct tlb_state, cpu_tlbstate) = { +__visible DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state, cpu_tlbstate) = { .loaded_mm = &init_mm, .next_asid = 1, .cr4 = ~0UL, /* fail hard if we screw up cr4 shadow initialization */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index fcd2c8e92f83..36d41356ed04 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -972,7 +972,7 @@ static bool tlb_is_not_lazy(int cpu) static DEFINE_PER_CPU(cpumask_t, flush_tlb_mask); -DEFINE_PER_CPU_SHARED_ALIGNED(struct tlb_state_shared, cpu_tlbstate_shared); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct tlb_state_shared, cpu_tlbstate_shared); EXPORT_PER_CPU_SYMBOL(cpu_tlbstate_shared); STATIC_NOPV void native_flush_tlb_multi(const struct cpumask *cpumask, diff --git a/include/asm-generic/irq_regs.h b/include/asm-generic/irq_regs.h index 2e7c6e89d42e..3225bdb2aefa 100644 --- a/include/asm-generic/irq_regs.h +++ b/include/asm-generic/irq_regs.h @@ -14,7 +14,7 @@ * Per-cpu current frame pointer - the location of the last exception frame on * the stack */ -DECLARE_PER_CPU(struct pt_regs *, __irq_regs); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct pt_regs *, __irq_regs); static inline struct pt_regs *get_irq_regs(void) { diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h index b97cea83b25e..35fdf256777a 100644 --- a/include/linux/arch_topology.h +++ b/include/linux/arch_topology.h @@ -23,7 +23,7 @@ static inline unsigned long topology_get_cpu_scale(int cpu) void topology_set_cpu_scale(unsigned int cpu, unsigned long capacity); -DECLARE_PER_CPU(unsigned long, arch_freq_scale); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, arch_freq_scale); static inline unsigned long topology_get_freq_scale(int cpu) { diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h index 0ee140176f10..68b2f10aaa46 100644 --- a/include/linux/hrtimer.h +++ b/include/linux/hrtimer.h @@ -355,7 +355,7 @@ static inline void timerfd_clock_was_set(void) { } static inline void timerfd_resume(void) { } #endif -DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); #ifdef CONFIG_PREEMPT_RT void hrtimer_cancel_wait_running(const struct hrtimer *timer); diff --git a/include/linux/interrupt.h b/include/linux/interrupt.h index 1f22a30c0963..6ae485d2ebb3 100644 --- a/include/linux/interrupt.h +++ b/include/linux/interrupt.h @@ -554,7 +554,7 @@ extern void __raise_softirq_irqoff(unsigned int nr); extern void raise_softirq_irqoff(unsigned int nr); extern void raise_softirq(unsigned int nr); -DECLARE_PER_CPU(struct task_struct *, ksoftirqd); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, ksoftirqd); static inline struct task_struct *this_cpu_ksoftirqd(void) { diff --git a/include/linux/kernel_stat.h b/include/linux/kernel_stat.h index 69ae6b278464..89609dc5d30f 100644 --- a/include/linux/kernel_stat.h +++ b/include/linux/kernel_stat.h @@ -40,8 +40,8 @@ struct kernel_stat { unsigned int softirqs[NR_SOFTIRQS]; }; -DECLARE_PER_CPU(struct kernel_stat, kstat); -DECLARE_PER_CPU(struct kernel_cpustat, kernel_cpustat); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_stat, kstat); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_cpustat, kernel_cpustat); /* Must have preemption disabled for this to be meaningful. */ #define kstat_this_cpu this_cpu_ptr(&kstat) diff --git a/include/linux/prandom.h b/include/linux/prandom.h index 056d31317e49..f02392ca6dc2 100644 --- a/include/linux/prandom.h +++ b/include/linux/prandom.h @@ -16,7 +16,7 @@ void prandom_bytes(void *buf, size_t nbytes); void prandom_seed(u32 seed); void prandom_reseed_late(void); -DECLARE_PER_CPU(unsigned long, net_rand_noise); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, net_rand_noise); #define PRANDOM_ADD_NOISE(a, b, c, d) \ prandom_u32_add_noise((unsigned long)(a), (unsigned long)(b), \ diff --git a/kernel/events/core.c b/kernel/events/core.c index 6ea559b6e0f4..1914cc538cab 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -1207,7 +1207,7 @@ void perf_pmu_enable(struct pmu *pmu) pmu->pmu_enable(pmu); } -static DEFINE_PER_CPU(struct list_head, active_ctx_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct list_head, active_ctx_list); /* * perf_event_ctx_activate(), perf_event_ctx_deactivate(), and @@ -4007,8 +4007,8 @@ do { \ return div64_u64(dividend, divisor); } -static DEFINE_PER_CPU(int, perf_throttled_count); -static DEFINE_PER_CPU(u64, perf_throttled_seq); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, perf_throttled_count); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(u64, perf_throttled_seq); static void perf_adjust_period(struct perf_event *event, u64 nsec, u64 count, bool disable) { diff --git a/kernel/irq_work.c b/kernel/irq_work.c index f7df715ec28e..10df3577c733 100644 --- a/kernel/irq_work.c +++ b/kernel/irq_work.c @@ -22,9 +22,9 @@ #include #include -static DEFINE_PER_CPU(struct llist_head, raised_list); -static DEFINE_PER_CPU(struct llist_head, lazy_list); -static DEFINE_PER_CPU(struct task_struct *, irq_workd); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct llist_head, raised_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct llist_head, lazy_list); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, irq_workd); static void wake_irq_workd(void) { diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 284d2722cf0c..aee2b6994bc2 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -74,7 +74,7 @@ /* Data structures. */ -static DEFINE_PER_CPU_SHARED_ALIGNED(struct rcu_data, rcu_data) = { +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rcu_data, rcu_data) = { .dynticks_nesting = 1, .dynticks_nmi_nesting = DYNTICK_IRQ_NONIDLE, .dynticks = ATOMIC_INIT(1), diff --git a/kernel/sched/core.c b/kernel/sched/core.c index e1c08ff4130e..7c96f0001c7f 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -43,7 +43,7 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_cfs_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_util_est_se_tp); EXPORT_TRACEPOINT_SYMBOL_GPL(sched_update_nr_running_tp); -DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rq, runqueues); #ifdef CONFIG_SCHED_DEBUG /* @@ -5104,8 +5104,8 @@ void sched_exec(void) #endif -DEFINE_PER_CPU(struct kernel_stat, kstat); -DEFINE_PER_CPU(struct kernel_cpustat, kernel_cpustat); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_stat, kstat); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kernel_cpustat, kernel_cpustat); EXPORT_PER_CPU_SYMBOL(kstat); EXPORT_PER_CPU_SYMBOL(kernel_cpustat); diff --git a/kernel/sched/cpufreq.c b/kernel/sched/cpufreq.c index 7c2fe50fd76d..c55a47f8e963 100644 --- a/kernel/sched/cpufreq.c +++ b/kernel/sched/cpufreq.c @@ -9,7 +9,8 @@ #include "sched.h" -DEFINE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct update_util_data __rcu *, + cpufreq_update_util_data); /** * cpufreq_add_update_util_hook - Populate the CPU's update_util_data pointer. diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c index 623b5feb142a..d3ad13308889 100644 --- a/kernel/sched/cputime.c +++ b/kernel/sched/cputime.c @@ -17,7 +17,7 @@ * task when irq is in progress while we read rq->clock. That is a worthy * compromise in place of having locks on each irq in account_system_time. */ -DEFINE_PER_CPU(struct irqtime, cpu_irqtime); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct irqtime, cpu_irqtime); static int __asi_not_sensitive sched_clock_irqtime; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 517c70a29a57..4188c1a570db 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -1360,7 +1360,7 @@ static inline void update_idle_core(struct rq *rq) static inline void update_idle_core(struct rq *rq) { } #endif -DECLARE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues); +DECLARE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct rq, runqueues); #define cpu_rq(cpu) (&per_cpu(runqueues, (cpu))) #define this_rq() this_cpu_ptr(&runqueues) @@ -1760,13 +1760,13 @@ static inline struct sched_domain *lowest_flag_domain(int cpu, int flag) return sd; } -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_llc); -DECLARE_PER_CPU(int, sd_llc_size); -DECLARE_PER_CPU(int, sd_llc_id); -DECLARE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_numa); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); -DECLARE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_llc); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_size); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_id); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain_shared __rcu *, sd_llc_shared); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_numa); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_packing); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_cpucapacity); extern struct static_key_false sched_asym_cpucapacity; struct sched_group_capacity { @@ -2753,7 +2753,7 @@ struct irqtime { struct u64_stats_sync sync; }; -DECLARE_PER_CPU(struct irqtime, cpu_irqtime); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct irqtime, cpu_irqtime); /* * Returns the irqtime minus the softirq time computed by ksoftirqd. @@ -2776,7 +2776,8 @@ static inline u64 irq_time_read(int cpu) #endif /* CONFIG_IRQ_TIME_ACCOUNTING */ #ifdef CONFIG_CPU_FREQ -DECLARE_PER_CPU(struct update_util_data __rcu *, cpufreq_update_util_data); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct update_util_data __rcu *, + cpufreq_update_util_data); /** * cpufreq_update_util - Take a note about CPU utilization changes. diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index d201a7052a29..1dcea6a6133e 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -641,13 +641,13 @@ static void destroy_sched_domains(struct sched_domain *sd) * the cpumask of the domain), this allows us to quickly tell if * two CPUs are in the same cache domain, see cpus_share_cache(). */ -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_llc); -DEFINE_PER_CPU(int, sd_llc_size); -DEFINE_PER_CPU(int, sd_llc_id); -DEFINE_PER_CPU(struct sched_domain_shared __rcu *, sd_llc_shared); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_numa); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_packing); -DEFINE_PER_CPU(struct sched_domain __rcu *, sd_asym_cpucapacity); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_llc); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_size); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, sd_llc_id); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain_shared __rcu *, sd_llc_shared); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_numa); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_packing); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct sched_domain __rcu *, sd_asym_cpucapacity); DEFINE_STATIC_KEY_FALSE(sched_asym_cpucapacity); static void update_top_cache_domain(int cpu) diff --git a/kernel/smp.c b/kernel/smp.c index c51fd981a4a9..3c1b328f0a09 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -92,9 +92,10 @@ struct call_function_data { cpumask_var_t cpumask_ipi; }; -static DEFINE_PER_CPU_ALIGNED(struct call_function_data, cfd_data); +static DEFINE_PER_CPU_ALIGNED_ASI_NOT_SENSITIVE(struct call_function_data, cfd_data); -static DEFINE_PER_CPU_SHARED_ALIGNED(struct llist_head, call_single_queue); +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(struct llist_head, + call_single_queue); static void flush_smp_call_function_queue(bool warn_cpu_offline); @@ -464,7 +465,7 @@ static __always_inline void csd_unlock(struct __call_single_data *csd) smp_store_release(&csd->node.u_flags, 0); } -static DEFINE_PER_CPU_SHARED_ALIGNED(call_single_data_t, csd_data); +static DEFINE_PER_CPU_SHARED_ALIGNED_ASI_NOT_SENSITIVE(call_single_data_t, csd_data); void __smp_call_single_queue(int cpu, struct llist_node *node) { diff --git a/kernel/softirq.c b/kernel/softirq.c index c462b7fab4d3..d2660a59feab 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -59,7 +59,7 @@ EXPORT_PER_CPU_SYMBOL(irq_stat); static struct softirq_action softirq_vec[NR_SOFTIRQS] __asi_not_sensitive ____cacheline_aligned; -DEFINE_PER_CPU(struct task_struct *, ksoftirqd); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct task_struct *, ksoftirqd); const char * const softirq_to_name[NR_SOFTIRQS] = { "HI", "TIMER", "NET_TX", "NET_RX", "BLOCK", "IRQ_POLL", diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c index 8b176f5c01f2..74cfc89a17c4 100644 --- a/kernel/time/hrtimer.c +++ b/kernel/time/hrtimer.c @@ -65,7 +65,7 @@ * to reach a base using a clockid, hrtimer_clockid_to_base() * is used to convert from clockid to the proper hrtimer_base_type. */ -DEFINE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases) = +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer_cpu_base, hrtimer_bases) = { .lock = __RAW_SPIN_LOCK_UNLOCKED(hrtimer_bases.lock), .clock_base = diff --git a/kernel/time/tick-common.c b/kernel/time/tick-common.c index cbe75661ca74..67180cb44394 100644 --- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -25,7 +25,7 @@ /* * Tick devices */ -DEFINE_PER_CPU(struct tick_device, tick_cpu_device); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); /* * Tick next event: keeps track of the tick time. It's updated by the * CPU which handles the tick and protected by jiffies_lock. There is diff --git a/kernel/time/tick-internal.h b/kernel/time/tick-internal.h index ed7e2a18060a..6961318d41b7 100644 --- a/kernel/time/tick-internal.h +++ b/kernel/time/tick-internal.h @@ -13,7 +13,7 @@ # define TICK_DO_TIMER_NONE -1 # define TICK_DO_TIMER_BOOT -2 -DECLARE_PER_CPU(struct tick_device, tick_cpu_device); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_device, tick_cpu_device); extern ktime_t tick_next_period; extern int tick_do_timer_cpu; @@ -161,7 +161,7 @@ static inline void timers_update_nohz(void) { } #define tick_nohz_active (0) #endif -DECLARE_PER_CPU(struct hrtimer_cpu_base, hrtimer_bases); +DECLARE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer_cpu_base, hrtimer_bases); extern u64 get_next_timer_interrupt(unsigned long basej, u64 basem); void timer_clear_idle(void); diff --git a/kernel/time/tick-sched.c b/kernel/time/tick-sched.c index c23fecbb68c2..afd393b85577 100644 --- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -36,7 +36,7 @@ /* * Per-CPU nohz control structure */ -static DEFINE_PER_CPU(struct tick_sched, tick_cpu_sched); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct tick_sched, tick_cpu_sched); struct tick_sched *tick_get_tick_sched(int cpu) { diff --git a/kernel/time/timer.c b/kernel/time/timer.c index 0b09c99b568c..9567df187420 100644 --- a/kernel/time/timer.c +++ b/kernel/time/timer.c @@ -212,7 +212,7 @@ struct timer_base { struct hlist_head vectors[WHEEL_SIZE]; } ____cacheline_aligned; -static DEFINE_PER_CPU(struct timer_base, timer_bases[NR_BASES]); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct timer_base, timer_bases[NR_BASES]); #ifdef CONFIG_NO_HZ_COMMON diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index eaec3814c5a4..b82f478caf4e 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -106,7 +106,7 @@ dummy_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set) * tracing is active, only save the comm when a trace event * occurred. */ -static DEFINE_PER_CPU(bool, trace_taskinfo_save); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, trace_taskinfo_save); /* * Kill all tracing for good (never come back). diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c index f4938040c228..177de3501677 100644 --- a/kernel/trace/trace_preemptirq.c +++ b/kernel/trace/trace_preemptirq.c @@ -17,7 +17,7 @@ #ifdef CONFIG_TRACE_IRQFLAGS /* Per-cpu variable to prevent redundant calls when IRQs already off */ -static DEFINE_PER_CPU(int, tracing_irq_cpu); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(int, tracing_irq_cpu); /* * Like trace_hardirqs_on() but without the lockdep invocation. This is diff --git a/kernel/watchdog.c b/kernel/watchdog.c index ad912511a0c0..c2bf55024202 100644 --- a/kernel/watchdog.c +++ b/kernel/watchdog.c @@ -174,13 +174,13 @@ static bool softlockup_initialized __read_mostly; static u64 __read_mostly sample_period; /* Timestamp taken after the last successful reschedule. */ -static DEFINE_PER_CPU(unsigned long, watchdog_touch_ts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, watchdog_touch_ts); /* Timestamp of the last softlockup report. */ -static DEFINE_PER_CPU(unsigned long, watchdog_report_ts); -static DEFINE_PER_CPU(struct hrtimer, watchdog_hrtimer); -static DEFINE_PER_CPU(bool, softlockup_touch_sync); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts); -static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, watchdog_report_ts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct hrtimer, watchdog_hrtimer); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(bool, softlockup_touch_sync); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, hrtimer_interrupts); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, hrtimer_interrupts_saved); static unsigned long soft_lockup_nmi_warn; static int __init nowatchdog_setup(char *str) diff --git a/lib/irq_regs.c b/lib/irq_regs.c index 0d545a93070e..8b3c6be06a7a 100644 --- a/lib/irq_regs.c +++ b/lib/irq_regs.c @@ -9,6 +9,6 @@ #include #ifndef ARCH_HAS_OWN_IRQ_REGS -DEFINE_PER_CPU(struct pt_regs *, __irq_regs); +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct pt_regs *, __irq_regs); EXPORT_PER_CPU_SYMBOL(__irq_regs); #endif diff --git a/lib/random32.c b/lib/random32.c index a57a0e18819d..e4c1cb1a70b4 100644 --- a/lib/random32.c +++ b/lib/random32.c @@ -339,7 +339,8 @@ struct siprand_state { }; static DEFINE_PER_CPU(struct siprand_state, net_rand_state) __latent_entropy; -DEFINE_PER_CPU(unsigned long, net_rand_noise); +/* TODO(oweisse): Is this entropy sensitive?? */ +DEFINE_PER_CPU_ASI_NOT_SENSITIVE(unsigned long, net_rand_noise); EXPORT_PER_CPU_SYMBOL(net_rand_noise); /* diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 0af973b950c2..8d2d76de5bd0 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -110,7 +110,7 @@ static atomic_t hardware_enable_failed; static struct kmem_cache *kvm_vcpu_cache; static __read_mostly struct preempt_ops kvm_preempt_ops; -static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_running_vcpu); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, kvm_running_vcpu); struct dentry *kvm_debugfs_dir; EXPORT_SYMBOL_GPL(kvm_debugfs_dir); From patchwork Wed Feb 23 05:22:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F104EC433F5 for ; Wed, 23 Feb 2022 05:25:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D5B98D0017; Wed, 23 Feb 2022 00:25:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 55DD58D0001; Wed, 23 Feb 2022 00:25:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 389A38D0017; Wed, 23 Feb 2022 00:25:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0107.hostedemail.com [216.40.44.107]) by kanga.kvack.org (Postfix) with ESMTP id 23C708D0001 for ; Wed, 23 Feb 2022 00:25:23 -0500 (EST) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id CD0339CD53 for ; Wed, 23 Feb 2022 05:25:22 +0000 (UTC) X-FDA: 79172906484.30.BE2AB84 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf14.hostedemail.com (Postfix) with ESMTP id 4998B100002 for ; Wed, 23 Feb 2022 05:25:22 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d726bd83a2so91093717b3.20 for ; Tue, 22 Feb 2022 21:25:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=1Ak925Pk776XAKxLwtqmq+2ujMHKkPdWGXyWe6mB81I=; b=qruJowqTdMVz7fZkxeG4xVb20iJ53LrE4dT1CKYoVmqpJaSFmZqssZFPvLDlocajG+ 0ZNfohsXrp0SvmtgoKo+3GpbCl6Ek66LSSQDb5QhwDTmpTY3IZR68hTWLTakDK649FWd NcBzCkwDsR687gBw9hUAZO42EBgmkkjWUtK9c7iThbB5OJvWqulI+ZjA4At9vc3jDdQv V3RzFrXEtxS3BFQakXKK5JYagS7g37QYbi+N7FduEDd6mJWAl6cR1xbe48z+M34xGgJb +aIJO9WdvPWGVowL+cjeTIFBzTQnJ9oyaKZdWhvwquAx5l+fbfWtWxe31CFQ765daVDd WYtQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=1Ak925Pk776XAKxLwtqmq+2ujMHKkPdWGXyWe6mB81I=; b=qa3MquwCS0BwqUxJsus+roGU4SMmYM2VEpG90ULijJyo0qLiBRmIlghGAQ/CvHQq5G QuSoebAXhvjtUGI1QUiJKgsjDQDNz0ilYBSq2dsY52/MgYFtXu4i5zGFkIX+wcLfp7TX J443fZdZQgwqJ2v2pOQl2zqVGXi4L1wVtonbOEHnvqS8ARTrQ1JLpP4rZWXtnEXZavJ9 koBVz1H0DZEtSL7uw8GH6GXYJjnyzSmE2mrDfqvqG7dves/9Aqw7rQT4LG/75X8ImOG/ LDNB4jwYLRYS8RXQvdgkQKgq6y8qvU2rNgGc40nq/JJaEYeAqGM/5b84bYpb7chT3tnb j5RQ== X-Gm-Message-State: AOAM532cgy6UipHe/XbVOMOcWkxKLL6Jr3SvYBmLjW60rRSe8Uz6XZQq cwHmoYeGmpKLRArUCO0rGgZ02bHrP0z8 X-Google-Smtp-Source: ABdhPJz+8yu7dLdPv2zaC0fb4RR9so3iE3HoEaEWayOmFqlvYUtwhO3rWaAyrQMIf9/n3UzRyvpRD4FVMxcA X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bad2:0:b0:620:fe28:ff53 with SMTP id a18-20020a25bad2000000b00620fe28ff53mr26733639ybk.340.1645593921610; Tue, 22 Feb 2022 21:25:21 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:19 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-44-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 43/47] mm: asi: Annotation of dynamic variables to be nonsensitive From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 4998B100002 X-Stat-Signature: jdyb3n4p1dxhqbjimq9yihq965r4h8df X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qruJowqT; spf=pass (imf14.hostedemail.com: domain of 3QcUVYgcKCDofqjWeZockkcha.Ykihejqt-iigrWYg.knc@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3QcUVYgcKCDofqjWeZockkcha.Ykihejqt-iigrWYg.knc@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593922-529596 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse The heart of ASI is to diffrentiate between sensitive and non-sensitive data access. This commit marks certain dynamic allocations as not sensitive. Some dynamic variables are accessed frequently and therefore would cause many ASI exits. The frequency of these accesses is monitored by tracing asi_exits and analyzing the accessed addresses. Many of these variables don't contain sensitive information and can therefore be mapped into the global ASI region. This commit adds GFP_LOCAL/GLOBAL_NONSENSITIVE attributes to these frequenmtly-accessed yet not sensitive variables. The end result is a very significant reduction in ASI exits on real benchmarks. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/kvm_host.h | 3 ++- arch/x86/kernel/apic/x2apic_cluster.c | 2 +- arch/x86/kvm/cpuid.c | 4 ++- arch/x86/kvm/lapic.c | 9 ++++--- arch/x86/kvm/mmu/mmu.c | 7 ++++++ arch/x86/kvm/vmx/vmx.c | 6 +++-- arch/x86/kvm/x86.c | 8 +++--- fs/binfmt_elf.c | 2 +- fs/eventfd.c | 2 +- fs/eventpoll.c | 10 +++++--- fs/exec.c | 2 ++ fs/file.c | 3 ++- fs/timerfd.c | 2 +- include/linux/kvm_host.h | 2 +- include/linux/kvm_types.h | 3 +++ kernel/cgroup/cgroup.c | 4 +-- kernel/events/core.c | 15 +++++++---- kernel/exit.c | 2 ++ kernel/fork.c | 36 +++++++++++++++++++++------ kernel/rcu/srcutree.c | 3 ++- kernel/sched/core.c | 6 +++-- kernel/sched/cpuacct.c | 8 +++--- kernel/sched/fair.c | 3 ++- kernel/sched/topology.c | 14 +++++++---- kernel/smp.c | 17 +++++++------ kernel/trace/ring_buffer.c | 5 ++-- kernel/tracepoint.c | 2 +- lib/radix-tree.c | 6 ++--- mm/memcontrol.c | 7 +++--- mm/util.c | 3 ++- mm/vmalloc.c | 3 ++- net/core/skbuff.c | 2 +- net/core/sock.c | 2 +- virt/kvm/coalesced_mmio.c | 2 +- virt/kvm/eventfd.c | 5 ++-- virt/kvm/kvm_main.c | 12 ++++++--- 36 files changed, 148 insertions(+), 74 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index b7292c4fece7..34a05add5e77 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1562,7 +1562,8 @@ static inline void kvm_ops_static_call_update(void) #define __KVM_HAVE_ARCH_VM_ALLOC static inline struct kvm *kvm_arch_alloc_vm(void) { - return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); + return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO | + __GFP_GLOBAL_NONSENSITIVE); } #define __KVM_HAVE_ARCH_VM_FREE diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index 655fe820a240..a1f6eb51ecb7 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -144,7 +144,7 @@ static int alloc_clustermask(unsigned int cpu, int node) } cluster_hotplug_mask = kzalloc_node(sizeof(*cluster_hotplug_mask), - GFP_KERNEL, node); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, node); if (!cluster_hotplug_mask) return -ENOMEM; cluster_hotplug_mask->node = node; diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 07e9215e911d..dedabfdd292e 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -310,7 +310,9 @@ int kvm_vcpu_ioctl_set_cpuid(struct kvm_vcpu *vcpu, if (IS_ERR(e)) return PTR_ERR(e); - e2 = kvmalloc_array(cpuid->nent, sizeof(*e2), GFP_KERNEL_ACCOUNT); + e2 = kvmalloc_array(cpuid->nent, sizeof(*e2), + GFP_KERNEL_ACCOUNT | + __GFP_LOCAL_NONSENSITIVE); if (!e2) { r = -ENOMEM; goto out_free_cpuid; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 213bbdfab49e..3a550299f015 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -213,7 +213,7 @@ void kvm_recalculate_apic_map(struct kvm *kvm) new = kvzalloc(sizeof(struct kvm_apic_map) + sizeof(struct kvm_lapic *) * ((u64)max_id + 1), - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!new) goto out; @@ -993,7 +993,7 @@ bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, *r = -1; if (irq->shorthand == APIC_DEST_SELF) { - *r = kvm_apic_set_irq(src->vcpu, irq, dest_map); + *r = kvm_apic_set_irq(src->vcpu, irq, dest_map); return true; } @@ -2455,13 +2455,14 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) ASSERT(vcpu != NULL); - apic = kzalloc(sizeof(*apic), GFP_KERNEL_ACCOUNT); + apic = kzalloc(sizeof(*apic), GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!apic) goto nomem; vcpu->arch.apic = apic; - apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT); + apic->regs = (void *)get_zeroed_page(GFP_KERNEL_ACCOUNT + | __GFP_LOCAL_NONSENSITIVE); if (!apic->regs) { printk(KERN_ERR "malloc apic regs error for vcpu %x\n", vcpu->vcpu_id); diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 5785a0d02558..a2ada1104c2d 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -5630,6 +5630,13 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_page_header_cache.gfp_zero = __GFP_ZERO; vcpu->arch.mmu_shadow_page_cache.gfp_zero = __GFP_ZERO; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + if (static_cpu_has(X86_FEATURE_ASI) && mm_asi_enabled(current->mm)) + vcpu->arch.mmu_shadow_page_cache.gfp_asi = + __GFP_LOCAL_NONSENSITIVE; + else + vcpu->arch.mmu_shadow_page_cache.gfp_asi = 0; +#endif vcpu->arch.mmu = &vcpu->arch.root_mmu; vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index e1ad82c25a78..6e1bb017b696 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2629,7 +2629,7 @@ void free_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) free_vmcs(loaded_vmcs->vmcs); loaded_vmcs->vmcs = NULL; if (loaded_vmcs->msr_bitmap) - free_page((unsigned long)loaded_vmcs->msr_bitmap); + kfree(loaded_vmcs->msr_bitmap); WARN_ON(loaded_vmcs->shadow_vmcs != NULL); } @@ -2648,7 +2648,9 @@ int alloc_loaded_vmcs(struct loaded_vmcs *loaded_vmcs) if (cpu_has_vmx_msr_bitmap()) { loaded_vmcs->msr_bitmap = (unsigned long *) - __get_free_page(GFP_KERNEL_ACCOUNT); + kzalloc(PAGE_SIZE, + GFP_KERNEL_ACCOUNT | + __GFP_LOCAL_NONSENSITIVE ); if (!loaded_vmcs->msr_bitmap) goto out_vmcs; memset(loaded_vmcs->msr_bitmap, 0xff, PAGE_SIZE); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 451872d178e5..dd862edc1b5a 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -329,7 +329,8 @@ static struct kmem_cache *kvm_alloc_emulator_cache(void) return kmem_cache_create_usercopy("x86_emulator", size, __alignof__(struct x86_emulate_ctxt), - SLAB_ACCOUNT, useroffset, + SLAB_ACCOUNT|SLAB_LOCAL_NONSENSITIVE, + useroffset, size - useroffset, NULL); } @@ -10969,7 +10970,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) r = -ENOMEM; - page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENSITIVE); if (!page) goto fail_free_lapic; vcpu->arch.pio_data = page_address(page); @@ -11718,7 +11719,8 @@ static int kvm_alloc_memslot_metadata(struct kvm *kvm, lpages = __kvm_mmu_slot_lpages(slot, npages, level); - linfo = kvcalloc(lpages, sizeof(*linfo), GFP_KERNEL_ACCOUNT); + linfo = kvcalloc(lpages, sizeof(*linfo), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!linfo) goto out_free; diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index f8c7f26f1fbb..b0550951da59 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -477,7 +477,7 @@ static struct elf_phdr *load_elf_phdrs(const struct elfhdr *elf_ex, if (size == 0 || size > 65536 || size > ELF_MIN_ALIGN) goto out; - elf_phdata = kmalloc(size, GFP_KERNEL); + elf_phdata = kmalloc(size, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!elf_phdata) goto out; diff --git a/fs/eventfd.c b/fs/eventfd.c index 3627dd7d25db..c748433e52af 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -415,7 +415,7 @@ static int do_eventfd(unsigned int count, int flags) if (flags & ~EFD_FLAGS_SET) return -EINVAL; - ctx = kmalloc(sizeof(*ctx), GFP_KERNEL); + ctx = kmalloc(sizeof(*ctx), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return -ENOMEM; diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 06f4c5ae1451..b28826c9f079 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -1239,7 +1239,7 @@ static void ep_ptable_queue_proc(struct file *file, wait_queue_head_t *whead, if (unlikely(!epi)) // an earlier allocation has failed return; - pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL); + pwq = kmem_cache_alloc(pwq_cache, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (unlikely(!pwq)) { epq->epi = NULL; return; @@ -1453,7 +1453,8 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event, return -ENOSPC; percpu_counter_inc(&ep->user->epoll_watches); - if (!(epi = kmem_cache_zalloc(epi_cache, GFP_KERNEL))) { + if (!(epi = kmem_cache_zalloc(epi_cache, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE))) { percpu_counter_dec(&ep->user->epoll_watches); return -ENOMEM; } @@ -2373,11 +2374,12 @@ static int __init eventpoll_init(void) /* Allocates slab cache used to allocate "struct epitem" items */ epi_cache = kmem_cache_create("eventpoll_epi", sizeof(struct epitem), - 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); + 0, SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, NULL); /* Allocates slab cache used to allocate "struct eppoll_entry" */ pwq_cache = kmem_cache_create("eventpoll_pwq", - sizeof(struct eppoll_entry), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); + sizeof(struct eppoll_entry), 0, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, NULL); ephead_cache = kmem_cache_create("ep_head", sizeof(struct epitems_head), 0, SLAB_PANIC|SLAB_ACCOUNT, NULL); diff --git a/fs/exec.c b/fs/exec.c index 537d92c41105..76f3b433e80d 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -1238,6 +1238,8 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me = current; int retval; + /* TODO: (oweisse) unmap the stack from ASI */ + /* Once we are committed compute the creds */ retval = bprm_creds_from_file(bprm); if (retval) diff --git a/fs/file.c b/fs/file.c index 97d212a9b814..85bfa5d70323 100644 --- a/fs/file.c +++ b/fs/file.c @@ -117,7 +117,8 @@ static struct fdtable * alloc_fdtable(unsigned int nr) if (!fdt) goto out; fdt->max_fds = nr; - data = kvmalloc_array(nr, sizeof(struct file *), GFP_KERNEL_ACCOUNT); + data = kvmalloc_array(nr, sizeof(struct file *), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!data) goto out_fdt; fdt->fd = data; diff --git a/fs/timerfd.c b/fs/timerfd.c index e9c96a0c79f1..385fbb29837d 100644 --- a/fs/timerfd.c +++ b/fs/timerfd.c @@ -425,7 +425,7 @@ SYSCALL_DEFINE2(timerfd_create, int, clockid, int, flags) !capable(CAP_WAKE_ALARM)) return -EPERM; - ctx = kzalloc(sizeof(*ctx), GFP_KERNEL); + ctx = kzalloc(sizeof(*ctx), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return -ENOMEM; diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index f31f7442eced..dfbb26d7a185 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -1085,7 +1085,7 @@ int kvm_arch_create_vm_debugfs(struct kvm *kvm); */ static inline struct kvm *kvm_arch_alloc_vm(void) { - return kzalloc(sizeof(struct kvm), GFP_KERNEL); + return kzalloc(sizeof(struct kvm), GFP_KERNEL | __GFP_LOCAL_NONSENSITIVE); } #endif diff --git a/include/linux/kvm_types.h b/include/linux/kvm_types.h index 234eab059839..a5a810db85ca 100644 --- a/include/linux/kvm_types.h +++ b/include/linux/kvm_types.h @@ -64,6 +64,9 @@ struct gfn_to_hva_cache { struct kvm_mmu_memory_cache { int nobjs; gfp_t gfp_zero; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + gfp_t gfp_asi; +#endif struct kmem_cache *kmem_cache; void *objects[KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE]; }; diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 729495e17363..79692dafd2be 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1221,7 +1221,7 @@ static struct css_set *find_css_set(struct css_set *old_cset, if (cset) return cset; - cset = kzalloc(sizeof(*cset), GFP_KERNEL); + cset = kzalloc(sizeof(*cset), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cset) return NULL; @@ -5348,7 +5348,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, /* allocate the cgroup and its ID, 0 is reserved for the root */ cgrp = kzalloc(struct_size(cgrp, ancestor_ids, (level + 1)), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cgrp) return ERR_PTR(-ENOMEM); diff --git a/kernel/events/core.c b/kernel/events/core.c index 1914cc538cab..64eeb2c67d92 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -4586,7 +4586,8 @@ alloc_perf_context(struct pmu *pmu, struct task_struct *task) { struct perf_event_context *ctx; - ctx = kzalloc(sizeof(struct perf_event_context), GFP_KERNEL); + ctx = kzalloc(sizeof(struct perf_event_context), + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ctx) return NULL; @@ -11062,7 +11063,8 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type) mutex_lock(&pmus_lock); ret = -ENOMEM; - pmu->pmu_disable_count = alloc_percpu(int); + pmu->pmu_disable_count = alloc_percpu_gfp(int, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!pmu->pmu_disable_count) goto unlock; @@ -11112,7 +11114,8 @@ int perf_pmu_register(struct pmu *pmu, const char *name, int type) goto got_cpu_context; ret = -ENOMEM; - pmu->pmu_cpu_context = alloc_percpu(struct perf_cpu_context); + pmu->pmu_cpu_context = alloc_percpu_gfp(struct perf_cpu_context, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!pmu->pmu_cpu_context) goto free_dev; @@ -11493,7 +11496,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu, } node = (cpu >= 0) ? cpu_to_node(cpu) : -1; - event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO, + event = kmem_cache_alloc_node(perf_event_cache, + GFP_KERNEL | __GFP_ZERO | __GFP_GLOBAL_NONSENSITIVE, node); if (!event) return ERR_PTR(-ENOMEM); @@ -13378,7 +13382,8 @@ void __init perf_event_init(void) ret = init_hw_breakpoint(); WARN(ret, "hw_breakpoint initialization failed with: %d", ret); - perf_event_cache = KMEM_CACHE(perf_event, SLAB_PANIC); + perf_event_cache = KMEM_CACHE(perf_event, + SLAB_PANIC | SLAB_GLOBAL_NONSENSITIVE); /* * Build time assertion that we keep the data_head at the intended diff --git a/kernel/exit.c b/kernel/exit.c index f702a6a63686..ab2749cf6887 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -768,6 +768,8 @@ void __noreturn do_exit(long code) profile_task_exit(tsk); kcov_task_exit(tsk); + /* TODO: (oweisse) unmap the stack from ASI */ + coredump_task_exit(tsk); ptrace_event(PTRACE_EVENT_EXIT, code); diff --git a/kernel/fork.c b/kernel/fork.c index d7f55de00947..cb147a72372d 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -168,6 +168,8 @@ static struct kmem_cache *task_struct_cachep; static inline struct task_struct *alloc_task_struct_node(int node) { + /* TODO: Figure how to allocate this propperly to ASI process map. This + * should be mapped in a __GFP_LOCAL_NONSENSITIVE slab. */ return kmem_cache_alloc_node(task_struct_cachep, GFP_KERNEL, node); } @@ -214,6 +216,7 @@ static int free_vm_stack_cache(unsigned int cpu) static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) { + /* TODO: (oweisse) Add annotation to map the stack into ASI */ #ifdef CONFIG_VMAP_STACK void *stack; int i; @@ -242,9 +245,13 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) * so memcg accounting is performed manually on assigning/releasing * stacks to tasks. Drop __GFP_ACCOUNT. */ + /* ASI: We intentionally don't pass VM_LOCAL_NONSENSITIVE nor + * __GFP_LOCAL_NONSENSITIVE since we don't have an mm yet. Later on we'll + * map the stack into the mm asi map. That being said, we do care about + * the stack weing allocaed below VMALLOC_LOCAL_NONSENSITIVE_END */ stack = __vmalloc_node_range(THREAD_SIZE, THREAD_ALIGN, - VMALLOC_START, VMALLOC_END, - THREADINFO_GFP & ~__GFP_ACCOUNT, + VMALLOC_START, VMALLOC_LOCAL_NONSENSITIVE_END, + (THREADINFO_GFP & (~__GFP_ACCOUNT)), PAGE_KERNEL, 0, node, __builtin_return_address(0)); @@ -346,7 +353,8 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; - vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); + vma = kmem_cache_alloc(vm_area_cachep, + GFP_KERNEL); if (vma) vma_init(vma, mm); return vma; @@ -683,6 +691,8 @@ static void check_mm(struct mm_struct *mm) #endif } +/* TODO: (oweisse) ASI: we need to allocate mm such that it will only be visible + * within itself. */ #define allocate_mm() (kmem_cache_alloc(mm_cachep, GFP_KERNEL)) #define free_mm(mm) (kmem_cache_free(mm_cachep, (mm))) @@ -823,9 +833,12 @@ void __init fork_init(void) /* create a slab on which task_structs can be allocated */ task_struct_whitelist(&useroffset, &usersize); + /* TODO: (oweisse) for the time being this cache is shared among all tasks. We + * mark it SLAB_NONSENSITIVE so task_struct can be accessed withing ASI. + * A final secure solution should have this memory LOCAL, not GLOBAL.*/ task_struct_cachep = kmem_cache_create_usercopy("task_struct", arch_task_struct_size, align, - SLAB_PANIC|SLAB_ACCOUNT, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, useroffset, usersize, NULL); #endif @@ -1601,6 +1614,7 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk) refcount_inc(¤t->sighand->count); return 0; } + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ sig = kmem_cache_alloc(sighand_cachep, GFP_KERNEL); RCU_INIT_POINTER(tsk->sighand, sig); if (!sig) @@ -1649,6 +1663,8 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk) if (clone_flags & CLONE_THREAD) return 0; + /* TODO: (oweisse) figure out how to properly allocate this in ASI for local + * process */ sig = kmem_cache_zalloc(signal_cachep, GFP_KERNEL); tsk->signal = sig; if (!sig) @@ -2923,7 +2939,8 @@ void __init proc_caches_init(void) SLAB_ACCOUNT, sighand_ctor); signal_cachep = kmem_cache_create("signal_cache", sizeof(struct signal_struct), 0, - SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT| + SLAB_GLOBAL_NONSENSITIVE, NULL); files_cachep = kmem_cache_create("files_cache", sizeof(struct files_struct), 0, @@ -2941,13 +2958,18 @@ void __init proc_caches_init(void) */ mm_size = sizeof(struct mm_struct) + cpumask_size(); + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ mm_cachep = kmem_cache_create_usercopy("mm_struct", mm_size, ARCH_MIN_MMSTRUCT_ALIGN, - SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, + SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT + |SLAB_GLOBAL_NONSENSITIVE, offsetof(struct mm_struct, saved_auxv), sizeof_field(struct mm_struct, saved_auxv), NULL); - vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); + + /* TODO: (oweisse) ASI replace with proper ASI allcation. */ + vm_area_cachep = KMEM_CACHE(vm_area_struct, + SLAB_PANIC|SLAB_ACCOUNT|SLAB_LOCAL_NONSENSITIVE); mmap_init(); nsproxy_cache_init(); } diff --git a/kernel/rcu/srcutree.c b/kernel/rcu/srcutree.c index 6833d8887181..553221503803 100644 --- a/kernel/rcu/srcutree.c +++ b/kernel/rcu/srcutree.c @@ -171,7 +171,8 @@ static int init_srcu_struct_fields(struct srcu_struct *ssp, bool is_static) atomic_set(&ssp->srcu_barrier_cpu_cnt, 0); INIT_DELAYED_WORK(&ssp->work, process_srcu); if (!is_static) - ssp->sda = alloc_percpu(struct srcu_data); + ssp->sda = alloc_percpu_gfp(struct srcu_data, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ssp->sda) return -ENOMEM; init_srcu_struct_nodes(ssp); diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7c96f0001c7f..7515f0612f5c 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -9329,7 +9329,8 @@ void __init sched_init(void) #endif /* CONFIG_RT_GROUP_SCHED */ #ifdef CONFIG_CGROUP_SCHED - task_group_cache = KMEM_CACHE(task_group, 0); + /* TODO: (oweisse) add SLAB_NONSENSITIVE */ + task_group_cache = KMEM_CACHE(task_group, SLAB_GLOBAL_NONSENSITIVE); list_add(&root_task_group.list, &task_groups); INIT_LIST_HEAD(&root_task_group.children); @@ -9741,7 +9742,8 @@ struct task_group *sched_create_group(struct task_group *parent) { struct task_group *tg; - tg = kmem_cache_alloc(task_group_cache, GFP_KERNEL | __GFP_ZERO); + tg = kmem_cache_alloc(task_group_cache, + GFP_KERNEL | __GFP_ZERO | __GFP_GLOBAL_NONSENSITIVE); if (!tg) return ERR_PTR(-ENOMEM); diff --git a/kernel/sched/cpuacct.c b/kernel/sched/cpuacct.c index 6e3da149125c..e8b0b29b4d37 100644 --- a/kernel/sched/cpuacct.c +++ b/kernel/sched/cpuacct.c @@ -64,15 +64,17 @@ cpuacct_css_alloc(struct cgroup_subsys_state *parent_css) if (!parent_css) return &root_cpuacct.css; - ca = kzalloc(sizeof(*ca), GFP_KERNEL); + ca = kzalloc(sizeof(*ca), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ca) goto out; - ca->cpuusage = alloc_percpu(struct cpuacct_usage); + ca->cpuusage = alloc_percpu_gfp(struct cpuacct_usage, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ca->cpuusage) goto out_free_ca; - ca->cpustat = alloc_percpu(struct kernel_cpustat); + ca->cpustat = alloc_percpu_gfp(struct kernel_cpustat, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!ca->cpustat) goto out_free_cpuusage; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index dc9b6133b059..97d70f1eb2c5 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -11486,7 +11486,8 @@ int alloc_fair_sched_group(struct task_group *tg, struct task_group *parent) for_each_possible_cpu(i) { cfs_rq = kzalloc_node(sizeof(struct cfs_rq), - GFP_KERNEL, cpu_to_node(i)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(i)); if (!cfs_rq) goto err; diff --git a/kernel/sched/topology.c b/kernel/sched/topology.c index 1dcea6a6133e..2ad96c78306c 100644 --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -569,7 +569,7 @@ static struct root_domain *alloc_rootdomain(void) { struct root_domain *rd; - rd = kzalloc(sizeof(*rd), GFP_KERNEL); + rd = kzalloc(sizeof(*rd), GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!rd) return NULL; @@ -2044,21 +2044,24 @@ static int __sdt_alloc(const struct cpumask *cpu_map) struct sched_group_capacity *sgc; sd = kzalloc_node(sizeof(struct sched_domain) + cpumask_size(), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sd) return -ENOMEM; *per_cpu_ptr(sdd->sd, j) = sd; sds = kzalloc_node(sizeof(struct sched_domain_shared), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sds) return -ENOMEM; *per_cpu_ptr(sdd->sds, j) = sds; sg = kzalloc_node(sizeof(struct sched_group) + cpumask_size(), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sg) return -ENOMEM; @@ -2067,7 +2070,8 @@ static int __sdt_alloc(const struct cpumask *cpu_map) *per_cpu_ptr(sdd->sg, j) = sg; sgc = kzalloc_node(sizeof(struct sched_group_capacity) + cpumask_size(), - GFP_KERNEL, cpu_to_node(j)); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, + cpu_to_node(j)); if (!sgc) return -ENOMEM; diff --git a/kernel/smp.c b/kernel/smp.c index 3c1b328f0a09..db9ab5a58e2c 100644 --- a/kernel/smp.c +++ b/kernel/smp.c @@ -103,15 +103,18 @@ int smpcfd_prepare_cpu(unsigned int cpu) { struct call_function_data *cfd = &per_cpu(cfd_data, cpu); - if (!zalloc_cpumask_var_node(&cfd->cpumask, GFP_KERNEL, + if (!zalloc_cpumask_var_node(&cfd->cpumask, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, cpu_to_node(cpu))) return -ENOMEM; - if (!zalloc_cpumask_var_node(&cfd->cpumask_ipi, GFP_KERNEL, + if (!zalloc_cpumask_var_node(&cfd->cpumask_ipi, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE, cpu_to_node(cpu))) { free_cpumask_var(cfd->cpumask); return -ENOMEM; } - cfd->pcpu = alloc_percpu(struct cfd_percpu); + cfd->pcpu = alloc_percpu_gfp(struct cfd_percpu, + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!cfd->pcpu) { free_cpumask_var(cfd->cpumask); free_cpumask_var(cfd->cpumask_ipi); @@ -179,10 +182,10 @@ static int __init csdlock_debug(char *str) } early_param("csdlock_debug", csdlock_debug); -static DEFINE_PER_CPU(call_single_data_t *, cur_csd); -static DEFINE_PER_CPU(smp_call_func_t, cur_csd_func); -static DEFINE_PER_CPU(void *, cur_csd_info); -static DEFINE_PER_CPU(struct cfd_seq_local, cfd_seq_local); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(call_single_data_t *, cur_csd); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(smp_call_func_t, cur_csd_func); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(void *, cur_csd_info); +static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct cfd_seq_local, cfd_seq_local); #define CSD_LOCK_TIMEOUT (5ULL * NSEC_PER_SEC) static atomic_t csd_bug_count = ATOMIC_INIT(0); diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 2699e9e562b1..9ad7d4569d4b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -1539,7 +1539,8 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer, * gracefully without invoking oom-killer and the system is not * destabilized. */ - mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL; + /* TODO(oweisse): this is a hack to enable ASI tracing. */ + mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_GLOBAL_NONSENSITIVE; /* * If a user thread allocates too much, and si_mem_available() @@ -1718,7 +1719,7 @@ struct trace_buffer *__ring_buffer_alloc(unsigned long size, unsigned flags, /* keep it in its own cache line */ buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!buffer) return NULL; diff --git a/kernel/tracepoint.c b/kernel/tracepoint.c index 64ea283f2f86..0ae6c38ee121 100644 --- a/kernel/tracepoint.c +++ b/kernel/tracepoint.c @@ -107,7 +107,7 @@ static void tp_stub_func(void) static inline void *allocate_probes(int count) { struct tp_probes *p = kmalloc(struct_size(p, probes, count), - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); return p == NULL ? NULL : p->probes; } diff --git a/lib/radix-tree.c b/lib/radix-tree.c index b3afafe46fff..c7d3342a7b30 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -248,8 +248,7 @@ radix_tree_node_alloc(gfp_t gfp_mask, struct radix_tree_node *parent, * cache first for the new node to get accounted to the memory * cgroup. */ - ret = kmem_cache_alloc(radix_tree_node_cachep, - gfp_mask | __GFP_NOWARN); + ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask | __GFP_NOWARN); if (ret) goto out; @@ -1597,9 +1596,10 @@ void __init radix_tree_init(void) BUILD_BUG_ON(RADIX_TREE_MAX_TAGS + __GFP_BITS_SHIFT > 32); BUILD_BUG_ON(ROOT_IS_IDR & ~GFP_ZONEMASK); BUILD_BUG_ON(XA_CHUNK_SIZE > 255); + /*TODO: (oweisse) ASI add SLAB_NONSENSITIVE */ radix_tree_node_cachep = kmem_cache_create("radix_tree_node", sizeof(struct radix_tree_node), 0, - SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, + SLAB_PANIC | SLAB_RECLAIM_ACCOUNT | SLAB_GLOBAL_NONSENSITIVE, radix_tree_node_ctor); ret = cpuhp_setup_state_nocalls(CPUHP_RADIX_DEAD, "lib/radix:dead", NULL, radix_tree_cpu_dead); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index a66d6b222ecf..fbc42e96b157 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5143,20 +5143,21 @@ static struct mem_cgroup *mem_cgroup_alloc(void) size = sizeof(struct mem_cgroup); size += nr_node_ids * sizeof(struct mem_cgroup_per_node *); - memcg = kzalloc(size, GFP_KERNEL); + memcg = kzalloc(size, GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (!memcg) return ERR_PTR(error); memcg->id.id = idr_alloc(&mem_cgroup_idr, NULL, 1, MEM_CGROUP_ID_MAX, - GFP_KERNEL); + GFP_KERNEL | __GFP_GLOBAL_NONSENSITIVE); if (memcg->id.id < 0) { error = memcg->id.id; goto fail; } memcg->vmstats_percpu = alloc_percpu_gfp(struct memcg_vmstats_percpu, - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | + __GFP_GLOBAL_NONSENSITIVE); if (!memcg->vmstats_percpu) goto fail; diff --git a/mm/util.c b/mm/util.c index 741ba32a43ac..0a49e15a0765 100644 --- a/mm/util.c +++ b/mm/util.c @@ -196,7 +196,8 @@ void *vmemdup_user(const void __user *src, size_t len) { void *p; - p = kvmalloc(len, GFP_USER); + /* TODO(oweisse): is this secure? */ + p = kvmalloc(len, GFP_USER | __GFP_LOCAL_NONSENSITIVE); if (!p) return ERR_PTR(-ENOMEM); diff --git a/mm/vmalloc.c b/mm/vmalloc.c index a89866a926f6..659560f286b0 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3309,7 +3309,8 @@ EXPORT_SYMBOL(vzalloc); void *vmalloc_user(unsigned long size) { return __vmalloc_node_range(size, SHMLBA, VMALLOC_START, VMALLOC_END, - GFP_KERNEL | __GFP_ZERO, PAGE_KERNEL, + GFP_KERNEL | __GFP_ZERO + | __GFP_LOCAL_NONSENSITIVE, PAGE_KERNEL, VM_USERMAP, NUMA_NO_NODE, __builtin_return_address(0)); } diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 909db87d7383..ce8c331386fb 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -404,7 +404,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, ? skbuff_fclone_cache : skbuff_head_cache; if (sk_memalloc_socks() && (flags & SKB_ALLOC_RX)) - gfp_mask |= __GFP_MEMALLOC; + gfp_mask |= __GFP_MEMALLOC | __GFP_GLOBAL_NONSENSITIVE; /* Get the HEAD */ if ((flags & (SKB_ALLOC_FCLONE | SKB_ALLOC_NAPI)) == SKB_ALLOC_NAPI && diff --git a/net/core/sock.c b/net/core/sock.c index 41e91d0f7061..6f6e0bd5ebf1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2704,7 +2704,7 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) /* Avoid direct reclaim but allow kswapd to wake */ pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | __GFP_NOWARN | - __GFP_NORETRY, + __GFP_NORETRY | __GFP_GLOBAL_NONSENSITIVE, SKB_FRAG_PAGE_ORDER); if (likely(pfrag->page)) { pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 0be80c213f7f..5b87476566c4 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -111,7 +111,7 @@ int kvm_coalesced_mmio_init(struct kvm *kvm) { struct page *page; - page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENSITIVE); if (!page) return -ENOMEM; diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2ad013b8bde9..40acb841135c 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -306,7 +306,8 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args) if (!kvm_arch_irqfd_allowed(kvm, args)) return -EINVAL; - irqfd = kzalloc(sizeof(*irqfd), GFP_KERNEL_ACCOUNT); + irqfd = kzalloc(sizeof(*irqfd), + GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENSITIVE); if (!irqfd) return -ENOMEM; @@ -813,7 +814,7 @@ static int kvm_assign_ioeventfd_idx(struct kvm *kvm, if (IS_ERR(eventfd)) return PTR_ERR(eventfd); - p = kzalloc(sizeof(*p), GFP_KERNEL_ACCOUNT); + p = kzalloc(sizeof(*p), GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENSITIVE); if (!p) { ret = -ENOMEM; goto fail; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 8d2d76de5bd0..587a75428da8 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -370,6 +370,9 @@ static inline void *mmu_memory_cache_alloc_obj(struct kvm_mmu_memory_cache *mc, gfp_t gfp_flags) { gfp_flags |= mc->gfp_zero; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + gfp_flags |= mc->gfp_asi; +#endif if (mc->kmem_cache) return kmem_cache_alloc(mc->kmem_cache, gfp_flags); @@ -863,7 +866,8 @@ static struct kvm_memslots *kvm_alloc_memslots(void) int i; struct kvm_memslots *slots; - slots = kvzalloc(sizeof(struct kvm_memslots), GFP_KERNEL_ACCOUNT); + slots = kvzalloc(sizeof(struct kvm_memslots), + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!slots) return NULL; @@ -1529,7 +1533,7 @@ static struct kvm_memslots *kvm_dup_memslots(struct kvm_memslots *old, else new_size = kvm_memslots_size(old->used_slots); - slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT); + slots = kvzalloc(new_size, GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (likely(slots)) kvm_copy_memslots(slots, old); @@ -3565,7 +3569,7 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) } BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE); - page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO); + page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENSITIVE); if (!page) { r = -ENOMEM; goto vcpu_free; @@ -4959,7 +4963,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, return -ENOSPC; new_bus = kmalloc(struct_size(bus, range, bus->dev_count + 1), - GFP_KERNEL_ACCOUNT); + GFP_KERNEL_ACCOUNT | __GFP_LOCAL_NONSENSITIVE); if (!new_bus) return -ENOMEM; From patchwork Wed Feb 23 05:22:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756405 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 74577C433F5 for ; Wed, 23 Feb 2022 05:25:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 44F898D0021; Wed, 23 Feb 2022 00:25:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 364E28D0001; Wed, 23 Feb 2022 00:25:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0A3CF8D0021; Wed, 23 Feb 2022 00:25:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167]) by kanga.kvack.org (Postfix) with ESMTP id DE43E8D0001 for ; Wed, 23 Feb 2022 00:25:25 -0500 (EST) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 926E79F5DD for ; Wed, 23 Feb 2022 05:25:25 +0000 (UTC) X-FDA: 79172906610.31.23FAAC9 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf29.hostedemail.com (Postfix) with ESMTP id B2CA6120002 for ; Wed, 23 Feb 2022 05:25:24 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d726bd83a2so91094237b3.20 for ; Tue, 22 Feb 2022 21:25:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=3oY760HlCqyP72oYrU3xHrMJFplDw11mb6Xbtw2c1Ls=; b=D9vuMKstkbgF/aCBQRDDYJkPE6oiAauxsU3dhO3caTQi1Ors7zFJtimuNwMlMum/ya fAcTI4uIzYyzpdE2ZrEf/e06dc5vqx0LmPw1pv8n1QaD5XqmpCV/TIWB6E1NCWUBPvjx kP+mi1IO7e5PuquN9CtvonrWLH58I2l06WKRG94h1m346iPt6qcvJobAmVLWvT/Por/W KXklWi6I38xsiROhwJwrwH5FvLqV78uNDxYi99AIQIOYR6bJ6jvCc3gKNOAsKyQCZ1fM DYgbo1P+vArTAQISb6AwV92QkktvbooeM3ZvW1x8EQlFOcrk6E/f/85+B3yFEDRNVe/6 OS4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=3oY760HlCqyP72oYrU3xHrMJFplDw11mb6Xbtw2c1Ls=; b=qMoO9vWnR9W1FBSSF8Oy4kmZKiqlYhBf626K2WATH2nmS1zbsGxDlhT2GzDJNdhFtx kHP5xE180qaolH6c8HsunWYewzejePhPn2MLaV334ldcGhpIEn+dCBoES4RFCplls4YW M+qrU/zHBoxD6Vxvocj3NGfzD2kstRM9I/o9QO6hMfMnnxVsIyLH+ik3MCMNpGwVQacW 4zGIWOg6mt9PL0MbI2ingEcJLPDo7OBQUkoHZFMqhAlYggVPovT82NwytfFTOWRv9XWM Q1bZ9ws0z4DpifURJB+tXBZ+BbsVVI4SHdEk+LZKRx8IQR7xgBY8jwUjekW9l95ZVbX0 V9xQ== X-Gm-Message-State: AOAM5315x+w0lmmriNNV5ER0A6tqhNpSllWBgdBaPwpkLzmFngqo4P9X 2IRkeTJClIRo9Q3PLEcT3K4yC5D0s9Ia X-Google-Smtp-Source: ABdhPJwRtpaU+ffn34I6BuLI3XRUIkza37lNEpNOPEWvhLHQmrQIvwguXasM2lDIKujjHT1vMgrO3eyensjW X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a81:1186:0:b0:2d6:a30d:fd86 with SMTP id 128-20020a811186000000b002d6a30dfd86mr26352453ywr.160.1645593924065; Tue, 22 Feb 2022 21:25:24 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:20 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-45-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 44/47] kvm: asi: Splitting kvm_vcpu_arch into non/sensitive parts From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=D9vuMKst; spf=pass (imf29.hostedemail.com: domain of 3RMUVYgcKCD0itmZhcrfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3RMUVYgcKCD0itmZhcrfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: B2CA6120002 X-Stat-Signature: ath59f1qc7ybpj6i5b3z446nuhfq3ra4 X-HE-Tag: 1645593924-119367 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse The part that was allocated via ASI LOCAL SENSITIVE is in `struct kvm_vcpu_arch_private`. The rest is in `struct kvm_vcpu_arch`. The latter contains a pointer `private` which is allocated to be ASI non-sensitive from a cache. Signed-off-by: Ofir Weisse --- arch/x86/include/asm/kvm_host.h | 109 ++++++++++++---------- arch/x86/kvm/cpuid.c | 14 +-- arch/x86/kvm/kvm_cache_regs.h | 22 ++--- arch/x86/kvm/mmu.h | 10 +- arch/x86/kvm/mmu/mmu.c | 138 +++++++++++++-------------- arch/x86/kvm/mmu/mmu_internal.h | 2 +- arch/x86/kvm/mmu/paging_tmpl.h | 26 +++--- arch/x86/kvm/mmu/spte.c | 4 +- arch/x86/kvm/mmu/tdp_mmu.c | 14 +-- arch/x86/kvm/svm/nested.c | 34 +++---- arch/x86/kvm/svm/sev.c | 70 +++++++------- arch/x86/kvm/svm/svm.c | 52 +++++------ arch/x86/kvm/trace.h | 10 +- arch/x86/kvm/vmx/nested.c | 68 +++++++------- arch/x86/kvm/vmx/vmx.c | 64 ++++++------- arch/x86/kvm/x86.c | 160 ++++++++++++++++---------------- arch/x86/kvm/x86.h | 2 +- virt/kvm/kvm_main.c | 38 ++++++-- 18 files changed, 436 insertions(+), 401 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 34a05add5e77..d7315f86f85c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -606,14 +606,12 @@ struct kvm_vcpu_xen { u64 runstate_times[4]; }; -struct kvm_vcpu_arch { - /* +struct kvm_vcpu_arch_private { + /* * rip and regs accesses must go through * kvm_{register,rip}_{read,write} functions. */ unsigned long regs[NR_VCPU_REGS]; - u32 regs_avail; - u32 regs_dirty; unsigned long cr0; unsigned long cr0_guest_owned_bits; @@ -623,6 +621,63 @@ struct kvm_vcpu_arch { unsigned long cr4_guest_owned_bits; unsigned long cr4_guest_rsvd_bits; unsigned long cr8; + + /* + * QEMU userspace and the guest each have their own FPU state. + * In vcpu_run, we switch between the user and guest FPU contexts. + * While running a VCPU, the VCPU thread will have the guest FPU + * context. + * + * Note that while the PKRU state lives inside the fpu registers, + * it is switched out separately at VMENTER and VMEXIT time. The + * "guest_fpstate" state here contains the guest FPU context, with the + * host PRKU bits. + */ + struct fpu_guest guest_fpu; + + u64 xcr0; + u64 guest_supported_xcr0; + + /* + * Paging state of the vcpu + * + * If the vcpu runs in guest mode with two level paging this still saves + * the paging mode of the l1 guest. This context is always used to + * handle faults. + */ + struct kvm_mmu *mmu; + + /* Non-nested MMU for L1 */ + struct kvm_mmu root_mmu; + + /* L1 MMU when running nested */ + struct kvm_mmu guest_mmu; + + /* + * Pointer to the mmu context currently used for + * gva_to_gpa translations. + */ + struct kvm_mmu *walk_mmu; + + /* + * Paging state of an L2 guest (used for nested npt) + * + * This context will save all necessary information to walk page tables + * of an L2 guest. This context is only initialized for page table + * walking and not for faulting since we never handle l2 page faults on + * the host. + */ + struct kvm_mmu nested_mmu; + + struct x86_emulate_ctxt *emulate_ctxt; +}; + +struct kvm_vcpu_arch { + struct kvm_vcpu_arch_private *private; + + u32 regs_avail; + u32 regs_dirty; + u32 host_pkru; u32 pkru; u32 hflags; @@ -645,36 +700,6 @@ struct kvm_vcpu_arch { u64 arch_capabilities; u64 perf_capabilities; - /* - * Paging state of the vcpu - * - * If the vcpu runs in guest mode with two level paging this still saves - * the paging mode of the l1 guest. This context is always used to - * handle faults. - */ - struct kvm_mmu *mmu; - - /* Non-nested MMU for L1 */ - struct kvm_mmu root_mmu; - - /* L1 MMU when running nested */ - struct kvm_mmu guest_mmu; - - /* - * Paging state of an L2 guest (used for nested npt) - * - * This context will save all necessary information to walk page tables - * of an L2 guest. This context is only initialized for page table - * walking and not for faulting since we never handle l2 page faults on - * the host. - */ - struct kvm_mmu nested_mmu; - - /* - * Pointer to the mmu context currently used for - * gva_to_gpa translations. - */ - struct kvm_mmu *walk_mmu; struct kvm_mmu_memory_cache mmu_pte_list_desc_cache; struct kvm_mmu_memory_cache mmu_shadow_page_cache; @@ -683,21 +708,6 @@ struct kvm_vcpu_arch { struct asi_pgtbl_pool asi_pgtbl_pool; - /* - * QEMU userspace and the guest each have their own FPU state. - * In vcpu_run, we switch between the user and guest FPU contexts. - * While running a VCPU, the VCPU thread will have the guest FPU - * context. - * - * Note that while the PKRU state lives inside the fpu registers, - * it is switched out separately at VMENTER and VMEXIT time. The - * "guest_fpstate" state here contains the guest FPU context, with the - * host PRKU bits. - */ - struct fpu_guest guest_fpu; - - u64 xcr0; - u64 guest_supported_xcr0; struct kvm_pio_request pio; void *pio_data; @@ -734,7 +744,6 @@ struct kvm_vcpu_arch { /* emulate context */ - struct x86_emulate_ctxt *emulate_ctxt; bool emulate_regs_need_sync_to_vcpu; bool emulate_regs_need_sync_from_vcpu; int (*complete_userspace_io)(struct kvm_vcpu *vcpu); diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index dedabfdd292e..7192cbe06ba3 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -169,12 +169,12 @@ void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu) best = kvm_find_cpuid_entry(vcpu, 0xD, 0); if (best) - best->ebx = xstate_required_size(vcpu->arch.xcr0, false); + best->ebx = xstate_required_size(vcpu->arch.private->xcr0, false); best = kvm_find_cpuid_entry(vcpu, 0xD, 1); if (best && (cpuid_entry_has(best, X86_FEATURE_XSAVES) || cpuid_entry_has(best, X86_FEATURE_XSAVEC))) - best->ebx = xstate_required_size(vcpu->arch.xcr0, true); + best->ebx = xstate_required_size(vcpu->arch.private->xcr0, true); best = kvm_find_kvm_cpuid_features(vcpu); if (kvm_hlt_in_guest(vcpu->kvm) && best && @@ -208,9 +208,9 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) best = kvm_find_cpuid_entry(vcpu, 0xD, 0); if (!best) - vcpu->arch.guest_supported_xcr0 = 0; + vcpu->arch.private->guest_supported_xcr0 = 0; else - vcpu->arch.guest_supported_xcr0 = + vcpu->arch.private->guest_supported_xcr0 = (best->eax | ((u64)best->edx << 32)) & supported_xcr0; /* @@ -223,8 +223,8 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) */ best = kvm_find_cpuid_entry(vcpu, 0x12, 0x1); if (best) { - best->ecx &= vcpu->arch.guest_supported_xcr0 & 0xffffffff; - best->edx &= vcpu->arch.guest_supported_xcr0 >> 32; + best->ecx &= vcpu->arch.private->guest_supported_xcr0 & 0xffffffff; + best->edx &= vcpu->arch.private->guest_supported_xcr0 >> 32; best->ecx |= XFEATURE_MASK_FPSSE; } @@ -234,7 +234,7 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu) vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu); kvm_pmu_refresh(vcpu); - vcpu->arch.cr4_guest_rsvd_bits = + vcpu->arch.private->cr4_guest_rsvd_bits = __cr4_reserved_bits(guest_cpuid_has, vcpu); kvm_hv_set_cpuid(vcpu); diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 90e1ffdc05b7..592780402160 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -12,12 +12,12 @@ #define BUILD_KVM_GPR_ACCESSORS(lname, uname) \ static __always_inline unsigned long kvm_##lname##_read(struct kvm_vcpu *vcpu)\ { \ - return vcpu->arch.regs[VCPU_REGS_##uname]; \ + return vcpu->arch.private->regs[VCPU_REGS_##uname]; \ } \ static __always_inline void kvm_##lname##_write(struct kvm_vcpu *vcpu, \ unsigned long val) \ { \ - vcpu->arch.regs[VCPU_REGS_##uname] = val; \ + vcpu->arch.private->regs[VCPU_REGS_##uname] = val; \ } BUILD_KVM_GPR_ACCESSORS(rax, RAX) BUILD_KVM_GPR_ACCESSORS(rbx, RBX) @@ -82,7 +82,7 @@ static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int reg if (!kvm_register_is_available(vcpu, reg)) static_call(kvm_x86_cache_reg)(vcpu, reg); - return vcpu->arch.regs[reg]; + return vcpu->arch.private->regs[reg]; } static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg, @@ -91,7 +91,7 @@ static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg, if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS)) return; - vcpu->arch.regs[reg] = val; + vcpu->arch.private->regs[reg] = val; kvm_register_mark_dirty(vcpu, reg); } @@ -122,21 +122,21 @@ static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index) if (!kvm_register_is_available(vcpu, VCPU_EXREG_PDPTR)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_PDPTR); - return vcpu->arch.walk_mmu->pdptrs[index]; + return vcpu->arch.private->walk_mmu->pdptrs[index]; } static inline void kvm_pdptr_write(struct kvm_vcpu *vcpu, int index, u64 value) { - vcpu->arch.walk_mmu->pdptrs[index] = value; + vcpu->arch.private->walk_mmu->pdptrs[index] = value; } static inline ulong kvm_read_cr0_bits(struct kvm_vcpu *vcpu, ulong mask) { ulong tmask = mask & KVM_POSSIBLE_CR0_GUEST_BITS; - if ((tmask & vcpu->arch.cr0_guest_owned_bits) && + if ((tmask & vcpu->arch.private->cr0_guest_owned_bits) && !kvm_register_is_available(vcpu, VCPU_EXREG_CR0)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR0); - return vcpu->arch.cr0 & mask; + return vcpu->arch.private->cr0 & mask; } static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu) @@ -147,17 +147,17 @@ static inline ulong kvm_read_cr0(struct kvm_vcpu *vcpu) static inline ulong kvm_read_cr4_bits(struct kvm_vcpu *vcpu, ulong mask) { ulong tmask = mask & KVM_POSSIBLE_CR4_GUEST_BITS; - if ((tmask & vcpu->arch.cr4_guest_owned_bits) && + if ((tmask & vcpu->arch.private->cr4_guest_owned_bits) && !kvm_register_is_available(vcpu, VCPU_EXREG_CR4)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR4); - return vcpu->arch.cr4 & mask; + return vcpu->arch.private->cr4 & mask; } static inline ulong kvm_read_cr3(struct kvm_vcpu *vcpu) { if (!kvm_register_is_available(vcpu, VCPU_EXREG_CR3)) static_call(kvm_x86_cache_reg)(vcpu, VCPU_EXREG_CR3); - return vcpu->arch.cr3; + return vcpu->arch.private->cr3; } static inline ulong kvm_read_cr4(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 60b84331007d..aea21355580d 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -89,7 +89,7 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) { - if (likely(vcpu->arch.mmu->root_hpa != INVALID_PAGE)) + if (likely(vcpu->arch.private->mmu->root_hpa != INVALID_PAGE)) return 0; return kvm_mmu_load(vcpu); @@ -111,13 +111,13 @@ static inline unsigned long kvm_get_active_pcid(struct kvm_vcpu *vcpu) static inline void kvm_mmu_load_pgd(struct kvm_vcpu *vcpu) { - u64 root_hpa = vcpu->arch.mmu->root_hpa; + u64 root_hpa = vcpu->arch.private->mmu->root_hpa; if (!VALID_PAGE(root_hpa)) return; static_call(kvm_x86_load_mmu_pgd)(vcpu, root_hpa, - vcpu->arch.mmu->shadow_root_level); + vcpu->arch.private->mmu->shadow_root_level); } struct kvm_page_fault { @@ -193,7 +193,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, .rsvd = err & PFERR_RSVD_MASK, .user = err & PFERR_USER_MASK, .prefetch = prefetch, - .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), + .is_tdp = likely(vcpu->arch.private->mmu->page_fault == kvm_tdp_page_fault), .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(), .max_level = KVM_MAX_HUGEPAGE_LEVEL, @@ -204,7 +204,7 @@ static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, if (fault.is_tdp) return kvm_tdp_page_fault(vcpu, &fault); #endif - return vcpu->arch.mmu->page_fault(vcpu, &fault); + return vcpu->arch.private->mmu->page_fault(vcpu, &fault); } /* diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index a2ada1104c2d..e36171f69b8e 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -704,7 +704,7 @@ static bool mmu_spte_age(u64 *sptep) static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu) { - if (is_tdp_mmu(vcpu->arch.mmu)) { + if (is_tdp_mmu(vcpu->arch.private->mmu)) { kvm_tdp_mmu_walk_lockless_begin(); } else { /* @@ -723,7 +723,7 @@ static void walk_shadow_page_lockless_begin(struct kvm_vcpu *vcpu) static void walk_shadow_page_lockless_end(struct kvm_vcpu *vcpu) { - if (is_tdp_mmu(vcpu->arch.mmu)) { + if (is_tdp_mmu(vcpu->arch.private->mmu)) { kvm_tdp_mmu_walk_lockless_end(); } else { /* @@ -1909,7 +1909,7 @@ static void kvm_mmu_commit_zap_page(struct kvm *kvm, static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, struct list_head *invalid_list) { - int ret = vcpu->arch.mmu->sync_page(vcpu, sp); + int ret = vcpu->arch.private->mmu->sync_page(vcpu, sp); if (ret < 0) { kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); @@ -2081,7 +2081,7 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, int direct, unsigned int access) { - bool direct_mmu = vcpu->arch.mmu->direct_map; + bool direct_mmu = vcpu->arch.private->mmu->direct_map; union kvm_mmu_page_role role; struct hlist_head *sp_list; unsigned quadrant; @@ -2089,13 +2089,13 @@ static struct kvm_mmu_page *kvm_mmu_get_page(struct kvm_vcpu *vcpu, int collisions = 0; LIST_HEAD(invalid_list); - role = vcpu->arch.mmu->mmu_role.base; + role = vcpu->arch.private->mmu->mmu_role.base; role.level = level; role.direct = direct; if (role.direct) role.gpte_is_8_bytes = true; role.access = access; - if (!direct_mmu && vcpu->arch.mmu->root_level <= PT32_ROOT_LEVEL) { + if (!direct_mmu && vcpu->arch.private->mmu->root_level <= PT32_ROOT_LEVEL) { quadrant = gaddr >> (PAGE_SHIFT + (PT64_PT_BITS * level)); quadrant &= (1 << ((PT32_PT_BITS - PT64_PT_BITS) * level)) - 1; role.quadrant = quadrant; @@ -2181,11 +2181,11 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato { iterator->addr = addr; iterator->shadow_addr = root; - iterator->level = vcpu->arch.mmu->shadow_root_level; + iterator->level = vcpu->arch.private->mmu->shadow_root_level; if (iterator->level >= PT64_ROOT_4LEVEL && - vcpu->arch.mmu->root_level < PT64_ROOT_4LEVEL && - !vcpu->arch.mmu->direct_map) + vcpu->arch.private->mmu->root_level < PT64_ROOT_4LEVEL && + !vcpu->arch.private->mmu->direct_map) iterator->level = PT32E_ROOT_LEVEL; if (iterator->level == PT32E_ROOT_LEVEL) { @@ -2193,10 +2193,10 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato * prev_root is currently only used for 64-bit hosts. So only * the active root_hpa is valid here. */ - BUG_ON(root != vcpu->arch.mmu->root_hpa); + BUG_ON(root != vcpu->arch.private->mmu->root_hpa); iterator->shadow_addr - = vcpu->arch.mmu->pae_root[(addr >> 30) & 3]; + = vcpu->arch.private->mmu->pae_root[(addr >> 30) & 3]; iterator->shadow_addr &= PT64_BASE_ADDR_MASK; --iterator->level; if (!iterator->shadow_addr) @@ -2207,7 +2207,7 @@ static void shadow_walk_init_using_root(struct kvm_shadow_walk_iterator *iterato static void shadow_walk_init(struct kvm_shadow_walk_iterator *iterator, struct kvm_vcpu *vcpu, u64 addr) { - shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.mmu->root_hpa, + shadow_walk_init_using_root(iterator, vcpu, vcpu->arch.private->mmu->root_hpa, addr); } @@ -2561,7 +2561,7 @@ static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva) gpa_t gpa; int r; - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) return 0; gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL); @@ -3186,7 +3186,7 @@ static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) do { u64 new_spte; - if (is_tdp_mmu(vcpu->arch.mmu)) + if (is_tdp_mmu(vcpu->arch.private->mmu)) sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte); else sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte); @@ -3393,7 +3393,7 @@ static hpa_t mmu_alloc_root(struct kvm_vcpu *vcpu, gfn_t gfn, gva_t gva, static int mmu_alloc_direct_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; u8 shadow_root_level = mmu->shadow_root_level; hpa_t root; unsigned i; @@ -3501,7 +3501,7 @@ static int mmu_first_shadow_root_alloc(struct kvm *kvm) static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; u64 pdptrs[4], pm_mask; gfn_t root_gfn, root_pgd; hpa_t root; @@ -3611,7 +3611,7 @@ static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) static int mmu_alloc_special_roots(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; bool need_pml5 = mmu->shadow_root_level > PT64_ROOT_4LEVEL; u64 *pml5_root = NULL; u64 *pml4_root = NULL; @@ -3712,16 +3712,16 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) int i; struct kvm_mmu_page *sp; - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) return; - if (!VALID_PAGE(vcpu->arch.mmu->root_hpa)) + if (!VALID_PAGE(vcpu->arch.private->mmu->root_hpa)) return; vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY); - if (vcpu->arch.mmu->root_level >= PT64_ROOT_4LEVEL) { - hpa_t root = vcpu->arch.mmu->root_hpa; + if (vcpu->arch.private->mmu->root_level >= PT64_ROOT_4LEVEL) { + hpa_t root = vcpu->arch.private->mmu->root_hpa; sp = to_shadow_page(root); if (!is_unsync_root(root)) @@ -3741,7 +3741,7 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC); for (i = 0; i < 4; ++i) { - hpa_t root = vcpu->arch.mmu->pae_root[i]; + hpa_t root = vcpu->arch.private->mmu->pae_root[i]; if (IS_VALID_PAE_ROOT(root)) { root &= PT64_BASE_ADDR_MASK; @@ -3760,11 +3760,11 @@ void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu) int i; for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) - if (is_unsync_root(vcpu->arch.mmu->prev_roots[i].hpa)) + if (is_unsync_root(vcpu->arch.private->mmu->prev_roots[i].hpa)) roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i); /* sync prev_roots by simply freeing them */ - kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free); + kvm_mmu_free_roots(vcpu, vcpu->arch.private->mmu, roots_to_free); } static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr, @@ -3781,7 +3781,7 @@ static gpa_t nonpaging_gva_to_gpa_nested(struct kvm_vcpu *vcpu, gpa_t vaddr, { if (exception) exception->error_code = 0; - return vcpu->arch.nested_mmu.translate_gpa(vcpu, vaddr, access, exception); + return vcpu->arch.private->nested_mmu.translate_gpa(vcpu, vaddr, access, exception); } static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) @@ -3834,7 +3834,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) walk_shadow_page_lockless_begin(vcpu); - if (is_tdp_mmu(vcpu->arch.mmu)) + if (is_tdp_mmu(vcpu->arch.private->mmu)) leaf = kvm_tdp_mmu_get_walk(vcpu, addr, sptes, &root); else leaf = get_walk(vcpu, addr, sptes, &root); @@ -3857,7 +3857,7 @@ static bool get_mmio_spte(struct kvm_vcpu *vcpu, u64 addr, u64 *sptep) if (!is_shadow_present_pte(sptes[leaf])) leaf++; - rsvd_check = &vcpu->arch.mmu->shadow_zero_check; + rsvd_check = &vcpu->arch.private->mmu->shadow_zero_check; for (level = root; level >= leaf; level--) reserved |= is_rsvd_spte(rsvd_check, sptes[level], level); @@ -3945,8 +3945,8 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, arch.token = (vcpu->arch.apf.id++ << 12) | vcpu->vcpu_id; arch.gfn = gfn; - arch.direct_map = vcpu->arch.mmu->direct_map; - arch.cr3 = vcpu->arch.mmu->get_guest_pgd(vcpu); + arch.direct_map = vcpu->arch.private->mmu->direct_map; + arch.cr3 = vcpu->arch.private->mmu->get_guest_pgd(vcpu); return kvm_setup_async_pf(vcpu, cr2_or_gpa, kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); @@ -4029,7 +4029,7 @@ static void vcpu_fill_asi_pgtbl_pool(struct kvm_vcpu *vcpu) static bool is_page_fault_stale(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int mmu_seq) { - struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root_hpa); + struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.private->mmu->root_hpa); /* Special roots, e.g. pae_root, are not backed by shadow pages. */ if (sp && is_obsolete_sp(vcpu->kvm, sp)) @@ -4052,7 +4052,7 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu, static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu); + bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.private->mmu); unsigned long mmu_seq; bool try_asi_map; @@ -4206,7 +4206,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd, { uint i; struct kvm_mmu_root_info root; - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; root.pgd = mmu->root_pgd; root.hpa = mmu->root_hpa; @@ -4230,7 +4230,7 @@ static bool cached_root_available(struct kvm_vcpu *vcpu, gpa_t new_pgd, static bool fast_pgd_switch(struct kvm_vcpu *vcpu, gpa_t new_pgd, union kvm_mmu_page_role new_role) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; /* * For now, limit the fast switch to 64-bit hosts+VMs in order to avoid @@ -4248,7 +4248,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, union kvm_mmu_page_role new_role) { if (!fast_pgd_switch(vcpu, new_pgd, new_role)) { - kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, KVM_MMU_ROOT_CURRENT); + kvm_mmu_free_roots(vcpu, vcpu->arch.private->mmu, KVM_MMU_ROOT_CURRENT); return; } @@ -4279,7 +4279,7 @@ static void __kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd, */ if (!new_role.direct) __clear_sp_write_flooding_count( - to_shadow_page(vcpu->arch.mmu->root_hpa)); + to_shadow_page(vcpu->arch.private->mmu->root_hpa)); } void kvm_mmu_new_pgd(struct kvm_vcpu *vcpu, gpa_t new_pgd) @@ -4826,7 +4826,7 @@ kvm_calc_tdp_mmu_root_page_role(struct kvm_vcpu *vcpu, static void init_kvm_tdp_mmu(struct kvm_vcpu *vcpu) { - struct kvm_mmu *context = &vcpu->arch.root_mmu; + struct kvm_mmu *context = &vcpu->arch.private->root_mmu; struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu); union kvm_mmu_role new_role = kvm_calc_tdp_mmu_root_page_role(vcpu, ®s, false); @@ -4914,7 +4914,7 @@ static void shadow_mmu_init_context(struct kvm_vcpu *vcpu, struct kvm_mmu *conte static void kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu_role_regs *regs) { - struct kvm_mmu *context = &vcpu->arch.root_mmu; + struct kvm_mmu *context = &vcpu->arch.private->root_mmu; union kvm_mmu_role new_role = kvm_calc_shadow_mmu_root_page_role(vcpu, regs, false); @@ -4937,7 +4937,7 @@ kvm_calc_shadow_npt_root_page_role(struct kvm_vcpu *vcpu, void kvm_init_shadow_npt_mmu(struct kvm_vcpu *vcpu, unsigned long cr0, unsigned long cr4, u64 efer, gpa_t nested_cr3) { - struct kvm_mmu *context = &vcpu->arch.guest_mmu; + struct kvm_mmu *context = &vcpu->arch.private->guest_mmu; struct kvm_mmu_role_regs regs = { .cr0 = cr0, .cr4 = cr4 & ~X86_CR4_PKE, @@ -4960,7 +4960,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, union kvm_mmu_role role = {0}; /* SMM flag is inherited from root_mmu */ - role.base.smm = vcpu->arch.root_mmu.mmu_role.base.smm; + role.base.smm = vcpu->arch.private->root_mmu.mmu_role.base.smm; role.base.level = level; role.base.gpte_is_8_bytes = true; @@ -4980,7 +4980,7 @@ kvm_calc_shadow_ept_root_page_role(struct kvm_vcpu *vcpu, bool accessed_dirty, void kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, bool execonly, bool accessed_dirty, gpa_t new_eptp) { - struct kvm_mmu *context = &vcpu->arch.guest_mmu; + struct kvm_mmu *context = &vcpu->arch.private->guest_mmu; u8 level = vmx_eptp_page_walk_level(new_eptp); union kvm_mmu_role new_role = kvm_calc_shadow_ept_root_page_role(vcpu, accessed_dirty, @@ -5012,7 +5012,7 @@ EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu); static void init_kvm_softmmu(struct kvm_vcpu *vcpu) { - struct kvm_mmu *context = &vcpu->arch.root_mmu; + struct kvm_mmu *context = &vcpu->arch.private->root_mmu; struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu); kvm_init_shadow_mmu(vcpu, ®s); @@ -5043,7 +5043,7 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) { struct kvm_mmu_role_regs regs = vcpu_to_role_regs(vcpu); union kvm_mmu_role new_role = kvm_calc_nested_mmu_role(vcpu, ®s); - struct kvm_mmu *g_context = &vcpu->arch.nested_mmu; + struct kvm_mmu *g_context = &vcpu->arch.private->nested_mmu; if (new_role.as_u64 == g_context->mmu_role.as_u64) return; @@ -5061,9 +5061,9 @@ static void init_kvm_nested_mmu(struct kvm_vcpu *vcpu) g_context->invlpg = NULL; /* - * Note that arch.mmu->gva_to_gpa translates l2_gpa to l1_gpa using + * Note that arch.private->mmu->gva_to_gpa translates l2_gpa to l1_gpa using * L1's nested page tables (e.g. EPT12). The nested translation - * of l2_gva to l1_gpa is done by arch.nested_mmu.gva_to_gpa using + * of l2_gva to l1_gpa is done by arch.private->nested_mmu.gva_to_gpa using * L2's page tables as the first level of translation and L1's * nested page tables as the second level of translation. Basically * the gva_to_gpa functions between mmu and nested_mmu are swapped. @@ -5119,9 +5119,9 @@ void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu) * problem is swept under the rug; KVM's CPUID API is horrific and * it's all but impossible to solve it without introducing a new API. */ - vcpu->arch.root_mmu.mmu_role.ext.valid = 0; - vcpu->arch.guest_mmu.mmu_role.ext.valid = 0; - vcpu->arch.nested_mmu.mmu_role.ext.valid = 0; + vcpu->arch.private->root_mmu.mmu_role.ext.valid = 0; + vcpu->arch.private->guest_mmu.mmu_role.ext.valid = 0; + vcpu->arch.private->nested_mmu.mmu_role.ext.valid = 0; kvm_mmu_reset_context(vcpu); /* @@ -5142,13 +5142,13 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) { int r; - r = mmu_topup_memory_caches(vcpu, !vcpu->arch.mmu->direct_map); + r = mmu_topup_memory_caches(vcpu, !vcpu->arch.private->mmu->direct_map); if (r) goto out; r = mmu_alloc_special_roots(vcpu); if (r) goto out; - if (vcpu->arch.mmu->direct_map) + if (vcpu->arch.private->mmu->direct_map) r = mmu_alloc_direct_roots(vcpu); else r = mmu_alloc_shadow_roots(vcpu); @@ -5165,10 +5165,10 @@ int kvm_mmu_load(struct kvm_vcpu *vcpu) void kvm_mmu_unload(struct kvm_vcpu *vcpu) { - kvm_mmu_free_roots(vcpu, &vcpu->arch.root_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.root_mmu.root_hpa)); - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); - WARN_ON(VALID_PAGE(vcpu->arch.guest_mmu.root_hpa)); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->root_mmu, KVM_MMU_ROOTS_ALL); + WARN_ON(VALID_PAGE(vcpu->arch.private->root_mmu.root_hpa)); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_ALL); + WARN_ON(VALID_PAGE(vcpu->arch.private->guest_mmu.root_hpa)); } static bool need_remote_flush(u64 old, u64 new) @@ -5351,9 +5351,9 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, void *insn, int insn_len) { int r, emulation_type = EMULTYPE_PF; - bool direct = vcpu->arch.mmu->direct_map; + bool direct = vcpu->arch.private->mmu->direct_map; - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.private->mmu->root_hpa))) return RET_PF_RETRY; r = RET_PF_INVALID; @@ -5382,14 +5382,14 @@ int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code, * paging in both guests. If true, we simply unprotect the page * and resume the guest. */ - if (vcpu->arch.mmu->direct_map && + if (vcpu->arch.private->mmu->direct_map && (error_code & PFERR_NESTED_GUEST_PAGE) == PFERR_NESTED_GUEST_PAGE) { kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(cr2_or_gpa)); return 1; } /* - * vcpu->arch.mmu.page_fault returned RET_PF_EMULATE, but we can still + * vcpu->arch.private->mmu.page_fault returned RET_PF_EMULATE, but we can still * optimistically try to just unprotect the page and let the processor * re-execute the instruction that caused the page fault. Do not allow * retrying MMIO emulation, as it's not only pointless but could also @@ -5412,8 +5412,8 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, { int i; - /* It's actually a GPA for vcpu->arch.guest_mmu. */ - if (mmu != &vcpu->arch.guest_mmu) { + /* It's actually a GPA for vcpu->arch.private->guest_mmu. */ + if (mmu != &vcpu->arch.private->guest_mmu) { /* INVLPG on a non-canonical address is a NOP according to the SDM. */ if (is_noncanonical_address(gva, vcpu)) return; @@ -5448,7 +5448,7 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva) { - kvm_mmu_invalidate_gva(vcpu, vcpu->arch.walk_mmu, gva, INVALID_PAGE); + kvm_mmu_invalidate_gva(vcpu, vcpu->arch.private->walk_mmu, gva, INVALID_PAGE); ++vcpu->stat.invlpg; } EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); @@ -5456,7 +5456,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_invlpg); void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t gva, unsigned long pcid) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; bool tlb_flush = false; uint i; @@ -5638,24 +5638,24 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu) vcpu->arch.mmu_shadow_page_cache.gfp_asi = 0; #endif - vcpu->arch.mmu = &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; + vcpu->arch.private->mmu = &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu = &vcpu->arch.private->root_mmu; - vcpu->arch.nested_mmu.translate_gpa = translate_nested_gpa; + vcpu->arch.private->nested_mmu.translate_gpa = translate_nested_gpa; asi_init_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); - ret = __kvm_mmu_create(vcpu, &vcpu->arch.guest_mmu); + ret = __kvm_mmu_create(vcpu, &vcpu->arch.private->guest_mmu); if (ret) return ret; - ret = __kvm_mmu_create(vcpu, &vcpu->arch.root_mmu); + ret = __kvm_mmu_create(vcpu, &vcpu->arch.private->root_mmu); if (ret) goto fail_allocate_root; return ret; fail_allocate_root: - free_mmu_pages(&vcpu->arch.guest_mmu); + free_mmu_pages(&vcpu->arch.private->guest_mmu); return ret; } @@ -6261,8 +6261,8 @@ unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm) void kvm_mmu_destroy(struct kvm_vcpu *vcpu) { kvm_mmu_unload(vcpu); - free_mmu_pages(&vcpu->arch.root_mmu); - free_mmu_pages(&vcpu->arch.guest_mmu); + free_mmu_pages(&vcpu->arch.private->root_mmu); + free_mmu_pages(&vcpu->arch.private->guest_mmu); mmu_free_memory_caches(vcpu); asi_clear_pgtbl_pool(&vcpu->arch.asi_pgtbl_pool); } diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h index 52c6527b1a06..57ec9dd147da 100644 --- a/arch/x86/kvm/mmu/mmu_internal.h +++ b/arch/x86/kvm/mmu/mmu_internal.h @@ -114,7 +114,7 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct kvm_vcpu *vcpu) * being enabled is mandatory as the bits used to denote WP-only SPTEs * are reserved for NPT w/ PAE (32-bit KVM). */ - return vcpu->arch.mmu == &vcpu->arch.guest_mmu && + return vcpu->arch.private->mmu == &vcpu->arch.private->guest_mmu && kvm_x86_ops.cpu_dirty_log_size; } diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h index 193317ad60a4..c39a1a870a2b 100644 --- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -194,11 +194,11 @@ static bool FNAME(prefetch_invalid_gpte)(struct kvm_vcpu *vcpu, goto no_present; /* if accessed bit is not supported prefetch non accessed gpte */ - if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.mmu) && + if (PT_HAVE_ACCESSED_DIRTY(vcpu->arch.private->mmu) && !(gpte & PT_GUEST_ACCESSED_MASK)) goto no_present; - if (FNAME(is_rsvd_bits_set)(vcpu->arch.mmu, gpte, PG_LEVEL_4K)) + if (FNAME(is_rsvd_bits_set)(vcpu->arch.private->mmu, gpte, PG_LEVEL_4K)) goto no_present; return false; @@ -533,7 +533,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, } #endif walker->fault.address = addr; - walker->fault.nested_page_fault = mmu != vcpu->arch.walk_mmu; + walker->fault.nested_page_fault = mmu != vcpu->arch.private->walk_mmu; walker->fault.async_page_fault = false; trace_kvm_mmu_walker_error(walker->fault.error_code); @@ -543,7 +543,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker, static int FNAME(walk_addr)(struct guest_walker *walker, struct kvm_vcpu *vcpu, gpa_t addr, u32 access) { - return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.mmu, addr, + return FNAME(walk_addr_generic)(walker, vcpu, vcpu->arch.private->mmu, addr, access); } @@ -552,7 +552,7 @@ static int FNAME(walk_addr_nested)(struct guest_walker *walker, struct kvm_vcpu *vcpu, gva_t addr, u32 access) { - return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.nested_mmu, + return FNAME(walk_addr_generic)(walker, vcpu, &vcpu->arch.private->nested_mmu, addr, access); } #endif @@ -573,7 +573,7 @@ FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, gfn = gpte_to_gfn(gpte); pte_access = sp->role.access & FNAME(gpte_access)(gpte); - FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); + FNAME(protect_clean_gpte)(vcpu->arch.private->mmu, &pte_access, gpte); slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log && (pte_access & ACC_WRITE_MASK)); @@ -670,7 +670,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, WARN_ON_ONCE(gw->gfn != base_gfn); direct_access = gw->pte_access; - top_level = vcpu->arch.mmu->root_level; + top_level = vcpu->arch.private->mmu->root_level; if (top_level == PT32E_ROOT_LEVEL) top_level = PT32_ROOT_LEVEL; /* @@ -682,7 +682,7 @@ static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, if (FNAME(gpte_changed)(vcpu, gw, top_level)) goto out_gpte_changed; - if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) + if (WARN_ON(!VALID_PAGE(vcpu->arch.private->mmu->root_hpa))) goto out_gpte_changed; for (shadow_walk_init(&it, vcpu, fault->addr); @@ -806,7 +806,7 @@ FNAME(is_self_change_mapping)(struct kvm_vcpu *vcpu, bool self_changed = false; if (!(walker->pte_access & ACC_WRITE_MASK || - (!is_cr0_wp(vcpu->arch.mmu) && !user_fault))) + (!is_cr0_wp(vcpu->arch.private->mmu) && !user_fault))) return false; for (level = walker->level; level <= walker->max_level; level++) { @@ -905,7 +905,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * we will cache the incorrect access into mmio spte. */ if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) && - !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) { + !is_cr0_wp(vcpu->arch.private->mmu) && !fault->user && fault->slot) { walker.pte_access |= ACC_WRITE_MASK; walker.pte_access &= ~ACC_USER_MASK; @@ -915,7 +915,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault * then we should prevent the kernel from executing it * if SMEP is enabled. */ - if (is_cr4_smep(vcpu->arch.mmu)) + if (is_cr4_smep(vcpu->arch.private->mmu)) walker.pte_access &= ~ACC_EXEC_MASK; } @@ -1071,7 +1071,7 @@ static gpa_t FNAME(gva_to_gpa_nested)(struct kvm_vcpu *vcpu, gpa_t vaddr, */ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) { - union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base; + union kvm_mmu_page_role mmu_role = vcpu->arch.private->mmu->mmu_role.base; int i; bool host_writable; gpa_t first_pte_gpa; @@ -1129,7 +1129,7 @@ static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) gfn = gpte_to_gfn(gpte); pte_access = sp->role.access; pte_access &= FNAME(gpte_access)(gpte); - FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); + FNAME(protect_clean_gpte)(vcpu->arch.private->mmu, &pte_access, gpte); if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) continue; diff --git a/arch/x86/kvm/mmu/spte.c b/arch/x86/kvm/mmu/spte.c index 13038fae5088..df14b6639b35 100644 --- a/arch/x86/kvm/mmu/spte.c +++ b/arch/x86/kvm/mmu/spte.c @@ -177,9 +177,9 @@ bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, if (prefetch) spte = mark_spte_for_access_track(spte); - WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), + WARN_ONCE(is_rsvd_spte(&vcpu->arch.private->mmu->shadow_zero_check, spte, level), "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, - get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); + get_rsvd_bits(&vcpu->arch.private->mmu->shadow_zero_check, spte, level)); if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { /* Enforced by kvm_mmu_hugepage_adjust. */ diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 1beb4ca90560..c3634ac01869 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -162,7 +162,7 @@ static union kvm_mmu_page_role page_role_for_level(struct kvm_vcpu *vcpu, { union kvm_mmu_page_role role; - role = vcpu->arch.mmu->mmu_role.base; + role = vcpu->arch.private->mmu->mmu_role.base; role.level = level; role.direct = true; role.gpte_is_8_bytes = true; @@ -198,7 +198,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) lockdep_assert_held_write(&kvm->mmu_lock); - role = page_role_for_level(vcpu, vcpu->arch.mmu->shadow_root_level); + role = page_role_for_level(vcpu, vcpu->arch.private->mmu->shadow_root_level); /* Check for an existing root before allocating a new one. */ for_each_tdp_mmu_root(kvm, root, kvm_mmu_role_as_id(role)) { @@ -207,7 +207,7 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu) goto out; } - root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.mmu->shadow_root_level); + root = alloc_tdp_mmu_page(vcpu, 0, vcpu->arch.private->mmu->shadow_root_level); refcount_set(&root->tdp_mmu_root_count, 1); spin_lock(&kvm->arch.tdp_mmu_pages_lock); @@ -952,7 +952,7 @@ static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, */ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; struct tdp_iter iter; struct kvm_mmu_page *sp; u64 *child_pt; @@ -1486,11 +1486,11 @@ int kvm_tdp_mmu_get_walk(struct kvm_vcpu *vcpu, u64 addr, u64 *sptes, int *root_level) { struct tdp_iter iter; - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; gfn_t gfn = addr >> PAGE_SHIFT; int leaf = -1; - *root_level = vcpu->arch.mmu->shadow_root_level; + *root_level = vcpu->arch.private->mmu->shadow_root_level; tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { leaf = iter.level; @@ -1515,7 +1515,7 @@ u64 *kvm_tdp_mmu_fast_pf_get_last_sptep(struct kvm_vcpu *vcpu, u64 addr, u64 *spte) { struct tdp_iter iter; - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; gfn_t gfn = addr >> PAGE_SHIFT; tdp_ptep_t sptep = NULL; diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index f8b7bc04b3e7..c90ef5bf26cf 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -97,7 +97,7 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu) WARN_ON(mmu_is_nested(vcpu)); - vcpu->arch.mmu = &vcpu->arch.guest_mmu; + vcpu->arch.private->mmu = &vcpu->arch.private->guest_mmu; /* * The NPT format depends on L1's CR4 and EFER, which is in vmcb01. Note, @@ -107,16 +107,16 @@ static void nested_svm_init_mmu_context(struct kvm_vcpu *vcpu) kvm_init_shadow_npt_mmu(vcpu, X86_CR0_PG, svm->vmcb01.ptr->save.cr4, svm->vmcb01.ptr->save.efer, svm->nested.ctl.nested_cr3); - vcpu->arch.mmu->get_guest_pgd = nested_svm_get_tdp_cr3; - vcpu->arch.mmu->get_pdptr = nested_svm_get_tdp_pdptr; - vcpu->arch.mmu->inject_page_fault = nested_svm_inject_npf_exit; - vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu; + vcpu->arch.private->mmu->get_guest_pgd = nested_svm_get_tdp_cr3; + vcpu->arch.private->mmu->get_pdptr = nested_svm_get_tdp_pdptr; + vcpu->arch.private->mmu->inject_page_fault = nested_svm_inject_npf_exit; + vcpu->arch.private->walk_mmu = &vcpu->arch.private->nested_mmu; } static void nested_svm_uninit_mmu_context(struct kvm_vcpu *vcpu) { - vcpu->arch.mmu = &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; + vcpu->arch.private->mmu = &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu = &vcpu->arch.private->root_mmu; } void recalc_intercepts(struct vcpu_svm *svm) @@ -437,13 +437,13 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, return -EINVAL; if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))) + CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, cr3))) return -EINVAL; if (!nested_npt) kvm_mmu_new_pgd(vcpu, cr3); - vcpu->arch.cr3 = cr3; + vcpu->arch.private->cr3 = cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ @@ -500,7 +500,7 @@ static void nested_vmcb02_prepare_save(struct vcpu_svm *svm, struct vmcb *vmcb12 svm_set_cr0(&svm->vcpu, vmcb12->save.cr0); svm_set_cr4(&svm->vcpu, vmcb12->save.cr4); - svm->vcpu.arch.cr2 = vmcb12->save.cr2; + svm->vcpu.arch.private->cr2 = vmcb12->save.cr2; kvm_rax_write(&svm->vcpu, vmcb12->save.rax); kvm_rsp_write(&svm->vcpu, vmcb12->save.rsp); @@ -634,7 +634,7 @@ int enter_svm_guest_mode(struct kvm_vcpu *vcpu, u64 vmcb12_gpa, return ret; if (!npt_enabled) - vcpu->arch.mmu->inject_page_fault = svm_inject_page_fault_nested; + vcpu->arch.private->mmu->inject_page_fault = svm_inject_page_fault_nested; if (!from_vmrun) kvm_make_request(KVM_REQ_GET_NESTED_STATE_PAGES, vcpu); @@ -695,7 +695,7 @@ int nested_svm_vmrun(struct kvm_vcpu *vcpu) */ svm->vmcb01.ptr->save.efer = vcpu->arch.efer; svm->vmcb01.ptr->save.cr0 = kvm_read_cr0(vcpu); - svm->vmcb01.ptr->save.cr4 = vcpu->arch.cr4; + svm->vmcb01.ptr->save.cr4 = vcpu->arch.private->cr4; svm->vmcb01.ptr->save.rflags = kvm_get_rflags(vcpu); svm->vmcb01.ptr->save.rip = kvm_rip_read(vcpu); @@ -805,7 +805,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm) vmcb12->save.cr0 = kvm_read_cr0(vcpu); vmcb12->save.cr3 = kvm_read_cr3(vcpu); vmcb12->save.cr2 = vmcb->save.cr2; - vmcb12->save.cr4 = svm->vcpu.arch.cr4; + vmcb12->save.cr4 = svm->vcpu.arch.private->cr4; vmcb12->save.rflags = kvm_get_rflags(vcpu); vmcb12->save.rip = kvm_rip_read(vcpu); vmcb12->save.rsp = kvm_rsp_read(vcpu); @@ -991,7 +991,7 @@ static int nested_svm_exit_handled_msr(struct vcpu_svm *svm) if (!(vmcb_is_intercept(&svm->nested.ctl, INTERCEPT_MSR_PROT))) return NESTED_EXIT_HOST; - msr = svm->vcpu.arch.regs[VCPU_REGS_RCX]; + msr = svm->vcpu.arch.private->regs[VCPU_REGS_RCX]; offset = svm_msrpm_offset(msr); write = svm->vmcb->control.exit_info_1 & 1; mask = 1 << ((2 * (msr & 0xf)) + write); @@ -1131,7 +1131,7 @@ static void nested_svm_inject_exception_vmexit(struct vcpu_svm *svm) else if (svm->vcpu.arch.exception.has_payload) svm->vmcb->control.exit_info_2 = svm->vcpu.arch.exception.payload; else - svm->vmcb->control.exit_info_2 = svm->vcpu.arch.cr2; + svm->vmcb->control.exit_info_2 = svm->vcpu.arch.private->cr2; } else if (nr == DB_VECTOR) { /* See inject_pending_event. */ kvm_deliver_exception_payload(&svm->vcpu); @@ -1396,7 +1396,7 @@ static int svm_set_nested_state(struct kvm_vcpu *vcpu, * Set it again to fix this. */ - ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.cr3, + ret = nested_svm_load_cr3(&svm->vcpu, vcpu->arch.private->cr3, nested_npt_enabled(svm), false); if (WARN_ON_ONCE(ret)) goto out_free; @@ -1449,7 +1449,7 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) * the guest CR3 might be restored prior to setting the nested * state which can lead to a load of wrong PDPTRs. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3))) + if (CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.private->cr3))) return false; if (!nested_svm_vmrun_msrpm(svm)) { diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index be2883141220..9c62566ddde8 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -565,28 +565,28 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm) return -EINVAL; /* Sync registgers */ - save->rax = svm->vcpu.arch.regs[VCPU_REGS_RAX]; - save->rbx = svm->vcpu.arch.regs[VCPU_REGS_RBX]; - save->rcx = svm->vcpu.arch.regs[VCPU_REGS_RCX]; - save->rdx = svm->vcpu.arch.regs[VCPU_REGS_RDX]; - save->rsp = svm->vcpu.arch.regs[VCPU_REGS_RSP]; - save->rbp = svm->vcpu.arch.regs[VCPU_REGS_RBP]; - save->rsi = svm->vcpu.arch.regs[VCPU_REGS_RSI]; - save->rdi = svm->vcpu.arch.regs[VCPU_REGS_RDI]; + save->rax = svm->vcpu.arch.private->regs[VCPU_REGS_RAX]; + save->rbx = svm->vcpu.arch.private->regs[VCPU_REGS_RBX]; + save->rcx = svm->vcpu.arch.private->regs[VCPU_REGS_RCX]; + save->rdx = svm->vcpu.arch.private->regs[VCPU_REGS_RDX]; + save->rsp = svm->vcpu.arch.private->regs[VCPU_REGS_RSP]; + save->rbp = svm->vcpu.arch.private->regs[VCPU_REGS_RBP]; + save->rsi = svm->vcpu.arch.private->regs[VCPU_REGS_RSI]; + save->rdi = svm->vcpu.arch.private->regs[VCPU_REGS_RDI]; #ifdef CONFIG_X86_64 - save->r8 = svm->vcpu.arch.regs[VCPU_REGS_R8]; - save->r9 = svm->vcpu.arch.regs[VCPU_REGS_R9]; - save->r10 = svm->vcpu.arch.regs[VCPU_REGS_R10]; - save->r11 = svm->vcpu.arch.regs[VCPU_REGS_R11]; - save->r12 = svm->vcpu.arch.regs[VCPU_REGS_R12]; - save->r13 = svm->vcpu.arch.regs[VCPU_REGS_R13]; - save->r14 = svm->vcpu.arch.regs[VCPU_REGS_R14]; - save->r15 = svm->vcpu.arch.regs[VCPU_REGS_R15]; + save->r8 = svm->vcpu.arch.private->regs[VCPU_REGS_R8]; + save->r9 = svm->vcpu.arch.private->regs[VCPU_REGS_R9]; + save->r10 = svm->vcpu.arch.private->regs[VCPU_REGS_R10]; + save->r11 = svm->vcpu.arch.private->regs[VCPU_REGS_R11]; + save->r12 = svm->vcpu.arch.private->regs[VCPU_REGS_R12]; + save->r13 = svm->vcpu.arch.private->regs[VCPU_REGS_R13]; + save->r14 = svm->vcpu.arch.private->regs[VCPU_REGS_R14]; + save->r15 = svm->vcpu.arch.private->regs[VCPU_REGS_R15]; #endif - save->rip = svm->vcpu.arch.regs[VCPU_REGS_RIP]; + save->rip = svm->vcpu.arch.private->regs[VCPU_REGS_RIP]; /* Sync some non-GPR registers before encrypting */ - save->xcr0 = svm->vcpu.arch.xcr0; + save->xcr0 = svm->vcpu.arch.private->xcr0; save->pkru = svm->vcpu.arch.pkru; save->xss = svm->vcpu.arch.ia32_xss; save->dr6 = svm->vcpu.arch.dr6; @@ -2301,10 +2301,10 @@ static void sev_es_sync_to_ghcb(struct vcpu_svm *svm) * Copy their values, even if they may not have been written during the * VM-Exit. It's the guest's responsibility to not consume random data. */ - ghcb_set_rax(ghcb, vcpu->arch.regs[VCPU_REGS_RAX]); - ghcb_set_rbx(ghcb, vcpu->arch.regs[VCPU_REGS_RBX]); - ghcb_set_rcx(ghcb, vcpu->arch.regs[VCPU_REGS_RCX]); - ghcb_set_rdx(ghcb, vcpu->arch.regs[VCPU_REGS_RDX]); + ghcb_set_rax(ghcb, vcpu->arch.private->regs[VCPU_REGS_RAX]); + ghcb_set_rbx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RBX]); + ghcb_set_rcx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RCX]); + ghcb_set_rdx(ghcb, vcpu->arch.private->regs[VCPU_REGS_RDX]); } static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) @@ -2326,18 +2326,18 @@ static void sev_es_sync_from_ghcb(struct vcpu_svm *svm) * * Copy their values to the appropriate location if supplied. */ - memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); + memset(vcpu->arch.private->regs, 0, sizeof(vcpu->arch.private->regs)); - vcpu->arch.regs[VCPU_REGS_RAX] = ghcb_get_rax_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RBX] = ghcb_get_rbx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RCX] = ghcb_get_rcx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb); - vcpu->arch.regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RAX] = ghcb_get_rax_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RBX] = ghcb_get_rbx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RCX] = ghcb_get_rcx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RDX] = ghcb_get_rdx_if_valid(ghcb); + vcpu->arch.private->regs[VCPU_REGS_RSI] = ghcb_get_rsi_if_valid(ghcb); svm->vmcb->save.cpl = ghcb_get_cpl_if_valid(ghcb); if (ghcb_xcr0_is_valid(ghcb)) { - vcpu->arch.xcr0 = ghcb_get_xcr0(ghcb); + vcpu->arch.private->xcr0 = ghcb_get_xcr0(ghcb); kvm_update_cpuid_runtime(vcpu); } @@ -2667,8 +2667,8 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm) GHCB_MSR_CPUID_FUNC_POS); /* Initialize the registers needed by the CPUID intercept */ - vcpu->arch.regs[VCPU_REGS_RAX] = cpuid_fn; - vcpu->arch.regs[VCPU_REGS_RCX] = 0; + vcpu->arch.private->regs[VCPU_REGS_RAX] = cpuid_fn; + vcpu->arch.private->regs[VCPU_REGS_RCX] = 0; ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_CPUID); if (!ret) { @@ -2680,13 +2680,13 @@ static int sev_handle_vmgexit_msr_protocol(struct vcpu_svm *svm) GHCB_MSR_CPUID_REG_MASK, GHCB_MSR_CPUID_REG_POS); if (cpuid_reg == 0) - cpuid_value = vcpu->arch.regs[VCPU_REGS_RAX]; + cpuid_value = vcpu->arch.private->regs[VCPU_REGS_RAX]; else if (cpuid_reg == 1) - cpuid_value = vcpu->arch.regs[VCPU_REGS_RBX]; + cpuid_value = vcpu->arch.private->regs[VCPU_REGS_RBX]; else if (cpuid_reg == 2) - cpuid_value = vcpu->arch.regs[VCPU_REGS_RCX]; + cpuid_value = vcpu->arch.private->regs[VCPU_REGS_RCX]; else - cpuid_value = vcpu->arch.regs[VCPU_REGS_RDX]; + cpuid_value = vcpu->arch.private->regs[VCPU_REGS_RDX]; set_ghcb_msr_bits(svm, cpuid_value, GHCB_MSR_CPUID_VALUE_MASK, diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5151efa424ac..516af87e7ab1 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -1425,10 +1425,10 @@ static int svm_create_vcpu(struct kvm_vcpu *vcpu) /* * SEV-ES guests maintain an encrypted version of their FPU * state which is restored and saved on VMRUN and VMEXIT. - * Mark vcpu->arch.guest_fpu->fpstate as scratch so it won't + * Mark vcpu->arch.private->guest_fpu->fpstate as scratch so it won't * do xsave/xrstor on it. */ - fpstate_set_confidential(&vcpu->arch.guest_fpu); + fpstate_set_confidential(&vcpu->arch.private->guest_fpu); } err = avic_init_vcpu(svm); @@ -1599,7 +1599,7 @@ static void svm_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) switch (reg) { case VCPU_EXREG_PDPTR: BUG_ON(!npt_enabled); - load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)); + load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu)); break; default: KVM_BUG_ON(1, vcpu->kvm); @@ -1804,7 +1804,7 @@ void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) } } #endif - vcpu->arch.cr0 = cr0; + vcpu->arch.private->cr0 = cr0; if (!npt_enabled) hcr0 |= X86_CR0_PG | X86_CR0_WP; @@ -1845,12 +1845,12 @@ static bool svm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) void svm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { unsigned long host_cr4_mce = cr4_read_shadow() & X86_CR4_MCE; - unsigned long old_cr4 = vcpu->arch.cr4; + unsigned long old_cr4 = vcpu->arch.private->cr4; if (npt_enabled && ((old_cr4 ^ cr4) & X86_CR4_PGE)) svm_flush_tlb(vcpu); - vcpu->arch.cr4 = cr4; + vcpu->arch.private->cr4 = cr4; if (!npt_enabled) cr4 |= X86_CR4_PAE; cr4 |= host_cr4_mce; @@ -2239,7 +2239,7 @@ enum { /* Return NONE_SVM_INSTR if not SVM instrs, otherwise return decode result */ static int svm_instr_opcode(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; if (ctxt->b != 0x1 || ctxt->opcode_len != 2) return NONE_SVM_INSTR; @@ -2513,7 +2513,7 @@ static bool check_selective_cr0_intercepted(struct kvm_vcpu *vcpu, unsigned long val) { struct vcpu_svm *svm = to_svm(vcpu); - unsigned long cr0 = vcpu->arch.cr0; + unsigned long cr0 = vcpu->arch.private->cr0; bool ret = false; if (!is_guest_mode(vcpu) || @@ -2585,7 +2585,7 @@ static int cr_interception(struct kvm_vcpu *vcpu) val = kvm_read_cr0(vcpu); break; case 2: - val = vcpu->arch.cr2; + val = vcpu->arch.private->cr2; break; case 3: val = kvm_read_cr3(vcpu); @@ -3396,9 +3396,9 @@ static int handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) /* SEV-ES guests must use the CR write traps to track CR registers. */ if (!sev_es_guest(vcpu->kvm)) { if (!svm_is_intercept(svm, INTERCEPT_CR0_WRITE)) - vcpu->arch.cr0 = svm->vmcb->save.cr0; + vcpu->arch.private->cr0 = svm->vmcb->save.cr0; if (npt_enabled) - vcpu->arch.cr3 = svm->vmcb->save.cr3; + vcpu->arch.private->cr3 = svm->vmcb->save.cr3; } if (is_guest_mode(vcpu)) { @@ -3828,7 +3828,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu) * vmcb02 when switching vmcbs for nested virtualization. */ vmload(svm->vmcb01.pa); - __svm_vcpu_run(vmcb_pa, (unsigned long *)&vcpu->arch.regs); + __svm_vcpu_run(vmcb_pa, (unsigned long *)&vcpu->arch.private->regs); vmsave(svm->vmcb01.pa); vmload(__sme_page_pa(sd->save_area)); @@ -3843,9 +3843,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) trace_kvm_entry(vcpu); - svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX]; - svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP]; - svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP]; + svm->vmcb->save.rax = vcpu->arch.private->regs[VCPU_REGS_RAX]; + svm->vmcb->save.rsp = vcpu->arch.private->regs[VCPU_REGS_RSP]; + svm->vmcb->save.rip = vcpu->arch.private->regs[VCPU_REGS_RIP]; /* * Disable singlestep if we're injecting an interrupt/exception. @@ -3871,7 +3871,7 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) svm->vmcb->control.asid = svm->asid; vmcb_mark_dirty(svm->vmcb, VMCB_ASID); } - svm->vmcb->save.cr2 = vcpu->arch.cr2; + svm->vmcb->save.cr2 = vcpu->arch.private->cr2; svm_hv_update_vp_id(svm->vmcb, vcpu); @@ -3926,10 +3926,10 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu) x86_spec_ctrl_restore_host(svm->spec_ctrl, svm->virt_spec_ctrl); if (!sev_es_guest(vcpu->kvm)) { - vcpu->arch.cr2 = svm->vmcb->save.cr2; - vcpu->arch.regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; - vcpu->arch.regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; - vcpu->arch.regs[VCPU_REGS_RIP] = svm->vmcb->save.rip; + vcpu->arch.private->cr2 = svm->vmcb->save.cr2; + vcpu->arch.private->regs[VCPU_REGS_RAX] = svm->vmcb->save.rax; + vcpu->arch.private->regs[VCPU_REGS_RSP] = svm->vmcb->save.rsp; + vcpu->arch.private->regs[VCPU_REGS_RIP] = svm->vmcb->save.rip; } if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI)) @@ -3999,8 +3999,8 @@ static void svm_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, /* Loading L2's CR3 is handled by enter_svm_guest_mode. */ if (!test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail)) return; - cr3 = vcpu->arch.cr3; - } else if (vcpu->arch.mmu->shadow_root_level >= PT64_ROOT_4LEVEL) { + cr3 = vcpu->arch.private->cr3; + } else if (vcpu->arch.private->mmu->shadow_root_level >= PT64_ROOT_4LEVEL) { cr3 = __sme_set(root_hpa) | kvm_get_active_pcid(vcpu); } else { /* PCID in the guest should be impossible with a 32-bit MMU. */ @@ -4221,7 +4221,7 @@ static int svm_check_intercept(struct kvm_vcpu *vcpu, INTERCEPT_SELECTIVE_CR0))) break; - cr0 = vcpu->arch.cr0 & ~SVM_CR0_SELECTIVE_MASK; + cr0 = vcpu->arch.private->cr0 & ~SVM_CR0_SELECTIVE_MASK; val = info->src_val & ~SVM_CR0_SELECTIVE_MASK; if (info->intercept == x86_intercept_lmsw) { @@ -4358,9 +4358,9 @@ static int svm_enter_smm(struct kvm_vcpu *vcpu, char *smstate) /* FEE0h - SVM Guest VMCB Physical Address */ put_smstate(u64, smstate, 0x7ee0, svm->nested.vmcb12_gpa); - svm->vmcb->save.rax = vcpu->arch.regs[VCPU_REGS_RAX]; - svm->vmcb->save.rsp = vcpu->arch.regs[VCPU_REGS_RSP]; - svm->vmcb->save.rip = vcpu->arch.regs[VCPU_REGS_RIP]; + svm->vmcb->save.rax = vcpu->arch.private->regs[VCPU_REGS_RAX]; + svm->vmcb->save.rsp = vcpu->arch.private->regs[VCPU_REGS_RSP]; + svm->vmcb->save.rip = vcpu->arch.private->regs[VCPU_REGS_RIP]; ret = nested_svm_vmexit(svm); if (ret) diff --git a/arch/x86/kvm/trace.h b/arch/x86/kvm/trace.h index 953b0fcb21ee..2dc906dc9c13 100644 --- a/arch/x86/kvm/trace.h +++ b/arch/x86/kvm/trace.h @@ -791,13 +791,13 @@ TRACE_EVENT(kvm_emulate_insn, TP_fast_assign( __entry->csbase = static_call(kvm_x86_get_segment_base)(vcpu, VCPU_SREG_CS); - __entry->len = vcpu->arch.emulate_ctxt->fetch.ptr - - vcpu->arch.emulate_ctxt->fetch.data; - __entry->rip = vcpu->arch.emulate_ctxt->_eip - __entry->len; + __entry->len = vcpu->arch.private->emulate_ctxt->fetch.ptr + - vcpu->arch.private->emulate_ctxt->fetch.data; + __entry->rip = vcpu->arch.private->emulate_ctxt->_eip - __entry->len; memcpy(__entry->insn, - vcpu->arch.emulate_ctxt->fetch.data, + vcpu->arch.private->emulate_ctxt->fetch.data, 15); - __entry->flags = kei_decode_mode(vcpu->arch.emulate_ctxt->mode); + __entry->flags = kei_decode_mode(vcpu->arch.private->emulate_ctxt->mode); __entry->failed = failed; ), diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 0a0092e4102d..34b7621adf99 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -313,7 +313,7 @@ static void free_nested(struct kvm_vcpu *vcpu) kvm_vcpu_unmap(vcpu, &vmx->nested.pi_desc_map, true); vmx->nested.pi_desc = NULL; - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_ALL); nested_release_evmcs(vcpu); @@ -356,11 +356,11 @@ static void nested_ept_invalidate_addr(struct kvm_vcpu *vcpu, gpa_t eptp, WARN_ON_ONCE(!mmu_is_nested(vcpu)); for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) { - cached_root = &vcpu->arch.mmu->prev_roots[i]; + cached_root = &vcpu->arch.private->mmu->prev_roots[i]; if (nested_ept_root_matches(cached_root->hpa, cached_root->pgd, eptp)) - vcpu->arch.mmu->invlpg(vcpu, addr, cached_root->hpa); + vcpu->arch.private->mmu->invlpg(vcpu, addr, cached_root->hpa); } } @@ -410,19 +410,19 @@ static void nested_ept_init_mmu_context(struct kvm_vcpu *vcpu) { WARN_ON(mmu_is_nested(vcpu)); - vcpu->arch.mmu = &vcpu->arch.guest_mmu; + vcpu->arch.private->mmu = &vcpu->arch.private->guest_mmu; nested_ept_new_eptp(vcpu); - vcpu->arch.mmu->get_guest_pgd = nested_ept_get_eptp; - vcpu->arch.mmu->inject_page_fault = nested_ept_inject_page_fault; - vcpu->arch.mmu->get_pdptr = kvm_pdptr_read; + vcpu->arch.private->mmu->get_guest_pgd = nested_ept_get_eptp; + vcpu->arch.private->mmu->inject_page_fault = nested_ept_inject_page_fault; + vcpu->arch.private->mmu->get_pdptr = kvm_pdptr_read; - vcpu->arch.walk_mmu = &vcpu->arch.nested_mmu; + vcpu->arch.private->walk_mmu = &vcpu->arch.private->nested_mmu; } static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu) { - vcpu->arch.mmu = &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; + vcpu->arch.private->mmu = &vcpu->arch.private->root_mmu; + vcpu->arch.private->walk_mmu = &vcpu->arch.private->root_mmu; } static bool nested_vmx_is_page_fault_vmexit(struct vmcs12 *vmcs12, @@ -456,7 +456,7 @@ static int nested_vmx_check_exception(struct kvm_vcpu *vcpu, unsigned long *exit } if (nested_vmx_is_page_fault_vmexit(vmcs12, vcpu->arch.exception.error_code)) { - *exit_qual = has_payload ? payload : vcpu->arch.cr2; + *exit_qual = has_payload ? payload : vcpu->arch.private->cr2; return 1; } } else if (vmcs12->exception_bitmap & (1u << nr)) { @@ -1103,7 +1103,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, * must not be dereferenced. */ if (reload_pdptrs && !nested_ept && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))) { + CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, cr3))) { *entry_failure_code = ENTRY_FAIL_PDPTE; return -EINVAL; } @@ -1111,7 +1111,7 @@ static int nested_vmx_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, if (!nested_ept) kvm_mmu_new_pgd(vcpu, cr3); - vcpu->arch.cr3 = cr3; + vcpu->arch.private->cr3 = cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); /* Re-initialize the MMU, e.g. to pick up CR4 MMU role changes. */ @@ -2508,8 +2508,8 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, * trap. Note that CR0.TS also needs updating - we do this later. */ vmx_update_exception_bitmap(vcpu); - vcpu->arch.cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask; - vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.cr0_guest_owned_bits); + vcpu->arch.private->cr0_guest_owned_bits &= ~vmcs12->cr0_guest_host_mask; + vmcs_writel(CR0_GUEST_HOST_MASK, ~vcpu->arch.private->cr0_guest_owned_bits); if (vmx->nested.nested_run_pending && (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PAT)) { @@ -2595,7 +2595,7 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, } if (!enable_ept) - vcpu->arch.walk_mmu->inject_page_fault = vmx_inject_page_fault_nested; + vcpu->arch.private->walk_mmu->inject_page_fault = vmx_inject_page_fault_nested; if ((vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL) && WARN_ON_ONCE(kvm_set_msr(vcpu, MSR_CORE_PERF_GLOBAL_CTRL, @@ -3070,7 +3070,7 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu) vmx->loaded_vmcs->host_state.cr4 = cr4; } - vm_fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs, + vm_fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.private->regs, vmx->loaded_vmcs->launched); if (vmx->msr_autoload.host.nr) @@ -3153,7 +3153,7 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) * the guest CR3 might be restored prior to setting the nested * state which can lead to a load of wrong PDPTRs. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.walk_mmu, vcpu->arch.cr3))) + if (CC(!load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.private->cr3))) return false; } @@ -3370,18 +3370,18 @@ enum nvmx_vmentry_status nested_vmx_enter_non_root_mode(struct kvm_vcpu *vcpu, * i.e. a VM-Fail detected by hardware but not KVM, KVM must unwind its * software model to the pre-VMEntry host state. When EPT is disabled, * GUEST_CR3 holds KVM's shadow CR3, not L1's "real" CR3, which causes - * nested_vmx_restore_host_state() to corrupt vcpu->arch.cr3. Stuffing - * vmcs01.GUEST_CR3 results in the unwind naturally setting arch.cr3 to + * nested_vmx_restore_host_state() to corrupt vcpu->arch.private->cr3. Stuffing + * vmcs01.GUEST_CR3 results in the unwind naturally setting arch.private->cr3 to * the correct value. Smashing vmcs01.GUEST_CR3 is safe because nested * VM-Exits, and the unwind, reset KVM's MMU, i.e. vmcs01.GUEST_CR3 is * guaranteed to be overwritten with a shadow CR3 prior to re-entering * L1. Don't stuff vmcs01.GUEST_CR3 when using nested early checks as - * KVM modifies vcpu->arch.cr3 if and only if the early hardware checks + * KVM modifies vcpu->arch.private->cr3 if and only if the early hardware checks * pass, and early VM-Fails do not reset KVM's MMU, i.e. the VM-Fail * path would need to manually save/restore vmcs01.GUEST_CR3. */ if (!enable_ept && !nested_early_check) - vmcs_writel(GUEST_CR3, vcpu->arch.cr3); + vmcs_writel(GUEST_CR3, vcpu->arch.private->cr3); vmx_switch_vmcs(vcpu, &vmx->nested.vmcs02); @@ -3655,20 +3655,20 @@ static inline unsigned long vmcs12_guest_cr0(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { return - /*1*/ (vmcs_readl(GUEST_CR0) & vcpu->arch.cr0_guest_owned_bits) | + /*1*/ (vmcs_readl(GUEST_CR0) & vcpu->arch.private->cr0_guest_owned_bits) | /*2*/ (vmcs12->guest_cr0 & vmcs12->cr0_guest_host_mask) | /*3*/ (vmcs_readl(CR0_READ_SHADOW) & ~(vmcs12->cr0_guest_host_mask | - vcpu->arch.cr0_guest_owned_bits)); + vcpu->arch.private->cr0_guest_owned_bits)); } static inline unsigned long vmcs12_guest_cr4(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) { return - /*1*/ (vmcs_readl(GUEST_CR4) & vcpu->arch.cr4_guest_owned_bits) | + /*1*/ (vmcs_readl(GUEST_CR4) & vcpu->arch.private->cr4_guest_owned_bits) | /*2*/ (vmcs12->guest_cr4 & vmcs12->cr4_guest_host_mask) | /*3*/ (vmcs_readl(CR4_READ_SHADOW) & ~(vmcs12->cr4_guest_host_mask | - vcpu->arch.cr4_guest_owned_bits)); + vcpu->arch.private->cr4_guest_owned_bits)); } static void vmcs12_save_pending_event(struct kvm_vcpu *vcpu, @@ -4255,11 +4255,11 @@ static void load_vmcs12_host_state(struct kvm_vcpu *vcpu, * CR0_GUEST_HOST_MASK is already set in the original vmcs01 * (KVM doesn't change it); */ - vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; + vcpu->arch.private->cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; vmx_set_cr0(vcpu, vmcs12->host_cr0); /* Same as above - no reason to call set_cr4_guest_host_mask(). */ - vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); + vcpu->arch.private->cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); vmx_set_cr4(vcpu, vmcs12->host_cr4); nested_ept_uninit_mmu_context(vcpu); @@ -4405,14 +4405,14 @@ static void nested_vmx_restore_host_state(struct kvm_vcpu *vcpu) */ vmx_set_efer(vcpu, nested_vmx_get_vmcs01_guest_efer(vmx)); - vcpu->arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; + vcpu->arch.private->cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; vmx_set_cr0(vcpu, vmcs_readl(CR0_READ_SHADOW)); - vcpu->arch.cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); + vcpu->arch.private->cr4_guest_owned_bits = ~vmcs_readl(CR4_GUEST_HOST_MASK); vmx_set_cr4(vcpu, vmcs_readl(CR4_READ_SHADOW)); nested_ept_uninit_mmu_context(vcpu); - vcpu->arch.cr3 = vmcs_readl(GUEST_CR3); + vcpu->arch.private->cr3 = vmcs_readl(GUEST_CR3); kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); /* @@ -5000,7 +5000,7 @@ static inline void nested_release_vmcs12(struct kvm_vcpu *vcpu) vmx->nested.current_vmptr >> PAGE_SHIFT, vmx->nested.cached_vmcs12, 0, VMCS12_SIZE); - kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); + kvm_mmu_free_roots(vcpu, &vcpu->arch.private->guest_mmu, KVM_MMU_ROOTS_ALL); vmx->nested.current_vmptr = INVALID_GPA; } @@ -5427,7 +5427,7 @@ static int handle_invept(struct kvm_vcpu *vcpu) * Nested EPT roots are always held through guest_mmu, * not root_mmu. */ - mmu = &vcpu->arch.guest_mmu; + mmu = &vcpu->arch.private->guest_mmu; switch (type) { case VMX_EPT_EXTENT_CONTEXT: @@ -5545,7 +5545,7 @@ static int handle_invvpid(struct kvm_vcpu *vcpu) * TODO: sync only the affected SPTEs for INVDIVIDUAL_ADDR. */ if (!enable_ept) - kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.root_mmu); + kvm_mmu_free_guest_mode_roots(vcpu, &vcpu->arch.private->root_mmu); return nested_vmx_succeed(vcpu); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6e1bb017b696..beba656116d7 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2242,20 +2242,20 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) switch (reg) { case VCPU_REGS_RSP: - vcpu->arch.regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP); + vcpu->arch.private->regs[VCPU_REGS_RSP] = vmcs_readl(GUEST_RSP); break; case VCPU_REGS_RIP: - vcpu->arch.regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP); + vcpu->arch.private->regs[VCPU_REGS_RIP] = vmcs_readl(GUEST_RIP); break; case VCPU_EXREG_PDPTR: if (enable_ept) ept_save_pdptrs(vcpu); break; case VCPU_EXREG_CR0: - guest_owned_bits = vcpu->arch.cr0_guest_owned_bits; + guest_owned_bits = vcpu->arch.private->cr0_guest_owned_bits; - vcpu->arch.cr0 &= ~guest_owned_bits; - vcpu->arch.cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits; + vcpu->arch.private->cr0 &= ~guest_owned_bits; + vcpu->arch.private->cr0 |= vmcs_readl(GUEST_CR0) & guest_owned_bits; break; case VCPU_EXREG_CR3: /* @@ -2263,13 +2263,13 @@ static void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) * CR3 is loaded into hardware, not the guest's CR3. */ if (!(exec_controls_get(to_vmx(vcpu)) & CPU_BASED_CR3_LOAD_EXITING)) - vcpu->arch.cr3 = vmcs_readl(GUEST_CR3); + vcpu->arch.private->cr3 = vmcs_readl(GUEST_CR3); break; case VCPU_EXREG_CR4: - guest_owned_bits = vcpu->arch.cr4_guest_owned_bits; + guest_owned_bits = vcpu->arch.private->cr4_guest_owned_bits; - vcpu->arch.cr4 &= ~guest_owned_bits; - vcpu->arch.cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits; + vcpu->arch.private->cr4 &= ~guest_owned_bits; + vcpu->arch.private->cr4 |= vmcs_readl(GUEST_CR4) & guest_owned_bits; break; default: KVM_BUG_ON(1, vcpu->kvm); @@ -2926,7 +2926,7 @@ static inline int vmx_get_current_vpid(struct kvm_vcpu *vcpu) static void vmx_flush_tlb_current(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; u64 root_hpa = mmu->root_hpa; /* No flush required if the current context is invalid. */ @@ -2963,7 +2963,7 @@ static void vmx_flush_tlb_guest(struct kvm_vcpu *vcpu) void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.walk_mmu; + struct kvm_mmu *mmu = vcpu->arch.private->walk_mmu; if (!kvm_register_is_dirty(vcpu, VCPU_EXREG_PDPTR)) return; @@ -2978,7 +2978,7 @@ void vmx_ept_load_pdptrs(struct kvm_vcpu *vcpu) void ept_save_pdptrs(struct kvm_vcpu *vcpu) { - struct kvm_mmu *mmu = vcpu->arch.walk_mmu; + struct kvm_mmu *mmu = vcpu->arch.private->walk_mmu; if (WARN_ON_ONCE(!is_pae_paging(vcpu))) return; @@ -3019,7 +3019,7 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) vmcs_writel(CR0_READ_SHADOW, cr0); vmcs_writel(GUEST_CR0, hw_cr0); - vcpu->arch.cr0 = cr0; + vcpu->arch.private->cr0 = cr0; kvm_register_mark_available(vcpu, VCPU_EXREG_CR0); #ifdef CONFIG_X86_64 @@ -3067,12 +3067,12 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) exec_controls_set(vmx, tmp); } - /* Note, vmx_set_cr4() consumes the new vcpu->arch.cr0. */ + /* Note, vmx_set_cr4() consumes the new vcpu->arch.private->cr0. */ if ((old_cr0_pg ^ cr0) & X86_CR0_PG) vmx_set_cr4(vcpu, kvm_read_cr4(vcpu)); } - /* depends on vcpu->arch.cr0 to be set to a new value */ + /* depends on vcpu->arch.private->cr0 to be set to a new value */ vmx->emulation_required = vmx_emulation_required(vcpu); } @@ -3114,7 +3114,7 @@ static void vmx_load_mmu_pgd(struct kvm_vcpu *vcpu, hpa_t root_hpa, if (!enable_unrestricted_guest && !is_paging(vcpu)) guest_cr3 = to_kvm_vmx(kvm)->ept_identity_map_addr; else if (test_bit(VCPU_EXREG_CR3, (ulong *)&vcpu->arch.regs_avail)) - guest_cr3 = vcpu->arch.cr3; + guest_cr3 = vcpu->arch.private->cr3; else /* vmcs01.GUEST_CR3 is already up-to-date. */ update_guest_cr3 = false; vmx_ept_load_pdptrs(vcpu); @@ -3144,7 +3144,7 @@ static bool vmx_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) { - unsigned long old_cr4 = vcpu->arch.cr4; + unsigned long old_cr4 = vcpu->arch.private->cr4; struct vcpu_vmx *vmx = to_vmx(vcpu); /* * Pass through host's Machine Check Enable value to hw_cr4, which @@ -3171,7 +3171,7 @@ void vmx_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) } } - vcpu->arch.cr4 = cr4; + vcpu->arch.private->cr4 = cr4; kvm_register_mark_available(vcpu, VCPU_EXREG_CR4); if (!is_unrestricted_guest(vcpu)) { @@ -4040,14 +4040,14 @@ void set_cr4_guest_host_mask(struct vcpu_vmx *vmx) { struct kvm_vcpu *vcpu = &vmx->vcpu; - vcpu->arch.cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS & - ~vcpu->arch.cr4_guest_rsvd_bits; + vcpu->arch.private->cr4_guest_owned_bits = KVM_POSSIBLE_CR4_GUEST_BITS & + ~vcpu->arch.private->cr4_guest_rsvd_bits; if (!enable_ept) - vcpu->arch.cr4_guest_owned_bits &= ~X86_CR4_PGE; + vcpu->arch.private->cr4_guest_owned_bits &= ~X86_CR4_PGE; if (is_guest_mode(&vmx->vcpu)) - vcpu->arch.cr4_guest_owned_bits &= + vcpu->arch.private->cr4_guest_owned_bits &= ~get_vmcs12(vcpu)->cr4_guest_host_mask; - vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.cr4_guest_owned_bits); + vmcs_writel(CR4_GUEST_HOST_MASK, ~vcpu->arch.private->cr4_guest_owned_bits); } static u32 vmx_pin_based_exec_ctrl(struct vcpu_vmx *vmx) @@ -4345,8 +4345,8 @@ static void init_vmcs(struct vcpu_vmx *vmx) /* 22.2.1, 20.8.1 */ vm_entry_controls_set(vmx, vmx_vmentry_ctrl()); - vmx->vcpu.arch.cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; - vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.cr0_guest_owned_bits); + vmx->vcpu.arch.private->cr0_guest_owned_bits = KVM_POSSIBLE_CR0_GUEST_BITS; + vmcs_writel(CR0_GUEST_HOST_MASK, ~vmx->vcpu.arch.private->cr0_guest_owned_bits); set_cr4_guest_host_mask(vmx); @@ -4956,7 +4956,7 @@ static int handle_set_cr4(struct kvm_vcpu *vcpu, unsigned long val) static int handle_desc(struct kvm_vcpu *vcpu) { - WARN_ON(!(vcpu->arch.cr4 & X86_CR4_UMIP)); + WARN_ON(!(vcpu->arch.private->cr4 & X86_CR4_UMIP)); return kvm_emulate_instruction(vcpu, 0); } @@ -6626,13 +6626,13 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vmx->loaded_vmcs->host_state.cr3 = cr3; } - if (vcpu->arch.cr2 != native_read_cr2()) - native_write_cr2(vcpu->arch.cr2); + if (vcpu->arch.private->cr2 != native_read_cr2()) + native_write_cr2(vcpu->arch.private->cr2); - vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.regs, + vmx->fail = __vmx_vcpu_run(vmx, (unsigned long *)&vcpu->arch.private->regs, vmx->loaded_vmcs->launched); - vcpu->arch.cr2 = native_read_cr2(); + vcpu->arch.private->cr2 = native_read_cr2(); VM_WARN_ON_ONCE(vcpu->kvm->asi && !is_asi_active()); asi_set_target_unrestricted(); @@ -6681,9 +6681,9 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) WARN_ON_ONCE(vmx->nested.need_vmcs12_to_shadow_sync); if (kvm_register_is_dirty(vcpu, VCPU_REGS_RSP)) - vmcs_writel(GUEST_RSP, vcpu->arch.regs[VCPU_REGS_RSP]); + vmcs_writel(GUEST_RSP, vcpu->arch.private->regs[VCPU_REGS_RSP]); if (kvm_register_is_dirty(vcpu, VCPU_REGS_RIP)) - vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); + vmcs_writel(GUEST_RIP, vcpu->arch.private->regs[VCPU_REGS_RIP]); cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) { diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dd862edc1b5a..680725089a18 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -595,7 +595,7 @@ void kvm_deliver_exception_payload(struct kvm_vcpu *vcpu) vcpu->arch.dr6 &= ~BIT(12); break; case PF_VECTOR: - vcpu->arch.cr2 = payload; + vcpu->arch.private->cr2 = payload; break; } @@ -736,8 +736,8 @@ bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu, struct kvm_mmu *fault_mmu; WARN_ON_ONCE(fault->vector != PF_VECTOR); - fault_mmu = fault->nested_page_fault ? vcpu->arch.mmu : - vcpu->arch.walk_mmu; + fault_mmu = fault->nested_page_fault ? vcpu->arch.private->mmu : + vcpu->arch.private->walk_mmu; /* * Invalidate the TLB entry for the faulting address, if it exists, @@ -892,7 +892,7 @@ int kvm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) #endif if (!(vcpu->arch.efer & EFER_LME) && (cr0 & X86_CR0_PG) && is_pae(vcpu) && ((cr0 ^ old_cr0) & pdptr_bits) && - !load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu))) + !load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu))) return 1; if (!(cr0 & X86_CR0_PG) && @@ -920,8 +920,8 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) { - if (vcpu->arch.xcr0 != host_xcr0) - xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0); + if (vcpu->arch.private->xcr0 != host_xcr0) + xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.private->xcr0); if (vcpu->arch.xsaves_enabled && vcpu->arch.ia32_xss != host_xss) @@ -930,7 +930,7 @@ void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu) if (static_cpu_has(X86_FEATURE_PKU) && (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU)) && + (vcpu->arch.private->xcr0 & XFEATURE_MASK_PKRU)) && vcpu->arch.pkru != vcpu->arch.host_pkru) write_pkru(vcpu->arch.pkru); } @@ -943,7 +943,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) if (static_cpu_has(X86_FEATURE_PKU) && (kvm_read_cr4_bits(vcpu, X86_CR4_PKE) || - (vcpu->arch.xcr0 & XFEATURE_MASK_PKRU))) { + (vcpu->arch.private->xcr0 & XFEATURE_MASK_PKRU))) { vcpu->arch.pkru = rdpkru(); if (vcpu->arch.pkru != vcpu->arch.host_pkru) write_pkru(vcpu->arch.host_pkru); @@ -951,7 +951,7 @@ void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu) if (kvm_read_cr4_bits(vcpu, X86_CR4_OSXSAVE)) { - if (vcpu->arch.xcr0 != host_xcr0) + if (vcpu->arch.private->xcr0 != host_xcr0) xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); if (vcpu->arch.xsaves_enabled && @@ -965,7 +965,7 @@ EXPORT_SYMBOL_GPL(kvm_load_host_xsave_state); static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) { u64 xcr0 = xcr; - u64 old_xcr0 = vcpu->arch.xcr0; + u64 old_xcr0 = vcpu->arch.private->xcr0; u64 valid_bits; /* Only support XCR_XFEATURE_ENABLED_MASK(xcr0) now */ @@ -981,7 +981,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) * saving. However, xcr0 bit 0 is always set, even if the * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). */ - valid_bits = vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; + valid_bits = vcpu->arch.private->guest_supported_xcr0 | XFEATURE_MASK_FP; if (xcr0 & ~valid_bits) return 1; @@ -995,7 +995,7 @@ static int __kvm_set_xcr(struct kvm_vcpu *vcpu, u32 index, u64 xcr) if ((xcr0 & XFEATURE_MASK_AVX512) != XFEATURE_MASK_AVX512) return 1; } - vcpu->arch.xcr0 = xcr0; + vcpu->arch.private->xcr0 = xcr0; if ((xcr0 ^ old_xcr0) & XFEATURE_MASK_EXTEND) kvm_update_cpuid_runtime(vcpu); @@ -1019,7 +1019,7 @@ bool kvm_is_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) if (cr4 & cr4_reserved_bits) return false; - if (cr4 & vcpu->arch.cr4_guest_rsvd_bits) + if (cr4 & vcpu->arch.private->cr4_guest_rsvd_bits) return false; return static_call(kvm_x86_is_valid_cr4)(vcpu, cr4); @@ -1069,7 +1069,7 @@ int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) return 1; } else if (is_paging(vcpu) && (cr4 & X86_CR4_PAE) && ((cr4 ^ old_cr4) & pdptr_bits) - && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, + && !load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu))) return 1; @@ -1092,7 +1092,7 @@ EXPORT_SYMBOL_GPL(kvm_set_cr4); static void kvm_invalidate_pcid(struct kvm_vcpu *vcpu, unsigned long pcid) { - struct kvm_mmu *mmu = vcpu->arch.mmu; + struct kvm_mmu *mmu = vcpu->arch.private->mmu; unsigned long roots_to_free = 0; int i; @@ -1159,13 +1159,13 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long cr3) if (kvm_vcpu_is_illegal_gpa(vcpu, cr3)) return 1; - if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3)) + if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, cr3)) return 1; if (cr3 != kvm_read_cr3(vcpu)) kvm_mmu_new_pgd(vcpu, cr3); - vcpu->arch.cr3 = cr3; + vcpu->arch.private->cr3 = cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); handle_tlb_flush: @@ -1190,7 +1190,7 @@ int kvm_set_cr8(struct kvm_vcpu *vcpu, unsigned long cr8) if (lapic_in_kernel(vcpu)) kvm_lapic_set_tpr(vcpu, cr8); else - vcpu->arch.cr8 = cr8; + vcpu->arch.private->cr8 = cr8; return 0; } EXPORT_SYMBOL_GPL(kvm_set_cr8); @@ -1200,7 +1200,7 @@ unsigned long kvm_get_cr8(struct kvm_vcpu *vcpu) if (lapic_in_kernel(vcpu)) return kvm_lapic_get_cr8(vcpu); else - return vcpu->arch.cr8; + return vcpu->arch.private->cr8; } EXPORT_SYMBOL_GPL(kvm_get_cr8); @@ -4849,10 +4849,10 @@ static int kvm_vcpu_ioctl_x86_set_debugregs(struct kvm_vcpu *vcpu, static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return; - fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.guest_fpu, + fpu_copy_guest_fpstate_to_uabi(&vcpu->arch.private->guest_fpu, guest_xsave->region, sizeof(guest_xsave->region), vcpu->arch.pkru); @@ -4861,10 +4861,10 @@ static void kvm_vcpu_ioctl_x86_get_xsave(struct kvm_vcpu *vcpu, static int kvm_vcpu_ioctl_x86_set_xsave(struct kvm_vcpu *vcpu, struct kvm_xsave *guest_xsave) { - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; - return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.guest_fpu, + return fpu_copy_uabi_to_guest_fpstate(&vcpu->arch.private->guest_fpu, guest_xsave->region, supported_xcr0, &vcpu->arch.pkru); } @@ -4880,7 +4880,7 @@ static void kvm_vcpu_ioctl_x86_get_xcrs(struct kvm_vcpu *vcpu, guest_xcrs->nr_xcrs = 1; guest_xcrs->flags = 0; guest_xcrs->xcrs[0].xcr = XCR_XFEATURE_ENABLED_MASK; - guest_xcrs->xcrs[0].value = vcpu->arch.xcr0; + guest_xcrs->xcrs[0].value = vcpu->arch.private->xcr0; } static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, @@ -6516,7 +6516,7 @@ gpa_t translate_nested_gpa(struct kvm_vcpu *vcpu, gpa_t gpa, u32 access, /* NPT walks are always user-walks */ access |= PFERR_USER_MASK; - t_gpa = vcpu->arch.mmu->gva_to_gpa(vcpu, gpa, access, exception); + t_gpa = vcpu->arch.private->mmu->gva_to_gpa(vcpu, gpa, access, exception); return t_gpa; } @@ -6525,7 +6525,7 @@ gpa_t kvm_mmu_gva_to_gpa_read(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception) { u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, exception); } EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); @@ -6534,7 +6534,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_read); { u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0; access |= PFERR_FETCH_MASK; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, exception); } gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, @@ -6542,7 +6542,7 @@ gpa_t kvm_mmu_gva_to_gpa_write(struct kvm_vcpu *vcpu, gva_t gva, { u32 access = (static_call(kvm_x86_get_cpl)(vcpu) == 3) ? PFERR_USER_MASK : 0; access |= PFERR_WRITE_MASK; - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, exception); } EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); @@ -6550,7 +6550,7 @@ EXPORT_SYMBOL_GPL(kvm_mmu_gva_to_gpa_write); gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, gva_t gva, struct x86_exception *exception) { - return vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, 0, exception); + return vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, 0, exception); } static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, @@ -6561,7 +6561,7 @@ static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int bytes, int r = X86EMUL_CONTINUE; while (bytes) { - gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access, + gpa_t gpa = vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, access, exception); unsigned offset = addr & (PAGE_SIZE-1); unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset); @@ -6595,7 +6595,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt *ctxt, int ret; /* Inline kvm_read_guest_virt_helper for speed. */ - gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK, + gpa_t gpa = vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, access|PFERR_FETCH_MASK, exception); if (unlikely(gpa == UNMAPPED_GVA)) return X86EMUL_PROPAGATE_FAULT; @@ -6659,7 +6659,7 @@ static int kvm_write_guest_virt_helper(gva_t addr, void *val, unsigned int bytes int r = X86EMUL_CONTINUE; while (bytes) { - gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, + gpa_t gpa = vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, addr, access, exception); unsigned offset = addr & (PAGE_SIZE-1); @@ -6757,7 +6757,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, * shadow page table for L2 guest. */ if (vcpu_match_mmio_gva(vcpu, gva) && (!is_paging(vcpu) || - !permission_fault(vcpu, vcpu->arch.walk_mmu, + !permission_fault(vcpu, vcpu->arch.private->walk_mmu, vcpu->arch.mmio_access, 0, access))) { *gpa = vcpu->arch.mmio_gfn << PAGE_SHIFT | (gva & (PAGE_SIZE - 1)); @@ -6765,7 +6765,7 @@ static int vcpu_mmio_gva_to_gpa(struct kvm_vcpu *vcpu, unsigned long gva, return 1; } - *gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, exception); + *gpa = vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, exception); if (*gpa == UNMAPPED_GVA) return -1; @@ -6867,7 +6867,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, int handled, ret; bool write = ops->write; struct kvm_mmio_fragment *frag; - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; /* * If the exit was due to a NPF we may already have a GPA. @@ -7246,7 +7246,7 @@ static unsigned long emulator_get_cr(struct x86_emulate_ctxt *ctxt, int cr) value = kvm_read_cr0(vcpu); break; case 2: - value = vcpu->arch.cr2; + value = vcpu->arch.private->cr2; break; case 3: value = kvm_read_cr3(vcpu); @@ -7275,7 +7275,7 @@ static int emulator_set_cr(struct x86_emulate_ctxt *ctxt, int cr, ulong val) res = kvm_set_cr0(vcpu, mk_cr_64(kvm_read_cr0(vcpu), val)); break; case 2: - vcpu->arch.cr2 = val; + vcpu->arch.private->cr2 = val; break; case 3: res = kvm_set_cr3(vcpu, val); @@ -7597,7 +7597,7 @@ static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) static bool inject_emulated_exception(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; if (ctxt->exception.vector == PF_VECTOR) return kvm_inject_emulated_page_fault(vcpu, &ctxt->exception); @@ -7621,14 +7621,14 @@ static struct x86_emulate_ctxt *alloc_emulate_ctxt(struct kvm_vcpu *vcpu) ctxt->vcpu = vcpu; ctxt->ops = &emulate_ops; - vcpu->arch.emulate_ctxt = ctxt; + vcpu->arch.private->emulate_ctxt = ctxt; return ctxt; } static void init_emulate_ctxt(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; int cs_db, cs_l; static_call(kvm_x86_get_cs_db_l_bits)(vcpu, &cs_db, &cs_l); @@ -7658,7 +7658,7 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu) void kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; int ret; init_emulate_ctxt(vcpu); @@ -7731,7 +7731,7 @@ static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, static void prepare_emulation_ctxt_failure_exit(struct kvm_vcpu *vcpu) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; prepare_emulation_failure_exit(vcpu, NULL, 0, ctxt->fetch.data, ctxt->fetch.end - ctxt->fetch.data); @@ -7792,7 +7792,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, WARN_ON_ONCE(!(emulation_type & EMULTYPE_PF))) return false; - if (!vcpu->arch.mmu->direct_map) { + if (!vcpu->arch.private->mmu->direct_map) { /* * Write permission should be allowed since only * write access need to be emulated. @@ -7825,7 +7825,7 @@ static bool reexecute_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, kvm_release_pfn_clean(pfn); /* The instructions are well-emulated on direct mmu. */ - if (vcpu->arch.mmu->direct_map) { + if (vcpu->arch.private->mmu->direct_map) { unsigned int indirect_shadow_pages; write_lock(&vcpu->kvm->mmu_lock); @@ -7893,7 +7893,7 @@ static bool retry_instruction(struct x86_emulate_ctxt *ctxt, vcpu->arch.last_retry_eip = ctxt->eip; vcpu->arch.last_retry_addr = cr2_or_gpa; - if (!vcpu->arch.mmu->direct_map) + if (!vcpu->arch.private->mmu->direct_map) gpa = kvm_mmu_gva_to_gpa_write(vcpu, cr2_or_gpa, NULL); kvm_mmu_unprotect_page(vcpu->kvm, gpa_to_gfn(gpa)); @@ -8055,7 +8055,7 @@ int x86_decode_emulated_instruction(struct kvm_vcpu *vcpu, int emulation_type, void *insn, int insn_len) { int r = EMULATION_OK; - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; init_emulate_ctxt(vcpu); @@ -8081,7 +8081,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, int emulation_type, void *insn, int insn_len) { int r; - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; bool writeback = true; bool write_fault_to_spt; @@ -8160,7 +8160,7 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, ctxt->exception.address = cr2_or_gpa; /* With shadow page tables, cr2 contains a GVA or nGPA. */ - if (vcpu->arch.mmu->direct_map) { + if (vcpu->arch.private->mmu->direct_map) { ctxt->gpa_available = true; ctxt->gpa_val = cr2_or_gpa; } @@ -9484,9 +9484,9 @@ static void enter_smm(struct kvm_vcpu *vcpu) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0x8000); - cr0 = vcpu->arch.cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG); + cr0 = vcpu->arch.private->cr0 & ~(X86_CR0_PE | X86_CR0_EM | X86_CR0_TS | X86_CR0_PG); static_call(kvm_x86_set_cr0)(vcpu, cr0); - vcpu->arch.cr0 = cr0; + vcpu->arch.private->cr0 = cr0; static_call(kvm_x86_set_cr4)(vcpu, 0); @@ -10245,14 +10245,14 @@ static void kvm_load_guest_fpu(struct kvm_vcpu *vcpu) * Exclude PKRU from restore as restored separately in * kvm_x86_ops.run(). */ - fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, true); + fpu_swap_kvm_fpstate(&vcpu->arch.private->guest_fpu, true); trace_kvm_fpu(1); } /* When vcpu_run ends, restore user space FPU context. */ static void kvm_put_guest_fpu(struct kvm_vcpu *vcpu) { - fpu_swap_kvm_fpstate(&vcpu->arch.guest_fpu, false); + fpu_swap_kvm_fpstate(&vcpu->arch.private->guest_fpu, false); ++vcpu->stat.fpu_reload; trace_kvm_fpu(0); } @@ -10342,7 +10342,7 @@ static void __get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) * that usually, but some bad designed PV devices (vmware * backdoor interface) need this to work */ - emulator_writeback_register_cache(vcpu->arch.emulate_ctxt); + emulator_writeback_register_cache(vcpu->arch.private->emulate_ctxt); vcpu->arch.emulate_regs_need_sync_to_vcpu = false; } regs->rax = kvm_rax_read(vcpu); @@ -10450,7 +10450,7 @@ static void __get_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs) sregs->gdt.limit = dt.size; sregs->gdt.base = dt.address; - sregs->cr2 = vcpu->arch.cr2; + sregs->cr2 = vcpu->arch.private->cr2; sregs->cr3 = kvm_read_cr3(vcpu); skip_protected_regs: @@ -10563,7 +10563,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, int kvm_task_switch(struct kvm_vcpu *vcpu, u16 tss_selector, int idt_index, int reason, bool has_error_code, u32 error_code) { - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; + struct x86_emulate_ctxt *ctxt = vcpu->arch.private->emulate_ctxt; int ret; init_emulate_ctxt(vcpu); @@ -10632,9 +10632,9 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs, dt.address = sregs->gdt.base; static_call(kvm_x86_set_gdt)(vcpu, &dt); - vcpu->arch.cr2 = sregs->cr2; + vcpu->arch.private->cr2 = sregs->cr2; *mmu_reset_needed |= kvm_read_cr3(vcpu) != sregs->cr3; - vcpu->arch.cr3 = sregs->cr3; + vcpu->arch.private->cr3 = sregs->cr3; kvm_register_mark_available(vcpu, VCPU_EXREG_CR3); kvm_set_cr8(vcpu, sregs->cr8); @@ -10644,7 +10644,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs, *mmu_reset_needed |= kvm_read_cr0(vcpu) != sregs->cr0; static_call(kvm_x86_set_cr0)(vcpu, sregs->cr0); - vcpu->arch.cr0 = sregs->cr0; + vcpu->arch.private->cr0 = sregs->cr0; *mmu_reset_needed |= kvm_read_cr4(vcpu) != sregs->cr4; static_call(kvm_x86_set_cr4)(vcpu, sregs->cr4); @@ -10652,7 +10652,7 @@ static int __set_sregs_common(struct kvm_vcpu *vcpu, struct kvm_sregs *sregs, if (update_pdptrs) { idx = srcu_read_lock(&vcpu->kvm->srcu); if (is_pae_paging(vcpu)) { - load_pdptrs(vcpu, vcpu->arch.walk_mmu, kvm_read_cr3(vcpu)); + load_pdptrs(vcpu, vcpu->arch.private->walk_mmu, kvm_read_cr3(vcpu)); *mmu_reset_needed = 1; } srcu_read_unlock(&vcpu->kvm->srcu, idx); @@ -10853,12 +10853,12 @@ int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) { struct fxregs_state *fxsave; - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; vcpu_load(vcpu); - fxsave = &vcpu->arch.guest_fpu.fpstate->regs.fxsave; + fxsave = &vcpu->arch.private->guest_fpu.fpstate->regs.fxsave; memcpy(fpu->fpr, fxsave->st_space, 128); fpu->fcw = fxsave->cwd; fpu->fsw = fxsave->swd; @@ -10876,12 +10876,12 @@ int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) { struct fxregs_state *fxsave; - if (fpstate_is_confidential(&vcpu->arch.guest_fpu)) + if (fpstate_is_confidential(&vcpu->arch.private->guest_fpu)) return 0; vcpu_load(vcpu); - fxsave = &vcpu->arch.guest_fpu.fpstate->regs.fxsave; + fxsave = &vcpu->arch.private->guest_fpu.fpstate->regs.fxsave; memcpy(fxsave->st_space, fpu->fpr, 128); fxsave->cwd = fpu->fcw; @@ -10988,7 +10988,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) if (!alloc_emulate_ctxt(vcpu)) goto free_wbinvd_dirty_mask; - if (!fpu_alloc_guest_fpstate(&vcpu->arch.guest_fpu)) { + if (!fpu_alloc_guest_fpstate(&vcpu->arch.private->guest_fpu)) { pr_err("kvm: failed to allocate vcpu's fpu\n"); goto free_emulate_ctxt; } @@ -11023,9 +11023,9 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) return 0; free_guest_fpu: - fpu_free_guest_fpstate(&vcpu->arch.guest_fpu); + fpu_free_guest_fpstate(&vcpu->arch.private->guest_fpu); free_emulate_ctxt: - kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt); + kmem_cache_free(x86_emulator_cache, vcpu->arch.private->emulate_ctxt); free_wbinvd_dirty_mask: free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); fail_free_mce_banks: @@ -11067,9 +11067,9 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) static_call(kvm_x86_vcpu_free)(vcpu); - kmem_cache_free(x86_emulator_cache, vcpu->arch.emulate_ctxt); + kmem_cache_free(x86_emulator_cache, vcpu->arch.private->emulate_ctxt); free_cpumask_var(vcpu->arch.wbinvd_dirty_mask); - fpu_free_guest_fpstate(&vcpu->arch.guest_fpu); + fpu_free_guest_fpstate(&vcpu->arch.private->guest_fpu); kvm_hv_vcpu_uninit(vcpu); kvm_pmu_destroy(vcpu); @@ -11118,7 +11118,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.dr7 = DR7_FIXED_1; kvm_update_dr7(vcpu); - vcpu->arch.cr2 = 0; + vcpu->arch.private->cr2 = 0; kvm_make_request(KVM_REQ_EVENT, vcpu); vcpu->arch.apf.msr_en_val = 0; @@ -11131,8 +11131,8 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_async_pf_hash_reset(vcpu); vcpu->arch.apf.halted = false; - if (vcpu->arch.guest_fpu.fpstate && kvm_mpx_supported()) { - struct fpstate *fpstate = vcpu->arch.guest_fpu.fpstate; + if (vcpu->arch.private->guest_fpu.fpstate && kvm_mpx_supported()) { + struct fpstate *fpstate = vcpu->arch.private->guest_fpu.fpstate; /* * To avoid have the INIT path from kvm_apic_has_events() that be @@ -11154,11 +11154,11 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) vcpu->arch.msr_misc_features_enables = 0; - vcpu->arch.xcr0 = XFEATURE_MASK_FP; + vcpu->arch.private->xcr0 = XFEATURE_MASK_FP; } /* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ - memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); + memset(vcpu->arch.private->regs, 0, sizeof(vcpu->arch.private->regs)); kvm_register_mark_dirty(vcpu, VCPU_REGS_RSP); /* @@ -11178,7 +11178,7 @@ void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) kvm_set_rflags(vcpu, X86_EFLAGS_FIXED); kvm_rip_write(vcpu, 0xfff0); - vcpu->arch.cr3 = 0; + vcpu->arch.private->cr3 = 0; kvm_register_mark_dirty(vcpu, VCPU_EXREG_CR3); /* @@ -12043,7 +12043,7 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) { int r; - if ((vcpu->arch.mmu->direct_map != work->arch.direct_map) || + if ((vcpu->arch.private->mmu->direct_map != work->arch.direct_map) || work->wakeup_all) return; @@ -12051,8 +12051,8 @@ void kvm_arch_async_page_ready(struct kvm_vcpu *vcpu, struct kvm_async_pf *work) if (unlikely(r)) return; - if (!vcpu->arch.mmu->direct_map && - work->arch.cr3 != vcpu->arch.mmu->get_guest_pgd(vcpu)) + if (!vcpu->arch.private->mmu->direct_map && + work->arch.cr3 != vcpu->arch.private->mmu->get_guest_pgd(vcpu)) return; kvm_mmu_do_page_fault(vcpu, work->cr2_or_gpa, 0, true); @@ -12398,9 +12398,9 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c (PFERR_WRITE_MASK | PFERR_FETCH_MASK | PFERR_USER_MASK); if (!(error_code & PFERR_PRESENT_MASK) || - vcpu->arch.walk_mmu->gva_to_gpa(vcpu, gva, access, &fault) != UNMAPPED_GVA) { + vcpu->arch.private->walk_mmu->gva_to_gpa(vcpu, gva, access, &fault) != UNMAPPED_GVA) { /* - * If vcpu->arch.walk_mmu->gva_to_gpa succeeded, the page + * If vcpu->arch.private->walk_mmu->gva_to_gpa succeeded, the page * tables probably do not match the TLB. Just proceed * with the error code that the processor gave. */ @@ -12410,7 +12410,7 @@ void kvm_fixup_and_inject_pf_error(struct kvm_vcpu *vcpu, gva_t gva, u16 error_c fault.nested_page_fault = false; fault.address = gva; } - vcpu->arch.walk_mmu->inject_page_fault(vcpu, &fault); + vcpu->arch.private->walk_mmu->inject_page_fault(vcpu, &fault); } EXPORT_SYMBOL_GPL(kvm_fixup_and_inject_pf_error); diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h index 3d5da4daaf53..dbcb6551d111 100644 --- a/arch/x86/kvm/x86.h +++ b/arch/x86/kvm/x86.h @@ -183,7 +183,7 @@ static inline bool x86_exception_has_error_code(unsigned int vector) static inline bool mmu_is_nested(struct kvm_vcpu *vcpu) { - return vcpu->arch.walk_mmu == &vcpu->arch.nested_mmu; + return vcpu->arch.private->walk_mmu == &vcpu->arch.private->nested_mmu; } static inline int is_pae(struct kvm_vcpu *vcpu) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 587a75428da8..3c4e27c5aea9 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -109,6 +109,8 @@ static atomic_t hardware_enable_failed; static struct kmem_cache *kvm_vcpu_cache; +static struct kmem_cache *kvm_vcpu_private_cache; + static __read_mostly struct preempt_ops kvm_preempt_ops; static DEFINE_PER_CPU_ASI_NOT_SENSITIVE(struct kvm_vcpu *, kvm_running_vcpu); @@ -457,6 +459,7 @@ void kvm_vcpu_destroy(struct kvm_vcpu *vcpu) put_pid(rcu_dereference_protected(vcpu->pid, 1)); free_page((unsigned long)vcpu->run); + kmem_cache_free(kvm_vcpu_private_cache, vcpu->arch.private); kmem_cache_free(kvm_vcpu_cache, vcpu); } EXPORT_SYMBOL_GPL(kvm_vcpu_destroy); @@ -2392,7 +2395,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma, * tail pages of non-compound higher order allocations, which * would then underflow the refcount when the caller does the * required put_page. Don't allow those pages here. - */ + */ if (!kvm_try_get_pfn(pfn)) r = -EFAULT; @@ -3562,17 +3565,25 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) if (r) goto vcpu_decrement; - vcpu = kmem_cache_zalloc(kvm_vcpu_cache, GFP_KERNEL_ACCOUNT); + vcpu = kmem_cache_zalloc(kvm_vcpu_cache, + GFP_KERNEL_ACCOUNT | __GFP_GLOBAL_NONSENSITIVE); if (!vcpu) { r = -ENOMEM; goto vcpu_decrement; } + vcpu->arch.private = kmem_cache_zalloc(kvm_vcpu_private_cache, + GFP_KERNEL | __GFP_LOCAL_NONSENSITIVE); + if (!vcpu->arch.private) { + r = -ENOMEM; + goto vcpu_free; + } + BUILD_BUG_ON(sizeof(struct kvm_run) > PAGE_SIZE); page = alloc_page(GFP_KERNEL_ACCOUNT | __GFP_ZERO | __GFP_LOCAL_NONSENSITIVE); if (!page) { r = -ENOMEM; - goto vcpu_free; + goto vcpu_private_free; } vcpu->run = page_address(page); @@ -3631,6 +3642,8 @@ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) kvm_arch_vcpu_destroy(vcpu); vcpu_free_run_page: free_page((unsigned long)vcpu->run); +vcpu_private_free: + kmem_cache_free(kvm_vcpu_private_cache, vcpu->arch.private); vcpu_free: kmem_cache_free(kvm_vcpu_cache, vcpu); vcpu_decrement: @@ -5492,7 +5505,7 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, vcpu_align = __alignof__(struct kvm_vcpu); kvm_vcpu_cache = kmem_cache_create_usercopy("kvm_vcpu", vcpu_size, vcpu_align, - SLAB_ACCOUNT, + SLAB_ACCOUNT|SLAB_GLOBAL_NONSENSITIVE, offsetof(struct kvm_vcpu, arch), offsetofend(struct kvm_vcpu, stats_id) - offsetof(struct kvm_vcpu, arch), @@ -5501,12 +5514,22 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, r = -ENOMEM; goto out_free_3; } - + kvm_vcpu_private_cache = kmem_cache_create_usercopy("kvm_vcpu_private", + sizeof(struct kvm_vcpu_arch_private), + __alignof__(struct kvm_vcpu_arch_private), + SLAB_ACCOUNT | SLAB_LOCAL_NONSENSITIVE, + 0, + sizeof(struct kvm_vcpu_arch_private), + NULL); + if (!kvm_vcpu_private_cache) { + r = -ENOMEM; + goto out_free_4; + } for_each_possible_cpu(cpu) { if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu), GFP_KERNEL, cpu_to_node(cpu))) { r = -ENOMEM; - goto out_free_4; + goto out_free_vcpu_private_cache; } } @@ -5541,6 +5564,8 @@ int kvm_init(void *opaque, unsigned vcpu_size, unsigned vcpu_align, out_free_5: for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); +out_free_vcpu_private_cache: + kmem_cache_destroy(kvm_vcpu_private_cache); out_free_4: kmem_cache_destroy(kvm_vcpu_cache); out_free_3: @@ -5567,6 +5592,7 @@ void kvm_exit(void) misc_deregister(&kvm_dev); for_each_possible_cpu(cpu) free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); + kmem_cache_destroy(kvm_vcpu_private_cache); kmem_cache_destroy(kvm_vcpu_cache); kvm_async_pf_deinit(); unregister_syscore_ops(&kvm_syscore_ops); From patchwork Wed Feb 23 05:22:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756404 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2863EC433EF for ; Wed, 23 Feb 2022 05:25:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D9EDA8D0022; Wed, 23 Feb 2022 00:25:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D252D8D0001; Wed, 23 Feb 2022 00:25:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B796E8D0022; Wed, 23 Feb 2022 00:25:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id A97138D0001 for ; Wed, 23 Feb 2022 00:25:27 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 795FE1204C5 for ; Wed, 23 Feb 2022 05:25:27 +0000 (UTC) X-FDA: 79172906694.05.BF2934D Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf14.hostedemail.com (Postfix) with ESMTP id E6566100002 for ; Wed, 23 Feb 2022 05:25:26 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id g205-20020a2552d6000000b0061e1843b8edso26636377ybb.18 for ; Tue, 22 Feb 2022 21:25:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Dq9qt2AUp1awc/cLmelSzNKRNDnSx+0iG/3BD+f2Pqw=; b=bPNyVJMg6CFD9HMFXRNSQizMrYjSNrjLiLhbm2g1SK1dzDwAFw9IVrAGvhhX45jt5Q MB3m8YKlJbAP43meWzITJhoGXfxoAhkBewEF4x4QZqaowTmjzbchko9N97X/yQbRhrNA ImSjY75jArlh3P4WvJxt6CSq2eFG/pUb6dFuicj5exLngzk9JG3+oOBNZTaiEkVgP1d8 VW0rHsRZ6Lp5BNVpEKjEx5KJ5bw8GUJBEAXS5TOsXJl2K77U7oBnknsAPh1rXoSbln87 yaf/9ms1aZYCgCW1qLWuFDH6hT7XNDObPyRg/4cz2jm+Z6IkcAJZ7HSdd02kUIZheQ2Z Pwpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Dq9qt2AUp1awc/cLmelSzNKRNDnSx+0iG/3BD+f2Pqw=; b=Fo7+wxx7KbsrQT5gP+qhAB/i11kDQrdpJ+ihiocLUtBkH48sYrdTQWPyutlK50s5hQ 8M9+E3RVZ73vCykHzqX/ql7+5+9GT/HKo0LwGhfTWfn0os3vZvKtJlrS23/O+dXGIm82 5Ir5jrvcbh33PSPV9ll9nS/I1krwTjZRSKu5GnmA7SRRfMCWHy1LMjrQwMDk78R+cbvF 0irriSP0X//TkAODgTdZwcJ4hlQfjFJO056930t29Zt3SB3mg+/uzoEYE4gBQp2BQOwr BjZ05OzIGh+95eo55MLApDkGd62cM+gdPizjH5N+RRz5dTxS1Gz5k2Ps54TA9SsqWkH6 krhA== X-Gm-Message-State: AOAM533bYxmAcJR7CZFL/caw54ExJ6qxNbQ4VbpQtEyzNpaPzanxkbdf K0EIwXQ0r1/Mwm7m6HNN9Xcky/RqREBk X-Google-Smtp-Source: ABdhPJzaqkidQawY7QCK9X5rVK71lVLf3wgt7MJ0n5efK7otIJAGSETfWRZjLt8X/bYBHlNaQvv9LeMoFDFo X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a0d:d1c5:0:b0:2ca:287c:6b81 with SMTP id t188-20020a0dd1c5000000b002ca287c6b81mr28230447ywd.38.1645593926233; Tue, 22 Feb 2022 21:25:26 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:21 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-46-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 45/47] mm: asi: Mapping global nonsensitive areas in asi_global_init From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E6566100002 X-Stat-Signature: 3b365kmk73czyxyzxaodex9w3ss8htj7 Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bPNyVJMg; spf=pass (imf14.hostedemail.com: domain of 3RsUVYgcKCD8kvobjethpphmf.dpnmjovy-nnlwbdl.psh@flex--junaids.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3RsUVYgcKCD8kvobjethpphmf.dpnmjovy-nnlwbdl.psh@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1645593926-184746 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse There are several areas in memory which we consider non sensitive. These areas should be mapped in every ASI domain. We map there areas in asi_global_init(). We modified some of the linking scripts to ensure these areas are starting and ending on page boundaries. The areas: - _stext --> _etext - __init_begin --> __init_end - __start_rodata --> __end_rodata - __start_once --> __end_once - __start___ex_table --> __stop___ex_table - __start_asi_nonsensitive --> __end_asi_nonsensitive - __start_asi_nonsensitive_readmostly --> __end_asi_nonsensitive_readmostly - __vvar_page --> + PAGE_SIZE - APIC_BASE --> + PAGE_SIZE - phys_base --> + PAGE_SIZE - __start___tracepoints_ptrs --> __stop___tracepoints_ptrs - __start___tracepoint_str --> __stop___tracepoint_str - __per_cpu_asi_start --> __per_cpu_asi_end (percpu) - irq_stack_backing_store --> + sizeof(irq_stack_backing_store) (percpu) The pgd's of the following addresses are cloned, modeled after KPTI: - CPU_ENTRY_AREA_BASE - ESPFIX_BASE_ADDR Signed-off-by: Ofir Weisse --- arch/x86/kernel/head_64.S | 12 +++++ arch/x86/kernel/vmlinux.lds.S | 2 +- arch/x86/mm/asi.c | 82 +++++++++++++++++++++++++++++++ include/asm-generic/vmlinux.lds.h | 13 +++-- 4 files changed, 105 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index d8b3ebd2bb85..3d3874661895 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -574,9 +574,21 @@ SYM_DATA_LOCAL(early_gdt_descr_base, .quad INIT_PER_CPU_VAR(gdt_page)) .align 16 /* This must match the first entry in level2_kernel_pgt */ + +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION +/* TODO: Find a way to mark .section for phys_base */ +/* Ideally, we want to map phys_base in .data..asi_non_sensitive. That doesn't + * seem to work properly. For now, we just make sure phys_base is in it's own + * page. */ + .align PAGE_SIZE +#endif SYM_DATA(phys_base, .quad 0x0) EXPORT_SYMBOL(phys_base) +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + .align PAGE_SIZE +#endif + #include "../../x86/xen/xen-head.S" __PAGE_ALIGNED_BSS diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S index 3d6dc12d198f..2b3668291785 100644 --- a/arch/x86/kernel/vmlinux.lds.S +++ b/arch/x86/kernel/vmlinux.lds.S @@ -148,8 +148,8 @@ SECTIONS } :text =0xcccc /* End of text section, which should occupy whole number of pages */ - _etext = .; . = ALIGN(PAGE_SIZE); + _etext = .; X86_ALIGN_RODATA_BEGIN RO_DATA(PAGE_SIZE) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 04628949e89d..7f2aa1823736 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -9,6 +9,7 @@ #include #include +#include /* struct irq_stack */ #include #include "mm_internal.h" @@ -17,6 +18,24 @@ #undef pr_fmt #define pr_fmt(fmt) "ASI: " fmt +#include +#include + +extern struct exception_table_entry __start___ex_table[]; +extern struct exception_table_entry __stop___ex_table[]; + +extern const char __start_asi_nonsensitive[], __end_asi_nonsensitive[]; +extern const char __start_asi_nonsensitive_readmostly[], + __end_asi_nonsensitive_readmostly[]; +extern const char __per_cpu_asi_start[], __per_cpu_asi_end[]; +extern const char *__start___tracepoint_str[]; +extern const char *__stop___tracepoint_str[]; +extern const char *__start___tracepoints_ptrs[]; +extern const char *__stop___tracepoints_ptrs[]; +extern const char __vvar_page[]; + +DECLARE_PER_CPU_PAGE_ALIGNED(struct irq_stack, irq_stack_backing_store); + static struct asi_class asi_class[ASI_MAX_NUM] __asi_not_sensitive; static DEFINE_SPINLOCK(asi_class_lock __asi_not_sensitive); @@ -412,6 +431,7 @@ void asi_unload_module(struct module* module) static int __init asi_global_init(void) { uint i, n; + int err = 0; if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -436,6 +456,68 @@ static int __init asi_global_init(void) pcpu_map_asi_reserved_chunk(); + + /* + * TODO: We need to ensure that all the sections mapped below are + * actually page-aligned by the linker. For now, we temporarily just + * align the start/end addresses here, but that is incorrect as the + * rest of the page could potentially contain sensitive data. + */ +#define MAP_SECTION(start, end) \ + pr_err("%s:%d mapping 0x%lx --> 0x%lx", \ + __FUNCTION__, __LINE__, start, end); \ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, \ + (void*)((unsigned long)(start) & PAGE_MASK),\ + PAGE_ALIGN((unsigned long)(end)) - \ + ((unsigned long)(start) & PAGE_MASK)); \ + BUG_ON(err); + +#define MAP_SECTION_PERCPU(start, size) \ + pr_err("%s:%d mapping PERCPU 0x%lx --> 0x%lx", \ + __FUNCTION__, __LINE__, start, (unsigned long)start+size); \ + err = asi_map_percpu(ASI_GLOBAL_NONSENSITIVE, \ + (void*)((unsigned long)(start) & PAGE_MASK), \ + PAGE_ALIGN((unsigned long)(size))); \ + BUG_ON(err); + + MAP_SECTION(_stext, _etext); + MAP_SECTION(__init_begin, __init_end); + MAP_SECTION(__start_rodata, __end_rodata); + MAP_SECTION(__start_once, __end_once); + MAP_SECTION(__start___ex_table, __stop___ex_table); + MAP_SECTION(__start_asi_nonsensitive, __end_asi_nonsensitive); + MAP_SECTION(__start_asi_nonsensitive_readmostly, + __end_asi_nonsensitive_readmostly); + MAP_SECTION(__vvar_page, __vvar_page + PAGE_SIZE); + MAP_SECTION(APIC_BASE, APIC_BASE + PAGE_SIZE); + MAP_SECTION(&phys_base, &phys_base + PAGE_SIZE); + + /* TODO: add a build flag to enable disable mapping only when + * instrumentation is used */ + MAP_SECTION(__start___tracepoints_ptrs, __stop___tracepoints_ptrs); + MAP_SECTION(__start___tracepoint_str, __stop___tracepoint_str); + + MAP_SECTION_PERCPU((void*)__per_cpu_asi_start, + __per_cpu_asi_end - __per_cpu_asi_start); + + MAP_SECTION_PERCPU(&irq_stack_backing_store, + sizeof(irq_stack_backing_store)); + + /* We have to map the stack canary into ASI. This is far from ideal, as + * attackers can use L1TF to steal the canary value, and then perhaps + * mount some other attack including a buffer overflow. This is a price + * we must pay to use ASI. + */ + MAP_SECTION_PERCPU(&fixed_percpu_data, PAGE_SIZE); + +#define CLONE_INIT_PGD(addr) \ + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, addr); + + CLONE_INIT_PGD(CPU_ENTRY_AREA_BASE); +#ifdef CONFIG_X86_ESPFIX64 + CLONE_INIT_PGD(ESPFIX_BASE_ADDR); +#endif + return 0; } subsys_initcall(asi_global_init) diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 0a931aedc285..7152ce3613f5 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -235,8 +235,10 @@ #define TRACE_PRINTKS() __start___trace_bprintk_fmt = .; \ KEEP(*(__trace_printk_fmt)) /* Trace_printk fmt' pointer */ \ __stop___trace_bprintk_fmt = .; -#define TRACEPOINT_STR() __start___tracepoint_str = .; \ +#define TRACEPOINT_STR() . = ALIGN(PAGE_SIZE); \ + __start___tracepoint_str = .; \ KEEP(*(__tracepoint_str)) /* Trace_printk fmt' pointer */ \ + . = ALIGN(PAGE_SIZE); \ __stop___tracepoint_str = .; #else #define TRACE_PRINTKS() @@ -348,8 +350,10 @@ MEM_KEEP(init.data*) \ MEM_KEEP(exit.data*) \ *(.data.unlikely) \ + . = ALIGN(PAGE_SIZE); \ __start_once = .; \ *(.data.once) \ + . = ALIGN(PAGE_SIZE); \ __end_once = .; \ STRUCT_ALIGN(); \ *(__tracepoints) \ @@ -453,9 +457,10 @@ *(.rodata) *(.rodata.*) \ SCHED_DATA \ RO_AFTER_INIT_DATA /* Read only after init */ \ - . = ALIGN(8); \ + . = ALIGN(PAGE_SIZE); \ __start___tracepoints_ptrs = .; \ KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \ + . = ALIGN(PAGE_SIZE); \ __stop___tracepoints_ptrs = .; \ *(__tracepoints_strings)/* Tracepoints: strings */ \ } \ @@ -671,11 +676,13 @@ */ #define EXCEPTION_TABLE(align) \ . = ALIGN(align); \ + . = ALIGN(PAGE_SIZE); \ __ex_table : AT(ADDR(__ex_table) - LOAD_OFFSET) { \ __start___ex_table = .; \ KEEP(*(__ex_table)) \ + . = ALIGN(PAGE_SIZE); \ __stop___ex_table = .; \ - } + } \ /* * .BTF From patchwork Wed Feb 23 05:22:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756406 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADBCBC433FE for ; Wed, 23 Feb 2022 05:25:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1449F8D0023; Wed, 23 Feb 2022 00:25:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0A3718D0001; Wed, 23 Feb 2022 00:25:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E86C48D0023; Wed, 23 Feb 2022 00:25:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0023.hostedemail.com [216.40.44.23]) by kanga.kvack.org (Postfix) with ESMTP id D670D8D0001 for ; Wed, 23 Feb 2022 00:25:29 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 9D8C88248D52 for ; Wed, 23 Feb 2022 05:25:29 +0000 (UTC) X-FDA: 79172906778.28.CDE59D6 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf15.hostedemail.com (Postfix) with ESMTP id 28447A0006 for ; Wed, 23 Feb 2022 05:25:29 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-2d07ae11467so163396527b3.12 for ; Tue, 22 Feb 2022 21:25:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=cWpgk2PD8ywov80MgvYy738g6oDVkPJMa4mg6Kd57WI=; b=EONmeIZhbgQ2Ix8S9q3J8G/HTiRJQe/uAmut/mIfTv5CIy/cqk4x/lLfaofIL6ZsPO 2DK7UMkGosmFYOPK8XbKYJ2B7I4QKSD7X8xoOad3cTZw5YniQxiy63dmB34fBM3ScXK9 b4M3PhJD3t0ZCy25IHGFNrU1Wai96D7oCn70Yeg5Az606EQRIrVrKV03yc49SjAoBqEY D/bmDYBLa30k0Q0T1/0E+zVvHZ7GWXBtnOemyeM6dCZ+R2nj996pZd5wNPpUZxVxnqxJ WaqOqmkCjUVkdvW9sUwO7Bbj6HXZvuS7EIm1omaeZny3PaSOUoJHhQmMblrL8LNsc6wd q1rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=cWpgk2PD8ywov80MgvYy738g6oDVkPJMa4mg6Kd57WI=; b=rst9P7Ajz9Xp2lPapsW7IhBRFSTOlnAAKePmyihIYyjiPKNOBZTiT8eKuUwBbQgJYV 2H4eHTmeZvHJh2Ok+6HHBD8hGs9ntMctbt1R2zVSnmeygFfKSai656lLb9LlwyWvV9D9 A/wkCb/G+c6UbgT9jnkZWb/fQV9vqBYdsvBBUzPLCrLU0JQUAwLx02NU7R54qaK/+IoK ZLXJg11/dr8NVmDkzF3bOWCAAvnZTmKMNuBdAND6khMGK/mLQ1a7tkDc136cGqp6XS5q DNMIbNqBFT2Tz4JPsaZ+hoConyVTUsoyjZHcs6yGeLGzU8P27lUyhrIXOFfQ/Ck6R/hB Da8g== X-Gm-Message-State: AOAM530vllDqhwLqVO+pAvpA/TGnOUprA63v+6vAEXl6DZxy1ajLg6sz iw3jDG3Zj8pUE1zIFXiPYGW32OOkcxpX X-Google-Smtp-Source: ABdhPJxkXSiWBnVLfkr470oqcQZfrXanhJKcgx/q87EPE8RXSdLDTEk6wPn7sTX4KYfxfNXBGCkwiM0xCmzr X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:bf87:0:b0:622:1e66:e7fd with SMTP id l7-20020a25bf87000000b006221e66e7fdmr25540509ybk.341.1645593928498; Tue, 22 Feb 2022 21:25:28 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:22 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-47-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 46/47] kvm: asi: Do asi_exit() in vcpu_run loop before returning to userspace From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspam-User: Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EONmeIZh; spf=pass (imf15.hostedemail.com: domain of 3SMUVYgcKCEEmxqdlgvjrrjoh.frpolqx0-ppnydfn.ruj@flex--junaids.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3SMUVYgcKCEEmxqdlgvjrrjoh.frpolqx0-ppnydfn.ruj@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 28447A0006 X-Stat-Signature: qy48ox39o1p549jgx797wd8uw8e4a7oa X-HE-Tag: 1645593929-212759 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse For the time being, we switch to the full kernel address space before returning back to userspace. Once KPTI is also implemented using ASI, we could potentially also switch to the KPTI address space directly. Signed-off-by: Ofir Weisse --- arch/x86/kvm/x86.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 680725089a18..294f73e9e71e 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10148,13 +10148,17 @@ static int vcpu_run(struct kvm_vcpu *vcpu) srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); r = xfer_to_guest_mode_handle_work(vcpu); if (r) - return r; + goto exit; vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); } } srcu_read_unlock(&kvm->srcu, vcpu->srcu_idx); +exit: + /* TODO(oweisse): trace this exit if we're still within an ASI. */ + asi_exit(); + return r; } From patchwork Wed Feb 23 05:22:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Junaid Shahid X-Patchwork-Id: 12756407 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7FEA8C433EF for ; Wed, 23 Feb 2022 05:25:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5609C8D0024; Wed, 23 Feb 2022 00:25:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EB378D0001; Wed, 23 Feb 2022 00:25:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2ECA98D0024; Wed, 23 Feb 2022 00:25:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 229428D0001 for ; Wed, 23 Feb 2022 00:25:32 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E7B9320808 for ; Wed, 23 Feb 2022 05:25:31 +0000 (UTC) X-FDA: 79172906862.08.1591124 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf26.hostedemail.com (Postfix) with ESMTP id 5B026140005 for ; Wed, 23 Feb 2022 05:25:31 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-2d7b96d74f8so26404177b3.16 for ; Tue, 22 Feb 2022 21:25:31 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=TaqdfVv20HSlc0A1c0PRFCOb1G6wQiL20chjexIyKqk=; b=rGs+4FxaDjpCpPLEg7H9QRppi6us6PQKtKoDaRWUcIklSnWA7hAYHi0ajBlLinkdGD RR60+N9t//ZgngiTNXXIeQGAWtu9hbb85MnwRsz6HZkNODh5q+Hvo26B+KyLDLD0wO72 Km+o4S/5IA7Nm5fHy35QBZ6gA5Jj2sdkDbtwen0aMy6pdcJNCDB87jD1BH+xf2J+Qqo1 uywNmgZooXGbPVFc7sIEcTr3h+ZPEiF9YjXU/CTb1IwHcqN0Ljdos6xBZ49r2iJJ80jO IU+HktHwWbxx7NyomQ4O+OP+COIay05VRI+prWu+WSkdgnQPlZ0X+27JswkXnsvP2ZOJ /p0g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=TaqdfVv20HSlc0A1c0PRFCOb1G6wQiL20chjexIyKqk=; b=hrIOGO+KdlN6uDagIg5QR1+83m+ARBSXlZoqAMpD63YzkhX/FeqR0V43R4wI1X7TFV jWg7eVssQ/MJt4ITZyX2hm0oGDsq1Y4qAoIwxXdP62EsmZ+s7XDXGIhQFBkys+7pJqzS 4xyoLZmkZhzn4Bc9iOo8pjWDzLX1J7SERW7G1YY46/14pLboi4U+8pI8baSejNeCvWIQ wPWac9xX7Rw5IpCusGMIJ3Xjur+2DDczZnRxPlzbqNYTRzw7qQ4D5SovIfENaNxCtUXg pvJs6Ubw6v3LZOm+TRGPnBQHo2t+ScGGehTn7pQxDdA90QIiX6yqt9vN8FIvwbrRB6wA TbBw== X-Gm-Message-State: AOAM531sl/XPllQrlJDufQgPhiuGEJdhi9TRZghtW/YVg7TUwVmrjBTq 0Ub+vUoTxVtKMM5Zr1ZcB5L9zy682y6w X-Google-Smtp-Source: ABdhPJwmqi34TRvmt135zAgWCIHRaV4KjaCBBqxLVhnQGHlMtw+LypDzffAe2O8Zb/1XiXdFw/2ftIFsbgRk X-Received: from js-desktop.svl.corp.google.com ([2620:15c:2cd:202:ccbe:5d15:e2e6:322]) (user=junaids job=sendgmr) by 2002:a25:6fc1:0:b0:624:43a0:c16c with SMTP id k184-20020a256fc1000000b0062443a0c16cmr21683087ybc.222.1645593930719; Tue, 22 Feb 2022 21:25:30 -0800 (PST) Date: Tue, 22 Feb 2022 21:22:23 -0800 In-Reply-To: <20220223052223.1202152-1-junaids@google.com> Message-Id: <20220223052223.1202152-48-junaids@google.com> Mime-Version: 1.0 References: <20220223052223.1202152-1-junaids@google.com> X-Mailer: git-send-email 2.35.1.473.g83b2b277ed-goog Subject: [RFC PATCH 47/47] mm: asi: Properly un/mapping task stack from ASI + tlb flush From: Junaid Shahid To: linux-kernel@vger.kernel.org Cc: Ofir Weisse , kvm@vger.kernel.org, pbonzini@redhat.com, jmattson@google.com, pjt@google.com, alexandre.chartre@oracle.com, rppt@linux.ibm.com, dave.hansen@linux.intel.com, peterz@infradead.org, tglx@linutronix.de, luto@kernel.org, linux-mm@kvack.org X-Rspamd-Queue-Id: 5B026140005 X-Stat-Signature: oct9puj8ury7zkmwryz4xezjo6r61nno X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rGs+4Fxa; spf=pass (imf26.hostedemail.com: domain of 3SsUVYgcKCEMozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3SsUVYgcKCEMozsfnixlttlqj.htrqnsz2-rrp0fhp.twl@flex--junaids.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam05 X-HE-Tag: 1645593931-226380 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Ofir Weisse There are several locations where this is important. Especially since a task_struct might be reused, potentially with a different mm. 1. Map in vcpu_run() @ arch/x86/kvm/x86.c 1. Unmap in release_task_stack() @ kernel/fork.c 2. Unmap in do_exit() @ kernel/exit.c 3. Unmap in begin_new_exec() @ fs/exec.c Signed-off-by: Ofir Weisse --- arch/x86/include/asm/asi.h | 6 ++++ arch/x86/kvm/x86.c | 6 ++++ arch/x86/mm/asi.c | 59 ++++++++++++++++++++++++++++++++++++++ fs/exec.c | 7 ++++- include/asm-generic/asi.h | 16 +++++++++-- include/linux/sched.h | 5 ++++ kernel/exit.c | 2 +- kernel/fork.c | 22 +++++++++++++- 8 files changed, 118 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 6148e65fb0c2..9d8f43981678 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -87,6 +87,12 @@ void asi_unmap_user(struct asi *asi, void *va, size_t len); int asi_fill_pgtbl_pool(struct asi_pgtbl_pool *pool, uint count, gfp_t flags); void asi_clear_pgtbl_pool(struct asi_pgtbl_pool *pool); +int asi_map_task_stack(struct task_struct *tsk, struct asi *asi); +void asi_unmap_task_stack(struct task_struct *tsk); +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm); +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order); + static inline void asi_init_pgtbl_pool(struct asi_pgtbl_pool *pool) { pool->pgtbl_list = NULL; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 294f73e9e71e..718104eefaed 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -10122,6 +10122,12 @@ static int vcpu_run(struct kvm_vcpu *vcpu) vcpu->srcu_idx = srcu_read_lock(&kvm->srcu); vcpu->arch.l1tf_flush_l1d = true; + /* We must have current->stack mapped into asi. This function can be + * safely called many times, as it will only do the actual mapping once. */ + r = asi_map_task_stack(current, vcpu->kvm->asi); + if (r != 0) + return r; + for (;;) { if (kvm_vcpu_running(vcpu)) { r = vcpu_enter_guest(vcpu); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 7f2aa1823736..a86ac6644a57 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -1029,6 +1029,45 @@ void asi_unmap(struct asi *asi, void *addr, size_t len, bool flush_tlb) asi_flush_tlb_range(asi, addr, len); } +int asi_map_task_stack(struct task_struct *tsk, struct asi *asi) +{ + int ret; + + /* If the stack is already mapped to asi - don't need to map it again. */ + if (tsk->asi_stack_mapped) + return 0; + + if (!tsk->mm) + return -EINVAL; + + /* If the stack was allocated via the page allocator, we assume the + * stack pages were marked with PageNonSensitive, therefore tsk->stack + * address is properly aliased. */ + ret = asi_map(ASI_LOCAL_NONSENSITIVE, tsk->stack, THREAD_SIZE); + if (!ret) { + tsk->asi_stack_mapped = asi; + asi_sync_mapping(asi, tsk->stack, THREAD_SIZE); + } + + return ret; +} + +void asi_unmap_task_stack(struct task_struct *tsk) +{ + /* No need to unmap if the stack was not mapped to begin with. */ + if (!tsk->asi_stack_mapped) + return; + + if (!tsk->mm) + return; + + asi_unmap(ASI_LOCAL_NONSENSITIVE, tsk->stack, THREAD_SIZE, + /* flush_tlb = */ true); + + tsk->asi_stack_mapped = NULL; +} + + void *asi_va(unsigned long pa) { struct page *page = pfn_to_page(PHYS_PFN(pa)); @@ -1336,3 +1375,23 @@ void asi_unmap_user(struct asi *asi, void *addr, size_t len) } } EXPORT_SYMBOL_GPL(asi_unmap_user); + +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm) +{ + uint i; + for (i = 0; i < (1 << order); i++) { + __SetPageLocalNonSensitive(pages + i); + pages[i].asi_mm = mm; + } +} + +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order) +{ + uint i; + for (i = 0; i < (1 << order); i++) { + __ClearPageLocalNonSensitive(pages + i); + pages[i].asi_mm = NULL; + } +} + diff --git a/fs/exec.c b/fs/exec.c index 76f3b433e80d..fb9182cf3f33 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -69,6 +69,7 @@ #include #include #include +#include #include #include "internal.h" @@ -1238,7 +1239,11 @@ int begin_new_exec(struct linux_binprm * bprm) struct task_struct *me = current; int retval; - /* TODO: (oweisse) unmap the stack from ASI */ + /* The old mm is about to be released later on in exec_mmap. We are + * reusing the task, including its stack which was mapped to + * mm->asi_pgd[0]. We need to asi_unmap the stack, so the destructor of + * the mm won't complain on "lingering" asi mappings. */ + asi_unmap_task_stack(current); /* Once we are committed compute the creds */ retval = bprm_creds_from_file(bprm); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 2763cb1a974c..6e9a261a2b9d 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -66,8 +66,13 @@ static inline struct asi *asi_get_target(void) { return NULL; } static inline struct asi *asi_get_current(void) { return NULL; } -static inline -int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags) +static inline int asi_map_task_stack(struct task_struct *tsk, struct asi *asi) +{ return 0; } + +static inline void asi_unmap_task_stack(struct task_struct *tsk) { } + +static inline int asi_map_gfp(struct asi *asi, void *addr, size_t len, + gfp_t gfp_flags) { return 0; } @@ -130,6 +135,13 @@ static inline int asi_load_module(struct module* module) {return 0;} static inline void asi_unload_module(struct module* module) { } +static inline +void asi_mark_pages_local_nonsensitive(struct page *pages, uint order, + struct mm_struct *mm) { } + +static inline +void asi_clear_pages_local_nonsensitive(struct page *pages, uint order) { } + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_ADDRESS_SPACE_ISOLATION */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 78c351e35fec..87ad45e52b19 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -67,6 +67,7 @@ struct sighand_struct; struct signal_struct; struct task_delay_info; struct task_group; +struct asi; /* * Task state bitmask. NOTE! These bits are also @@ -1470,6 +1471,10 @@ struct task_struct { int mce_count; #endif +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + struct asi *asi_stack_mapped; +#endif + #ifdef CONFIG_KRETPROBES struct llist_head kretprobe_instances; #endif diff --git a/kernel/exit.c b/kernel/exit.c index ab2749cf6887..f21cc21814d1 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -768,7 +768,7 @@ void __noreturn do_exit(long code) profile_task_exit(tsk); kcov_task_exit(tsk); - /* TODO: (oweisse) unmap the stack from ASI */ + asi_unmap_task_stack(tsk); coredump_task_exit(tsk); ptrace_event(PTRACE_EVENT_EXIT, code); diff --git a/kernel/fork.c b/kernel/fork.c index cb147a72372d..876fefc477cb 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -216,7 +216,6 @@ static int free_vm_stack_cache(unsigned int cpu) static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) { - /* TODO: (oweisse) Add annotation to map the stack into ASI */ #ifdef CONFIG_VMAP_STACK void *stack; int i; @@ -269,7 +268,16 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) struct page *page = alloc_pages_node(node, THREADINFO_GFP, THREAD_SIZE_ORDER); + /* When marking pages as PageLocalNonSesitive we set the page->mm to be + * NULL. We must make sure the flag is cleared from the stack pages + * before free_pages is called. Otherwise, page->mm will be accessed + * which will reuslt in NULL reference. page_address() below will yield + * an aliased address after ASI_LOCAL_MAP, thanks to + * PageLocalNonSesitive flag. */ if (likely(page)) { + asi_mark_pages_local_nonsensitive(page, + THREAD_SIZE_ORDER, + NULL); tsk->stack = kasan_reset_tag(page_address(page)); return tsk->stack; } @@ -301,6 +309,14 @@ static inline void free_thread_stack(struct task_struct *tsk) } #endif + /* We must clear the PageNonSensitive flag before calling free_pages(). + * Otherwise page->mm (which is NULL) will be accessed, in order to + * unmap the pages from ASI. Specifically for the stack, we assume the + * pages were already unmapped from ASI before we got here, via + * asi_unmap_task_stack(). */ + asi_clear_pages_local_nonsensitive(virt_to_page(tsk->stack), + THREAD_SIZE_ORDER); + __free_pages(virt_to_page(tsk->stack), THREAD_SIZE_ORDER); } # else @@ -436,6 +452,7 @@ static void release_task_stack(struct task_struct *tsk) if (WARN_ON(READ_ONCE(tsk->__state) != TASK_DEAD)) return; /* Better to leak the stack than to free prematurely */ + asi_unmap_task_stack(tsk); account_kernel_stack(tsk, -1); free_thread_stack(tsk); tsk->stack = NULL; @@ -916,6 +933,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) * functions again. */ tsk->stack = stack; +#ifdef CONFIG_ADDRESS_SPACE_ISOLATION + tsk->asi_stack_mapped = NULL; +#endif #ifdef CONFIG_VMAP_STACK tsk->stack_vm_area = stack_vm_area; #endif