From patchwork Fri Jul 12 17:00:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732011 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC960176AA1 for ; Fri, 12 Jul 2024 17:00:54 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803656; cv=none; b=hRG4fjsMRBxcq3+/aktNx8LG3SgpLdqQuoqqQr5EWPmZNeZNyeVpXvJLg7IZjCThuYsNAQCc2rp0N9qSkYdMyJNaLaYF/ehWRVNyy+FiQehYFvywlsYm9xKW7d2YDLNURA2RNt8pbQ6Yqn0iRx0FoCR/l/tiCjrk4NTiscIFGDY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803656; c=relaxed/simple; bh=K3gVCDyK+u9CvTzMmL9E5JCyrrcnY9YAK60OCxm0rQA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=V+EZ2znbm4q21E9tNNEewjSzyW9R6c37fjEeUa6tVQAQYv4tzHsmqIWyhZkswUZOAHgB2p7FgrctQEMtywR+Cuy1AocsAti5CmoUAfyi7qLtdMzRHT70muAWWhXK+IvS/sFRp1j740pAbXfx4zew8GMC+WADsrJi47gNZwwqgTY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=fc0pFS+N; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="fc0pFS+N" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-42794cb8251so13390635e9.2 for ; Fri, 12 Jul 2024 10:00:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803653; x=1721408453; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=XYkIsXZqE6yHwQwkHX4TOc/fbGXEyWGzvfr2StuEGo8=; b=fc0pFS+Nh2U2Mi8rn0F+6Fs1JrbPag9EVaO+Ju4s5e7nUsPuvKL1RRRAmvxblLm4l+ EcbcppmuJ53lhg1V5k9tcSlcNaioP3kiBXxMXpe+rKcqpTbR5hGKkk9rRyMdpamLHMDU sZrRFwFODi2d2RuyX1ssofzsDYri3g4Xfav24XTByxDciYv5NgePdzZzsrpnX8uTn2/M Uei7hWFEQ0J7kvTWdCKet+Out8wk7z/hHub4vp3jBUpgQPj9nYf6fFZxq1LNcqi/0LLd g3QE4mg0dn2+iBUT0adF0lToa42WGK17l9cjZPIW1V7vpickjp4Y3ksnEr1i3yAOLqaA eVXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803653; x=1721408453; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XYkIsXZqE6yHwQwkHX4TOc/fbGXEyWGzvfr2StuEGo8=; b=E/DpFLA/g0xgatgZSJf/NSF8JUOQtFOcYuKUx1pAgWu78ge427dkomNHOFwL9zN+wT KwVYIfTGa5fPjUZYMa0qceW3sr8jfWFeJ2Y3m1UJcIYUVuhiG7OURWn8MzGqpMENNxHw nM8HMoGYnULEGXjGB5fZT4nQLyNCU/AUIX+G78kOHURRODKlqoZ8SgMITawUJnPmxW/9 Hb2yuQF+BG8ZUCXCBYUUU6s964CkZXncev5wgE2F7JTxMqTLieR+8WdLSalukSMtv5yt bQm0CZYlAKoJvotZvLOEtRpjE9JK/m5+q+fC95tFDQJE4dDsEgjUP5OBAxKWyVNZtKKV 9gQQ== X-Forwarded-Encrypted: i=1; AJvYcCVlFOfyADf/j3AzIAmzPss61wDlwq4kwDDdqfIukRslDa39TBEErleA2Zn/6dlko2K1iGwOdw34DPV8vc5RpuZJ6Kuv X-Gm-Message-State: AOJu0YyvpE5AnG7TpmpJm1MyzmuFyFwLadJUIRVlBQByC6DpOYGxxCyl BPOcauqCKg6d+GZtaaPT97uohrBab4aqMNLvz4+RW8RqJOyuDQFnp83tbJ6xCKa7XY6fT7oFpGV 6T36k/tQ73Q== X-Google-Smtp-Source: AGHT+IHv51LtvzZcsv+Q35b3qDQSC+/JaCm1f27HCjANRBhvdVtpbFdUnaeYYxG55HOkAMdxTZXXEQFhJkEhkg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:35c6:b0:424:a4ac:561b with SMTP id 5b1f17b1804b1-426708fda54mr866675e9.7.1720803652922; Fri, 12 Jul 2024 10:00:52 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:19 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-1-144b319a40d8@google.com> Subject: [PATCH 01/26] mm: asi: Make some utility functions noinstr compatible From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid Some existing utility functions would need to be called from a noinstr context in the later patches. So mark these as either noinstr or __always_inline. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/processor.h | 2 +- arch/x86/include/asm/special_insns.h | 8 ++++---- arch/x86/mm/tlb.c | 8 ++++---- include/linux/compiler_types.h | 8 ++++++++ 4 files changed, 17 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index 78e51b0d6433d..dc45d622eae4e 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -206,7 +206,7 @@ void print_cpu_msr(struct cpuinfo_x86 *); /* * Friendlier CR3 helpers. */ -static inline unsigned long read_cr3_pa(void) +static __always_inline unsigned long read_cr3_pa(void) { return __read_cr3() & CR3_ADDR_MASK; } diff --git a/arch/x86/include/asm/special_insns.h b/arch/x86/include/asm/special_insns.h index 2e9fc5c400cdc..c63433dc04d34 100644 --- a/arch/x86/include/asm/special_insns.h +++ b/arch/x86/include/asm/special_insns.h @@ -42,14 +42,14 @@ static __always_inline void native_write_cr2(unsigned long val) asm volatile("mov %0,%%cr2": : "r" (val) : "memory"); } -static inline unsigned long __native_read_cr3(void) +static __always_inline unsigned long __native_read_cr3(void) { unsigned long val; asm volatile("mov %%cr3,%0\n\t" : "=r" (val) : __FORCE_ORDER); return val; } -static inline void native_write_cr3(unsigned long val) +static __always_inline void native_write_cr3(unsigned long val) { asm volatile("mov %0,%%cr3": : "r" (val) : "memory"); } @@ -153,12 +153,12 @@ static __always_inline void write_cr2(unsigned long x) * Careful! CR3 contains more than just an address. You probably want * read_cr3_pa() instead. */ -static inline unsigned long __read_cr3(void) +static __always_inline unsigned long __read_cr3(void) { return __native_read_cr3(); } -static inline void write_cr3(unsigned long x) +static __always_inline void write_cr3(unsigned long x) { native_write_cr3(x); } diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 44ac64f3a047c..6ca18ac9058b6 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -110,7 +110,7 @@ /* * Given @asid, compute kPCID */ -static inline u16 kern_pcid(u16 asid) +static inline_or_noinstr u16 kern_pcid(u16 asid) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); @@ -155,9 +155,9 @@ static inline u16 user_pcid(u16 asid) return ret; } -static inline unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam) +static inline_or_noinstr unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam) { - unsigned long cr3 = __sme_pa(pgd) | lam; + unsigned long cr3 = __sme_pa_nodebug(pgd) | lam; if (static_cpu_has(X86_FEATURE_PCID)) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); @@ -1087,7 +1087,7 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) * It's intended to be used for code like KVM that sneakily changes CR3 * and needs to restore it. It needs to be used very carefully. */ -unsigned long __get_current_cr3_fast(void) +inline_or_noinstr unsigned long __get_current_cr3_fast(void) { unsigned long cr3 = build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, diff --git a/include/linux/compiler_types.h b/include/linux/compiler_types.h index 8f8236317d5b1..955497335832c 100644 --- a/include/linux/compiler_types.h +++ b/include/linux/compiler_types.h @@ -320,6 +320,14 @@ struct ftrace_likely_data { */ #define __cpuidle __noinstr_section(".cpuidle.text") +/* + * Can be used for functions which themselves are not strictly noinstr, but + * may be called from noinstr code. + */ +#define inline_or_noinstr \ + inline notrace __attribute((__section__(".noinstr.text"))) \ + __no_kcsan __no_sanitize_address __no_sanitize_coverage + #endif /* __KERNEL__ */ #endif /* __ASSEMBLY__ */ From patchwork Fri Jul 12 17:00:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732012 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DCCEB176FDF for ; Fri, 12 Jul 2024 17:00:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803660; cv=none; b=KXZiZxMFLNdpdet0ABp/tKj2sU3V3Zn1YW/ZaIlx5t6HMiIjd1f3C3J8tjcPdjf49dejbDQuoX2nvdMNQ6WjcmpLrMoFBcOBXI0PqLjE/G4a0frXQqvx6HgDEbiu2TTdI96gSR4RZZt9FaRam04g2WmxPDK9nBPgc3/tKl8tLG4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803660; c=relaxed/simple; bh=0ox/fHV0whWmOA16gHsV/BoM3g3RhG5o+vu5pBfDGGQ=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=KiJKeqLAA+KyEK9mYtbBqryyHNd8mgwVlkQwdOOIPJLGWNO8gBmDssHxtI+07LGR8iZLMYMZwtK6JN/fTlM0TgGE0JLycpRki5ptQWB5HQRQUhR2aHNPhfAH4+PJ1fY/g+BzEfhzFKFpowIg5sEE7fzvUVPNMcP63zll1Az8uCs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GF9046HL; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GF9046HL" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-3679ab94cdbso1354297f8f.3 for ; Fri, 12 Jul 2024 10:00:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803657; x=1721408457; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=27EENIWQHHAHugBK1Q6SYWc3PF1x6i/tf/vuqqwaqlI=; b=GF9046HLblGUxJUtYgpxP67eEVdgAfBrcUV4iSPM2WlIEqpn4gIv4PcTXSK0rkdxdr mNbtBUdwOOVbMm6xEjCb9nugjmGbLsQVjFU8pQGXTmiq0Y42P+NcR704zJ5iPzANB6fV U08hoj97XEOEETAzerxeMPJyod8RqAFzXxRxkmmxJUtwXRtug5Ka3pN1lQP/c/Jz0bGt /BOOm52j5DVOvPHjgjfynAiLexEMejqyrD7tzQKFnyD9pX+7vLX7FQmhBBJ0s7O/P71y lFL8NDGqyfXzjg6mfpIG9Hh4ZlJBH4mztcMpieNlrZSjPixmym7/fFG456PNKrPAHVBb ApzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803657; x=1721408457; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=27EENIWQHHAHugBK1Q6SYWc3PF1x6i/tf/vuqqwaqlI=; b=X11PgfwLMuT0iEQqHQvim86mVIiMij643BycMzH3CMuznY5c22AiNTwH4MHm/LVlnB b9sIuGtNYIlEhqqu3fVa2/j2lSv1ZwYvnmNL2uBgrPlqvctP/xiOPn4W1XvnO1VdapGy fMk1bmsT0RodvuxSQni/tXRe1ny2TreKwuHT4EILYQuJm/34e1kUDr3xAodL7LOG3n9e igtKPB0BVc0DPR3DxE3X0jf+GXQh6G00EOXggoZ84TK7HZKup7tu5IIawqh4B1i8iDsm EZYpYHxXORD0vfEzqGWfyfimPzbVTAPgC5G/+5NkKma5PCZeHK97LXGuuoG7VOnn17go Ngbw== X-Forwarded-Encrypted: i=1; AJvYcCXLo+hZDpDSgZCldM9ih6Unyj3eDC+xAl/vsiI742fCHhqZkOA4p20AzgzUiImKyIgEePRcQrBXLSTZ0lSjC09MVzW8 X-Gm-Message-State: AOJu0YyRIZwaMeQKenLeY6kP9eGnu2hmejJ71WyKW94gFedxFESlOQ+/ O/tHpIEIAVfOMUKbwW+3ZDAP8IdVdohe/WqVh94yV1v/ifhG6WmasN6sMp9Il5GFN2tEovkIcd+ E84GtARu5BQ== X-Google-Smtp-Source: AGHT+IGA7T3CScQphIzrB5GwbqwR/XvQ+ZJeEz+EAB7hHAYrv99hyozzic8O5c9/DkjRZkjaSXDa+Xm3xrwxfw== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:14c:b0:360:727b:8b5d with SMTP id ffacd0b85a97d-367cea738dbmr25053f8f.6.1720803657248; Fri, 12 Jul 2024 10:00:57 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:20 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-2-144b319a40d8@google.com> Subject: [PATCH 02/26] x86: Create CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Currently a nop config. Keeping as a separate commit for easy review of the boring bits. Later commits will use and enable this new config. This config is only added for non-UML x86_64 as other architectures do not yet have pending implementations. It also has somewhat artificial dependencies on !PARAVIRT and !KASAN which are explained in the Kconfig file. Co-developed-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/alpha/include/asm/Kbuild | 1 + arch/arc/include/asm/Kbuild | 1 + arch/arm/include/asm/Kbuild | 1 + arch/arm64/include/asm/Kbuild | 1 + arch/csky/include/asm/Kbuild | 1 + arch/hexagon/include/asm/Kbuild | 1 + arch/loongarch/include/asm/Kbuild | 1 + arch/m68k/include/asm/Kbuild | 1 + arch/microblaze/include/asm/Kbuild | 1 + arch/mips/include/asm/Kbuild | 1 + arch/nios2/include/asm/Kbuild | 1 + arch/openrisc/include/asm/Kbuild | 1 + arch/parisc/include/asm/Kbuild | 1 + arch/powerpc/include/asm/Kbuild | 1 + arch/riscv/include/asm/Kbuild | 1 + arch/s390/include/asm/Kbuild | 1 + arch/sh/include/asm/Kbuild | 1 + arch/sparc/include/asm/Kbuild | 1 + arch/um/include/asm/Kbuild | 1 + arch/x86/Kconfig | 19 +++++++++++++++++++ arch/xtensa/include/asm/Kbuild | 1 + include/asm-generic/asi.h | 5 +++++ 22 files changed, 44 insertions(+) diff --git a/arch/alpha/include/asm/Kbuild b/arch/alpha/include/asm/Kbuild index 396caece6d6d9..ca72ce3baca13 100644 --- a/arch/alpha/include/asm/Kbuild +++ b/arch/alpha/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += agp.h generic-y += asm-offsets.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/arc/include/asm/Kbuild b/arch/arc/include/asm/Kbuild index 3c1afa524b9c2..60bdeffa7c31e 100644 --- a/arch/arc/include/asm/Kbuild +++ b/arch/arc/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h generic-y += user.h +generic-y += asi.h diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild index 03657ff8fbe3d..1e2c3d8dbbd99 100644 --- a/arch/arm/include/asm/Kbuild +++ b/arch/arm/include/asm/Kbuild @@ -6,3 +6,4 @@ generic-y += parport.h generated-y += mach-types.h generated-y += unistd-nr.h +generic-y += asi.h diff --git a/arch/arm64/include/asm/Kbuild b/arch/arm64/include/asm/Kbuild index 4b6d2d52053e4..f95699a559309 100644 --- a/arch/arm64/include/asm/Kbuild +++ b/arch/arm64/include/asm/Kbuild @@ -5,6 +5,7 @@ generic-y += qrwlock.h generic-y += qspinlock.h generic-y += parport.h generic-y += user.h +generic-y += asi.h generated-y += cpucap-defs.h generated-y += sysreg-defs.h diff --git a/arch/csky/include/asm/Kbuild b/arch/csky/include/asm/Kbuild index 1117c28cb7e8a..5e49ccb571644 100644 --- a/arch/csky/include/asm/Kbuild +++ b/arch/csky/include/asm/Kbuild @@ -10,3 +10,4 @@ generic-y += qspinlock.h generic-y += parport.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h \ No newline at end of file diff --git a/arch/hexagon/include/asm/Kbuild b/arch/hexagon/include/asm/Kbuild index 3ece3c93fe086..744ffbeeb7ae4 100644 --- a/arch/hexagon/include/asm/Kbuild +++ b/arch/hexagon/include/asm/Kbuild @@ -3,3 +3,4 @@ generic-y += extable.h generic-y += iomap.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/loongarch/include/asm/Kbuild b/arch/loongarch/include/asm/Kbuild index 2dbec7853ae86..66fcd325d6083 100644 --- a/arch/loongarch/include/asm/Kbuild +++ b/arch/loongarch/include/asm/Kbuild @@ -27,3 +27,4 @@ generic-y += param.h generic-y += posix_types.h generic-y += resource.h generic-y += kvm_para.h +generic-y += asi.h diff --git a/arch/m68k/include/asm/Kbuild b/arch/m68k/include/asm/Kbuild index 0dbf9c5c6faeb..faf0f135df4ab 100644 --- a/arch/m68k/include/asm/Kbuild +++ b/arch/m68k/include/asm/Kbuild @@ -4,3 +4,4 @@ generic-y += extable.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h +generic-y += asi.h diff --git a/arch/microblaze/include/asm/Kbuild b/arch/microblaze/include/asm/Kbuild index a055f5dbe00a3..012e4bf83c134 100644 --- a/arch/microblaze/include/asm/Kbuild +++ b/arch/microblaze/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += parport.h generic-y += syscalls.h generic-y += tlb.h generic-y += user.h +generic-y += asi.h diff --git a/arch/mips/include/asm/Kbuild b/arch/mips/include/asm/Kbuild index 7ba67a0d6c97b..3191699298d80 100644 --- a/arch/mips/include/asm/Kbuild +++ b/arch/mips/include/asm/Kbuild @@ -13,3 +13,4 @@ generic-y += parport.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/nios2/include/asm/Kbuild b/arch/nios2/include/asm/Kbuild index 7fe7437555fb4..bfdc4026c5b16 100644 --- a/arch/nios2/include/asm/Kbuild +++ b/arch/nios2/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/openrisc/include/asm/Kbuild b/arch/openrisc/include/asm/Kbuild index c8c99b554ca4c..d137c4e08e369 100644 --- a/arch/openrisc/include/asm/Kbuild +++ b/arch/openrisc/include/asm/Kbuild @@ -7,3 +7,4 @@ generic-y += spinlock.h generic-y += qrwlock_types.h generic-y += qrwlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/parisc/include/asm/Kbuild b/arch/parisc/include/asm/Kbuild index 4fb596d94c893..3cbb4eb14712c 100644 --- a/arch/parisc/include/asm/Kbuild +++ b/arch/parisc/include/asm/Kbuild @@ -5,3 +5,4 @@ generic-y += agp.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += user.h +generic-y += asi.h diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild index 61a8d5555cd7e..103c7e2f66987 100644 --- a/arch/powerpc/include/asm/Kbuild +++ b/arch/powerpc/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += mcs_spinlock.h generic-y += qrwlock.h generic-y += vtime.h generic-y += early_ioremap.h +generic-y += asi.h diff --git a/arch/riscv/include/asm/Kbuild b/arch/riscv/include/asm/Kbuild index 504f8b7e72d41..08c199a56731e 100644 --- a/arch/riscv/include/asm/Kbuild +++ b/arch/riscv/include/asm/Kbuild @@ -9,3 +9,4 @@ generic-y += qrwlock.h generic-y += qrwlock_types.h generic-y += user.h generic-y += vmlinux.lds.h +generic-y += asi.h diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild index 4b904110d27cb..b5caf77e8d955 100644 --- a/arch/s390/include/asm/Kbuild +++ b/arch/s390/include/asm/Kbuild @@ -7,3 +7,4 @@ generated-y += unistd_nr.h generic-y += asm-offsets.h generic-y += kvm_types.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/sh/include/asm/Kbuild b/arch/sh/include/asm/Kbuild index fc44d9c88b419..ea19e45158285 100644 --- a/arch/sh/include/asm/Kbuild +++ b/arch/sh/include/asm/Kbuild @@ -3,3 +3,4 @@ generated-y += syscall_table.h generic-y += kvm_para.h generic-y += mcs_spinlock.h generic-y += parport.h +generic-y += asi.h diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild index 43b0ae4c2c211..cb9062c9be17f 100644 --- a/arch/sparc/include/asm/Kbuild +++ b/arch/sparc/include/asm/Kbuild @@ -4,3 +4,4 @@ generated-y += syscall_table_64.h generic-y += agp.h generic-y += kvm_para.h generic-y += mcs_spinlock.h +generic-y += asi.h diff --git a/arch/um/include/asm/Kbuild b/arch/um/include/asm/Kbuild index b2d834a29f3a9..1bcb16b09dc49 100644 --- a/arch/um/include/asm/Kbuild +++ b/arch/um/include/asm/Kbuild @@ -28,3 +28,4 @@ generic-y += trace_clock.h generic-y += kprobes.h generic-y += mm_hooks.h generic-y += vga.h +generic-y += asi.h diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 928820e61cb50..ff74aa53842ea 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2516,6 +2516,25 @@ config MITIGATION_PAGE_TABLE_ISOLATION See Documentation/arch/x86/pti.rst for more details. +config MITIGATION_ADDRESS_SPACE_ISOLATION + bool "Allow code to run with a reduced kernel address space" + default n + depends on X86_64 && !PARAVIRT && !KASAN && !UML + help + This feature provides the ability to run some kernel code + with a reduced kernel address space. This can be used to + mitigate some speculative execution attacks. + + The !PARAVIRT dependency is only because of lack of testing; in theory + the code is written to work under paravirtualization. In practice + there are likely to be unhandled cases, in particular concerning TLB + flushes. + + The !KASAN dependency is mainly because ASI creates a secondary + direct-map region in order to implement local-nonsensitive memory. + This dependencies will later be removed with extensions to the KASAN + implementation. + config MITIGATION_RETPOLINE bool "Avoid speculative indirect branches in kernel" select OBJTOOL if HAVE_OBJTOOL diff --git a/arch/xtensa/include/asm/Kbuild b/arch/xtensa/include/asm/Kbuild index fa07c686cbcc2..07cea6902f980 100644 --- a/arch/xtensa/include/asm/Kbuild +++ b/arch/xtensa/include/asm/Kbuild @@ -8,3 +8,4 @@ generic-y += parport.h generic-y += qrwlock.h generic-y += qspinlock.h generic-y += user.h +generic-y += asi.h diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h new file mode 100644 index 0000000000000..c4d9a5ff860a9 --- /dev/null +++ b/include/asm-generic/asi.h @@ -0,0 +1,5 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef __ASM_GENERIC_ASI_H +#define __ASM_GENERIC_ASI_H + +#endif From patchwork Fri Jul 12 17:00:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732013 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9D65A178364 for ; Fri, 12 Jul 2024 17:01:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803664; cv=none; b=V+Wex2kmMuC5rTv7Ko3rzJc/Uvsb6TLgvRk1niBxXSa4Yg6bT/OmdweNwQWrOQmrL7y6ss2VpNt+PQlL53VA7yO1VV8J6sCO0nULeyY6dsdNDyJVl/0LwWSNdcYWg0DUGr1EFWsbrrvc9K8GLhNISTmJSEb1z/0PxPztW29Tyww= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803664; c=relaxed/simple; bh=7hFccffoH6+2s6A+xgGH3BAwy9+MyU9gEGjy8kzQy10=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=JdNAbcFKgjIfFXScgpi6zH3xIVdzk6dxVMm1IFzTMp2acF9JQfm2ll+mm/r89pDReRnmq5/7p1PUUIVfpqryK8/9iNlJmgSS71P1qSYAGPtn8zm6ItIPoF20rESIy99Kpq3MG/DOVvnrEjXEsXC1oNR+EuqMYi23xJ5KMtsxeC8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=YIqDaR1J; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="YIqDaR1J" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-42725d3ae3eso15358285e9.3 for ; Fri, 12 Jul 2024 10:01:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803660; x=1721408460; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PMdbHr8f+bHJyqhYwYG2FeYYV8yyAj+rEymiJJJZ+c4=; b=YIqDaR1JDjsKswpltImZIoVteGIZ1DuELoFXymxz3GwQsTWq8t/aGvOAY+oqk649v5 JuIklSH3a89w4WMt+ssYOkcNpy0yVHPAwMnunkQlDRiOkEyI4g+UErcQPEM8if3nMn9k TmGGqtkQxJWeWTNR19PJwazHQc7tNY6MzsIgSbxUHNmsiPR+HTx1HPHUcZ+uV3J0iHyk rqztansgulYBMHm9SKr68rzgNl201QUVr9Q94MlwBq5HKlsBL1v4uvnSqABdTK4dTafo /JachCrFTb6Z+igR2s352y1dBdnwZ7XjnHn0hheGNPYClDc/+c7DO2IFbMMA6R1UE/Ir rvaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803660; x=1721408460; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PMdbHr8f+bHJyqhYwYG2FeYYV8yyAj+rEymiJJJZ+c4=; b=Cx4hUplLLmegLAnT2jEeh1jQ+Hf0rlY4anfSZNzL8V7eW6dJwB4zYQ/AUQWH4AbjCh 09hs6b+iuY/s0iwm85uBAIHMcEnVBD0XUlv7Em7mIW8m01yR6kXqLgZUNiKB3PGu+lDb e8htIKXwjuCE6JsZ04ZCc1EhCU3TahY7+rGJK87kKmt8j3gDmtqOvs5suWXZ26vxADzR dYqp7uzcUkHfbNkVLRTmm61OPGLTwUaHQUaK6dLaLxWOWjNvz6FCTWRmTHF8XaAwCn5I J84PGkEuU9OHJJPisQQs11x06vHUDzPl6q1JjeRb9MXdjHfUqk32nDUs9s2k277zpp3g imEQ== X-Forwarded-Encrypted: i=1; AJvYcCUupvb4ZuM72naKdYyiY1xviHURsmXzjmQtug8QNCa5ogwAGD3kb940l8Xj18inPbo+Hu2MLNIxVnIu/8CeiTyf/Cbj X-Gm-Message-State: AOJu0YxP8TPmOdmMeNkdLDN1WjZ/4xdXCP9Rbh9NCdnTRVX5RW78LEwD ErKTdhPvo/1ZKrbRkHQkasF20HF8ffZgiCm3EqXP2Rez1cN0RV9ipwShbx4+ycv9ztz5LS+k9Rs dCoegbNESpw== X-Google-Smtp-Source: AGHT+IEVF7omOk0dAyaMWn9nP5g1PTxgYRWlYkZl2KK6tCcUkjJHKogxau8pwI0HbmROqrqrZGZvZockxDjFbQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:1c13:b0:426:6668:d59 with SMTP id 5b1f17b1804b1-426707f91dbmr1903295e9.3.1720803659878; Fri, 12 Jul 2024 10:00:59 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:21 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-3-144b319a40d8@google.com> Subject: [PATCH 03/26] mm: asi: Introduce ASI core API From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid Introduce core API for Address Space Isolation (ASI). Kernel address space isolation provides the ability to run some kernel code with a restricted kernel address space. There can be multiple classes of such restricted kernel address spaces (e.g. KPTI, KVM-PTI etc.). Each ASI class is identified by an index. The ASI class can register some hooks to be called when entering/exiting the restricted address space. Currently, there is a fixed maximum number of ASI classes supported. In addition, each process can have at most one restricted address space from each ASI class. Neither of these are inherent limitations and are merely simplifying assumptions for the time being. (The high-level ASI API was derived from the original ASI RFC by Alexandre Chartre [0]). [0]: https://lore.kernel.org/kvm/1562855138-19507-1-git-send-email-alexandre.chartre@oracle.com Signed-off-by: Ofir Weisse Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 175 +++++++++++++++++++++++++++++ arch/x86/include/asm/processor.h | 8 ++ arch/x86/include/asm/tlbflush.h | 2 + arch/x86/mm/Makefile | 1 + arch/x86/mm/asi.c | 234 +++++++++++++++++++++++++++++++++++++++ arch/x86/mm/init.c | 3 +- arch/x86/mm/tlb.c | 2 +- include/asm-generic/asi.h | 50 +++++++++ include/linux/mm_types.h | 7 ++ kernel/fork.c | 3 + mm/init-mm.c | 4 + 11 files changed, 487 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h new file mode 100644 index 0000000000000..a052e561b2b70 --- /dev/null +++ b/arch/x86/include/asm/asi.h @@ -0,0 +1,175 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _ASM_X86_ASI_H +#define _ASM_X86_ASI_H + +#include + +#include +#include +#include +#include + +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +/* + * Overview of API usage by ASI clients: + * + * Setup: First call asi_init() to create a domain. At present only one domain + * can be created per mm per class, but it's safe to asi_init() this domain + * multiple times. For each asi_init() call you must call asi_destroy() AFTER + * you are certain all CPUs have exicted the restricted address space (by + * calling asi_exit()). + * + * Runtime usage: + * + * 1. Call asi_enter() to switch to the restricted address space. This can't be + * from an interrupt or exception handler and preemption must be disabled. + * + * 2. Execute untrusted code. + * + * 3. Call asi_relax() to inform the ASI subsystem that untrusted code execution + * is finished. This doesn't cause any address space change. + * + * 4. Either: + * + * a. Go back to 1. + * + * b. Call asi_exit() before returning to userspace. This immediately + * switches to the unrestricted address space. + * + * The region between 1 and 3 is called the "ASI critical section". During the + * critical section, it is a bug to access any sensitive data, and you mustn't + * sleep. + * + * The restriction on sleeping is not really a fundamental property of ASI. + * However for performance reasons it's important that the critical section is + * absolutely as short as possible. So the ability to do sleepy things like + * taking mutexes oughtn't to confer any convenience on API users. + * + * Similarly to the issue of sleeping, the need to asi_exit in case 4b is not a + * fundamental property of the system but a limitation of the current + * implementation. With further work it is possible to context switch + * from and/or to the restricted address space, and to return to userspace + * directly from the restricted address space, or _in_ it. + * + * Note that the critical section only refers to the direct execution path from + * asi_enter to asi_relax: it's fine to access sensitive data from exceptions + * and interrupt handlers that occur during that time. ASI will re-enter the + * restricted address space before returning from the outermost + * exception/interrupt. + * + * Note: ASI does not modify KPTI behaviour; when ASI and KPTI run together + * there are 2+N address spaces per task: the unrestricted kernel address space, + * the user address space, and one restricted (kernel) address space for each of + * the N ASI classes. + */ + +#define ASI_MAX_NUM_ORDER 2 +#define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) + +struct asi_hooks { + /* + * Both of these functions MUST be idempotent and re-entrant. They will + * be called in no particular order and with no particular symmetry wrt. + * the number of calls. They are part of the ASI critical section, so + * they must not sleep and must not access sensitive data. + */ + void (*post_asi_enter)(void); + void (*pre_asi_exit)(void); +}; + +/* + * An ASI class is a type of isolation that can be applied to a process. A + * process may have a domain for each class. + */ +struct asi_class { + struct asi_hooks ops; + const char *name; +}; + +/* + * An ASI domain (struct asi) represents a restricted address space. The + * unrestricted address space (and user address space under PTI) are not + * represented as a domain. + */ +struct asi { + pgd_t *pgd; + struct asi_class *class; + struct mm_struct *mm; + int64_t ref_count; +}; + +DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); + +void asi_init_mm_state(struct mm_struct *mm); + +int asi_register_class(const char *name, const struct asi_hooks *ops); +void asi_unregister_class(int index); + +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi); +void asi_destroy(struct asi *asi); + +/* Enter an ASI domain (restricted address space) and begin the critical section. */ +void asi_enter(struct asi *asi); + +/* + * Leave the "tense" state if we are in it, i.e. end the critical section. We + * will stay relaxed until the next asi_enter. + */ +void asi_relax(void); + +/* Immediately exit the restricted address space if in it */ +void asi_exit(void); + +/* The target is the domain we'll enter when returning to process context. */ +static __always_inline struct asi *asi_get_target(struct task_struct *p) +{ + return p->thread.asi_state.target; +} + +static __always_inline void asi_set_target(struct task_struct *p, + struct asi *target) +{ + p->thread.asi_state.target = target; +} + +static __always_inline struct asi *asi_get_current(void) +{ + return this_cpu_read(curr_asi); +} + +/* Are we currently in a restricted address space? */ +static __always_inline bool asi_is_restricted(void) +{ + return (bool)asi_get_current(); +} + +/* If we exit/have exited, can we stay that way until the next asi_enter? */ +static __always_inline bool asi_is_relaxed(void) +{ + return !asi_get_target(current); +} + +/* + * Is the current task in the critical section? + * + * This is just the inverse of !asi_is_relaxed(). We have both functions in order to + * help write intuitive client code. In particular, asi_is_tense returns false + * when ASI is disabled, which is judged to make user code more obvious. + */ +static __always_inline bool asi_is_tense(void) +{ + return !asi_is_relaxed(); +} + +static __always_inline pgd_t *asi_pgd(struct asi *asi) +{ + return asi ? asi->pgd : NULL; +} + +#define INIT_MM_ASI(init_mm) \ + .asi_init_lock = __MUTEX_INITIALIZER(init_mm.asi_init_lock), + +#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + +#endif diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index dc45d622eae4e..a42f03ff3edca 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -5,6 +5,7 @@ #include /* Forward declaration, a strange C thing */ +struct asi; struct task_struct; struct mm_struct; struct io_bitmap; @@ -489,6 +490,13 @@ struct thread_struct { struct thread_shstk shstk; #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + struct { + /* Domain to enter when returning to process context. */ + struct asi *target; + } asi_state; +#endif + /* Floating point and extended processor state */ struct fpu fpu; /* diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index 25726893c6f4d..ed847567b25de 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -391,6 +391,8 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd) } #define huge_pmd_needs_flush huge_pmd_needs_flush +unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam); + #ifdef CONFIG_ADDRESS_MASKING static inline u64 tlbstate_lam_cr3_mask(void) { diff --git a/arch/x86/mm/Makefile b/arch/x86/mm/Makefile index 428048e73bd2e..499233f001dc2 100644 --- a/arch/x86/mm/Makefile +++ b/arch/x86/mm/Makefile @@ -62,6 +62,7 @@ obj-$(CONFIG_NUMA_EMU) += numa_emulation.o obj-$(CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS) += pkeys.o obj-$(CONFIG_RANDOMIZE_MEMORY) += kaslr.o obj-$(CONFIG_MITIGATION_PAGE_TABLE_ISOLATION) += pti.o +obj-$(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION) += asi.o obj-$(CONFIG_X86_MEM_ENCRYPT) += mem_encrypt.o obj-$(CONFIG_AMD_MEM_ENCRYPT) += mem_encrypt_amd.o diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c new file mode 100644 index 0000000000000..c5979d78fdbbd --- /dev/null +++ b/arch/x86/mm/asi.c @@ -0,0 +1,234 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +#include +#include +#include + +static struct asi_class asi_class[ASI_MAX_NUM]; +static DEFINE_SPINLOCK(asi_class_lock); + +DEFINE_PER_CPU_ALIGNED(struct asi *, curr_asi); +EXPORT_SYMBOL(curr_asi); + +static inline bool asi_class_registered(int index) +{ + return asi_class[index].name != NULL; +} + +static inline bool asi_index_valid(int index) +{ + return index >= 0 && index < ARRAY_SIZE(asi_class); +} + +int asi_register_class(const char *name, const struct asi_hooks *ops) +{ + int i; + + VM_BUG_ON(name == NULL); + + spin_lock(&asi_class_lock); + + for (i = 0; i < ARRAY_SIZE(asi_class); i++) { + if (!asi_class_registered(i)) { + asi_class[i].name = name; + if (ops != NULL) + asi_class[i].ops = *ops; + break; + } + } + + spin_unlock(&asi_class_lock); + + if (i == ARRAY_SIZE(asi_class)) + i = -ENOSPC; + + return i; +} +EXPORT_SYMBOL_GPL(asi_register_class); + +void asi_unregister_class(int index) +{ + BUG_ON(!asi_index_valid(index)); + + spin_lock(&asi_class_lock); + + WARN_ON(asi_class[index].name == NULL); + memset(&asi_class[index], 0, sizeof(struct asi_class)); + + spin_unlock(&asi_class_lock); +} +EXPORT_SYMBOL_GPL(asi_unregister_class); + + +static void __asi_destroy(struct asi *asi) +{ + lockdep_assert_held(&asi->mm->asi_init_lock); + +} + +int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) +{ + struct asi *asi; + int err = 0; + + *out_asi = NULL; + + BUG_ON(!asi_index_valid(asi_index)); + + asi = &mm->asi[asi_index]; + + BUG_ON(!asi_class_registered(asi_index)); + + mutex_lock(&mm->asi_init_lock); + + if (asi->ref_count++ > 0) + goto exit_unlock; /* err is 0 */ + + BUG_ON(asi->pgd != NULL); + + /* + * For now, we allocate 2 pages to avoid any potential problems with + * KPTI code. This won't be needed once KPTI is folded into the ASI + * framework. + */ + asi->pgd = (pgd_t *)__get_free_pages( + GFP_KERNEL_ACCOUNT | __GFP_ZERO, PGD_ALLOCATION_ORDER); + if (!asi->pgd) { + err = -ENOMEM; + goto exit_unlock; + } + + asi->class = &asi_class[asi_index]; + asi->mm = mm; + +exit_unlock: + if (err) + __asi_destroy(asi); + else + *out_asi = asi; + + mutex_unlock(&mm->asi_init_lock); + + return err; +} +EXPORT_SYMBOL_GPL(asi_init); + +void asi_destroy(struct asi *asi) +{ + struct mm_struct *mm; + + if (!asi) + return; + + mm = asi->mm; + /* + * We would need this mutex even if the refcount was atomic as we need + * to block concurrent asi_init calls. + */ + mutex_lock(&mm->asi_init_lock); + WARN_ON_ONCE(asi->ref_count <= 0); + if (--(asi->ref_count) == 0) { + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + memset(asi, 0, sizeof(struct asi)); + } + mutex_unlock(&mm->asi_init_lock); +} +EXPORT_SYMBOL_GPL(asi_destroy); + +static noinstr void __asi_enter(void) +{ + u64 asi_cr3; + struct asi *target = asi_get_target(current); + + /* + * This is actually false restriction, it should be fine to be + * preemptible during the critical section. But we haven't tested it. We + * will also need to disable preemption during this function itself and + * perhaps elsewhere. This false restriction shouldn't create any + * additional burden for ASI clients anyway: the critical section has + * to be as short as possible to avoid unnecessary ASI transitions so + * disabling preemption should be fine. + */ + VM_BUG_ON(preemptible()); + + if (!target || target == this_cpu_read(curr_asi)) + return; + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == + LOADED_MM_SWITCHING); + + /* + * Must update curr_asi before writing CR3 to ensure an interrupting + * asi_exit sees that it may need to switch address spaces. + */ + this_cpu_write(curr_asi, target); + + asi_cr3 = build_cr3(target->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid), + tlbstate_lam_cr3_mask()); + write_cr3(asi_cr3); + + if (target->class->ops.post_asi_enter) + target->class->ops.post_asi_enter(); +} + +noinstr void asi_enter(struct asi *asi) +{ + VM_WARN_ON_ONCE(!asi); + + asi_set_target(current, asi); + barrier(); + + __asi_enter(); +} +EXPORT_SYMBOL_GPL(asi_enter); + +inline_or_noinstr void asi_relax(void) +{ + barrier(); + asi_set_target(current, NULL); +} +EXPORT_SYMBOL_GPL(asi_relax); + +noinstr void asi_exit(void) +{ + u64 unrestricted_cr3; + struct asi *asi; + + preempt_disable_notrace(); + + VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == + LOADED_MM_SWITCHING); + + asi = this_cpu_read(curr_asi); + if (asi) { + if (asi->class->ops.pre_asi_exit) + asi->class->ops.pre_asi_exit(); + + unrestricted_cr3 = + build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, + this_cpu_read(cpu_tlbstate.loaded_mm_asid), + tlbstate_lam_cr3_mask()); + + write_cr3(unrestricted_cr3); + /* + * Must not update curr_asi until after CR3 write, otherwise a + * re-entrant call might not enter this branch. (This means we + * might do unnecessary CR3 writes). + */ + this_cpu_write(curr_asi, NULL); + } + + preempt_enable_notrace(); +} +EXPORT_SYMBOL_GPL(asi_exit); + +void asi_init_mm_state(struct mm_struct *mm) +{ + memset(mm->asi, 0, sizeof(mm->asi)); + mutex_init(&mm->asi_init_lock); +} diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 679893ea5e687..5b06d30dee672 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -249,7 +249,8 @@ static void __init probe_page_size_mask(void) /* By the default is everything supported: */ __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI where the kernel is mostly non-Global: */ - if (cpu_feature_enabled(X86_FEATURE_PTI)) + if (cpu_feature_enabled(X86_FEATURE_PTI) || + IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 6ca18ac9058b6..9a5afeac96547 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -155,7 +155,7 @@ static inline u16 user_pcid(u16 asid) return ret; } -static inline_or_noinstr unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam) +inline_or_noinstr unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam) { unsigned long cr3 = __sme_pa_nodebug(pgd) | lam; diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index c4d9a5ff860a9..3660fc1defe87 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -2,4 +2,54 @@ #ifndef __ASM_GENERIC_ASI_H #define __ASM_GENERIC_ASI_H +#ifndef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +#define ASI_MAX_NUM_ORDER 0 +#define ASI_MAX_NUM 0 + +#ifndef _ASSEMBLY_ + +struct asi_hooks {}; +struct asi {}; + +static inline +int asi_register_class(const char *name, const struct asi_hooks *ops) +{ + return 0; +} + +static inline void asi_unregister_class(int asi_index) { } + +static inline void asi_init_mm_state(struct mm_struct *mm) { } + +static inline int asi_init(struct mm_struct *mm, int asi_index, + struct asi **asi_out) +{ + return 0; +} + +static inline void asi_destroy(struct asi *asi) { } + +static inline void asi_enter(struct asi *asi) { } + +static inline void asi_relax(void) { } + +static inline bool asi_is_relaxed(void) { return true; } + +static inline bool asi_is_tense(void) { return false; } + +static inline void asi_exit(void) { } + +static inline bool asi_is_restricted(void) { return false; } + +static inline struct asi *asi_get_current(void) { return NULL; } + +static inline struct asi *asi_get_target(struct task_struct *p) { return NULL; } + +static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } + +#endif /* !_ASSEMBLY_ */ + +#endif /* !CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + #endif diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 5240bd7bca338..226a586ebbdca 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -19,8 +19,10 @@ #include #include #include +#include #include +#include #ifndef AT_VECTOR_SIZE_ARCH #define AT_VECTOR_SIZE_ARCH 0 @@ -802,6 +804,11 @@ struct mm_struct { atomic_t membarrier_state; #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + struct asi asi[ASI_MAX_NUM]; + struct mutex asi_init_lock; +#endif + /** * @mm_users: The number of users including userspace. * diff --git a/kernel/fork.c b/kernel/fork.c index aebb3e6c96dc6..a6251d11106a6 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -109,6 +109,7 @@ #include #include #include +#include #include @@ -1292,6 +1293,8 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, mm->def_flags = 0; } + asi_init_mm_state(mm); + if (mm_alloc_pgd(mm)) goto fail_nopgd; diff --git a/mm/init-mm.c b/mm/init-mm.c index 24c8093792745..e820e1c6edd48 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -12,6 +12,7 @@ #include #include #include +#include #ifndef INIT_MM_CONTEXT #define INIT_MM_CONTEXT(name) @@ -44,6 +45,9 @@ struct mm_struct init_mm = { #endif .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + INIT_MM_ASI(init_mm) +#endif INIT_MM_CONTEXT(init_mm) }; From patchwork Fri Jul 12 17:00:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732014 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A896B178374 for ; Fri, 12 Jul 2024 17:01:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803665; cv=none; b=e5D5Gd4Sg2Ug1+KLIWSJeHu2gv2ZOt8DTepbWdte/zZ1l8UMpmzL5tk0rBjyJgDOKFUzx86KMZt4AuJ0H6cQjYisJKMpTZYkCVN9Qg1iia9+nvSiBCjzgblE3VqMRaRhBmrvCJ/40aMmu40rChSZzuWHTZcr0jNM/dAIGP/9nOY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803665; c=relaxed/simple; bh=WYamHMKq11ekPlGjt0Tp4cGSXpKCPA/6UkC+Za8VmrE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=pRI5C2aa1F3QNurvHlTUSBq+dKdeoaLR64Fg3/oNa7ahVMT8BHJP9R3Val1W1BnN0dcH/9nB/8siUUqaA/djqq3SJ5wspBflQO7N9ix6a0moxqMrP3WgG1mN8KfO/n4DWKjchL909I9V/mvoWENUUok7FQ2Q2QxV+Qoc3505RJ8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2nvjJHX1; arc=none smtp.client-ip=209.85.219.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2nvjJHX1" Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-e0360f8d773so3757915276.3 for ; Fri, 12 Jul 2024 10:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803663; x=1721408463; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uRdo734SZY7LgE59iZWS/AzLUQw9XRryPytUBqM/u/Y=; b=2nvjJHX1vBCEHiZ/eZE+RsLY4OryH28oLsB/kAMco1NvemfN6bh8Uj0v8HGGmZsRAI jgm7mhu2spmdoQVzNV3TQNKn4+g/ETQbEfDFgq0Zb96+2GmWA5Ezdjl+R0Wb0D/ky8RD 6FSOJXwTLx9ElIN9PeDs1onJZAmdtxH+7vyZ9hCyKV1E4EEDn+c0oX4O+l/wXXABnthu L4IyhCk+g+nMnTCq532FY5ehMSqRpcjnwCpyk/u+il1JZMGiOAOMoxJaW39sPAKlvXb8 2Nx6SXUtDlXvo5eBkefUXc4iZMz7X2+AcaILmiIwhPqUdHAKJu5FwSm/YFHeHA3vq+sW EpeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803663; x=1721408463; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uRdo734SZY7LgE59iZWS/AzLUQw9XRryPytUBqM/u/Y=; b=Tu9qxGJuJQckp6fcc3DOpbitVNiyh0yFjwvBdkTgXYbDEz+oRYSIYoEOJQFAr43dxv PtavUaUem+SSP+rqTdsMIG/+RUCzY7+pgdwtymNgABNZHCVpPcc0HMugrvjU6Kfx6mrJ BcJzDbDB1urXRyIGgBCNoZdVKrgJWMAnRk/047UUU1/aKfFPlt8iZjyp3IJXvKwY+xZ8 EDaTP/e6qsHFsqHTXYBgJwVtR/zb6+Nuhh2haEYf16d5uqr3soheiT0ThsDCyknUK2Zb gaQxfZTbfsSvlpaMnGvFK/qIagIPm0yIApMLURlAkanc+u0qE7hhcS2xraFGFJDIXpuK BAEw== X-Forwarded-Encrypted: i=1; AJvYcCUs/jt9lQH0NOUm7pBYsBoBd3Lncv3bSSMNICcRCWYpBzzvm6c4Nn64bNNk9BSC+1wlZTWTS/upgP3eXlLBDsFLXNEY X-Gm-Message-State: AOJu0Yz2fBUn7llXy13jcXQw2wmlk9kbYj6F99j3jI5l95KiInK14j2n ITGC0GsvzZqJD4hV0FeU6MgrMPaG2ZfxqPfLCbDpwlAhVCpnDYG5Grrkll/dF76VEmYOVkLrdM3 wwjBSolCrNA== X-Google-Smtp-Source: AGHT+IFYA55X1keLcQMRBGeCAFNmP+QStD5pTx8+8q+l5zrMjeIlti4CTq5gIWF/C14pV/xqXPndYQF6ybLHYg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:1b12:b0:e03:b3e8:f9a1 with SMTP id 3f1490d57ef6-e041b02fabamr791937276.2.1720803662807; Fri, 12 Jul 2024 10:01:02 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:22 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-4-144b319a40d8@google.com> Subject: [PATCH 04/26] objtool: let some noinstr functions make indirect calls From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman As described in the comment, some noinstr functions really need to make indirect calls. Those functions could be rewritten to use static calls, but that just shifts the "assume it's instrumented" to "assume the indirect call is fine" which seems like just moving the problem around. Instead here's a way to selectively mark functions that are known to be in the danger zone, and we'll just have to be careful with them. Signed-off-by: Brendan Jackman --- tools/objtool/check.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 0a33d9195b7a9..a760a858d8aa3 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -3425,6 +3425,17 @@ static bool pv_call_dest(struct objtool_file *file, struct instruction *insn) return file->pv_ops[idx].clean; } +static inline bool allow_noinstr_indirect_call(struct symbol *func) +{ + /* + * These functions are noinstr but make indirect calls. The programmer + * solemnly promises that the target functions are noinstr too, but they + * might be in modules so we can't prove it here. + */ + return (!strcmp(func->name, "asi_exit") || + !strcmp(func->name, "__asi_enter")); +} + static inline bool noinstr_call_dest(struct objtool_file *file, struct instruction *insn, struct symbol *func) @@ -3437,6 +3448,9 @@ static inline bool noinstr_call_dest(struct objtool_file *file, if (file->pv_ops) return pv_call_dest(file, insn); + if (allow_noinstr_indirect_call(insn->sym)) + return true; + return false; } From patchwork Fri Jul 12 17:00:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732015 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9342E178CE8 for ; Fri, 12 Jul 2024 17:01:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803669; cv=none; b=dk3QY0UNwWVKU2UA16mBF8Z96gXZiPjTYgayEwixig2CNBUa8fD8r/ala0FE+XGd6zzmaunIurQuzDxRfW4FLyj7JdAD1Y48lgq0n9RJWV62iwOTV86oAcTRKaxA29KHZojCHyYE/4LjmUvjPzf9Kbxf1o6kKBldo0SZduszhoY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803669; c=relaxed/simple; bh=4l/aFGSn0+cUe0grjkF9kj1YjmhkIf8eDVOhTK2vwB0=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cUHUDbY6NN8uVjvymX5bgKJlp6ADUzd9wMCf//FQOfGjDkJ4SvOQPlv5K46s7hfRHS1ENliRdSrCGhc+hz8EoxlaXWB0G0WCgYSgubb4VOPQxHHt+bl+T44C+2rA8uAc9kIXKYBRDLX+gHs/Ol+Z0PUns8yAOKtpKliCMglNg9E= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4Wjwa4mB; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4Wjwa4mB" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-36789def10eso1371706f8f.0 for ; Fri, 12 Jul 2024 10:01:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803666; x=1721408466; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A+jkQL2bLgnd1dA3asyFJ85yKqjjPIZgiPP4ZLm28ds=; b=4Wjwa4mBK3eZaHpx91r4Zb5MbwJMDC5svsczkvs185E9Lsy/6w6WqilsFw26SdWtDw M38ePalloGdDpKlufv4+9ZAEmpeE9/rJq3H4qS0ND3OrxqdyiFZzQnCDdW6PM0E5tKOh CqUmGPd/S5DjQAEP4ZYOpglDHmySAfOngWril1hO+8AJmUIrnGZKndidr55xGam3+LUC 3UaFU2BO8oFT3i29506YWadZ7BYVKbylBrkhH2KTDUjAdU1e0nW8AsoPSeGyALFTuXUG KbJjhpjORIKWAfaItRXP73NQToPJK6/wfsbpv67n+4pdO25q3YpXgc+hqiTCWLg+H7Y2 yFDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803666; x=1721408466; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A+jkQL2bLgnd1dA3asyFJ85yKqjjPIZgiPP4ZLm28ds=; b=Xakiy6DrUzqvyEhzlBZpFOAg+D4UUfpBnWmlbs2jiVuwBJaDeuvI68C4Tw4zqJj0GS uloFesRECOSOhD2yaZ9dUknm5IeB7YkLq5rNur+tj/63BgiPyZV0G0X0XPTb+XtnRfif vLk/T2CfvGr328L9Vie8EIIGtdi/V+OM9p5BOWUpNwrjTkQ9QnIQYUvrRkkvqUB+vkv1 oJrVQH4+BQhanoTqA8PPRDl1Rw389LcAuo1abLewI2/Wzd54YXo6bYHSPktCIvgQe9LW xNyowR4f1w8Fj1AOh1GhNlRC/cFLhMW596KP4Jy7T5ECLJ55BoEovtfAcEu370LZprbN JPQg== X-Forwarded-Encrypted: i=1; AJvYcCU2F412E8k6fyY6LVEec4ycHjn598wun2jjMg+IJkD02WyRPWwgx9lAJSH+Tay/+XfbWwWzoDXX0OvLb8ZARrBXyZVZ X-Gm-Message-State: AOJu0YxpZzB9gsX/CPRS0e4Bfa5RNQMKZuiWKcs+GhPsfzHVb455QmyX +A5tSKltI+kBGflgyler8F4XfOacVqa1YwBgJmDWbkqR7aG6pv/hzqkcCMI1bvRVBITes/K0/Sw 2INJ8JA88QA== X-Google-Smtp-Source: AGHT+IHocE6UCQJXGH4jW5S1f1i1qM0rmNw/QM4OyMtcRUe5WUau/3oluaHQD6zUlZ5KVAkubOggLQncuc63kw== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:adf:ea0c:0:b0:367:8147:25c5 with SMTP id ffacd0b85a97d-367cea8ef32mr16488f8f.8.1720803665595; Fri, 12 Jul 2024 10:01:05 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:23 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-5-144b319a40d8@google.com> Subject: [PATCH 05/26] mm: asi: Add infrastructure for boot-time enablement From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Add a boot time parameter to control the newly added X86_FEATURE_ASI. "asi=on" or "asi=off" can be used in the kernel command line to enable or disable ASI at boot time. If not specified, ASI enablement depends on CONFIG_ADDRESS_SPACE_ISOLATION_DEFAULT_ON, which is off by default. asi_check_boottime_disable() is modeled after pti_check_boottime_disable(). The boot parameter is currently ignored until ASI is fully functional. Once we have a set of ASI features checked in that we have actually tested, we will stop ignoring the flag. But for now let's just add the infrastructure so we can implement the usage code. Co-developed-by: Junaid Shahid Co-developed-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/Kconfig | 8 +++++ arch/x86/include/asm/asi.h | 20 +++++++++-- arch/x86/include/asm/cpufeatures.h | 1 + arch/x86/include/asm/disabled-features.h | 8 ++++- arch/x86/mm/asi.c | 61 +++++++++++++++++++++++++++----- arch/x86/mm/init.c | 4 ++- include/asm-generic/asi.h | 4 +++ 7 files changed, 92 insertions(+), 14 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index ff74aa53842e..7f21de55d6ac 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2535,6 +2535,14 @@ config MITIGATION_ADDRESS_SPACE_ISOLATION This dependencies will later be removed with extensions to the KASAN implementation. +config ADDRESS_SPACE_ISOLATION_DEFAULT_ON + bool "Enable address space isolation by default" + default n + depends on ADDRESS_SPACE_ISOLATION + help + If selected, ASI is enabled by default at boot if the asi=on or + asi=off are not specified. + config MITIGATION_RETPOLINE bool "Avoid speculative indirect branches in kernel" select OBJTOOL if HAVE_OBJTOOL diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index a052e561b2b7..04ba2ec7fd28 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -6,6 +6,7 @@ #include #include +#include #include #include @@ -64,6 +65,9 @@ * the N ASI classes. */ +/* Try to avoid this outside of hot code (see comment on _static_cpu_has). */ +#define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) + #define ASI_MAX_NUM_ORDER 2 #define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) @@ -101,6 +105,8 @@ struct asi { DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); +void asi_check_boottime_disable(void); + void asi_init_mm_state(struct mm_struct *mm); int asi_register_class(const char *name, const struct asi_hooks *ops); @@ -124,7 +130,9 @@ void asi_exit(void); /* The target is the domain we'll enter when returning to process context. */ static __always_inline struct asi *asi_get_target(struct task_struct *p) { - return p->thread.asi_state.target; + return static_asi_enabled() + ? p->thread.asi_state.target + : NULL; } static __always_inline void asi_set_target(struct task_struct *p, @@ -135,7 +143,9 @@ static __always_inline void asi_set_target(struct task_struct *p, static __always_inline struct asi *asi_get_current(void) { - return this_cpu_read(curr_asi); + return static_asi_enabled() + ? this_cpu_read(curr_asi) + : NULL; } /* Are we currently in a restricted address space? */ @@ -144,7 +154,11 @@ static __always_inline bool asi_is_restricted(void) return (bool)asi_get_current(); } -/* If we exit/have exited, can we stay that way until the next asi_enter? */ +/* + * If we exit/have exited, can we stay that way until the next asi_enter? + * + * When ASI is disabled, this returns true. + */ static __always_inline bool asi_is_relaxed(void) { return !asi_get_target(current); diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 3c7434329661..a6b213c7df44 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -470,6 +470,7 @@ #define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */ #define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */ #define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */ +#define X86_FEATURE_ASI (21*32+5) /* Kernel Address Space Isolation */ /* * BUG word(s) diff --git a/arch/x86/include/asm/disabled-features.h b/arch/x86/include/asm/disabled-features.h index c492bdc97b05..c7964ed4fef8 100644 --- a/arch/x86/include/asm/disabled-features.h +++ b/arch/x86/include/asm/disabled-features.h @@ -50,6 +50,12 @@ # define DISABLE_PTI (1 << (X86_FEATURE_PTI & 31)) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +# define DISABLE_ASI 0 +#else +# define DISABLE_ASI (1 << (X86_FEATURE_ASI & 31)) +#endif + #ifdef CONFIG_MITIGATION_RETPOLINE # define DISABLE_RETPOLINE 0 #else @@ -154,7 +160,7 @@ #define DISABLED_MASK17 0 #define DISABLED_MASK18 (DISABLE_IBT) #define DISABLED_MASK19 (DISABLE_SEV_SNP) -#define DISABLED_MASK20 0 +#define DISABLED_MASK20 (DISABLE_ASI) #define DISABLED_MASK21 0 #define DISABLED_MASK_CHECK BUILD_BUG_ON_ZERO(NCAPINTS != 22) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index c5979d78fdbb..21207a3e8b17 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -4,7 +4,9 @@ #include #include +#include #include +#include #include #include @@ -28,6 +30,9 @@ int asi_register_class(const char *name, const struct asi_hooks *ops) { int i; + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + VM_BUG_ON(name == NULL); spin_lock(&asi_class_lock); @@ -52,6 +57,9 @@ EXPORT_SYMBOL_GPL(asi_register_class); void asi_unregister_class(int index) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + BUG_ON(!asi_index_valid(index)); spin_lock(&asi_class_lock); @@ -63,11 +71,36 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +void __init asi_check_boottime_disable(void) +{ + bool enabled = IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION_DEFAULT_ON); + char arg[4]; + int ret; + + ret = cmdline_find_option(boot_command_line, "asi", arg, sizeof(arg)); + if (ret == 3 && !strncmp(arg, "off", 3)) { + enabled = false; + pr_info("ASI disabled through kernel command line.\n"); + } else if (ret == 2 && !strncmp(arg, "on", 2)) { + enabled = true; + pr_info("Ignoring asi=on param while ASI implementation is incomplete.\n"); + } else { + pr_info("ASI %s by default.\n", + enabled ? "enabled" : "disabled"); + } + + if (enabled) + pr_info("ASI enablement ignored due to incomplete implementation.\n"); +} static void __asi_destroy(struct asi *asi) { - lockdep_assert_held(&asi->mm->asi_init_lock); + WARN_ON_ONCE(asi->ref_count <= 0); + if (--(asi->ref_count) > 0) + return; + free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); + memset(asi, 0, sizeof(struct asi)); } int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) @@ -77,6 +110,9 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) *out_asi = NULL; + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + BUG_ON(!asi_index_valid(asi_index)); asi = &mm->asi[asi_index]; @@ -121,7 +157,7 @@ void asi_destroy(struct asi *asi) { struct mm_struct *mm; - if (!asi) + if (!boot_cpu_has(X86_FEATURE_ASI) || !asi) return; mm = asi->mm; @@ -130,11 +166,7 @@ void asi_destroy(struct asi *asi) * to block concurrent asi_init calls. */ mutex_lock(&mm->asi_init_lock); - WARN_ON_ONCE(asi->ref_count <= 0); - if (--(asi->ref_count) == 0) { - free_pages((ulong)asi->pgd, PGD_ALLOCATION_ORDER); - memset(asi, 0, sizeof(struct asi)); - } + __asi_destroy(asi); mutex_unlock(&mm->asi_init_lock); } EXPORT_SYMBOL_GPL(asi_destroy); @@ -178,6 +210,9 @@ static noinstr void __asi_enter(void) noinstr void asi_enter(struct asi *asi) { + if (!static_asi_enabled()) + return; + VM_WARN_ON_ONCE(!asi); asi_set_target(current, asi); @@ -189,8 +224,10 @@ EXPORT_SYMBOL_GPL(asi_enter); inline_or_noinstr void asi_relax(void) { - barrier(); - asi_set_target(current, NULL); + if (static_asi_enabled()) { + barrier(); + asi_set_target(current, NULL); + } } EXPORT_SYMBOL_GPL(asi_relax); @@ -199,6 +236,9 @@ noinstr void asi_exit(void) u64 unrestricted_cr3; struct asi *asi; + if (!static_asi_enabled()) + return; + preempt_disable_notrace(); VM_BUG_ON(this_cpu_read(cpu_tlbstate.loaded_mm) == @@ -229,6 +269,9 @@ EXPORT_SYMBOL_GPL(asi_exit); void asi_init_mm_state(struct mm_struct *mm) { + if (!boot_cpu_has(X86_FEATURE_ASI)) + return; + memset(mm->asi, 0, sizeof(mm->asi)); mutex_init(&mm->asi_init_lock); } diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 5b06d30dee67..e2a29f6779d9 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -27,6 +27,7 @@ #include #include #include +#include /* * We need to define the tracepoints somewhere, and tlb.c @@ -250,7 +251,7 @@ static void __init probe_page_size_mask(void) __default_kernel_pte_mask = __supported_pte_mask; /* Except when with PTI where the kernel is mostly non-Global: */ if (cpu_feature_enabled(X86_FEATURE_PTI) || - IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION)) + cpu_feature_enabled(X86_FEATURE_ASI)) __default_kernel_pte_mask &= ~_PAGE_GLOBAL; /* Enable 1 GB linear kernel mappings if available: */ @@ -757,6 +758,7 @@ void __init init_mem_mapping(void) unsigned long end; pti_check_boottime_disable(); + asi_check_boottime_disable(); probe_page_size_mask(); setup_pcid(); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 3660fc1defe8..d0a451f9d0b7 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -48,6 +48,10 @@ static inline struct asi *asi_get_target(struct task_struct *p) { return NULL; } static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } +#define static_asi_enabled() false + +static inline void asi_check_boottime_disable(void) { } + #endif /* !_ASSEMBLY_ */ #endif /* !CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ From patchwork Fri Jul 12 17:00:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732016 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 55C38179202 for ; Fri, 12 Jul 2024 17:01:10 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803672; cv=none; b=PlcVbFfHeLenFyEN51YT+E0c1y//8qOG4PRbknyXveZ/i9hG7Dpz1m2n/mDr161AQHMvaoKd2CqGPfe/K8B5uyNxCVFQEiKuNCaIwCfL5UA6aHXTuaRhw6fu6bRqme99pk9HqJVjJ3NMCrwco2rCf18yLO+MEjfD+ms2PCjlNuM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803672; c=relaxed/simple; bh=kjjyiN2vmyqipSL6Mp7ZmPAujjs/LXDiL0u5FOb6NbA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MiR+h0a3mf0aSJ09CeHtowyXzlQX6CTMXzQw8AxIZg/gG6/GwnSUF7lkSx+ayBY4eZ18whadsB3uMK0WHttjYoQxPkilrh6ccx7wmahz/G9ypgD1pIfg9z6QA7zglFNtcrMF/wm54R78iqA8x2Aw4dPMra1/5SuzNO7gpk7DM0M= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xzhvd/bo; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xzhvd/bo" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-36789def10eso1371741f8f.0 for ; Fri, 12 Jul 2024 10:01:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803669; x=1721408469; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qrAVAbVK+nVNUznV+52/ScTYsIGKlGAcY8a4rOPWm6Y=; b=xzhvd/boGPuWiwVPlHlbaMKaJM+AcoIUt/dJhSpg3n4Qal2SSHlINSuEGDAw6oE0a0 HJNPTC9Dzg8EIjOHq3rLhxHEh0r9ok1V6c0qhQnpMHCyE5G+9kQ0r1ilEn+JNOwb5f7v F1ksCBXWr+ALD47Kp2iOSOgHN6dewscbt9ucgMENxj6J6nhllqhxgb79rJJLSCbB/7Vc Z4qWV0ltrx068R9jUxszCttnaKiKSIMBxp9fZaKMx+mbyXtbrFjlyJ20n6uii8xQoMi6 2cbxbLHEddGDqE2IZmD2uybIc5E3DR7FCwJ8uWoTyoLvZv/YQZr5zlntgRkHcDgNZgyg mhnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803669; x=1721408469; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qrAVAbVK+nVNUznV+52/ScTYsIGKlGAcY8a4rOPWm6Y=; b=PLMbGOuUuXTfEYz9cfHTKtv/akQJo7dzn6qLnivlJjidYJFsc8j/zz7/qPgI0Ro4yt K1YdSMhiYLVDIvvj8Lgzq0BWdzJsfxw6rMNKnzka4vKHaCKPSDVjRjt2O64LA6olMdc6 lotd5YyUygQUql6idUn23XjtbzRi/EfgE5LNO7bLwV6OatTvnrtcsdSKANduieZU+WXr fmsehTAeFCevsOhuY2rGPXpT4nPpwZOMLGUl691hgYrB7RN18J7aESl9v77NdCXZT3rr IqIVGSdvoPgOLKXX4V4BdbAvKtjjVz9SXM/Bqtk82vHxsFWkRDsegU3INRX8eYozAMr5 A1yg== X-Forwarded-Encrypted: i=1; AJvYcCXi2o3YjnNzl6vhvjrxO6jUQ3mXPVSKD9X5eeL8Qs1ZJyVn9WRQNChWomwxEkcxv6PQSm1RA33bBuAHbVjhw7RLmGD7 X-Gm-Message-State: AOJu0YwcNP5TV+EhrZpCPCedZojI+ipHX0P6ZSj2sxr4NZs4SHaeYVPy POXO8uqLIilcyZ3AYuC2pM1M+BJk0QwJbwls8fZdcGUrmmoEdbU0vKRIZ/q32EYU35eSJbOPCe+ Er/6K4jmuOg== X-Google-Smtp-Source: AGHT+IF3kvDD67OIZ2BMm13dMbd7zSJf6yY9zVib7vZL6oHM8iz/nEJhVOpklDYifHh85TWN8ozTDRdKs7e20g== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:187:b0:367:60dc:9ec4 with SMTP id ffacd0b85a97d-367cea68085mr18401f8f.6.1720803668718; Fri, 12 Jul 2024 10:01:08 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:24 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-6-144b319a40d8@google.com> Subject: [PATCH 06/26] mm: asi: ASI support in interrupts/exceptions From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Add support for potentially switching address spaces from within interrupts/exceptions/NMIs etc. An interrupt does not automatically switch to the unrestricted address space. It can switch if needed to access some memory not available in the restricted address space, using the normal asi_exit call. On return from the outermost interrupt, if the target address space was the restricted address space (e.g. we were in the critical code path between ASI Enter and VM Enter), the restricted address space will be automatically restored. Otherwise, execution will continue in the unrestricted address space until the next explicit ASI Enter. In order to keep track of when to restore the restricted address space, an interrupt/exception nesting depth counter is maintained per-task. An alternative implementation without needing this counter is also possible, but the counter unlocks an additional nice-to-have benefit by allowing detection of whether or not we are currently executing inside an exception context, which would be useful in a later patch. Note that for KVM on SVM, this is not actually necessary as NMIs are in fact maskable via CLGI. It's not clear to me if VMX has something equivalent but we will need this infrastructure in place for userspace support anyway. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 68 ++++++++++++++++++++++++++++++++++++++-- arch/x86/include/asm/idtentry.h | 50 ++++++++++++++++++++++++----- arch/x86/include/asm/processor.h | 5 +++ arch/x86/kernel/process.c | 2 ++ arch/x86/kernel/traps.c | 22 +++++++++++++ arch/x86/mm/asi.c | 5 ++- include/asm-generic/asi.h | 10 ++++++ 7 files changed, 151 insertions(+), 11 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 04ba2ec7fd28..df34a8c0560b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -127,6 +127,11 @@ void asi_relax(void); /* Immediately exit the restricted address space if in it */ void asi_exit(void); +static inline void asi_init_thread_state(struct thread_struct *thread) +{ + thread->asi_state.intr_nest_depth = 0; +} + /* The target is the domain we'll enter when returning to process context. */ static __always_inline struct asi *asi_get_target(struct task_struct *p) { @@ -167,9 +172,10 @@ static __always_inline bool asi_is_relaxed(void) /* * Is the current task in the critical section? * - * This is just the inverse of !asi_is_relaxed(). We have both functions in order to - * help write intuitive client code. In particular, asi_is_tense returns false - * when ASI is disabled, which is judged to make user code more obvious. + * This is just the inverse of !asi_is_relaxed(). We have both functions in + * order to help write intuitive client code. In particular, asi_is_tense + * returns false when ASI is disabled, which is judged to make user code more + * obvious. */ static __always_inline bool asi_is_tense(void) { @@ -181,6 +187,62 @@ static __always_inline pgd_t *asi_pgd(struct asi *asi) return asi ? asi->pgd : NULL; } +static __always_inline void asi_intr_enter(void) +{ + if (static_asi_enabled() && asi_is_tense()) { + current->thread.asi_state.intr_nest_depth++; + barrier(); + } +} + +void __asi_enter(void); + +static __always_inline void asi_intr_exit(void) +{ + if (static_asi_enabled() && asi_is_tense()) { + /* + * If an access to sensitive memory got reordered after the + * decrement, the #PF handler for that access would see a value + * of 0 for the counter and re-__asi_enter before returning to + * the faulting access, triggering an infinite PF loop. + */ + barrier(); + + if (--current->thread.asi_state.intr_nest_depth == 0) { + /* + * If the decrement got reordered after __asi_enter, an + * interrupt that came between __asi_enter and the + * decrement would always see a nonzero value for the + * counter so it wouldn't call __asi_enter again and we + * would return to process context in the wrong address + * space. + */ + barrier(); + __asi_enter(); + } + } +} + +/* + * Returns the nesting depth of interrupts/exceptions that have interrupted the + * ongoing critical section. If the current task is not in a critical section + * this is 0. + */ +static __always_inline int asi_intr_nest_depth(void) +{ + return current->thread.asi_state.intr_nest_depth; +} + +/* + * Remember that interrupts/exception don't count as the critical section. If + * you want to know if the current task is in the critical section use + * asi_is_tense(). + */ +static __always_inline bool asi_in_critical_section(void) +{ + return asi_is_tense() && !asi_intr_nest_depth(); +} + #define INIT_MM_ASI(init_mm) \ .asi_init_lock = __MUTEX_INITIALIZER(init_mm.asi_init_lock), diff --git a/arch/x86/include/asm/idtentry.h b/arch/x86/include/asm/idtentry.h index 749c7411d2f1..446aed5ebe18 100644 --- a/arch/x86/include/asm/idtentry.h +++ b/arch/x86/include/asm/idtentry.h @@ -12,6 +12,7 @@ #include #include +#include typedef void (*idtentry_t)(struct pt_regs *regs); @@ -55,12 +56,15 @@ static __always_inline void __##func(struct pt_regs *regs); \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ __##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static __always_inline void __##func(struct pt_regs *regs) @@ -102,12 +106,15 @@ static __always_inline void __##func(struct pt_regs *regs, \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ __##func (regs, error_code); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static __always_inline void __##func(struct pt_regs *regs, \ @@ -139,7 +146,16 @@ static __always_inline void __##func(struct pt_regs *regs, \ * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW(func) \ -__visible noinstr void func(struct pt_regs *regs) +static __always_inline void __##func(struct pt_regs *regs); \ + \ +__visible noinstr void func(struct pt_regs *regs) \ +{ \ + asi_intr_enter(); \ + __##func (regs); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs) /** * DEFINE_FREDENTRY_RAW - Emit code for raw FRED entry points @@ -178,7 +194,18 @@ noinstr void fred_##func(struct pt_regs *regs) * is required before the enter/exit() helpers are invoked. */ #define DEFINE_IDTENTRY_RAW_ERRORCODE(func) \ -__visible noinstr void func(struct pt_regs *regs, unsigned long error_code) +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code); \ + \ +__visible noinstr void func(struct pt_regs *regs, unsigned long error_code)\ +{ \ + asi_intr_enter(); \ + __##func (regs, error_code); \ + asi_intr_exit(); \ +} \ + \ +static __always_inline void __##func(struct pt_regs *regs, \ + unsigned long error_code) /** * DECLARE_IDTENTRY_IRQ - Declare functions for device interrupt IDT entry @@ -209,14 +236,17 @@ static void __##func(struct pt_regs *regs, u32 vector); \ __visible noinstr void func(struct pt_regs *regs, \ unsigned long error_code) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ u32 vector = (u32)(u8)error_code; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ kvm_set_cpu_l1tf_flush_l1d(); \ run_irq_on_irqstack_cond(__##func, regs, vector); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ static noinline void __##func(struct pt_regs *regs, u32 vector) @@ -256,12 +286,15 @@ static __always_inline void instr_##func(struct pt_regs *regs) \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ instr_##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ void fred_##func(struct pt_regs *regs) \ @@ -295,12 +328,15 @@ static __always_inline void instr_##func(struct pt_regs *regs) \ \ __visible noinstr void func(struct pt_regs *regs) \ { \ - irqentry_state_t state = irqentry_enter(regs); \ + irqentry_state_t state; \ \ + asi_intr_enter(); \ + state = irqentry_enter(regs); \ instrumentation_begin(); \ instr_##func (regs); \ instrumentation_end(); \ irqentry_exit(regs, state); \ + asi_intr_exit(); \ } \ \ void fred_##func(struct pt_regs *regs) \ diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h index a42f03ff3edc..5b10b3c09b6a 100644 --- a/arch/x86/include/asm/processor.h +++ b/arch/x86/include/asm/processor.h @@ -494,6 +494,11 @@ struct thread_struct { struct { /* Domain to enter when returning to process context. */ struct asi *target; + /* + * The depth of interrupt/exceptions interrupting an ASI + * critical section + */ + int intr_nest_depth; } asi_state; #endif diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index b8441147eb5e..ca2391079e59 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -96,6 +96,8 @@ int arch_dup_task_struct(struct task_struct *dst, struct task_struct *src) #ifdef CONFIG_VM86 dst->thread.vm86 = NULL; #endif + asi_init_thread_state(&dst->thread); + /* Drop the copied pointer to current's fpstate */ dst->thread.fpu.fpstate = NULL; diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index 4fa0b17e5043..ca0d0b9fe955 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -64,6 +64,7 @@ #include #include #include +#include #include #include #include @@ -414,6 +415,27 @@ DEFINE_IDTENTRY_DF(exc_double_fault) } #endif + /* + * Do an asi_exit() only here because a #DF usually indicates + * the system is in a really bad state, and we don't want to + * cause any additional issue that would prevent us from + * printing a correct stack trace. + * + * The additional issues are not related to a possible triple + * fault, which can only occurs if a fault is encountered while + * invoking this handler, but here we are already executing it. + * Instead, an ASI-induced #PF here could potentially end up + * getting another #DF. For example, if there was some issue in + * invoking the #PF handler. The handler for the second #DF + * could then again cause an ASI-induced #PF leading back to the + * same recursion. + * + * This is not needed in the espfix64 case above, since that + * code is about turning a #DF into a #GP which is okay to + * handle in the restricted domain. That's also why we don't + * asi_exit() in the #GP handler. + */ + asi_exit(); irqentry_nmi_enter(regs); instrumentation_begin(); notify_die(DIE_TRAP, str, regs, error_code, X86_TRAP_DF, SIGSEGV); diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 21207a3e8b17..2cd8e93a4415 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -171,7 +171,7 @@ void asi_destroy(struct asi *asi) } EXPORT_SYMBOL_GPL(asi_destroy); -static noinstr void __asi_enter(void) +noinstr void __asi_enter(void) { u64 asi_cr3; struct asi *target = asi_get_target(current); @@ -186,6 +186,7 @@ static noinstr void __asi_enter(void) * disabling preemption should be fine. */ VM_BUG_ON(preemptible()); + VM_BUG_ON(current->thread.asi_state.intr_nest_depth != 0); if (!target || target == this_cpu_read(curr_asi)) return; @@ -246,6 +247,8 @@ noinstr void asi_exit(void) asi = this_cpu_read(curr_asi); if (asi) { + WARN_ON_ONCE(asi_in_critical_section()); + if (asi->class->ops.pre_asi_exit) asi->class->ops.pre_asi_exit(); diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index d0a451f9d0b7..fa0bbf899a09 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -38,6 +38,8 @@ static inline bool asi_is_relaxed(void) { return true; } static inline bool asi_is_tense(void) { return false; } +static inline bool asi_in_critical_section(void) { return false; } + static inline void asi_exit(void) { } static inline bool asi_is_restricted(void) { return false; } @@ -48,6 +50,14 @@ static inline struct asi *asi_get_target(struct task_struct *p) { return NULL; } static inline pgd_t *asi_pgd(struct asi *asi) { return NULL; } +static inline void asi_init_thread_state(struct thread_struct *thread) { } + +static inline void asi_intr_enter(void) { } + +static inline int asi_intr_nest_depth(void) { return 0; } + +static inline void asi_intr_exit(void) { } + #define static_asi_enabled() false static inline void asi_check_boottime_disable(void) { } From patchwork Fri Jul 12 17:00:25 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732017 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3D8C017997D for ; Fri, 12 Jul 2024 17:01:13 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803674; cv=none; b=N9+OlUtn3xZFUngt1gf3QMYnvukZl1mfTwmOraNzjzIReiUuiX//pFiSgZdse6iBQrvQ/3PDkliFKDbqrjleEwxL8dTsrFeribL5dCTz3HlLU8r2H3n4RRl2ULRtK4zhiE1lQk5c7h0ba1boBGAeEd1a5AYyLU1DqVndZ/ViWo4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803674; c=relaxed/simple; bh=/oJj5vwlZeyskk+GmP45ROP1br4fXDWKMm3Wo5pDy64=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bmlQEoi8RvCKCdo0SgHfJrmQDB0kaeMAdTsDEMjotK6uEvwoGLxF9lDDRI08iUmD0cW/JmrTWfHoQBS50jy4JyK8GBeldl+avV+vYOYmX2B2nKpP/wusBISiJsj4LNc67X7tiOu2ADJyQA3ElFrFCSG2jkYXXgiqEHkKoD72qLc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=4FZd6+vY; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="4FZd6+vY" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-367987e6ebcso1585640f8f.2 for ; Fri, 12 Jul 2024 10:01:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803672; x=1721408472; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=HhvK+VsrCDIF0zdCgKI0fd0HHuRdBdA78SeRsMH4q04=; b=4FZd6+vYdHNetqPYBVq0OPYSqiH0u4eFWuGqvPgUwr5y2oHbgd1OBwSI0/EGJlafdI jT4+AH/lgQe0XE0MoKDXpEx5bEgf6qfZWjvZIhFHsK6qy3PkvPi8vDUP2QPZBkDIqh6r uWY1eumCkH+MeDnNc1KjL7zLetrKj7s/YUfHm8NYLPNJq5UVqs4BOVL5oZtyVJ78hcAU J5GqG0/OtR8iAj8fesdhbofxo0Wez/hIJpHboOOUqXIXweYtkLbICeOzvZ5vPSFqRnn4 qY0US9FYSSv2Vf106dkH4M4q2wkgIvzeCXa78ol04YEGW4n4gcv/yI99+uaLlIbQs27p dELQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803672; x=1721408472; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=HhvK+VsrCDIF0zdCgKI0fd0HHuRdBdA78SeRsMH4q04=; b=sBo5E5IZOarL5oyDYwA41mL7jk4bK5TLHTeyjqSPybVfgTgKmdSz7JzEN6GE99Th4f WnGEhOYn+eHdrkVjx3R7f8+LjThZXvfRB9+AZ13rg2tUFnpf2ihTsLW6gAHGb+TtcLhD aR1+horNTuw3C0jeyhP96uqnWvu4BNzowvr5hA4SkNQpwb7YenNMv5BCZucCol/1hb97 0OUSVLb4DOwe6yEiv/FnnzaItpSEjEF8BA9soDFu2mUjMjNnDqufhlL9G+QNmWruBcjS 1XSKA/BJU9YSpJAuvpES6rm0Ow+YjbqLsLFNqyMWouAiVQOkWNzZonFToMdRvbl4s+Dx WSAA== X-Forwarded-Encrypted: i=1; AJvYcCVJnOZ4nJNuK9jGqixjqhyah/Vp4Uu1TV0uIFaN99ihYaCujfNPQlk5HGos4m6yoyKsH3E1KlQzkNWICcb2sqmkk+ej X-Gm-Message-State: AOJu0YzYyQlPhbxEvtHmDabHubtD6mM4AtQSqsW3WHlIo15eofETGnxb RthmS+1f1BDrWsAwE4awRFtRyqlhamLUu+xeBjoOuZoZ9hDB+3BhUITkNLoFAlxHnAIXAy80g24 6RxuClPIfaA== X-Google-Smtp-Source: AGHT+IFAPE6Abc5gQhah4PjzgMavZ5HwflkhPdDu4LbVKDPqBKd9rbMZcjfutEvjASrTYbwEe+XhPDdeyFCaUQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:adf:e988:0:b0:367:9db7:d6d4 with SMTP id ffacd0b85a97d-367ceac39c9mr19900f8f.9.1720803671416; Fri, 12 Jul 2024 10:01:11 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:25 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-7-144b319a40d8@google.com> Subject: [PATCH 07/26] mm: asi: Switch to unrestricted address space before a context switch From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid To keep things simpler for the time being, we disallow context switches within the restricted address space. In the future, we would be able to relax this limitation for the case of context switches to different threads within the same process (or to the idle thread and back). Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- kernel/sched/core.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 7019a40457a6..e65ac22e5a28 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -77,6 +77,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -5353,6 +5354,8 @@ static __always_inline struct rq * context_switch(struct rq *rq, struct task_struct *prev, struct task_struct *next, struct rq_flags *rf) { + asi_exit(); + prepare_task_switch(rq, prev, next); /* From patchwork Fri Jul 12 17:00:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732018 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C1026176AD9 for ; Fri, 12 Jul 2024 17:01:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803677; cv=none; b=fD+BTBmfakk+n0mV+zyU6MUMGeu9THpm6gvPQIhntRoNeFRm5YnRGamHoul36OUAU9XLpzqu6XgyYVWstAvfoA7u0oA2e5g9si9QbV9bHcYrQNSPyd6spy1t/QoSwsTsJXM7SfrCZ2i1MW52ghX+abJUI4O25gm8zOPx6ZBGPj0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803677; c=relaxed/simple; bh=ysS/sHsfGcOywr5ecIDWgWosegBwRS/ro8phsMXIPz4=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tiZcKxcmVAjFRWHT52DSjVONReum1VwbSNV9z/xwXzZwXalp981/OY+IpNSs+iycZTr32U93vR50Axd51QulHT7goaViDJHdI1iy+fgYiDemeHXHPjQDL+mx7LYiIuaIulmMQlopOKlP1LIR4XOX9pgjLgPE0VoLNIMMvlGY3yo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=pk/WG/fz; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="pk/WG/fz" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-654c14abdcfso32844207b3.1 for ; Fri, 12 Jul 2024 10:01:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803675; x=1721408475; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IU+0QFZ2g2My69wAuJPkncJIr3fDXfYxT6Ca6fNVeD8=; b=pk/WG/fz6s6/h2pPP3hkwdrfCZOpJr0JqEWwtHGKTm6ffeauyv1fxP7yiu80GgzWYH Dlg1qxyrCru9YHA2HRYfnuSATsDsIr6qYPzf7CBXsAdPfHXOpTk++C4Ay5lfcYQqmRXC jhgNUS0pWhh9/BMP8HOJEASo+bQRFHbR81Afjdy77g0YLD8v9KYDAuotCAgqVS1sDvrh QMHIhLfP6BSoRrmy8z141exdP3BZ44P/4WD1Q0Gd6shYlDKdoSvhXE60ZOAlvjlG7cAM T+9fWdTxOQkLL/ak3CsoJZrKFvPTlRoe84UsWBZUJyG2EfHQfGNKRtLwAU0XpY7xhHP7 6kZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803675; x=1721408475; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IU+0QFZ2g2My69wAuJPkncJIr3fDXfYxT6Ca6fNVeD8=; b=vOIHjHStdmTzWI+iKjgU682sQSZuYnODhghL6qoGS7FA6kiUikVShRNAmTRO8FPFzL 3wgb5N4x60mVjXwcJxI6UYYiZWTX2NUcWy7Ksw816/jwMa1RwCRM4aguw/24oIU7XEYM QHfZC3vE5cUl+i6/WFqosyRyreBB2h8pkt29eCI6ymaPbgBYa+hBBDBNmIsmexJzzBKE 9TxFbMByHqSiuCmkvB7Y1lZDOsf0Rgef5bgIjHJRJvo8mWkMjRlMCdRh0BjbMXSQ4eAi XeCriFWRAp2+gvr7IoA+SA9ydmvrw2yJowate+QUULztQkw9dM/+xmKLp1jRI+8XNT46 xbzw== X-Forwarded-Encrypted: i=1; AJvYcCX1b1gdtBESYXVNBrLLqOTg/1qR2ajbXCkoOprvmRtx6vj4I+MzOZTraI5IbmGpgoJAnSOhdL5z2QBaCG+w/5TwOwI+ X-Gm-Message-State: AOJu0YxNNwnm4KunFqonCiRPe/0lRqT+il1WkNVDi2Qo4I9bLCrKbStD zV4OlVDt146tmy3yohEzKnPoBsqnbnPnlWNnJsU1kO1AFM7w/s8tRKJnraz3+2LKYVqPpjQomBn YiAgsjzwkqQ== X-Google-Smtp-Source: AGHT+IH090AhnVnRaBqnnakPZ4/V5j0nTOpLdhxmJg4znTZivgjhg0KK45kar3tFbJqN9PhohJA5PmDXL/BkUA== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a25:945:0:b0:e05:7113:920c with SMTP id 3f1490d57ef6-e058a6cfb8dmr11142276.6.1720803674689; Fri, 12 Jul 2024 10:01:14 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:26 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-8-144b319a40d8@google.com> Subject: [PATCH 08/26] mm: asi: Use separate PCIDs for restricted address spaces From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid Each restricted address space is assigned a separate PCID. Since currently only one ASI instance per-class exists for a given process, the PCID is just derived from the class index. This commit only sets the appropriate PCID when switching CR3, but does not actually use the NOFLUSH bit. That will be done by later patches. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 10 +++++++++- arch/x86/include/asm/tlbflush.h | 3 +++ arch/x86/mm/asi.c | 7 ++++--- arch/x86/mm/tlb.c | 44 +++++++++++++++++++++++++++++++++++++---- 4 files changed, 56 insertions(+), 8 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index df34a8c0560b..1a19a925300c 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -69,7 +69,14 @@ #define static_asi_enabled() cpu_feature_enabled(X86_FEATURE_ASI) #define ASI_MAX_NUM_ORDER 2 -#define ASI_MAX_NUM (1 << ASI_MAX_NUM_ORDER) +/* + * We include an ASI identifier in the higher bits of PCID to use + * different PCID for restricted ASIs from non-restricted ASIs (see asi_pcid). + * The ASI identifier we use for this is asi_index + 1, as asi_index + * starts from 0. The -1 below for ASI_MAX_NUM comes from this PCID + * space availability. + */ +#define ASI_MAX_NUM ((1 << ASI_MAX_NUM_ORDER) - 1) struct asi_hooks { /* @@ -101,6 +108,7 @@ struct asi { struct asi_class *class; struct mm_struct *mm; int64_t ref_count; + u16 index; }; DECLARE_PER_CPU_ALIGNED(struct asi *, curr_asi); diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h index ed847567b25d..3605f6b99da7 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -392,6 +392,9 @@ static inline bool huge_pmd_needs_flush(pmd_t oldpmd, pmd_t newpmd) #define huge_pmd_needs_flush huge_pmd_needs_flush unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam); +unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, unsigned long lam, bool noflush); + +u16 asi_pcid(struct asi *asi, u16 asid); #ifdef CONFIG_ADDRESS_MASKING static inline u64 tlbstate_lam_cr3_mask(void) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 2cd8e93a4415..0ba156f879d3 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -140,6 +140,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) asi->class = &asi_class[asi_index]; asi->mm = mm; + asi->index = asi_index; exit_unlock: if (err) @@ -174,6 +175,7 @@ EXPORT_SYMBOL_GPL(asi_destroy); noinstr void __asi_enter(void) { u64 asi_cr3; + u16 pcid; struct asi *target = asi_get_target(current); /* @@ -200,9 +202,8 @@ noinstr void __asi_enter(void) */ this_cpu_write(curr_asi, target); - asi_cr3 = build_cr3(target->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid), - tlbstate_lam_cr3_mask()); + pcid = asi_pcid(target, this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + asi_cr3 = build_cr3_pcid(target->pgd, pcid, tlbstate_lam_cr3_mask(), false); write_cr3(asi_cr3); if (target->class->ops.post_asi_enter) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 9a5afeac9654..34d61b56d33f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -98,7 +98,12 @@ # define PTI_CONSUMED_PCID_BITS 0 #endif -#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS) +#define ASI_CONSUMED_PCID_BITS ASI_MAX_NUM_ORDER +#define ASI_PCID_BITS_SHIFT CR3_AVAIL_PCID_BITS +#define CR3_AVAIL_PCID_BITS (X86_CR3_PCID_BITS - PTI_CONSUMED_PCID_BITS - \ + ASI_CONSUMED_PCID_BITS) + +static_assert(BIT(CR3_AVAIL_PCID_BITS) > TLB_NR_DYN_ASIDS); /* * ASIDs are zero-based: 0->MAX_AVAIL_ASID are valid. -1 below to account @@ -155,18 +160,23 @@ static inline u16 user_pcid(u16 asid) return ret; } +static inline unsigned long __build_cr3(pgd_t *pgd, u16 pcid, unsigned long lam) +{ + return __sme_pa_nodebug(pgd) | pcid | lam; +} + inline_or_noinstr unsigned long build_cr3(pgd_t *pgd, u16 asid, unsigned long lam) { - unsigned long cr3 = __sme_pa_nodebug(pgd) | lam; + u16 pcid = 0; if (static_cpu_has(X86_FEATURE_PCID)) { VM_WARN_ON_ONCE(asid > MAX_ASID_AVAILABLE); - cr3 |= kern_pcid(asid); + pcid = kern_pcid(asid); } else { VM_WARN_ON_ONCE(asid != 0); } - return cr3; + return __build_cr3(pgd, pcid, lam); } static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid, @@ -181,6 +191,19 @@ static inline unsigned long build_cr3_noflush(pgd_t *pgd, u16 asid, return build_cr3(pgd, asid, lam) | CR3_NOFLUSH; } +inline_or_noinstr unsigned long build_cr3_pcid(pgd_t *pgd, u16 pcid, + unsigned long lam, bool noflush) +{ + u64 noflush_bit = 0; + + if (!static_cpu_has(X86_FEATURE_PCID)) + pcid = 0; + else if (noflush) + noflush_bit = CR3_NOFLUSH; + + return __build_cr3(pgd, pcid, lam) | noflush_bit; +} + /* * We get here when we do something requiring a TLB invalidation * but could not go invalidate all of the contexts. We do the @@ -995,6 +1018,19 @@ static void put_flush_tlb_info(void) #endif } +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +inline_or_noinstr u16 asi_pcid(struct asi *asi, u16 asid) +{ + return kern_pcid(asid) | ((asi->index + 1) << ASI_PCID_BITS_SHIFT); +} + +#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + +u16 asi_pcid(struct asi *asi, u16 asid) { return kern_pcid(asid); } + +#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + void flush_tlb_mm_range(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned int stride_shift, bool freed_tables) From patchwork Fri Jul 12 17:00:27 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732019 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 327F117B058 for ; Fri, 12 Jul 2024 17:01:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803680; cv=none; b=aQAXuzHa8a6GMUWF+xtR/Xz4T7E12xOACha6sGSgWep4ie36w4SqfnhfuIGoFgZ8CEfSIGOJFlYOFb77XQqSdl4k4xkT/9WEGeUpEY8g9cyJpvGW/YwQTy/r58oy1zhjj+nsgNwcY5U3HxeBv3uuXbpoLxRF42eweGP20QXrAhg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803680; c=relaxed/simple; bh=GLYAN/kV7nMWH7raHmImHFUkFRo8aYGtdJRB09Hyu5E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=PeCl2AVaP4WiKQ/cp9fDuntvGOZO+lXU4Jo6c8iropVb3QrVoqDMl6XbOBE2UndXDrrDSi0LfFCAoNljc3U7yTf1qJ5dBkJlpuuSE2Uz82BUdfrXmjcjEBvXoNCd3ZJCTrUY/1T+EWXbRJNp5kS2Zng8xvL/0zkgorQRnS5XyZs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=TcR3Tt4M; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="TcR3Tt4M" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42668796626so14700305e9.0 for ; Fri, 12 Jul 2024 10:01:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803677; x=1721408477; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=STKCp8fX1eTmkj5Bbc4ElFS0AmmAwYWDzWQm15HV9Io=; b=TcR3Tt4MErZrNXdoflAGqLhy0CV9tAuAPZwTeZTxRpYUkjvadVNgybEv6jwjDIkaAV BnmlXu6x2Lu/hG8aR5igl0eg63BMPedFfyG6zOjkqklyeMPRTrq7qN7qY9BTk8BrBXDP 7b4kl7/uXUKouL2H+lxiGlC+aO1w5cYurSlObuo/YwLDyOCu+Q/FJgIS6Ur0wh6/re76 x6/GlLYNq5H6VUR+DpOVteW82c7iJz4xngpj+SFa+SvduXjbtqo+nuVWjYIRv/0a7ztp ys7mJYtXPGkNno+IVC96qi1aZ94Lu/ovqHF9iPRSIUiZaydWiWvk+MW7pixnt9UkIFQ7 6dIA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803677; x=1721408477; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=STKCp8fX1eTmkj5Bbc4ElFS0AmmAwYWDzWQm15HV9Io=; b=LV/bRGSYoVdLyQYEGjvOrrbndqNR671Q38+xDVHWB89COv1Id2TecQ0DXPcE6HRf5I VGV+K2gwHIk+8CUazOUxPXvjdmVMaqtsKB3EFpmB5iBzrwcrfTIPNHq48a79QcxHbBs+ m4mytPfmmt3KwnUZq0g0rDDOuLj5Ra1baPvd3NdQXLd9rF/eyEkV2d18L4LWhdBqms6G F0MzfAwej2C+2NJ2civJoZK5Ai7AtVSFoAzmX7FSlbHt4lbucNx7QnE9X/hxnPahPRuO vvZSfYCI3/hcPvASVEhKCWxyGbTXWqiewgYugKHKoPsTkNZLYiGi0AuDezW6JZU2jn6R D+tg== X-Forwarded-Encrypted: i=1; AJvYcCVSf9kL6fwjwKoMgGIWvi8n03Uil9ItQ5Z2QTWHiV1L0YIQYI6Y3omnrAZ69+alENnMRZDiFYZ+/lhcE5xFPV5KgBJ3 X-Gm-Message-State: AOJu0YwnIFacV/HXL8OKs7g0JkgxHVqb5ghZrVWgd4qSt8keisSEYpeD wtrhFm/9/KDrdfw7vaZn3bz32Xnm02NletWgopkKxA1qk2qXRcvSGg7Ls7zY7t9WjrVKsp+U4Bf LiJD32Suhiw== X-Google-Smtp-Source: AGHT+IFyt7OjPNZMl6PbkdfYNDPh8L+dPRBgusnHtzyUmHnXcEp/uNI1T+enUzqfuMGxALw2f3LiWe2F0njYig== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:358a:b0:426:5ff0:1b48 with SMTP id 5b1f17b1804b1-426708ef8b5mr784025e9.4.1720803677551; Fri, 12 Jul 2024 10:01:17 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:27 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-9-144b319a40d8@google.com> Subject: [PATCH 09/26] mm: asi: Make __get_current_cr3_fast() ASI-aware From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid When ASI is active, __get_current_cr3_fast() adjusts the returned CR3 value accordingly to reflect the actual ASI CR3. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 34d61b56d33f..02f73a71d4ea 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include "mm_internal.h" @@ -1125,14 +1126,32 @@ void flush_tlb_kernel_range(unsigned long start, unsigned long end) */ inline_or_noinstr unsigned long __get_current_cr3_fast(void) { - unsigned long cr3 = - build_cr3(this_cpu_read(cpu_tlbstate.loaded_mm)->pgd, - this_cpu_read(cpu_tlbstate.loaded_mm_asid), - tlbstate_lam_cr3_mask()); + unsigned long cr3; + pgd_t *pgd; + u16 asid = this_cpu_read(cpu_tlbstate.loaded_mm_asid); + struct asi *asi = asi_get_current(); + u16 pcid; + + if (asi) { + pgd = asi_pgd(asi); + pcid = asi_pcid(asi, asid); + } else { + pgd = this_cpu_read(cpu_tlbstate.loaded_mm)->pgd; + pcid = kern_pcid(asid); + } + + cr3 = build_cr3_pcid(pgd, pcid, tlbstate_lam_cr3_mask(), false); /* For now, be very restrictive about when this can be called. */ VM_WARN_ON(in_nmi() || preemptible()); + /* + * Outside of the ASI critical section, an ASI-restricted CR3 is + * unstable because an interrupt (including an inner interrupt, if we're + * already in one) could cause a persistent asi_exit. + */ + VM_WARN_ON_ONCE(asi && (asi_is_relaxed() || asi_intr_nest_depth())); + VM_BUG_ON(cr3 != __read_cr3()); return cr3; } From patchwork Fri Jul 12 17:00:28 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732020 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 1D2E017B420 for ; Fri, 12 Jul 2024 17:01:21 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803683; cv=none; b=VeynBTOGs6c0ESCdSJ/cadcal3GQdSLe8DoQ8g4Wh5cTDRE5LSOQ+V2ed4g98gxG8yb/54BfIxlfVbJftoyxwg3sFyhoQ65FRz1464RnsQ/DpT05aXZaIKze3IzhCjA9/MqouDStkpqoCcwdstfBoubR4GG/MdycqAi/IDZeR0Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803683; c=relaxed/simple; bh=JV+LctEmZ8RVScnNvPDQmSmCQdrPRydUorU75kWraZk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=p9wXEem6RUzMsTgdvPtf3jaUrm/+tCjy3tkE8t743AFAqnrepWsL/xE7ak4rRAWRLKbzpj9VNd1f+BfSHJ4Tvs2jO/kHa1XjbXMR0oAKyc06fEFR7o7a1Ji6PR37OymWsO24hZSGuWIMJQKe2URQFYF8xeCsiqKeNF0ZRdz00qE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=FZbsFT+f; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="FZbsFT+f" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-367960f4673so1771430f8f.1 for ; Fri, 12 Jul 2024 10:01:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803680; x=1721408480; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jms+qnUoGWR3Ok8D5PSRh1HvZeNFLLNfdQwhqdA9AN8=; b=FZbsFT+fMDAuUgio48QK61G8Yxlx/8W5pbGp04kifviTcGbOfXubWv+6lPj7XuuRIm JoGdvNbW7IXh54dcdDR+0Gx8G0PABCk6RdhxkKcKyNzcLq2s+baZ67N6sT0u07UaefZt i8A//X66F5F9HiH7RkFxhKM4y3gnO8h99dP5blKtBPczwkXB2gd9BvgY7x955J44ueCF nz/W+DYg3rXxwQAvJPPCdh3QEIefThWUbVRRRJr7fN5vwjmNG5+9ayeZPc08Tc4i/tr6 /+Q8LnNcd9K7msMIe1kBVfg9P/U3Lg6zC00FNdq9vuRHaAGGhvnuedXt8QX29oWTTNB5 Cz+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803680; x=1721408480; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jms+qnUoGWR3Ok8D5PSRh1HvZeNFLLNfdQwhqdA9AN8=; b=GBT6k5nLKMbHKhrd7AL9a8sNzIxSOZ1wTho0hzG/alJX/Z5lCZ2+0/bXVAXXClixQJ bEYWBkab+5fYUaCPzIdbrt0AL2hFCcLVtHa1yhPD/ei82dPzrC5coKsKa4KqmaxwQaq4 JyKYjCPkRnTmFusq/IcKaYOwzeQe4rpvEzw5aQ5LxagfcTuqdrzmomje8EPRU2RQ6rq1 aOk7NOXOxnI4QdiNSXnin6R9dCQThtdJd/FfNIIupocLO5PPSFEaBPeES97+hBCn24YX rJe6fOWIT8Nsk2a2KB8FymMSG3IKF3Il+jYOTYhRzv08MSN5I0mVUmCZeFe/YQuMB5hp VXYA== X-Forwarded-Encrypted: i=1; AJvYcCWp53pFDk9eHyoKQGzL0wzPDCdJF1uigAYR6CNPodRli9vK7z24mwikK2m/cjHuhY+0FdNm496FiaQ+wzfLNaLSByZW X-Gm-Message-State: AOJu0YykH+tSDDZtxvvtKKADJEoG6mM0eYcXYuGv6vh0fC4M4nKY8mXz 5O4KlZR5O2cjMyYNeddzEPleKaJN+VflEtoikzbrpvmed6AmoNwnlgnGdIOvrg1xD2hUubWkCNy Q4MXY84p7Ag== X-Google-Smtp-Source: AGHT+IGisRm/zH5xPuc3vZntkMct9zRaY/l9dILlFDVOJcrKy+wGrjnqUwhX4l9eAthBD2p9mDlo+nx2x2+a6A== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a7b:cb11:0:b0:426:670b:20bf with SMTP id 5b1f17b1804b1-427a0a24cb6mr168765e9.0.1720803680269; Fri, 12 Jul 2024 10:01:20 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:28 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-10-144b319a40d8@google.com> Subject: [PATCH 10/26] mm: asi: Avoid warning from NMI userspace accesses in ASI context From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman nmi_uaccess_okay() emits a warning if current CR3 != mm->pgd. Limit the warning to only when ASI is not active. Co-developed-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 28 +++++++++++++++++++++++----- 1 file changed, 23 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 02f73a71d4ea..e80cd67a5239 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1326,6 +1326,24 @@ void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) put_cpu(); } +static inline bool cr3_matches_current_mm(void) +{ + struct asi *asi = asi_get_current(); + pgd_t *cr3_pgd; + + /* + * Prevent read_cr3_pa -> [NMI, asi_exit] -> asi_get_current, + * otherwise we might find CR3 pointing to the ASI PGD but not + * find a current ASI domain. + */ + barrier(); + cr3_pgd = __va(read_cr3_pa()); + + if (cr3_pgd == current->mm->pgd) + return true; + return asi && (cr3_pgd == asi_pgd(asi)); +} + /* * Blindly accessing user memory from NMI context can be dangerous * if we're in the middle of switching the current user task or @@ -1341,10 +1359,10 @@ bool nmi_uaccess_okay(void) VM_WARN_ON_ONCE(!loaded_mm); /* - * The condition we want to check is - * current_mm->pgd == __va(read_cr3_pa()). This may be slow, though, - * if we're running in a VM with shadow paging, and nmi_uaccess_okay() - * is supposed to be reasonably fast. + * The condition we want to check that CR3 points to either + * current_mm->pgd or an appropriate ASI PGD. Reading CR3 may be slow, + * though, if we're running in a VM with shadow paging, and + * nmi_uaccess_okay() is supposed to be reasonably fast. * * Instead, we check the almost equivalent but somewhat conservative * condition below, and we rely on the fact that switch_mm_irqs_off() @@ -1353,7 +1371,7 @@ bool nmi_uaccess_okay(void) if (loaded_mm != current_mm) return false; - VM_WARN_ON_ONCE(current_mm->pgd != __va(read_cr3_pa())); + VM_WARN_ON_ONCE(!cr3_matches_current_mm()); return true; } From patchwork Fri Jul 12 17:00:29 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732021 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2966817B420 for ; Fri, 12 Jul 2024 17:01:25 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803686; cv=none; b=VX68bq4Iaxh5NgVY3YlaMFXIhVlmOoeyauGmHN03TwY5UP6a7f1UvebClw4RiCFE4yvuXGslwberJtGmDrxuxmakNfLp82BLrJjk9tqE1TshRNO9or3grgCp4ILls9tC0wR1ifwhz1VVneGNztq15WzYFGylOLvN185++DORanI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803686; c=relaxed/simple; bh=tmQea9Di7/WAZR/rcHWCtTHIP3bmxSVvEz9ipvVMY+I=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=aUUoI3Nvs7f4nbTGH1ozeEcTVD7Rk4vVufKE/azvDP6na4udGvKzWmqnWlo7kd3v3lTAyYCGw63CcTIbJqnVpXieXcifHA9CsneQ1yYkSUe+5AiqD3b46MM3vpvoNjBr8k2UIo7zE2KXcqyfLkqLkTdfbK7wnfAkVZcDsZ16mEA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=n5/MBrQ8; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="n5/MBrQ8" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-426620721c2so15369225e9.2 for ; Fri, 12 Jul 2024 10:01:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803684; x=1721408484; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tzyFc6toxGAQrtxbLGqZIE4RB+JOOXOhclq2H8+llZE=; b=n5/MBrQ8NNpA/RaOk0E8Oj95aR2bIk/2lc6ePuHi4uBrmx9SsgkDSLIq8MvPJ4b0sH +Fp1RwJBKnHWVT2mV9je+DuWL9TG3iLs6FWkK+3pg4Rln5ornjYd+6qF6XDfAnLzWXH+ +zw1ppScMVdlk2c595Rw/41aLLLsz5J29I8jGYzRoQoKpWaF22rZYC97dsF/nBMlP0zp VZYd4wF/4/muZhGrl16FbkT8yExRvhyaHRfxbbxjkfaZ0SseLb+VRzXwSThg0MxXLKkI J2bBMeuCo/vJ0NdP2k6E94LGA9EKNUJTTwzJVtTMMMhnuEg1Ir+aGZG0QdPJXmb82583 AkEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803684; x=1721408484; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tzyFc6toxGAQrtxbLGqZIE4RB+JOOXOhclq2H8+llZE=; b=dsdqXSuE1g2tUL6LDvFFoh0Yv1KddtwP+1LZeIBy5FWc3pQxaWL8WJNXD+zTq0jxqG wNHAWi/g/uLGxBoUH+EOtFc1DPz8XjsD6ZXNEfuV/YnQGJximfI0yen5jXZgbXxRonp1 MZbgey8hq+kLffKyyinZlPgtUaJFkl0uqhdg9hvHyx0awM0mlO37rmAZjk4o2xotZPsH Cj/HIYWul7titDwN4j3s6OvIVaC/e56xhNViX7UEu8B3Byx0VNVQnihbEhO7J58UnIN2 a34fCiihKT0QlOj3BUJVYEA4EUMN8QbkPWc5sbEYoyk4F5ANV7PofIRCiMy5yIOY+8GX m1MQ== X-Forwarded-Encrypted: i=1; AJvYcCVMb9E5nuaVoAg94h24cn6u83ritpmSMFIpkm+qnFVlM0e/cOgcSTfaaNHqioYpE4JOEH8AtZXteG6pIjjwsPgsV9M+ X-Gm-Message-State: AOJu0YziKb+oN+z7tBsdmKtw2k8e5P/y9BvXvwg4ad5Sf2KdjrHvGCyE 880YewsY2lJs4l6OCGFINqEJdwKoUa8bYNZj+l+FhbI790Crr0eEH3e9l5GopNNAEyRz3GuVqfI t+sxSJtg87g== X-Google-Smtp-Source: AGHT+IEU6XhUTfh8cq6gCvMggnXLFkSt7XlYIzIUi6HAiuQhdPaLLmoiwQpzZ2rTTEjQ37T3mDQb8/UfRpK6jw== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:4f0e:b0:426:6a73:fb5f with SMTP id 5b1f17b1804b1-426708f9ab5mr1904875e9.7.1720803683507; Fri, 12 Jul 2024 10:01:23 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:29 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-11-144b319a40d8@google.com> Subject: [PATCH 11/26] mm: asi: ASI page table allocation functions From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid This adds custom allocation and free functions for ASI page tables. The alloc functions support allocating memory using different GFP reclaim flags, in order to be able to support non-sensitive allocations from both standard and atomic contexts. They also install the page tables locklessly, which makes it slightly simpler to handle non-sensitive allocations from interrupts/exceptions. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 59 insertions(+) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 0ba156f879d3..8798aab66748 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -71,6 +71,65 @@ void asi_unregister_class(int index) } EXPORT_SYMBOL_GPL(asi_unregister_class); +#ifndef mm_inc_nr_p4ds +#define mm_inc_nr_p4ds(mm) do {} while (false) +#endif + +#ifndef mm_dec_nr_p4ds +#define mm_dec_nr_p4ds(mm) do {} while (false) +#endif + +#define pte_offset pte_offset_kernel + +/* + * asi_p4d_alloc, asi_pud_alloc, asi_pmd_alloc, asi_pte_alloc. + * + * These are like the normal xxx_alloc functions, but: + * + * - They use atomic operations instead of taking a spinlock; this allows them + * to be used from interrupts. This is necessary because we use the page + * allocator from interrupts and the page allocator ultimately calls this + * code. + * - They support customizing the allocation flags. + * + * On the other hand, they do not use the normal page allocation infrastructure, + * that means that PTE pages do not have the PageTable type nor the PagePgtable + * flag and we don't increment the meminfo stat (NR_PAGETABLE) as they do. + */ +static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); +#define DEFINE_ASI_PGTBL_ALLOC(base, level) \ +__maybe_unused \ +static level##_t * asi_##level##_alloc(struct asi *asi, \ + base##_t *base, ulong addr, \ + gfp_t flags) \ +{ \ + if (unlikely(base##_none(*base))) { \ + ulong pgtbl = get_zeroed_page(flags); \ + phys_addr_t pgtbl_pa; \ + \ + if (!pgtbl) \ + return NULL; \ + \ + pgtbl_pa = __pa(pgtbl); \ + \ + if (cmpxchg((ulong *)base, 0, \ + pgtbl_pa | _PAGE_TABLE) != 0) { \ + free_page(pgtbl); \ + goto out; \ + } \ + \ + mm_inc_nr_##level##s(asi->mm); \ + } \ +out: \ + VM_BUG_ON(base##_leaf(*base)); \ + return level##_offset(base, addr); \ +} + +DEFINE_ASI_PGTBL_ALLOC(pgd, p4d) +DEFINE_ASI_PGTBL_ALLOC(p4d, pud) +DEFINE_ASI_PGTBL_ALLOC(pud, pmd) +DEFINE_ASI_PGTBL_ALLOC(pmd, pte) + void __init asi_check_boottime_disable(void) { bool enabled = IS_ENABLED(CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION_DEFAULT_ON); From patchwork Fri Jul 12 17:00:30 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732022 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E95A117B504 for ; Fri, 12 Jul 2024 17:01:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803689; cv=none; b=ZAWp624I44H2LCIljCfpeW9H9628DtN9mxGvij6KKwaETKXF3ihWHi0/lzdqxONjCfWy0lrEEIVIiMoE37HwQ5g40AXAsvWfebHkQeZ0si0iFH8QQDSRhQwMUbr7tF1Ay9rAH0WvVUqZn7bvLPJVTwdVilBKggjX889ToAPN+KY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803689; c=relaxed/simple; bh=zjiZ8DlYeMv7rFCcpM3mQi4MuT9VAeVdTY4uW1UV8lE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ba6mNxRrdvGK6QPhz6XVmYJanrIS/K7qZiy1YbeNFqtOf4SFzLJlC+c0UcE/N3/hY+gs+h+ejVIlSzjmjn27n9QbJ0o06RiENXfL+MUmw+FYeU6u/EasLMgBYtyuV74rt4c82Kdx7thrWR8Ay+8Y1McvxgV80/a0HHSLTrUTqQM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1O2tT4b8; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1O2tT4b8" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-367988464ceso1772221f8f.2 for ; Fri, 12 Jul 2024 10:01:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803686; x=1721408486; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=MiTpcmi4SGpEzjuyBzeicJVJrQMm8QovFUpVk7p2Hrs=; b=1O2tT4b8VtedcUoeKiNXsgthbJfySQHATtFEegS2cd35VXkonC207t68YcMGkPvwht r2RedZNsC57MnjiyVx5V2dl+wKwnOkU0eSDWXYKPuUAsX+rxTkvbRX4pzZYHxZ2QoSbW 6PTAfbhXTU8cFzWHA6HwwbKzYImPkkOZCsL6sLfjAJxAyorhlbVsf6NhF6Hcg5O2zQ7G mulcu51Ir5AVVtiUsLkTKukvI4YC+w9CDH3q9wGBl2QZBOW3jDvlHdhDfWE2sMWIpViJ NF0jkw+I+69r7m30uWA0dOsoT/MHrkucfeLSyRj/EWNQ/2CpieTVuoG3rAqri3qrAhmA ot1A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803686; x=1721408486; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MiTpcmi4SGpEzjuyBzeicJVJrQMm8QovFUpVk7p2Hrs=; b=WIMOYjmlNPJd1FGDTdyTIGQbO++VJoh/M9H19hkAR5W+Z8AyKcTQYiLEXKue37UkyN OGOWtE1dWs8XANWO2YQFRdNTW51IvAhJCtDvkMbinRA8GRXAnsTlRkFE1A/jwnIIkzGp Eax2c2soZqRmGDGLing0ElcFziW5fw8nuEDWW7H7WVR4dWdxC2ZE2KVP/tzeD0cUcFEd mMo4CJDvpMLz10tPJG8b826mLWtHS6kxgqnUt0Rq/Jmbnk91fDumyWhJ9Fv6Whq8/sa4 6e+hj5cnLjLel3bZD/ebnREEyXdffh2dOC1tb4J24J/J8W+HAdorhNZydRitHPOkgViZ xxCg== X-Forwarded-Encrypted: i=1; AJvYcCVo6p9ZMO1T5FsH+PTm6f8rXas53VlGwmeeKFbA0IgHY6hoxVsFxk67DV3iAF/RyOyS2JuadfyN71GGk+0418D/DyTP X-Gm-Message-State: AOJu0YyOSZDT7FqRxahjDvsMy00n6cvDVpumj2HDIROsZTkSW6jrpnNq qMgS5mafxKun7KGeRQ371A30438Sdff7UJ6wi0FOeaWqynbpvOiev1s1Yp3uVse5Yq8KHvKjdqA Nn4SVksMsqA== X-Google-Smtp-Source: AGHT+IHaxca8Kt7JtoFlz5j8ZwGaa7ohvNgTm761vo7JWzSlYydrUy+c3mRwk3Ip75evYABidpu6mgorHtLpeg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a5d:6387:0:b0:367:890e:935e with SMTP id ffacd0b85a97d-367cea67da0mr20496f8f.4.1720803686255; Fri, 12 Jul 2024 10:01:26 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:30 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-12-144b319a40d8@google.com> Subject: [PATCH 12/26] mm: asi: asi_exit() on PF, skip handling if address is accessible From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Ofir Weisse On a page-fault - do asi_exit(). Then check if now after the exit the address is accessible. We do this by refactoring spurious_kernel_fault() into two parts: 1. Verify that the error code value is something that could arise from a lazy TLB update. 2. Walk the page table and verify permissions, which is now called is_address_accessible(). We also define PTE_PRESENT() and PMD_PRESENT() which are suitable for checking userspace pages. For the sake of spurious faults, pte_present() and pmd_present() are only good for kernelspace pages. This is because these macros might return true even if the present bit is 0 (only relevant for userspace). Signed-off-by: Ofir Weisse Signed-off-by: Brendan Jackman --- arch/x86/mm/fault.c | 119 +++++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 104 insertions(+), 15 deletions(-) diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index bba4e020dd64..e0bc5006c371 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -942,7 +942,7 @@ do_sigbus(struct pt_regs *regs, unsigned long error_code, unsigned long address, force_sig_fault(SIGBUS, BUS_ADRERR, (void __user *)address); } -static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) +static __always_inline int kernel_protection_ok(unsigned long error_code, pte_t *pte) { if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) return 0; @@ -953,6 +953,9 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) return 1; } +static inline_or_noinstr int kernel_access_ok( + unsigned long error_code, unsigned long address, pgd_t *pgd); + /* * Handle a spurious fault caused by a stale TLB entry. * @@ -978,11 +981,6 @@ static noinline int spurious_kernel_fault(unsigned long error_code, unsigned long address) { pgd_t *pgd; - p4d_t *p4d; - pud_t *pud; - pmd_t *pmd; - pte_t *pte; - int ret; /* * Only writes to RO or instruction fetches from NX may cause @@ -998,6 +996,50 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return 0; pgd = init_mm.pgd + pgd_index(address); + return kernel_access_ok(error_code, address, pgd); +} +NOKPROBE_SYMBOL(spurious_kernel_fault); + +/* + * For kernel addresses, pte_present and pmd_present are sufficient for + * is_address_accessible. For user addresses these functions will return true + * even though the pte is not actually accessible by hardware (i.e _PAGE_PRESENT + * is not set). This happens in cases where the pages are physically present in + * memory, but they are not made accessible to hardware as they need software + * handling first: + * + * - ptes/pmds with _PAGE_PROTNONE need autonuma balancing (see pte_protnone(), + * change_prot_numa(), and do_numa_page()). + * + * - pmds with _PAGE_PSE & !_PAGE_PRESENT are undergoing splitting (see + * split_huge_page()). + * + * Here, we care about whether the hardware can actually access the page right + * now. + * + * These issues aren't currently present for PUD but we also have a custom + * PUD_PRESENT for a layer of future-proofing. + */ +#define PUD_PRESENT(pud) (pud_flags(pud) & _PAGE_PRESENT) +#define PMD_PRESENT(pmd) (pmd_flags(pmd) & _PAGE_PRESENT) +#define PTE_PRESENT(pte) (pte_flags(pte) & _PAGE_PRESENT) + +/* + * Check if an access by the kernel would cause a page fault. The access is + * described by a page fault error code (whether it was a write/instruction + * fetch) and address. This doesn't check for types of faults that are not + * expected to affect the kernel, e.g. PKU. The address can be user or kernel + * space, if user then we assume the access would happen via the uaccess API. + */ +static inline_or_noinstr int +kernel_access_ok(unsigned long error_code, unsigned long address, pgd_t *pgd) +{ + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + int ret; + if (!pgd_present(*pgd)) return 0; @@ -1006,27 +1048,27 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return 0; if (p4d_leaf(*p4d)) - return spurious_kernel_fault_check(error_code, (pte_t *) p4d); + return kernel_protection_ok(error_code, (pte_t *) p4d); pud = pud_offset(p4d, address); - if (!pud_present(*pud)) + if (!PUD_PRESENT(*pud)) return 0; if (pud_leaf(*pud)) - return spurious_kernel_fault_check(error_code, (pte_t *) pud); + return kernel_protection_ok(error_code, (pte_t *) pud); pmd = pmd_offset(pud, address); - if (!pmd_present(*pmd)) + if (!PMD_PRESENT(*pmd)) return 0; if (pmd_leaf(*pmd)) - return spurious_kernel_fault_check(error_code, (pte_t *) pmd); + return kernel_protection_ok(error_code, (pte_t *) pmd); pte = pte_offset_kernel(pmd, address); - if (!pte_present(*pte)) + if (!PTE_PRESENT(*pte)) return 0; - ret = spurious_kernel_fault_check(error_code, pte); + ret = kernel_protection_ok(error_code, pte); if (!ret) return 0; @@ -1034,12 +1076,11 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) * Make sure we have permissions in PMD. * If not, then there's a bug in the page tables: */ - ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd); + ret = kernel_protection_ok(error_code, (pte_t *) pmd); WARN_ONCE(!ret, "PMD has incorrect permission bits\n"); return ret; } -NOKPROBE_SYMBOL(spurious_kernel_fault); int show_unhandled_signals = 1; @@ -1483,6 +1524,29 @@ handle_page_fault(struct pt_regs *regs, unsigned long error_code, } } +static __always_inline void warn_if_bad_asi_pf( + unsigned long error_code, unsigned long address) +{ +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + struct asi *target; + + /* + * It's a bug to access sensitive data from the "critical section", i.e. + * on the path between asi_enter and asi_relax, where untrusted code + * gets run. #PF in this state sees asi_intr_nest_depth() as 1 because + * #PF increments it. We can't think of a better way to determine if + * this has happened than to check the ASI pagetables, hence we can't + * really have this check in non-debug builds unfortunately. + */ + VM_WARN_ONCE( + (target = asi_get_target(current)) != NULL && + asi_intr_nest_depth() == 1 && + !kernel_access_ok(error_code, address, asi_pgd(target)), + "ASI-sensitive data access from critical section, addr=%px error_code=%lx class=%s", + (void *) address, error_code, target->class->name); +#endif +} + DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) { irqentry_state_t state; @@ -1490,6 +1554,31 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) address = cpu_feature_enabled(X86_FEATURE_FRED) ? fred_event_data(regs) : read_cr2(); + if (static_asi_enabled() && !user_mode(regs)) { + pgd_t *pgd; + + /* Can be a NOP even for ASI faults, because of NMIs */ + asi_exit(); + + /* + * handle_page_fault() might oops if we run it for a kernel + * address. This might be the case if we got here due to an ASI + * fault. We avoid this case by checking whether the address is + * now, after asi_exit(), accessible by hardware. If it is - + * there's nothing to do. Note that this is a bit of a shotgun; + * we can also bail early from user-address faults here that + * weren't actually caused by ASI. So we might wanna move this + * logic later in the handler. In particular, we might be losing + * some stats here. However for now this keeps ASI page faults + * nice and fast. + */ + pgd = (pgd_t *)__va(read_cr3_pa()) + pgd_index(address); + if (kernel_access_ok(error_code, address, pgd)) { + warn_if_bad_asi_pf(error_code, address); + return; + } + } + prefetchw(¤t->mm->mmap_lock); /* From patchwork Fri Jul 12 17:00:31 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732023 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F41217B504 for ; Fri, 12 Jul 2024 17:01:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803693; cv=none; b=XQMpErg/NAXGAt0lq2x+YFCYZwC1AcxTUGCeLD/zFuKQNp9YtqDn9KWp5dH59Y3rVlvjdPuJ+6mgvp49YnmwHCYuxzgyYM9kbtBxw3p6TBxQwlyO62pXWIpTvPfPsXY5CYeZtRTXfZo4YpB8u4KPx16nj7M9bjFcdERVJVqzFvo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803693; c=relaxed/simple; bh=ab3ibfiu6/EobF/dt6nt6fNjyRlfI6Skav1gbyBbuUE=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Uvgzc1OnFdNNwZGzcTfoWCn6Kg0AVo12rMbsOR31VQFaJYuas5KTZ/WHpeS6QXGv+rLaQxaOCbsP+XnuPqoP5vzNUhfVdxCp3JZ5JGv9csNV9d7/wcB1Ixf2w1Rwe5BiECoxPIVCvO3c1XancPofgz2mPmlQVe7Kmh8jtKIcWaQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=a5xiMbl7; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="a5xiMbl7" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-367990b4beeso1258836f8f.2 for ; Fri, 12 Jul 2024 10:01:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803690; x=1721408490; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hLefymYBmXT6JRGt/rz6i03hx4/br+A6fWBCFOJPqqY=; b=a5xiMbl7t/jObyi6m/EVX+/kuN8XSBDPmMzJq57ZpE7OY3Lwip/A9y4m9X9vfpiaSf nHv19dAJxdWn4b9wkgJ6Z0dSvy0fGkwyIWo9OnQLR/2/ofnCDf3ztwcw658PrNYY00Ny grY8N9awDtn5okvPYA+8wN46h0hxuWv0RwIf8Q1gn1XaQ3nwqzrTrnIcGsrMjOC0YiGA 018d8UsfB8iQALCBSEg784onShUChfgsFC/hFsSGzxlKXk1z4xMkZVeaA88E1uLipxP0 IlNYPuPWuMe8MQLbLNZ3OJMDf3CsA3oxOjCvsYTjnBEHxfkVUi0wxhmFVuitN3tO1b0z gdLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803690; x=1721408490; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hLefymYBmXT6JRGt/rz6i03hx4/br+A6fWBCFOJPqqY=; b=WlxEZiPBmYvFHqJl7dHQOrJSPQKHNEuA42QYWXLp2UDR2UxYl34uH2AZfKgZI6cn++ BGIZ5PNzMcv6RlnThOVj8/1JlueXRVnVD1eMuvz/i7UZyDUstYe1ObXCEokmoKpnIRG2 gQuKvWQXETivbPDnD/ghTEYcaQe3K0wqNvfRDTv6x4G2V5W4f2cQtzcAH0S+XOvMQEHv KmSjxLqfTd3E/U26nKgPxkHLz0d2YPtBH3rSRsHvPoOinYsccYQNoKFYsO87Op0FLlsl jGZVgwkkyi+f3/DECDJfZ7xsK5O5XULCggvvYGcKAmeEToyJF8AwEWUh7sK2VK3vzg84 lvxQ== X-Forwarded-Encrypted: i=1; AJvYcCVbSdOVdKPeIMsJGAGPDBfgtt6S/HNB5f35WpPzkyfu4+DrNvTAA0fXQ68sQGOJGHLjP4Xy9khSZwl0weEctDzX1CtG X-Gm-Message-State: AOJu0YwpvU+7Ic+SCH0OifiyZoOhaqTnTb+yzbPbxZiBAfr+8sbVa4/X 1+77o+JhCiOcPfwHm9MwqoadR+A8/r7I0ao+UEZlY9nJkQEeFkxsUq3uq0pUGhiL8+HXot7oU0N ap52VbT1yVQ== X-Google-Smtp-Source: AGHT+IGmoS0wFSTyq42rNb9s8Sa7gNvWrt/+iS9kYwcQcp6CEEUakdBFKNvwNThuXyaWVj/Pb47/G/lt1+tIwQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:adf:e692:0:b0:367:9b1f:c59b with SMTP id ffacd0b85a97d-367ceac4433mr15745f8f.9.1720803689929; Fri, 12 Jul 2024 10:01:29 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:31 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-13-144b319a40d8@google.com> Subject: [PATCH 13/26] mm: asi: Functions to map/unmap a memory range into ASI page tables From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid Two functions, asi_map() and asi_map_gfp(), are added to allow mapping memory into ASI page tables. The mapping will be identical to the one for the same virtual address in the unrestricted page tables. This is necessary to allow switching between the page tables at any arbitrary point in the kernel. Another function, asi_unmap() is added to allow unmapping memory mapped via asi_map* Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 5 + arch/x86/mm/asi.c | 238 ++++++++++++++++++++++++++++++++++++++++++++- arch/x86/mm/tlb.c | 5 + include/asm-generic/asi.h | 13 +++ include/linux/pgtable.h | 3 + mm/internal.h | 2 + mm/vmalloc.c | 32 +++--- 7 files changed, 284 insertions(+), 14 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 1a19a925300c9..9aad843eb6dfa 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -135,6 +135,11 @@ void asi_relax(void); /* Immediately exit the restricted address space if in it */ void asi_exit(void); +int asi_map_gfp(struct asi *asi, void *addr, size_t len, gfp_t gfp_flags); +int asi_map(struct asi *asi, void *addr, size_t len); +void asi_unmap(struct asi *asi, void *addr, size_t len); +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len); + static inline void asi_init_thread_state(struct thread_struct *thread) { thread->asi_state.intr_nest_depth = 0; diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 8798aab667489..e43b206450ad9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -9,6 +9,9 @@ #include #include #include +#include + +#include "../../../mm/internal.h" static struct asi_class asi_class[ASI_MAX_NUM]; static DEFINE_SPINLOCK(asi_class_lock); @@ -98,7 +101,6 @@ EXPORT_SYMBOL_GPL(asi_unregister_class); */ static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ -__maybe_unused \ static level##_t * asi_##level##_alloc(struct asi *asi, \ base##_t *base, ulong addr, \ gfp_t flags) \ @@ -338,3 +340,237 @@ void asi_init_mm_state(struct mm_struct *mm) memset(mm->asi, 0, sizeof(mm->asi)); mutex_init(&mm->asi_init_lock); } + +static bool is_page_within_range(unsigned long addr, unsigned long page_size, + unsigned long range_start, unsigned long range_end) +{ + unsigned long page_start = ALIGN_DOWN(addr, page_size); + unsigned long page_end = page_start + page_size; + + return page_start >= range_start && page_end <= range_end; +} + +static bool follow_physaddr( + pgd_t *pgd_table, unsigned long virt, + phys_addr_t *phys, unsigned long *page_size, ulong *flags) +{ + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + + /* This may be written using lookup_address_in_*, see kcl/675039. */ + + *page_size = PGDIR_SIZE; + pgd = pgd_offset_pgd(pgd_table, virt); + if (!pgd_present(*pgd)) + return false; + if (pgd_leaf(*pgd)) { + *phys = PFN_PHYS(pgd_pfn(*pgd)) | (virt & ~PGDIR_MASK); + *flags = pgd_flags(*pgd); + return true; + } + + *page_size = P4D_SIZE; + p4d = p4d_offset(pgd, virt); + if (!p4d_present(*p4d)) + return false; + if (p4d_leaf(*p4d)) { + *phys = PFN_PHYS(p4d_pfn(*p4d)) | (virt & ~P4D_MASK); + *flags = p4d_flags(*p4d); + return true; + } + + *page_size = PUD_SIZE; + pud = pud_offset(p4d, virt); + if (!pud_present(*pud)) + return false; + if (pud_leaf(*pud)) { + *phys = PFN_PHYS(pud_pfn(*pud)) | (virt & ~PUD_MASK); + *flags = pud_flags(*pud); + return true; + } + + *page_size = PMD_SIZE; + pmd = pmd_offset(pud, virt); + if (!pmd_present(*pmd)) + return false; + if (pmd_leaf(*pmd)) { + *phys = PFN_PHYS(pmd_pfn(*pmd)) | (virt & ~PMD_MASK); + *flags = pmd_flags(*pmd); + return true; + } + + *page_size = PAGE_SIZE; + pte = pte_offset_map(pmd, virt); + if (!pte) + return false; + + if (!pte_present(*pte)) { + pte_unmap(pte); + return false; + } + + *phys = PFN_PHYS(pte_pfn(*pte)) | (virt & ~PAGE_MASK); + *flags = pte_flags(*pte); + + pte_unmap(pte); + return true; +} + +/* + * Map the given range into the ASI page tables. The source of the mapping is + * the regular unrestricted page tables. Can be used to map any kernel memory. + * + * The caller MUST ensure that the source mapping will not change during this + * function. For dynamic kernel memory, this is generally ensured by mapping the + * memory within the allocator. + * + * If this fails, it may leave partial mappings behind. You must asi_unmap them, + * bearing in mind asi_unmap's requirements on the calling context. Part of the + * reason for this is that we don't want to unexpectedly undo mappings that + * weren't created by the present caller. + * + * If the source mapping is a large page and the range being mapped spans the + * entire large page, then it will be mapped as a large page in the ASI page + * tables too. If the range does not span the entire huge page, then it will be + * mapped as smaller pages. In that case, the implementation is slightly + * inefficient, as it will walk the source page tables again for each small + * destination page, but that should be ok for now, as usually in such cases, + * the range would consist of a small-ish number of pages. + * + * Note that upstream + * (https://lore.kernel.org/all/20210317155843.c15e71f966f1e4da508dea04@linux-foundation.org/) + * vmap_p4d_range supports huge mappings. It is probably possible to use that + * logic instead of custom mapping duplication logic in later versions of ASI. + */ +int __must_check asi_map_gfp(struct asi *asi, void *addr, unsigned long len, gfp_t gfp_flags) +{ + unsigned long virt; + unsigned long start = (size_t)addr; + unsigned long end = start + len; + unsigned long page_size; + + if (!static_asi_enabled()) + return 0; + + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); + VM_BUG_ON(!IS_ALIGNED(len, PAGE_SIZE)); + VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ + + gfp_flags &= GFP_RECLAIM_MASK; + + if (asi->mm != &init_mm) + gfp_flags |= __GFP_ACCOUNT; + + for (virt = start; virt < end; virt = ALIGN(virt + 1, page_size)) { + pgd_t *pgd; + p4d_t *p4d; + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + phys_addr_t phys; + ulong flags; + + if (!follow_physaddr(asi->mm->pgd, virt, &phys, &page_size, &flags)) + continue; + +#define MAP_AT_LEVEL(base, BASE, level, LEVEL) { \ + if (base##_leaf(*base)) { \ + if (WARN_ON_ONCE(PHYS_PFN(phys & BASE##_MASK) !=\ + base##_pfn(*base))) \ + return -EBUSY; \ + continue; \ + } \ + \ + level = asi_##level##_alloc(asi, base, virt, gfp_flags);\ + if (!level) \ + return -ENOMEM; \ + \ + if (page_size >= LEVEL##_SIZE && \ + (level##_none(*level) || level##_leaf(*level)) && \ + is_page_within_range(virt, LEVEL##_SIZE, \ + start, end)) { \ + page_size = LEVEL##_SIZE; \ + phys &= LEVEL##_MASK; \ + \ + if (!level##_none(*level)) { \ + if (WARN_ON_ONCE(level##_pfn(*level) != \ + PHYS_PFN(phys))) { \ + return -EBUSY; \ + } \ + } else { \ + set_##level(level, \ + __##level(phys | flags)); \ + } \ + continue; \ + } \ + } + + pgd = pgd_offset_pgd(asi->pgd, virt); + + MAP_AT_LEVEL(pgd, PGDIR, p4d, P4D); + MAP_AT_LEVEL(p4d, P4D, pud, PUD); + MAP_AT_LEVEL(pud, PUD, pmd, PMD); + /* + * If a large page is going to be partially mapped + * in 4k pages, convert the PSE/PAT bits. + */ + if (page_size >= PMD_SIZE) + flags = protval_large_2_4k(flags); + MAP_AT_LEVEL(pmd, PMD, pte, PAGE); + + VM_BUG_ON(true); /* Should never reach here. */ + } + + return 0; +#undef MAP_AT_LEVEL +} + +int __must_check asi_map(struct asi *asi, void *addr, unsigned long len) +{ + return asi_map_gfp(asi, addr, len, GFP_KERNEL); +} + +/* + * Unmap a kernel address range previously mapped into the ASI page tables. + * + * The area being unmapped must be a whole previously mapped region (or regions) + * Unmapping a partial subset of a previously mapped region is not supported. + * That will work, but may end up unmapping more than what was asked for, if + * the mapping contained huge pages. A later patch will remove this limitation + * by splitting the huge mapping in the ASI page table in such a case. For now, + * vunmap_pgd_range() will just emit a warning if this situation is detected. + * + * This might sleep, and cannot be called with interrupts disabled. + */ +void asi_unmap(struct asi *asi, void *addr, size_t len) +{ + size_t start = (size_t)addr; + size_t end = start + len; + pgtbl_mod_mask mask = 0; + + if (!static_asi_enabled() || !len) + return; + + VM_BUG_ON(start & ~PAGE_MASK); + VM_BUG_ON(len & ~PAGE_MASK); + VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ + + vunmap_pgd_range(asi->pgd, start, end, &mask); + + /* We don't support partial unmappings - b/270310049 */ + if (mask & PGTBL_P4D_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, P4D_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, P4D_SIZE)); + } else if (mask & PGTBL_PUD_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, PUD_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, PUD_SIZE)); + } else if (mask & PGTBL_PMD_MODIFIED) { + VM_WARN_ON(!IS_ALIGNED((ulong)addr, PMD_SIZE)); + VM_WARN_ON(!IS_ALIGNED((ulong)len, PMD_SIZE)); + } + + asi_flush_tlb_range(asi, addr, len); +} diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index e80cd67a5239e..36087d6238e6f 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -1026,6 +1026,11 @@ inline_or_noinstr u16 asi_pcid(struct asi *asi, u16 asid) return kern_pcid(asid) | ((asi->index + 1) << ASI_PCID_BITS_SHIFT); } +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) +{ + flush_tlb_kernel_range((ulong)addr, (ulong)addr + len); +} + #else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ u16 asi_pcid(struct asi *asi, u16 asid) { return kern_pcid(asid); } diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index fa0bbf899a094..3956f995fe6a1 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -2,6 +2,8 @@ #ifndef __ASM_GENERIC_ASI_H #define __ASM_GENERIC_ASI_H +#include + #ifndef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION #define ASI_MAX_NUM_ORDER 0 @@ -58,6 +60,17 @@ static inline int asi_intr_nest_depth(void) { return 0; } static inline void asi_intr_exit(void) { } +static inline int asi_map(struct asi *asi, void *addr, size_t len) +{ + return 0; +} + +static inline +void asi_unmap(struct asi *asi, void *addr, size_t len) { } + +static inline +void asi_flush_tlb_range(struct asi *asi, void *addr, size_t len) { } + #define static_asi_enabled() false static inline void asi_check_boottime_disable(void) { } diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 85fc7554cd52b..4884dfc6e699b 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -1788,6 +1788,9 @@ typedef unsigned int pgtbl_mod_mask; #ifndef pmd_leaf #define pmd_leaf(x) false #endif +#ifndef pte_leaf +#define pte_leaf(x) 1 +#endif #ifndef pgd_leaf_size #define pgd_leaf_size(x) (1ULL << PGDIR_SHIFT) diff --git a/mm/internal.h b/mm/internal.h index 07ad2675a88b4..8a8f98e119dfa 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -217,6 +217,8 @@ void unmap_page_range(struct mmu_gather *tlb, void page_cache_ra_order(struct readahead_control *, struct file_ra_state *, unsigned int order); void force_page_cache_ra(struct readahead_control *, unsigned long nr); +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask); static inline void force_page_cache_readahead(struct address_space *mapping, struct file *file, pgoff_t index, unsigned long nr_to_read) { diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 125427cbdb87b..7a8daf5afb7cc 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -419,6 +419,24 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, } while (p4d++, addr = next, addr != end); } +void vunmap_pgd_range(pgd_t *pgd_table, unsigned long addr, unsigned long end, + pgtbl_mod_mask *mask) +{ + unsigned long next; + pgd_t *pgd = pgd_offset_pgd(pgd_table, addr); + + BUG_ON(addr >= end); + + do { + next = pgd_addr_end(addr, end); + if (pgd_bad(*pgd)) + *mask |= PGTBL_PGD_MODIFIED; + if (pgd_none_or_clear_bad(pgd)) + continue; + vunmap_p4d_range(pgd, addr, next, mask); + } while (pgd++, addr = next, addr != end); +} + /* * vunmap_range_noflush is similar to vunmap_range, but does not * flush caches or TLBs. @@ -433,21 +451,9 @@ static void vunmap_p4d_range(pgd_t *pgd, unsigned long addr, unsigned long end, */ void __vunmap_range_noflush(unsigned long start, unsigned long end) { - unsigned long next; - pgd_t *pgd; - unsigned long addr = start; pgtbl_mod_mask mask = 0; - BUG_ON(addr >= end); - pgd = pgd_offset_k(addr); - do { - next = pgd_addr_end(addr, end); - if (pgd_bad(*pgd)) - mask |= PGTBL_PGD_MODIFIED; - if (pgd_none_or_clear_bad(pgd)) - continue; - vunmap_p4d_range(pgd, addr, next, &mask); - } while (pgd++, addr = next, addr != end); + vunmap_pgd_range(init_mm.pgd, start, end, &mask); if (mask & ARCH_PAGE_TABLE_SYNC_MASK) arch_sync_kernel_mappings(start, end); From patchwork Fri Jul 12 17:00:32 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732024 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C1E917BB23 for ; Fri, 12 Jul 2024 17:01:33 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803695; cv=none; b=mR0UBU6tp2GbFR+rIQ6vRZ7y0THi16hvzLlK3Su0c1tXoHZ2R7HNG1fuH4cT30n+f5UszN4qwmIvkt2Gf3VhNgrupP1CvKazadEBOlRLWgrrZNsBFIBOJFl1GSj4htFr8XDxyx99AsIq1amyXdp/rys48t5geeOryetfIfnlAik= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803695; c=relaxed/simple; bh=Pq52Q/QpIzshTMsDJJlCxL0sB1Cq1Huhd06MZnjx5cY=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=W+iB4qmQWJWdC6AvzLc5a/gMX5c9manL5ndXalJ2m8Np0M74flREKpzFRNqCOe7tT+08VDBnQhhsgvVUDE0CCpL+407U5m/274Nr1r82eQ6XnzM7xDuNalNBWUHjVtjDzgiu+bOS4yEgqh0PiEzOBGxPxOTJUb5CKXTOWhaFuYE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=GK3cxh/A; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="GK3cxh/A" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-64b2a73ad0bso35291947b3.3 for ; Fri, 12 Jul 2024 10:01:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803693; x=1721408493; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3PoEMkqequy1KWINMIJVW/1gdvscCRasMo3UATAb/nU=; b=GK3cxh/AjB6FWdSPKdeITlJVfDdZPAJbSBVQoW77cvR/dS8Bhmhqi9OpFtoNebNjci cJlBQHihDUk6bnC203s/ZGbEQi8jGe9edxjRyDd2H1LhrSHChKx+IiLFfmmLaJu1gb2E v9PA78gfkQrkxc9tHHkcjI/q1YOmnJ+T1UtDpHSmDXyQY8E9lpNXE289YC02dHKfo0mf SCTC7Jlmx+1fRiq5w/XnMk5PSIctdCyzg4YhALk98ocMRhbYyvqRfUALdoTubyTgHNga yWcAvcujYw8P0EnoMmD6/iUpig3B8tIKnjaGFBTusWMYusVicrxGCe6T3skOal2P7RsL aY5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803693; x=1721408493; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3PoEMkqequy1KWINMIJVW/1gdvscCRasMo3UATAb/nU=; b=PTVrlgllee/MaXBEBbl7Zfkx/RceFh8Lnfpl8gyGtc0VFHZi5xR2baZdsNSkK3BZit GuOKUBrkn7xQcqiornr+QEX9H7Xgfp/qteWSJMK69f7SplTsnSQ1sayQNnKcLp67MQjv 0iXS5wMjNt+X1tQtXOFTew6ZvgNhDATdxuoQWqg2QvAvPUSaZsz+0u6N3qP4CADU1J2E nfKHV+D8I1WTM3EZGPCnUdZwy9eOV1qulX/0k38neyFc7/J7pzwj81E4l8k4xq0LyZBY xkRMJ0sgKS3i2pQ2CO6/Ux9NqGdAmgXs92LhTU+rAoOn9S5fFeN5EIRUUxBLNvxTv7aj WRSQ== X-Forwarded-Encrypted: i=1; AJvYcCVUosX8eIeEFq3ILUyOMOshINn6ocOvZct0A3yNOi+uYmofPfuUTpPyrWbaOyW32GuIGa3s2T+v3TS3LzTjth+1np05 X-Gm-Message-State: AOJu0YwOM71IZG+Bz0c7L0M6DiVPV1E8ElBAlrkDeNDVVNSo/7lYXKqP /wD7vHHiFb8JhqY6ZX/boguroB2PgX6pHjUakR1FwZRlCkZ4Xcwlh+6ABEnYL3Tn3Z/WwHeMFaZ PAQ6BpUcPXA== X-Google-Smtp-Source: AGHT+IGAsyCLoTSPrJ2N91yIZy7N9XYfi0+vS82VtwZTogujGpm4xok4sUQDdphNDQnjhHnIeN3tEJvaUVoySQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:690c:488a:b0:62c:ea0b:a447 with SMTP id 00721157ae682-658ee69b8aemr3280537b3.2.1720803693133; Fri, 12 Jul 2024 10:01:33 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:32 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-14-144b319a40d8@google.com> Subject: [PATCH 14/26] mm: asi: Add basic infrastructure for global non-sensitive mappings From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Junaid Shahid A pseudo-PGD is added to store global non-sensitive ASI mappings. Actual ASI PGDs copy entries from this pseudo-PGD during asi_init(). Memory can be mapped as globally non-sensitive by calling asi_map() with ASI_GLOBAL_NONSENSITIVE. Page tables allocated for global non-sensitive mappings are never freed. While a previous version used init_mm.asi[0] as the special global nonsensitive domain, here we have tried to avoid special-casing index 0. So now we have a special global variable for that. For this to work we need to make sure that nobody assumes that asi is a member of asi->mm->asi (also that nobody assumes a struct asi is embedded in a struct mm - but that seems like a weird assumption to make anyway, when you already have the .mm pointer). I currently believe that this is worth it for the reduced level of magic in the code. Signed-off-by: Junaid Shahid Signed-off-by: Brendan Jackman --- arch/x86/include/asm/asi.h | 3 +++ arch/x86/mm/asi.c | 37 +++++++++++++++++++++++++++++++++++++ arch/x86/mm/init_64.c | 25 ++++++++++++++++--------- arch/x86/mm/mm_internal.h | 3 +++ include/asm-generic/asi.h | 2 ++ 5 files changed, 61 insertions(+), 9 deletions(-) diff --git a/arch/x86/include/asm/asi.h b/arch/x86/include/asm/asi.h index 9aad843eb6df..2d86a5c17f2b 100644 --- a/arch/x86/include/asm/asi.h +++ b/arch/x86/include/asm/asi.h @@ -78,6 +78,9 @@ */ #define ASI_MAX_NUM ((1 << ASI_MAX_NUM_ORDER) - 1) +extern struct asi __asi_global_nonsensitive; +#define ASI_GLOBAL_NONSENSITIVE (&__asi_global_nonsensitive) + struct asi_hooks { /* * Both of these functions MUST be idempotent and re-entrant. They will diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index e43b206450ad..807d51497f43 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -11,6 +11,7 @@ #include #include +#include "mm_internal.h" #include "../../../mm/internal.h" static struct asi_class asi_class[ASI_MAX_NUM]; @@ -19,6 +20,13 @@ static DEFINE_SPINLOCK(asi_class_lock); DEFINE_PER_CPU_ALIGNED(struct asi *, curr_asi); EXPORT_SYMBOL(curr_asi); +static __aligned(PAGE_SIZE) pgd_t asi_global_nonsensitive_pgd[PTRS_PER_PGD]; + +struct asi __asi_global_nonsensitive = { + .pgd = asi_global_nonsensitive_pgd, + .mm = &init_mm, +}; + static inline bool asi_class_registered(int index) { return asi_class[index].name != NULL; @@ -154,6 +162,31 @@ void __init asi_check_boottime_disable(void) pr_info("ASI enablement ignored due to incomplete implementation.\n"); } +static int __init asi_global_init(void) +{ + if (!boot_cpu_has(X86_FEATURE_ASI)) + return 0; + + /* + * Lower-level pagetables for global nonsensitive mappings are shared, + * but the PGD has to be copied into each domain during asi_init. To + * avoid needing to synchronize new mappings into pre-existing domains + * we just pre-allocate all of the relevant level N-1 entries so that + * the global nonsensitive PGD already has pointers that can be copied + * when new domains get asi_init()ed. + */ + preallocate_sub_pgd_pages(asi_global_nonsensitive_pgd, + PAGE_OFFSET, + PAGE_OFFSET + PFN_PHYS(max_pfn) - 1, + "ASI Global Non-sensitive direct map"); + preallocate_sub_pgd_pages(asi_global_nonsensitive_pgd, + VMALLOC_START, VMALLOC_END, + "ASI Global Non-sensitive vmalloc"); + + return 0; +} +subsys_initcall(asi_global_init) + static void __asi_destroy(struct asi *asi) { WARN_ON_ONCE(asi->ref_count <= 0); @@ -168,6 +201,7 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) { struct asi *asi; int err = 0; + uint i; *out_asi = NULL; @@ -203,6 +237,9 @@ int asi_init(struct mm_struct *mm, int asi_index, struct asi **out_asi) asi->mm = mm; asi->index = asi_index; + for (i = KERNEL_PGD_BOUNDARY; i < PTRS_PER_PGD; i++) + set_pgd(asi->pgd + i, asi_global_nonsensitive_pgd[i]); + exit_unlock: if (err) __asi_destroy(asi); diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 7e177856ee4f..f67f4637357c 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -1278,18 +1278,15 @@ static void __init register_page_bootmem_info(void) #endif } -/* - * Pre-allocates page-table pages for the vmalloc area in the kernel page-table. - * Only the level which needs to be synchronized between all page-tables is - * allocated because the synchronization can be expensive. - */ -static void __init preallocate_vmalloc_pages(void) +/* Initialize empty pagetables at the level below PGD. */ +void __init preallocate_sub_pgd_pages(pgd_t *pgd_table, ulong start, + ulong end, const char *name) { unsigned long addr; const char *lvl; - for (addr = VMALLOC_START; addr <= VMEMORY_END; addr = ALIGN(addr + 1, PGDIR_SIZE)) { - pgd_t *pgd = pgd_offset_k(addr); + for (addr = start; addr <= end; addr = ALIGN(addr + 1, PGDIR_SIZE)) { + pgd_t *pgd = pgd_offset_pgd(pgd_table, addr); p4d_t *p4d; pud_t *pud; @@ -1325,7 +1322,17 @@ static void __init preallocate_vmalloc_pages(void) * The pages have to be there now or they will be missing in * process page-tables later. */ - panic("Failed to pre-allocate %s pages for vmalloc area\n", lvl); + panic("Failed to pre-allocate %s pages for %s area\n", lvl, name); +} + +/* + * Pre-allocates page-table pages for the vmalloc area in the kernel page-table. + * Only the level which needs to be synchronized between all page-tables is + * allocated because the synchronization can be expensive. + */ +static void __init preallocate_vmalloc_pages(void) +{ + preallocate_sub_pgd_pages(init_mm.pgd, VMALLOC_START, VMEMORY_END, "vmalloc"); } void __init mem_init(void) diff --git a/arch/x86/mm/mm_internal.h b/arch/x86/mm/mm_internal.h index 3f37b5c80bb3..1203a977edcd 100644 --- a/arch/x86/mm/mm_internal.h +++ b/arch/x86/mm/mm_internal.h @@ -25,4 +25,7 @@ void update_cache_mode_entry(unsigned entry, enum page_cache_mode cache); extern unsigned long tlb_single_page_flush_ceiling; +extern void preallocate_sub_pgd_pages(pgd_t *pgd_table, ulong start, + ulong end, const char *name); + #endif /* __X86_MM_INTERNAL_H */ diff --git a/include/asm-generic/asi.h b/include/asm-generic/asi.h index 3956f995fe6a..fd5a302e0e09 100644 --- a/include/asm-generic/asi.h +++ b/include/asm-generic/asi.h @@ -9,6 +9,8 @@ #define ASI_MAX_NUM_ORDER 0 #define ASI_MAX_NUM 0 +#define ASI_GLOBAL_NONSENSITIVE NULL + #ifndef _ASSEMBLY_ struct asi_hooks {}; From patchwork Fri Jul 12 17:00:33 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732025 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 36F4817C21A for ; Fri, 12 Jul 2024 17:01:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803698; cv=none; b=t2qnTKccGp7o9pTeGR6VHs7iIuaJCLwhA898Y7tfAyd3u7ZbTwwGtzSxxevtGUHCerdjI8sS/FYpgyKZw2nEyWUxLLsNUPQTuH9m9k4+FVzH7HiJmExDq+enzNLuOO5XsIDMxm7ft0lK8WKlQKPPmAJXRuqmlYhhF4GrtVElymk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803698; c=relaxed/simple; bh=tWN6TTu143KggvbB+U0oNB1aBXCL6xSZoNTP9ZhwMws=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uy1Up8Hga7rLCpg4xEQtU5XWE6EalhYEif5Q/4WBkHESvlCAXC21DclBPEA2Ye5ctUHu3r2Nj6fAVzc9mBBsU8er4ICpUEix+G6ld9h32Qk/pHXvCrIRqN/3+EzT5PQ5XYjf0TsXG5bUWhXLnY5lKIZZOAZtupig4o/lLSoGE7U= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=2TVNbVZw; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="2TVNbVZw" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-3678f832c75so1791366f8f.3 for ; Fri, 12 Jul 2024 10:01:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803696; x=1721408496; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=wVrvdlJuWkiD+fWXDk/FgNJ7m+hI0zlwjZ1uzkGuBuA=; b=2TVNbVZwLPkJRaZKBJjPxXXA10SyuhSb3GYjLyl62vvBnrkmnwrhv2eRrV0n0oHMcZ Z+EPcYP+0kmlzENohsivYVchqjeYhrLHRVGBMGmxRjozhpVd/4CAXMOX+tyuVTfjm6iq Cl5+bYe8upVb3tTbXaDIa1lU5paqTsYjsinPV13kT5yi9z1bv5BBnpZebaj/MkKOr0bU HuT/7zAdQNQt78ioFkP8MwGTzRK2YuE5gRCx6QQPpwiSrG2wrvqeyZd8ShAHqzQ/BQDr HZilBQJVUHH6P4MhVfTausRkfZLiD/Udrj9SBfgfmYRKZQqYs+rDU69KMdcdGNGCuiDh fkrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803696; x=1721408496; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=wVrvdlJuWkiD+fWXDk/FgNJ7m+hI0zlwjZ1uzkGuBuA=; b=HdKwY0n5Z7LuMsdXwOGomMYoUr7gjzYlfYLf9zoJXcGp6kCv6nQuB5jONGcnluPhqI dUkuIlUrucz4YzJElWNA2SURe8C13FfPsycPYI1ZPkElJ9A6gD3J9Qg69QYiKdbT+QGL OZEWeGywAUTNxvCprHXO+bbr5aGIp3+feDpAJDroZytzdpxe3bdmtyzv5TCJbLMg4nT8 EY1Msmd7PWF3qQGxKukpFgrYpMyZJy7MlcT328HY49eoenjJihbkSFYaTNWautExH17w 9IIzqGYL2Kxguw2YHJHzNZmATLZjwCjv/nfhu4IQCpJhOoQXa8tyfF/Y96QpVPcYFWqn 4xRQ== X-Forwarded-Encrypted: i=1; AJvYcCUTsg3Mf20PS3gKbU9wp7hh4/bH9z2I7VhZ/9Evhy5ksgU2O532oYpFqc2ghECbqPKJZvKV5GGOBnM2j5sy+UIGaBAl X-Gm-Message-State: AOJu0Yw8xWXF+EjjIMexOqtofRJW8oCP4cssbcD+G2y+yRzofMfLvNgq Anp0wppdM8PcM+dwcQXqj7AFDL65OmQvdVT1QdOx1SCNYXN3CDMB+iJHLlRbrDaFrxKtlZvLbuF 0LJk7aKTqGg== X-Google-Smtp-Source: AGHT+IErU4rdmgQiGRDTef2HJjYT1+igKTiETpi6GcIgcGxVKuf8GnSM5HDh2rVJ+wSRQZ9KiIWprZFuMKIHlA== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:400c:b0:368:5d2:9e5f with SMTP id ffacd0b85a97d-36805d29fa3mr5077f8f.0.1720803695788; Fri, 12 Jul 2024 10:01:35 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:33 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-15-144b319a40d8@google.com> Subject: [PATCH 15/26] mm: Add __PAGEFLAG_FALSE From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman __PAGEFLAG_FALSE is a non-atomic equivalent of PAGEFLAG_FALSE. Signed-off-by: Brendan Jackman --- include/linux/page-flags.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 4bf1c25fd1dc5..57fa58899a661 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -488,6 +488,10 @@ static inline int Page##uname(const struct page *page) { return 0; } FOLIO_SET_FLAG_NOOP(lname) \ static inline void SetPage##uname(struct page *page) { } +#define __SETPAGEFLAG_NOOP(uname, lname) \ +static inline void __folio_set_##lname(struct folio *folio) { } \ +static inline void __SetPage##uname(struct page *page) { } + #define CLEARPAGEFLAG_NOOP(uname, lname) \ FOLIO_CLEAR_FLAG_NOOP(lname) \ static inline void ClearPage##uname(struct page *page) { } @@ -510,6 +514,9 @@ static inline int TestClearPage##uname(struct page *page) { return 0; } #define TESTSCFLAG_FALSE(uname, lname) \ TESTSETFLAG_FALSE(uname, lname) TESTCLEARFLAG_FALSE(uname, lname) +#define __PAGEFLAG_FALSE(uname, lname) TESTPAGEFLAG_FALSE(uname, lname) \ + __SETPAGEFLAG_NOOP(uname, lname) __CLEARPAGEFLAG_NOOP(uname, lname) + __PAGEFLAG(Locked, locked, PF_NO_TAIL) FOLIO_FLAG(waiters, FOLIO_HEAD_PAGE) PAGEFLAG(Error, error, PF_NO_TAIL) TESTCLEARFLAG(Error, error, PF_NO_TAIL) From patchwork Fri Jul 12 17:00:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732026 Received: from mail-wr1-f73.google.com (mail-wr1-f73.google.com [209.85.221.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 28DBC17C7B6 for ; Fri, 12 Jul 2024 17:01:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803702; cv=none; b=ELJjKZhMu3Ht5jEj5iZXbl0AmDxjUiNk+Q4c2VCInYOPbFVClFvJGpyhSg3EXZ+thiwtTRA02mHnzAAeMh6lBZJ0vxQkosGAEkytIF6GcmXjNuv1GlG1j0xqTDsxqLj6Vy7u+dm45B+9PBqx/008ick3cbKLXBKgcXtXX+3Kd4Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803702; c=relaxed/simple; bh=pTXcmnTga+Jc6ez8CkNFqLYlbuHN6bbyvm/QWL3+3+8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tCpkeaJ6FdjeSEqjpNP1YRqyKRtz076pRSx+Bsq2K+mCv7VqqhwjHSY7G3BByIWzKmrCncplXWm9vxz1hL0OMAZSnGLmmpADkYuQz8EKL/7hWO6cxj8WvsA2eDm8nwIZ7c5xIhQNMG6mAserZztuLs1wgf9sMEUv3C6w8xVh0qo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=CuGiG1m4; arc=none smtp.client-ip=209.85.221.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="CuGiG1m4" Received: by mail-wr1-f73.google.com with SMTP id ffacd0b85a97d-36793373454so1384018f8f.1 for ; Fri, 12 Jul 2024 10:01:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803699; x=1721408499; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tuve7JnQEpqqfaMLOqszce9B8olVou/+ZFnfWaWBcIQ=; b=CuGiG1m4fOCxF522E/LffzR4MHhnjoQKhJOF9fv/eLZ3rjSdNqdIkXej5+QLMbvXfa VMffGza/vDTmAN4GcCjSuHGxq/9wHt4PdngbvFP41Y6JtYd3FMuj4JiAbYyZkOFsJPF7 o/t3YDaYmfuQbSTgEehtq9K8/38ZtYJSPm7cAIAGgJ3IlB9fuxyBju6W+BStl0zPLqBO qZeRUyn7pXMkccb9udkdrjLeoQRoyePXQRoKKt+8CFB6isfLqugCgQEfJM95NcNQB8GI n6fWRzxiDXaLIA9EjLeRB98Z+83SGWR9QG1mVI3ZW7bq5z+3jg9/y8aA9FNK9GtW0WE7 DLXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803699; x=1721408499; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tuve7JnQEpqqfaMLOqszce9B8olVou/+ZFnfWaWBcIQ=; b=ni8v+EuMs/Pm+NlYn69Bwawz9jvJJmKRrzOm8h4me78LRzv0ierpx6GgwUXer8pTCA AwclguEsMCRqKs+7Woz4zM3YZe8usc97XF3oLLs8zP53n5LgPJh5dkBYjmp412uAJDgs hvuc4nwmGpWQt+1pMu8/DiSJQqQ3U0xAB7SFJ+2BRFB3UTCrqkxxznDlWCn8/RmV1xI7 NFQW0ZlpBhF8mvfeYEZ/lD8g0SVzBV/obHiEeHSphWH0e9MomzguvPcAS/B/q/jOKGNv 4y+C7VUFmOHOV3/WAtfJ4m3skr3ASa+jKiWrknvkt9YTd9i8jdkGlk6wEj5s9tX1k8jl EWjw== X-Forwarded-Encrypted: i=1; AJvYcCXccz5Bw8A6WLeHAXw+X9sTm2LRMA4I7BugMYAyiKy2HtCyTemO1Fmh2gK8ygAyGcqGJm7p7W9wq0selzLVGMMC3bGr X-Gm-Message-State: AOJu0YymDwEtqATq+lq6AiRuqt+Tiewg13l0qWAhIpfMWhKbQclkEQl5 k6DBUOXzL4XY+YUb1IV5G2TjHtS479f2eEyk3hMPCbHk9EW6vlZahflyhrHSNft2iGcPWJc5/h5 FduBc2J8MLQ== X-Google-Smtp-Source: AGHT+IE/LeQCdWU5upInTaAHGJKI73E8a+vH6WinMVUM+08W4ltowKkHhgsLti9XyJHiOTCwn2ccsqpZeVTo4A== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:14c:b0:360:727b:8b5d with SMTP id ffacd0b85a97d-367cea738dbmr25058f8f.6.1720803698587; Fri, 12 Jul 2024 10:01:38 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:34 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-16-144b319a40d8@google.com> Subject: [PATCH 16/26] mm: asi: Map non-user buddy allocations as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman This is just simplest possible page_alloc patch I could come up with to demonstrate ASI working in a "denylist" mode: we map the direct map into the restricted address space, except pages allocated with GFP_USER. Pages must be asi_unmap()'d before they can be re-allocated. This requires a TLB flush, which can't generally be done from the free path (requires IRQs on), so pages that need unmapping are freed via a workqueue. This solution is silly for at least the following reasons: - If the async queue gets long, we'll run out of allocatable memory. - We don't batch the TLB flushing or worker wakeups at all. - We drop FPI flags and skip the pcplists. Internally at Google we've so far found with plenty of extra complexity we're able to make the principle work for the workloads we've tested so far, but it seems likely we'll hit a wall where tuning gets impossible. So instead for the [PATCH] version I hope to come up with an implementation that instead just makes the allocator more deeply aware of sensitivity, most likely this will look a bit like an extra "dimension" like movability etc. This was discussed at LSF/MM/BPF [1] but I haven't made time to experiment on it yet. With this smarter approach, it should also be possible to remove the pageflag, as other contextual information will let us know if a page is mapped in the restricted address space (the page tables also reflect this status...). [1] https://youtu.be/WD9-ey8LeiI The main thing in here that is "real" and may warrant discussion is __GFP_SENSITIVE (or at least, some sort of allocator switch to determine sensitivity, in an "allowlist" model we would probably have the opposite, and in future iterations we might want additional options for different "types" of sensitivity). I think we need this as an extension to the allocation API; the main alternative would be to infer from context of the allocation whether the data should be treated as sensitive; however I think we will have contexts where both sensitive and nonsensitive data needs to be allocatable. If there are concerns about __GFP flags specifically, rather than just the general problem of expanding the allocator API, we could always just provide an API like __alloc_pages_sensitive or something, implemented with ALLOC_ flags internally. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 33 +++++++++- include/linux/gfp_types.h | 15 ++++- include/linux/page-flags.h | 9 +++ include/trace/events/mmflags.h | 12 +++- mm/page_alloc.c | 143 ++++++++++++++++++++++++++++++++++++++++- tools/perf/builtin-kmem.c | 1 + 6 files changed, 208 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 807d51497f43a..6e106f25abbb9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -5,6 +5,8 @@ #include #include +#include + #include #include #include @@ -102,10 +104,17 @@ EXPORT_SYMBOL_GPL(asi_unregister_class); * allocator from interrupts and the page allocator ultimately calls this * code. * - They support customizing the allocation flags. + * - They avoid infinite recursion when the page allocator calls back to + * asi_map * * On the other hand, they do not use the normal page allocation infrastructure, * that means that PTE pages do not have the PageTable type nor the PagePgtable * flag and we don't increment the meminfo stat (NR_PAGETABLE) as they do. + * + * As an optimisation we attempt to map the pagetables in + * ASI_GLOBAL_NONSENSITIVE, but this can fail, and for simplicity we don't do + * anything about that. This means it's invalid to access ASI pagetables from a + * critical section. */ static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ @@ -114,8 +123,11 @@ static level##_t * asi_##level##_alloc(struct asi *asi, \ gfp_t flags) \ { \ if (unlikely(base##_none(*base))) { \ - ulong pgtbl = get_zeroed_page(flags); \ + /* Stop asi_map calls causing recursive allocation */ \ + gfp_t pgtbl_gfp = flags | __GFP_SENSITIVE; \ + ulong pgtbl = get_zeroed_page(pgtbl_gfp); \ phys_addr_t pgtbl_pa; \ + int err; \ \ if (!pgtbl) \ return NULL; \ @@ -129,6 +141,16 @@ static level##_t * asi_##level##_alloc(struct asi *asi, \ } \ \ mm_inc_nr_##level##s(asi->mm); \ + \ + err = asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, \ + (void *)pgtbl, PAGE_SIZE, flags); \ + if (err) \ + /* Should be rare. Spooky. */ \ + pr_warn_ratelimited("Created sensitive ASI %s (%pK, maps %luK).\n",\ + #level, (void *)pgtbl, addr); \ + else \ + __SetPageGlobalNonSensitive(virt_to_page(pgtbl));\ + \ } \ out: \ VM_BUG_ON(base##_leaf(*base)); \ @@ -469,6 +491,9 @@ static bool follow_physaddr( * reason for this is that we don't want to unexpectedly undo mappings that * weren't created by the present caller. * + * This must not be called from the critical section, as ASI's pagetables are + * not guaranteed to be mapped in the restricted address space. + * * If the source mapping is a large page and the range being mapped spans the * entire large page, then it will be mapped as a large page in the ASI page * tables too. If the range does not span the entire huge page, then it will be @@ -492,6 +517,9 @@ int __must_check asi_map_gfp(struct asi *asi, void *addr, unsigned long len, gfp if (!static_asi_enabled()) return 0; + /* ASI pagetables might be sensitive. */ + WARN_ON_ONCE(asi_in_critical_section()); + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); VM_BUG_ON(!IS_ALIGNED(len, PAGE_SIZE)); VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ @@ -591,6 +619,9 @@ void asi_unmap(struct asi *asi, void *addr, size_t len) if (!static_asi_enabled() || !len) return; + /* ASI pagetables might be sensitive. */ + WARN_ON_ONCE(asi_in_critical_section()); + VM_BUG_ON(start & ~PAGE_MASK); VM_BUG_ON(len & ~PAGE_MASK); VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 13becafe41df0..d33953a1c9b28 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -55,6 +55,7 @@ enum { #ifdef CONFIG_LOCKDEP ___GFP_NOLOCKDEP_BIT, #endif + ___GFP_SENSITIVE_BIT, ___GFP_LAST_BIT }; @@ -95,6 +96,11 @@ enum { #else #define ___GFP_NOLOCKDEP 0 #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define ___GFP_SENSITIVE BIT(___GFP_SENSITIVE_BIT) +#else +#define ___GFP_SENSITIVE 0 +#endif /* * Physical address zone modifiers (see linux/mmzone.h - low four bits) @@ -284,6 +290,12 @@ enum { /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) +/* + * Allocate sensitive memory, i.e. do not map it into ASI's restricted address + * space. + */ +#define __GFP_SENSITIVE ((__force gfp_t)___GFP_SENSITIVE) + /* Room for N __GFP_FOO bits */ #define __GFP_BITS_SHIFT ___GFP_LAST_BIT #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) @@ -365,7 +377,8 @@ enum { #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN) #define GFP_NOIO (__GFP_RECLAIM) #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) -#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) +#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | \ + __GFP_HARDWALL | __GFP_SENSITIVE) #define GFP_DMA __GFP_DMA #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 57fa58899a661..d4842cd1fb59a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -135,6 +135,9 @@ enum pageflags { #ifdef CONFIG_ARCH_USES_PG_ARCH_X PG_arch_2, PG_arch_3, +#endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + PG_global_nonsensitive, #endif __NR_PAGEFLAGS, @@ -642,6 +645,12 @@ FOLIO_TEST_CLEAR_FLAG(young, FOLIO_HEAD_PAGE) FOLIO_FLAG(idle, FOLIO_HEAD_PAGE) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +#else +__PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +#endif + /* * PageReported() is used to track reported free pages within the Buddy * allocator. We can use the non-atomic version of the test and set diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index d55e53ac91bd2..416a79fe1a66d 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -50,7 +50,8 @@ gfpflag_string(__GFP_RECLAIM), \ gfpflag_string(__GFP_DIRECT_RECLAIM), \ gfpflag_string(__GFP_KSWAPD_RECLAIM), \ - gfpflag_string(__GFP_ZEROTAGS) + gfpflag_string(__GFP_ZEROTAGS), \ + gfpflag_string(__GFP_SENSITIVE) #ifdef CONFIG_KASAN_HW_TAGS #define __def_gfpflag_names_kasan , \ @@ -95,6 +96,12 @@ #define IF_HAVE_PG_ARCH_X(_name) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define IF_HAVE_ASI(_name) ,{1UL << PG_##_name, __stringify(_name)} +#else +#define IF_HAVE_ASI(_name) +#endif + #define DEF_PAGEFLAG_NAME(_name) { 1UL << PG_##_name, __stringify(_name) } #define __def_pageflag_names \ @@ -125,7 +132,8 @@ IF_HAVE_PG_HWPOISON(hwpoison) \ IF_HAVE_PG_IDLE(idle) \ IF_HAVE_PG_IDLE(young) \ IF_HAVE_PG_ARCH_X(arch_2) \ -IF_HAVE_PG_ARCH_X(arch_3) +IF_HAVE_PG_ARCH_X(arch_3) \ +IF_HAVE_ASI(global_nonsensitive) #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14d39f34d3367..1e71ee9ae178c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1081,6 +1081,8 @@ static void kernel_init_pages(struct page *page, int numpages) kasan_enable_current(); } +static bool asi_async_free_enqueue(struct page *page, unsigned int order); + __always_inline bool free_pages_prepare(struct page *page, unsigned int order) { @@ -1177,7 +1179,7 @@ __always_inline bool free_pages_prepare(struct page *page, debug_pagealloc_unmap_pages(page, 1 << order); - return true; + return !asi_async_free_enqueue(page, order); } /* @@ -4364,6 +4366,136 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, return true; } +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +struct asi_async_free_cpu_state { + struct work_struct work; + struct list_head to_free; +}; +static DEFINE_PER_CPU(struct asi_async_free_cpu_state, asi_async_free_cpu_state); + +static bool async_free_work_initialized; + +static void asi_async_free_work_fn(struct work_struct *work) +{ + struct asi_async_free_cpu_state *cpu_state = + container_of(work, struct asi_async_free_cpu_state, work); + struct page *page, *tmp; + struct list_head to_free = LIST_HEAD_INIT(to_free); + + local_irq_disable(); + list_splice_init(&cpu_state->to_free, &to_free); + local_irq_enable(); /* IRQs must be on for asi_unmap. */ + + /* Use _safe because __free_the_page uses .lru */ + list_for_each_entry_safe(page, tmp, &to_free, lru) { + unsigned long order = page_private(page); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE << order); + for (int i = 0; i < (1 << order); i++) + __ClearPageGlobalNonSensitive(page + i); + + /* + * Note weird loop-de-loop here, we might already have called + * __free_pages_ok for this page, but now we've cleared + * PageGlobalNonSensitive so it won't end up back on the queue + * again. + */ + __free_pages_ok(page, order, FPI_NONE); + cond_resched(); + } +} + +/* Returns true if the page was queued for asynchronous freeing. */ +static bool asi_async_free_enqueue(struct page *page, unsigned int order) +{ + struct asi_async_free_cpu_state *cpu_state; + unsigned long flags; + + if (!PageGlobalNonSensitive(page)) + return false; + + local_irq_save(flags); + cpu_state = this_cpu_ptr(&asi_async_free_cpu_state); + set_page_private(page, order); + list_add(&page->lru, &cpu_state->to_free); + local_irq_restore(flags); + + return true; +} + +static int __init asi_page_alloc_init(void) +{ + int cpu; + + if (!static_asi_enabled()) + return 0; + + for_each_possible_cpu(cpu) { + struct asi_async_free_cpu_state *cpu_state + = &per_cpu(asi_async_free_cpu_state, cpu); + + INIT_WORK(&cpu_state->work, asi_async_free_work_fn); + INIT_LIST_HEAD(&cpu_state->to_free); + } + + /* + * This function is called before SMP is initialized, so we can assume + * that this is the only running CPU at this point. + */ + + barrier(); + async_free_work_initialized = true; + barrier(); + + return 0; +} +early_initcall(asi_page_alloc_init); + +static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) +{ + + if (!static_asi_enabled()) + return 0; + + if (!(gfp_mask & __GFP_SENSITIVE)) { + int err = asi_map_gfp( + ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); + uint i; + + if (err) + return err; + + for (i = 0; i < (1 << order); i++) + __SetPageGlobalNonSensitive(page + i); + } + + return 0; +} + +#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + +static inline +int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask) +{ + return 0; +} + +static inline +bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + return true; +} + +static bool asi_async_free_enqueue(struct page *page, unsigned int order) +{ + return false; +} + +#endif + /* * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array * @gfp: GFP flags for the allocation @@ -4551,6 +4683,10 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; + /* Clear out old (maybe sensitive) data before reallocating as nonsensitive. */ + if (!static_asi_enabled() && !(gfp & __GFP_SENSITIVE)) + gfp |= __GFP_ZERO; + gfp &= gfp_allowed_mask; /* * Apply scoped allocation constraints. This is mainly about GFP_NOFS @@ -4597,6 +4733,11 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); + if (page && unlikely(asi_map_alloced_pages(page, order, gfp))) { + __free_pages(page, order); + page = NULL; + } + return page; } EXPORT_SYMBOL(__alloc_pages); diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 9714327fd0ead..912497b7b1c3f 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -682,6 +682,7 @@ static const struct { { "__GFP_RECLAIM", "R" }, { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, + { "__GFP_SENSITIVE", "S" }, }; static size_t max_gfp_len; From patchwork Fri Jul 12 17:00:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732027 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DEFEA17BB1E for ; Fri, 12 Jul 2024 17:01:42 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803704; cv=none; b=sSgkje4BCtT1y0lSs07nB/cDY6FygkmIYWrwvR0IRYOCsJSZkNS/XL07vH/+W9NenWHc3/aPaXoTQQMuNtJH3wu2kV7AHiAk+unxduaswIM53Zn1M3nse3tJ3W+DhcifTyme0Zh50j5QBT860SAjW9caCZ37vw2pXbP3wuOFzNo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803704; c=relaxed/simple; bh=OoDGhAtZINf/eC54ISE3Likkp4nTLCyVsj6xWH4/LYU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=uYmj0WCOs/HTDjJDX2R9Pz/G3M57lCUF0+ZAaY8LDpCw32fqe1k+jumaI3Gxh/caIu8hYDjJSyuHsBn1UwOAY6Hv9XcXW3ZVyU0BBGhdhuVQ6RB3i7s0NoK1yf99PEY7abRsqXPkwUeI4uvoLQPaSfWhF/gD7PVMMFyH4hJZJMU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=apyLtnmc; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="apyLtnmc" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-4266fbae4c6so15469785e9.0 for ; Fri, 12 Jul 2024 10:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803701; x=1721408501; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mWHMoCzbupyPLn4HJqmbJTZdF2QBYUbInE64x2NA2rE=; b=apyLtnmcUPymzIKsAD/MUoC0Z6tI0gArThPFKcpmi+m1mUp1Fen3vbEop1n6yCY5ov CgDn/IXU7zCeOjFteTKfL/NX0bF0H9cdN1uQM0Up4eVTSj+7sZSuCG9QxZiNo0iTNbeK zmLZY/fziBqYJjwKvj0Oc88kej3hIaYvoDIRursDi8NipPmWPqrHNjtYr5NgaWCMdp+l PCqJF8NiZwmiyqhhh1Fu0Tqrrolg+3zjF9tdMnXpV0rBNgaHhCWgOLXzOU25FRaMXb/3 MLExLGtSZ2iEMxJlqlEeo1fNQMwVT7L21TA3DxZPLfWGIBGdQamIaVU9nwAq6kGWL8i0 rm2Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803701; x=1721408501; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mWHMoCzbupyPLn4HJqmbJTZdF2QBYUbInE64x2NA2rE=; b=T2/bM6oxfTGxGPfjIb06kBVSJBcYHpuvlDF1/fzzxqgjaOa4plUGiB2FI0rDjtAj0G /TnfD15TVhDMnp85nPacCCjAt/6SFJJcXrFNxMWkBQxvEVqZdlOuF5pMWIoDXtO1cCTJ xVe1OyCJ9vjHEYnN5yPFAjiMOq79WRMBBLpRGkZBhLh6zJibXKiGl1OpmjJT9ZbC0C2N YIEyBioTd10QTxWuArRn4+/ACSAjfFDvyVvLSZF4Bj84j08W1CPygHz0u9wvDhNOnLrD opTzcmqX43lUfYUlvBtDIAPcYcVSuvic2O/k4QpGedlNReY1kF/5qDmwD81vXh79xnLi MxXg== X-Forwarded-Encrypted: i=1; AJvYcCXqi/4QJzZxc87MFbI9W634X3pbpiHvpoouDt3lZy99QTrLMoShumeZue4hX36aZVSvw4rjLEZ8TD8qdLRHpXaHRL0Z X-Gm-Message-State: AOJu0Ywi7fWrAMKP7+ZL11D5Xu5K0nt474vG3cyYt4M6/H12tcizGxuv cd32BJAfbHr/aSPG2tmZtrXQvNE7hCtxxVNJv1aHG5Sb/NaqOuiuGApUpix7C7ssJVPebrHz6LZ 4TnGF8hJtrw== X-Google-Smtp-Source: AGHT+IH7bbw/Dn5jSbbHkSsXS63XrNwSXZ9nYhS09PC8vYQaS4DPRI1dYsfdXRcOP1EzG0hCXTQisDlHIVSzOg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:4f0e:b0:426:6a73:fb5f with SMTP id 5b1f17b1804b1-426708f9ab5mr1905165e9.7.1720803701319; Fri, 12 Jul 2024 10:01:41 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:35 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-17-144b319a40d8@google.com> Subject: [PATCH 17/26] mm: asi: Map kernel text and static data as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Basically we need to map the kernel code and all its static variables. Per-CPU variables need to be treated specially as described in the comments. The cpu_entry_area is similar - this needs to be nonsensitive so that the CPU can access the GDT etc when handling a page fault. Under 5-level paging, most of the kernel memory comes under a single PGD entry (see Documentation/x86/x86_64/mm.rst. Basically, the mapping is for this big region is the same as under 4-level, just wrapped in an outer PGD entry). For that region, the "clone" logic is moved down one step of the paging hierarchy. Note that the p4d_alloc in asi_clone_p4d won't actually be used in practice; the relevant PGD entry will always have been populated by prior asi_map calls so this code would "work" if we just wrote p4d_offset (but asi_clone_p4d would be broken if viewed in isolation). The vmemmap area is not under this single PGD, it has its own 2-PGD area, so we still use asi_clone_pgd for that one. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 106 +++++++++++++++++++++++++++++++++++++- include/asm-generic/vmlinux.lds.h | 11 ++++ 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6e106f25abbb..891b8d351df8 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -7,8 +7,8 @@ #include #include -#include #include +#include #include #include #include @@ -184,8 +184,68 @@ void __init asi_check_boottime_disable(void) pr_info("ASI enablement ignored due to incomplete implementation.\n"); } +/* + * Map data by sharing sub-PGD pagetables with the unrestricted mapping. This is + * more efficient than asi_map, but only works when you know the whole top-level + * page needs to be mapped in the restricted tables. Note that the size of the + * mappings this creates differs between 4 and 5-level paging. + */ +static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src = pgd_offset_pgd(src_table, addr); + pgd_t *dst = pgd_offset_pgd(dst_table, addr); + + if (!pgd_val(*dst)) + set_pgd(dst, *src); + else + WARN_ON_ONCE(pgd_val(*dst) != pgd_val(*src)); +} + +/* + * For 4-level paging this is exactly the same as asi_clone_pgd. For 5-level + * paging it clones one level lower. So this always creates a mapping of the + * same size. + */ +static void asi_clone_p4d(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src_pgd = pgd_offset_pgd(src_table, addr); + pgd_t *dst_pgd = pgd_offset_pgd(dst_table, addr); + p4d_t *src_p4d = p4d_alloc(&init_mm, src_pgd, addr); + p4d_t *dst_p4d = p4d_alloc(&init_mm, dst_pgd, addr); + + if (!p4d_val(*dst_p4d)) + set_p4d(dst_p4d, *src_p4d); + else + WARN_ON_ONCE(p4d_val(*dst_p4d) != p4d_val(*src_p4d)); +} + +/* + * percpu_addr is where the linker put the percpu variable. asi_map_percpu finds + * the place where the percpu allocator copied the data during boot. + * + * This is necessary even when the page allocator defaults to + * global-nonsensitive, because the percpu allocator uses the memblock allocator + * for early allocations. + */ +static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu, err; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr = per_cpu_ptr(percpu_addr, cpu); + err = asi_map(asi, ptr, len); + if (err) + return err; + } + + return 0; +} + static int __init asi_global_init(void) { + int err; + if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -205,6 +265,46 @@ static int __init asi_global_init(void) VMALLOC_START, VMALLOC_END, "ASI Global Non-sensitive vmalloc"); + /* Map all kernel text and static data */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)__START_KERNEL, + (size_t)_end - __START_KERNEL); + if (WARN_ON(err)) + return err; + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)FIXADDR_START, + FIXADDR_SIZE); + if (WARN_ON(err)) + return err; + /* Map all static percpu data */ + err = asi_map_percpu( + ASI_GLOBAL_NONSENSITIVE, + __per_cpu_start, __per_cpu_end - __per_cpu_start); + if (WARN_ON(err)) + return err; + + /* + * The next areas are mapped using shared sub-P4D paging structures + * (asi_clone_p4d instead of asi_map), since we know the whole P4D will + * be mapped. + */ + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + CPU_ENTRY_AREA_BASE); +#ifdef CONFIG_X86_ESPFIX64 + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + ESPFIX_BASE_ADDR); +#endif + /* + * The vmemmap area actually _must_ be cloned via shared paging + * structures, since mappings can potentially change dynamically when + * hugetlbfs pages are created or broken down. + * + * We always clone 2 PGDs, this is a corrolary of the sizes of struct + * page, a page, and the physical address space. + */ + WARN_ON(sizeof(struct page) * MAXMEM / PAGE_SIZE != 2 * (1UL << PGDIR_SHIFT)); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, + VMEMMAP_START + (1UL << PGDIR_SHIFT)); + return 0; } subsys_initcall(asi_global_init) @@ -482,6 +582,10 @@ static bool follow_physaddr( * Map the given range into the ASI page tables. The source of the mapping is * the regular unrestricted page tables. Can be used to map any kernel memory. * + * In contrast to some internal ASI logic (asi_clone_pgd and asi_clone_p4d) this + * never shares pagetables between restricted and unrestricted address spaces, + * instead it creates wholly new equivalent mappings. + * * The caller MUST ensure that the source mapping will not change during this * function. For dynamic kernel memory, this is generally ensured by mapping the * memory within the allocator. diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index f7749d0f2562..4eca33d62950 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -1021,6 +1021,16 @@ COMMON_DISCARDS \ } +/* + * ASI maps certain sections with certain sensitivity levels, so they need to + * have a page-aligned size. + */ +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define ASI_ALIGN() ALIGN(PAGE_SIZE) +#else +#define ASI_ALIGN() . +#endif + /** * PERCPU_INPUT - the percpu input sections * @cacheline: cacheline size @@ -1042,6 +1052,7 @@ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ PERCPU_DECRYPTED_SECTION \ + . = ASI_ALIGN(); \ __per_cpu_end = .; /** From patchwork Fri Jul 12 17:00:36 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732028 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5253F17C9F8 for ; Fri, 12 Jul 2024 17:01:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.73 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803708; cv=none; b=Dh/jWmDiapZR30uPe1ze3XiHXFh3qd5MKRzaifpkubDnlBM64miLwtUpktdvKDb67OnN950//ButGbUoy27jy+EXR4/nGdRUlA3M4pIQ9A/K827Bem9Ng6npZ/iGnSa1NEguddi7kCYy3a8/AUGrvm/KNxy26KzQ3zVLvo5fJT4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803708; c=relaxed/simple; bh=ZFN3pqUHygYP3u85utcm51QcO4qoDXiBKRc/+5sufdI=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=bxETCgCsNncPH67qoPJpkgXbphs8gZVYWyqDdT/mB4ii0saBqoZe72kvbWFhASu+UGeJNdm85fRP/KUXhfXhqvbZFMZTmeIlmjl+Iq57FsHZn+s1c8lZHBd/z6+tG9S4YJEPx2PNbYL9u0t2q/Th9z3C3Ha0GnsoQsuQR0G2Qc4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ikU/AH5i; arc=none smtp.client-ip=209.85.128.73 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ikU/AH5i" Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-42725d3ae3eso15363205e9.3 for ; Fri, 12 Jul 2024 10:01:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803705; x=1721408505; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9uYrhf+8D7jqi5tPdexW+5CSq/ZRPpRaUw7HFMG3mLc=; b=ikU/AH5iybV0aMgJSTtuDJydkFM41C6xEuKrPpd/g2lfNLSUz+gjiUthUYDbADOmlS Ni5Icu8CBDSYtGnVZpAlbaa+5q1PuMnipqob5nwbcnBLDC8nLLpYLaijLHjw4dUZ042z mUlrawsNcDSh9y4GrAZJxH8X/bVTNxPwREnYOjF5fyMfsp9ts554oWc17QSNomCu3iIL HsVB7ZyooBn/+ztu+tSrcfqgh5hzEDsx+hx6u5SxsIFptFwPVM9IVG7vvzgKi18eTe/B S1SWdSoXNtjd5djZ+H2EtWx2gnNbdV+8MNT9PkpFtHE2jjrFvy6BP8zuEkI4pFzHYkHo +evg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803705; x=1721408505; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9uYrhf+8D7jqi5tPdexW+5CSq/ZRPpRaUw7HFMG3mLc=; b=EOvBFy8hnV9ABADomlfdTLaF8x13ms25zR9LCXCqk8d4qRITSu3GXS5E/3jhU10KYA CeAFsUT0Msg5HZ5uAHkSFpWMh4AYJKtgjkPuwTY4ULPBDXF7WfJSHXQIT5tCiNLjBoGC lk6vhoQxBA5yRoJkNN4smfKJggudJsnB4faBo2pAHYw8/by8O81a6vL+A9ghY5K+S1ma HmMMJDWGFQdecPQAnULZTXtjSRjG4dhAIvmXZuXHK/bhq1GxZVS20hpfk7kwLuc0izwH pMglTNqjA8MEVr6uNegrlgN2FTuN394cGIzlMKJ8vPJLdnXxfhI1Ql/iBUgrHbDsmIW6 5aaw== X-Forwarded-Encrypted: i=1; AJvYcCXnSrgSWRp3PYZihWKFmpdzNWVzSDhJ8D6rvvTfotXR/oSHSQJ+AeunsNpvKlof13sWRvPwKWJxRymnncxFiTF+PgF3 X-Gm-Message-State: AOJu0YxEhz18+FhfAqkQf0FBFRbxWDW5pZUcQVbuVTJIJ+/HF131fmwL AaRRfLZrrMrW89TOdcrGeHVDqb8IFhFvOXGZgoAEqmHsfQl83vKxsCtRtfLJ2aa6V6ImPytWsGF K1+SWmaQxZA== X-Google-Smtp-Source: AGHT+IEAch1YA6Hi8yZ9PqgivQRgArWS9QwTp9vfGmMIf5En6W6fVDOrfLBr4fzHwEnBua+4wRIDTzrg/vklEA== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:19cc:b0:426:6c7a:3a77 with SMTP id 5b1f17b1804b1-426705ced5bmr1906465e9.1.1720803704718; Fri, 12 Jul 2024 10:01:44 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:36 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-18-144b319a40d8@google.com> Subject: [PATCH 18/26] mm: asi: Map vmalloc/vmap data as nonsesnitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman We add new VM flags for sensitive and global-nonsensitive, parallel to the corresponding GFP flags. __get_vm_area_node and friends will default to creating global-nonsensitive VM areas, and vmap then calls asi_map as necessary. __vmalloc_node_range has additional logic to check and set defaults for the sensitivity of the underlying page allocation. It does this via an initial __set_asi_flags call - note that it then calls __get_vm_area_node which also calls __set_asi_flags. This second call is a NOP. By default, we mark the underlying page allocation as sensitive, even if the VM area is global-nonsensitive. This is just an optimization to avoid unnecessary asi_map etc, since presumably most code has no reason to access vmalloc'd data through the direct map. There are some details of the GFP-flag/VM-flag interaction that are not really obvious, for example: what should happen when callers of __vmalloc explicitly set GFP sensitivity flags? (That function has no VM flags argument). For the moment let's just not block on that and focus on adding the infastructure, though. At the moment, the high-level vmalloc APIs doesn't actually provide a way to conffigure sensitivity, this commit just adds the infrastructure. We'll have to decide how to expose this to allocation sites as we implement more denylist logic. vmap does already allow configuring vm flags. Signed-off-by: Brendan Jackman --- mm/vmalloc.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 7a8daf5afb7c..d14e2f692e42 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3189,6 +3189,7 @@ struct vm_struct *remove_vm_area(const void *addr) { struct vmap_area *va; struct vm_struct *vm; + unsigned long vm_addr; might_sleep(); @@ -3200,6 +3201,7 @@ struct vm_struct *remove_vm_area(const void *addr) if (!va || !va->vm) return NULL; vm = va->vm; + vm_addr = (unsigned long) READ_ONCE(vm->addr); debug_check_no_locks_freed(vm->addr, get_vm_area_size(vm)); debug_check_no_obj_freed(vm->addr, get_vm_area_size(vm)); @@ -3331,6 +3333,7 @@ void vfree(const void *addr) addr); return; } + asi_unmap(ASI_GLOBAL_NONSENSITIVE, vm->addr, get_vm_area_size(vm)); if (unlikely(vm->flags & VM_FLUSH_RESET_PERMS)) vm_reset_perms(vm); @@ -3370,12 +3373,14 @@ void vunmap(const void *addr) if (!addr) return; + vm = remove_vm_area(addr); if (unlikely(!vm)) { WARN(1, KERN_ERR "Trying to vunmap() nonexistent vm area (%p)\n", addr); return; } + asi_unmap(ASI_GLOBAL_NONSENSITIVE, vm->addr, get_vm_area_size(vm)); kfree(vm); } EXPORT_SYMBOL(vunmap); @@ -3424,16 +3429,21 @@ void *vmap(struct page **pages, unsigned int count, addr = (unsigned long)area->addr; if (vmap_pages_range(addr, addr + size, pgprot_nx(prot), - pages, PAGE_SHIFT) < 0) { - vunmap(area->addr); - return NULL; - } + pages, PAGE_SHIFT) < 0) + goto err; + + if (asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area))) + goto err; /* The necessary asi_unmap() is in vunmap. */ if (flags & VM_MAP_PUT_PAGES) { area->pages = pages; area->nr_pages = count; } return area->addr; +err: + vunmap(area->addr); + return NULL; } EXPORT_SYMBOL(vmap); @@ -3701,6 +3711,10 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, goto fail; } + if (asi_map(ASI_GLOBAL_NONSENSITIVE, area->addr, + get_vm_area_size(area))) + goto fail; /* The necessary asi_unmap() is in vfree. */ + return area->addr; fail: @@ -3780,6 +3794,13 @@ void *__vmalloc_node_range(unsigned long size, unsigned long align, size = ALIGN(real_size, 1UL << shift); } + /* + * Assume nobody is interested in accessing these pages via the direct + * map, so there's no point in having them in ASI's global-nonsensitive + * physmap, which would just cost us a TLB flush later on. + */ + gfp_mask |= __GFP_SENSITIVE; + again: area = __get_vm_area_node(real_size, align, shift, VM_ALLOC | VM_UNINITIALIZED | vm_flags, start, end, node, From patchwork Fri Jul 12 17:00:37 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732029 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6EC1117CA10 for ; Fri, 12 Jul 2024 17:01:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803709; cv=none; b=YEE78H5RYcSeOS3nWvrGhaBjrWhQPF1QeWmLkCqB+RLJCPKYG/a0zL9Z9Ent5X5a/Ab9iUUedJrpQQE93/E0mYUELcqBJuaLqf/DX1N5WI1sv3Km6yYrJ+oFnNqAmKz9+SIfOcsogXeT0gXLII2D4qqWYSbFK1A4oT9WqvpuWIk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803709; c=relaxed/simple; bh=9+32GFQKDiKRSTS7tV4qzvgjaIO16wRpq3iLd8/veyA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=Ew1lSS+jtW0Mb7oDMkq1axKoBs8LUvn+854zUmPClOU/fFfapsWBaKt/tzt4AhWTWK0GdHLDusWTAdV1JIP8tBVH8vmv5dXKzmnE2zPNIx/b6UuqeREe8bByRb+Ct1TScQdY7k7ALU4bh8JLrTMOzlqf9UlNrpxt8BTOcAHhmWs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=ZgqL4h79; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="ZgqL4h79" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-e03c68c7163so3861682276.0 for ; Fri, 12 Jul 2024 10:01:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803707; x=1721408507; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=L6MmdIw12rXfqeUQQj88z73bFenIU6DgFXoL45UzNnM=; b=ZgqL4h79FAa2jaVkqNfmRIFof8Ys8TqIeQ8ATFYskEQblxWzu7pm2OrrMjOxQinndY RsYlq+Al3dOqrwtiHvWgN2aHo6iK1LAa2G0DBYY5Xwnyu+0R4M7aiTmM9a+RK7yp2Tz/ aSWubuf3R8uTiUgDOPIuIPRBiz1XdAE5G+8nhpQ5h5OWWwpQ/sHk9Gvbv2OilLkQd4sK z0GfMUt2jD6uSWkCghAvX08EGPRFBCj0TkcLrQLTgeIKom5G84Q6yiwzMhImIpDWDDaM /T0uAoRTdevMZM7tl4fDJv/LB2Ckg31afdH/GP4tRBB992gj5U6Ek48qk6qrnyF7YWni pZQw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803707; x=1721408507; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=L6MmdIw12rXfqeUQQj88z73bFenIU6DgFXoL45UzNnM=; b=IniGuQEYaUgNFgb6J1kGLBTtEv72rx3ubjvHtnSdrxu9szBB4DyEDSrNfxvmbPJN4g dWKu78yMNzkOvS93Fytqvytq73GQQr1a18xB8emzTvJdjC3bchj12sCxVq1+aeSaLZHY sgb8L3VUfeCNThBPTjUhmTpNt4kccPsKXJ8gJR60jECRJE0jSdgikWoq7nMiKKqooeph NSE58NIZ/0bL4QjpkF3HaMOVhCQHi16Q+dBByMpT+mUCuyjhL3eeThf0E6ugAQd/4hyj njh/wiHYz00hbvXvaanvvxdmrqA1gQCg7zf2Q6qkQGpIvNqCkzBMauJs7D5CnMcXMVmB BsXA== X-Forwarded-Encrypted: i=1; AJvYcCUM1NEsx0PEmgYTNQOj2hZHipF8zNZCAME3vIdWCpInrrs6Ms7n2IMzb3824NcwEW8vkylXocZPUiH3XnGlgMRu+6cU X-Gm-Message-State: AOJu0YzbALM3a+p2Z3nRcbVgJI1J5MnLDXcSM2nCe/0WVmZf7UxFWXGX VtI7I9LQcKtv7eVlsvXfYSBSdFYWuq3s2K53lAoXb1TFJS8RcppLqG1TPaDlnc1dxIshL6oLAAe qsqS6jDI5BQ== X-Google-Smtp-Source: AGHT+IE68Lhuxh26ghjRhhLaHrJ9YduMOpsYsuWnIFTPqHiwg/OIUXbqUWhmNqr8FCCI3K44ay3Zr7IW+9N5Rw== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:2e0d:b0:e03:2f90:e81d with SMTP id 3f1490d57ef6-e041b14c989mr757806276.11.1720803707241; Fri, 12 Jul 2024 10:01:47 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:37 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-19-144b319a40d8@google.com> Subject: [PATCH 19/26] percpu: clean up all mappings when pcpu_map_pages() fails From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman , Dennis Zhou From: Yosry Ahmed In pcpu_map_pages(), if __pcpu_map_pages() fails on a CPU, we call __pcpu_unmap_pages() to clean up mappings on all CPUs where mappings were created, but not on the CPU where __pcpu_map_pages() fails. __pcpu_map_pages() and __pcpu_unmap_pages() are wrappers around vmap_pages_range_noflush() and vunmap_range_noflush(). All other callers of vmap_pages_range_noflush() call vunmap_range_noflush() when mapping fails, except pcpu_map_pages(). The reason could be that partial mappings may be left behind from a failed mapping attempt. Call __pcpu_unmap_pages() for the failed CPU as well in pcpu_map_pages(). This was found by code inspection, no failures or bugs were observed. Signed-off-by: Yosry Ahmed Acked-by: Dennis Zhou (am from https://lore.kernel.org/lkml/20240311194346.2291333-1-yosryahmed@google.com/) --- mm/percpu-vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index 2054c9213c43..cd69caf6aa8d 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -231,10 +231,10 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, return 0; err: for_each_possible_cpu(tcpu) { - if (tcpu == cpu) - break; __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), page_end - page_start); + if (tcpu == cpu) + break; } pcpu_post_unmap_tlb_flush(chunk, page_start, page_end); return err; From patchwork Fri Jul 12 17:00:38 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732030 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 625B417D371 for ; Fri, 12 Jul 2024 17:01:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803713; cv=none; b=FYERidPYlQ3k7/d7TSuVPSthAapxfn1pfe75SZMiiypyKUrOCk2oeI5bArOSEZF9ChD4zNvxc7qDLfeAFHgovFmJqkY7ie92y4mCbVtZethC3t6p4bMATIBpmoZJKA4tKDqUJxur82NqgZKvGdQs3QOwxJp4NkRFDgLsyeaRQAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803713; c=relaxed/simple; bh=1ErSrdkfI8yA/0cfu0Z+seRub2kxW1au/mENrkDG/Go=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=YZyhHEmI1v/OB45opyZQ0E1K96lCgFdhvPzLtaospHyR4DVGZgpzYRAHgyxk3kMWtC7MElxhmzoMylBq5+myt1Cur4NIQ8XSDiTlh+tHqpqsSWDfQbNY4RM+cl9NnZHQgZXXi7YKNqlhYbDKGysDVbQlBCcZDJ1JDy3fwr/FG+w= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=aQCPejgQ; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="aQCPejgQ" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-65be82a3241so37988247b3.0 for ; Fri, 12 Jul 2024 10:01:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803710; x=1721408510; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vODRaOCa3ahOVXhr5qs8R058obZ/hmyGBbEWzV1lmI8=; b=aQCPejgQm8djF229zLWYJbsA1g/PA60MtWOi/glveAdwjSrNlUnibbEAZu46hQ4Kcz KH7tJfF/gK77GISM6li5mmVOyBZs8CP/Si1K/3zDKUnkH8957Wlv4goMmQVg3uwezgfX N8nnzzhQIv1zMi3bMv6QXypfQi1IlZtasoC8iOHKTNoTC27flIgIdW0PqAVnvZywIF1X +TNcWk73B8T+0ioCnyE+Zdiu7xMKgiWQtPaNcJDnQTY5DdfyF2lE/wc2JOK9XpLTsK3o aKdepfAeuZQp06bA8byOapSjz3giUXGi5Ol4A4CBtWGEiePwASkhfg/wjKEVlHhVbUcL og3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803710; x=1721408510; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vODRaOCa3ahOVXhr5qs8R058obZ/hmyGBbEWzV1lmI8=; b=P2cBj57H0lfGDnE72UZIX1fAKgUHGPCu3iCCOjifUuduiv+VfXo0yzzCDLGjyt0KaO YQx/Ml7uGYB7Bvt1/FgSyZOpt+sYCoHq7VD2KMpD99tYiYeUMvYnvCpOtv1fNEUBw459 Fc2kt3zEqUAgBkcJQtczNMyq6gOq2gljfXzTM0geJoOsHPViwdt1R0mAuyHuiPgMBv2r +phCtfhF+LMvU77U6c5tr2839nynXQIMVXJsJJ6BQ2VNIu7brRi7z6W3ISIW69ZC7koG 62fbfAfMtPAhpbdG2yYZkaFUprPd164Xii13AL8xdKVwgLWC73EYzl6SChgF5qQ5oG7s /ihA== X-Forwarded-Encrypted: i=1; AJvYcCVhXHB6tLP8pGkig9ImjZjMxeB95A4jPpidg0KNrnmEYERXHZe1Ca1GKlaQ/rx4OvwxuD/lM8Dlf5zBBHaL+kfYE68J X-Gm-Message-State: AOJu0YwUKsn64wdpRxAZC7jFa88Io+c+9Drg9Ys2u37wi8ihmq71HoNW LpzwRIUzuG/wFibjIc3dDtEz4fpZ+vRJ4hO6bVG/veiMApDsDk4xQYs54aKrJShC2NuhtNcM7t/ zWSIyIIcbmw== X-Google-Smtp-Source: AGHT+IFZSEZj3VcvmAV2TZcIrShIr4wt15JBT1NWiP/B9JdX5vcjtoCXk1eU0WBA34EhNCZasboijrfP9ezs0Q== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:70b:b0:e03:5144:1d48 with SMTP id 3f1490d57ef6-e041b142c52mr23629276.11.1720803710367; Fri, 12 Jul 2024 10:01:50 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:38 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-20-144b319a40d8@google.com> Subject: [PATCH 20/26] mm: asi: Map dynamic percpu memory as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman From: Reiji Watanabe Currently, all dynamic percpu memory is implicitly (and unintentionally) treated as sensitive memory. Unconditionally map pages for dynamically allocated percpu memory as global nonsensitive memory, other than pages that are allocated for pcpu_{first,reserved}_chunk during early boot via memblock allocator (these will be taken care by the following patch). We don't support sensitive percpu memory allocation yet. Co-developed-by: Junaid Shahid Signed-off-by: Junaid Shahid Signed-off-by: Reiji Watanabe Signed-off-by: Brendan Jackman WIP: Drop VM_SENSITIVE checks from percpu code --- mm/percpu-vm.c | 50 ++++++++++++++++++++++++++++++++++++++++++++------ mm/percpu.c | 4 ++-- 2 files changed, 46 insertions(+), 8 deletions(-) diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c index cd69caf6aa8d8..2935d7fbac415 100644 --- a/mm/percpu-vm.c +++ b/mm/percpu-vm.c @@ -132,11 +132,20 @@ static void pcpu_pre_unmap_flush(struct pcpu_chunk *chunk, pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end)); } -static void __pcpu_unmap_pages(unsigned long addr, int nr_pages) +static void ___pcpu_unmap_pages(unsigned long addr, int nr_pages) { vunmap_range_noflush(addr, addr + (nr_pages << PAGE_SHIFT)); } +static void __pcpu_unmap_pages(unsigned long addr, int nr_pages, + unsigned long vm_flags) +{ + unsigned long size = nr_pages << PAGE_SHIFT; + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, (void *)addr, size); + ___pcpu_unmap_pages(addr, nr_pages); +} + /** * pcpu_unmap_pages - unmap pages out of a pcpu_chunk * @chunk: chunk of interest @@ -153,6 +162,8 @@ static void __pcpu_unmap_pages(unsigned long addr, int nr_pages) static void pcpu_unmap_pages(struct pcpu_chunk *chunk, struct page **pages, int page_start, int page_end) { + struct vm_struct **vms = (struct vm_struct **)chunk->data; + unsigned long vm_flags = vms ? vms[0]->flags : VM_ALLOC; unsigned int cpu; int i; @@ -165,7 +176,7 @@ static void pcpu_unmap_pages(struct pcpu_chunk *chunk, pages[pcpu_page_idx(cpu, i)] = page; } __pcpu_unmap_pages(pcpu_chunk_addr(chunk, cpu, page_start), - page_end - page_start); + page_end - page_start, vm_flags); } } @@ -190,13 +201,38 @@ static void pcpu_post_unmap_tlb_flush(struct pcpu_chunk *chunk, pcpu_chunk_addr(chunk, pcpu_high_unit_cpu, page_end)); } -static int __pcpu_map_pages(unsigned long addr, struct page **pages, - int nr_pages) +/* + * __pcpu_map_pages() should not be called during the percpu initialization, + * as asi_map() depends on the page allocator (which isn't available yet + * during percpu initialization). Instead, ___pcpu_map_pages() can be used + * during the percpu initialization. But, any pages that are mapped with + * ___pcpu_map_pages() will be treated as sensitive memory, unless + * they are explicitly mapped with asi_map() later. + */ +static int ___pcpu_map_pages(unsigned long addr, struct page **pages, + int nr_pages) { return vmap_pages_range_noflush(addr, addr + (nr_pages << PAGE_SHIFT), PAGE_KERNEL, pages, PAGE_SHIFT); } +static int __pcpu_map_pages(unsigned long addr, struct page **pages, + int nr_pages, unsigned long vm_flags) +{ + unsigned long size = nr_pages << PAGE_SHIFT; + int err; + + err = ___pcpu_map_pages(addr, pages, nr_pages); + if (err) + return err; + + /* + * If this fails, pcpu_map_pages()->__pcpu_unmap_pages() will call + * asi_unmap() and clean up any partial mappings. + */ + return asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)addr, size); +} + /** * pcpu_map_pages - map pages into a pcpu_chunk * @chunk: chunk of interest @@ -214,13 +250,15 @@ static int __pcpu_map_pages(unsigned long addr, struct page **pages, static int pcpu_map_pages(struct pcpu_chunk *chunk, struct page **pages, int page_start, int page_end) { + struct vm_struct **vms = (struct vm_struct **)chunk->data; + unsigned long vm_flags = vms ? vms[0]->flags : VM_ALLOC; unsigned int cpu, tcpu; int i, err; for_each_possible_cpu(cpu) { err = __pcpu_map_pages(pcpu_chunk_addr(chunk, cpu, page_start), &pages[pcpu_page_idx(cpu, page_start)], - page_end - page_start); + page_end - page_start, vm_flags); if (err < 0) goto err; @@ -232,7 +270,7 @@ static int pcpu_map_pages(struct pcpu_chunk *chunk, err: for_each_possible_cpu(tcpu) { __pcpu_unmap_pages(pcpu_chunk_addr(chunk, tcpu, page_start), - page_end - page_start); + page_end - page_start, vm_flags); if (tcpu == cpu) break; } diff --git a/mm/percpu.c b/mm/percpu.c index 4e11fc1e6deff..d8309f2ea4e44 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -3328,8 +3328,8 @@ int __init pcpu_page_first_chunk(size_t reserved_size, pcpu_fc_cpu_to_node_fn_t pcpu_populate_pte(unit_addr + (i << PAGE_SHIFT)); /* pte already populated, the following shouldn't fail */ - rc = __pcpu_map_pages(unit_addr, &pages[unit * unit_pages], - unit_pages); + rc = ___pcpu_map_pages(unit_addr, &pages[unit * unit_pages], + unit_pages); if (rc < 0) panic("failed to map percpu area, err=%d\n", rc); From patchwork Fri Jul 12 17:00:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732031 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 13135178377 for ; Fri, 12 Jul 2024 17:01:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803715; cv=none; b=mcL+mN/Emf6hfDKq1iHAfI0Xs5zX811mmmKBmTuJMZ5f9dFf0cvvcvgLulxEPo0osCoKJMJjR7SCGZQjaDXRCyCgirgtkuuuancH6Nlar7QTBrrBmnPThZFWZWVPRq+DYZ5J4fvtlSoYBlr818Z0YnypgKj/rRIrPZSKGmzIyNk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803715; c=relaxed/simple; bh=RlnEc2kzUTtJVPred8ra/z6KZ+O/0UJVwVIJk8uU5UA=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=cV/UAmMNrpFRDyB6SxpSecbNUgkG5rFpgI75qLVJLJ9+J5I2aix4QtWesiXWonYrCurYLUxBVp+Fwug+AaGLUMmUVD28ANSbCoi1vkOTu+KRVjTXsjo1+ovJ10S8q4sfebOHl1vUNQTF1GtTZpiRj75l6MVZhs9aT6wvT9kKmH0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=xj6l2CTD; arc=none smtp.client-ip=209.85.128.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="xj6l2CTD" Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-64b9800b377so41515377b3.0 for ; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803713; x=1721408513; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=xj6l2CTDjePDABHZ9WMo+3Xh/938jSAYAMhAJCnh4Tshfaw+ePABB2aERXMuWtp3Yk gK2jfxOIOhxqeXitFIDeh5pq09qig6YXzajhNnofOyS1g8qDWhhYeX3XFnENctbqm5ih W9ZUQGdHOaBSI2W0xyQ+kZTRZkF5i7zmXq48ALKbtZj/LM76ORowR0dRhN9m1IH0g9hr zB0G6SfY3HpPSUvqA9e3+GO2hPGu+32Wt9U/m8spK5B0/JkMYfoXU1I7zez6I8sFUHVd dY+UuyH9wQjv/W002TzLG0N9BtU9Pe8LKE9XsS9o1ytbKs5l4XpJEQjCejWDYKTwgkZR LLlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803713; x=1721408513; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6yzlvtdLlwV2xo7wLZNn52PBY89yLp9ZWddfXkgzdEU=; b=EzgE/8Ryh5VzOMpfhmh0i0vn2Ev9Woeh0ycGSZquAh/Scw3zQWI5w7K+1qc+7eZ9PU IyLB3Mjk5OJXhFKLtB5EFtEklNWsA3PTi5DHx2FzR21YD5kqhs/qdznbz8lWrUKlgyBt oiZP0x4AQ7NKhiFKmHrYkBdsEM9M2ULZ244vONIUlSaiH7tnggnOQqSVTPRHibHa/BKv VENviYha7twqYTLRgjHuPZjFbCFT2Ta4krCrwXbDoQ2l8nG41dZgnK7jemwsXAWSt7FR 4YH6D0WxaQZum02EgjVn2NW8OuY6pZvhPUVh7U58IflIQgkKAUJ3m3mqrK+8QSj9wv70 eppw== X-Forwarded-Encrypted: i=1; AJvYcCWl0SNmYEWSOPbWRDguOAEtsG/qf62cpg5bVeiZMc9cilBfMcTGtBCKtCT9bKjmLXkffaGA7PekxtRAABQ4SrrwjiGD X-Gm-Message-State: AOJu0Yy+SYYuMj/DaLbwSl4zBIws+eDDaWoIs1PXMJ38qwIpOMF1yjj7 5EQqGUJKKbFqRtM79p+ZH4nz3eQIOfL3IaQHZMT24MXzi27EY+YcYNYys5OJOhFmjGeZWECLu6I G8dKN8TvbIw== X-Google-Smtp-Source: AGHT+IGnjpXnxbERxy2U1NWxsSb37YpVUWZ9S/9b1Rm5PwazcdcV13kMgwRbwaJighCTP4xDBxvKZC1Kul9XcQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:1389:b0:e02:f35c:d398 with SMTP id 3f1490d57ef6-e058a707db8mr92172276.0.1720803713060; Fri, 12 Jul 2024 10:01:53 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:39 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-21-144b319a40d8@google.com> Subject: [PATCH 21/26] KVM: x86: asi: Restricted address space for VM execution From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman An ASI restricted address space is added for KVM. It is currently only enabled for Intel CPUs. This change incorporates an extra asi_exit at the end of vcpu_run. We expect later iterations of ASI to drop that call as we gain the ablity to context switch within the ASI domain. Signed-off-by: Brendan Jackman --- arch/x86/include/asm/kvm_host.h | 3 +++ arch/x86/kvm/svm/svm.c | 2 ++ arch/x86/kvm/vmx/vmx.c | 36 ++++++++++++++++++++++-------------- arch/x86/kvm/x86.c | 29 +++++++++++++++++++++++++++-- 4 files changed, 54 insertions(+), 16 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6efd1497b0263..6c3326cb8273c 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -36,6 +36,7 @@ #include #include #include +#include #define __KVM_HAVE_ARCH_VCPU_DEBUGFS @@ -1514,6 +1515,8 @@ struct kvm_arch { */ #define SPLIT_DESC_CACHE_MIN_NR_OBJECTS (SPTE_ENT_PER_PAGE + 1) struct kvm_mmu_memory_cache split_desc_cache; + + struct asi *asi; }; struct kvm_vm_stat { diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 9aaf83c8d57df..6f9a279c12dc7 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -4108,6 +4108,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in guest_state_enter_irqoff(); amd_clear_divider(); + asi_enter(vcpu->kvm->arch.asi); if (sev_es_guest(vcpu->kvm)) __svm_sev_es_vcpu_run(svm, spec_ctrl_intercepted, @@ -4115,6 +4116,7 @@ static noinstr void svm_vcpu_enter_exit(struct kvm_vcpu *vcpu, bool spec_ctrl_in else __svm_vcpu_run(svm, spec_ctrl_intercepted); + asi_relax(); guest_state_exit_irqoff(); } diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 22411f4aff530..1105d666a8ade 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -49,6 +49,7 @@ #include #include #include +#include #include @@ -7255,14 +7256,32 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx = to_vmx(vcpu); + unsigned long cr3; guest_state_enter_irqoff(); + asi_enter(vcpu->kvm->arch.asi); + + /* + * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately + * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time + * it switches back to the current->mm, which can occur in KVM context + * when switching to a temporary mm to patch kernel code, e.g. if KVM + * toggles a static key while handling a VM-Exit. + * Also, this must be done after asi_enter(), as it changes CR3 + * when switching address spaces. + */ + cr3 = __get_current_cr3_fast(); + if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { + vmcs_writel(HOST_CR3, cr3); + vmx->loaded_vmcs->host_state.cr3 = cr3; + } /* * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW * mitigation for MDS is done late in VMentry and is still * executed in spite of L1D Flush. This is because an extra VERW * should not matter much after the big hammer L1D Flush. + * This is only after asi_enter() for performance reasons. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) vmx_l1d_flush(vcpu); @@ -7283,6 +7302,8 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, vmx->idt_vectoring_info = 0; + asi_relax(); + vmx_enable_fb_clear(vmx); if (unlikely(vmx->fail)) { @@ -7311,7 +7332,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) { struct vcpu_vmx *vmx = to_vmx(vcpu); - unsigned long cr3, cr4; + unsigned long cr4; /* Record the guest's net vcpu time for enforced NMI injections. */ if (unlikely(!enable_vnmi && @@ -7354,19 +7375,6 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit) vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]); vcpu->arch.regs_dirty = 0; - /* - * Refresh vmcs.HOST_CR3 if necessary. This must be done immediately - * prior to VM-Enter, as the kernel may load a new ASID (PCID) any time - * it switches back to the current->mm, which can occur in KVM context - * when switching to a temporary mm to patch kernel code, e.g. if KVM - * toggles a static key while handling a VM-Exit. - */ - cr3 = __get_current_cr3_fast(); - if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { - vmcs_writel(HOST_CR3, cr3); - vmx->loaded_vmcs->host_state.cr3 = cr3; - } - cr4 = cr4_read_shadow(); if (unlikely(cr4 != vmx->loaded_vmcs->host_state.cr4)) { vmcs_writel(HOST_CR4, cr4); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 91478b769af08..b9947e88d4ac6 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -85,6 +85,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include "trace.h" @@ -318,6 +319,8 @@ u64 __read_mostly host_xcr0; static struct kmem_cache *x86_emulator_cache; +static int __read_mostly kvm_asi_index = -1; + /* * When called, it means the previous get/set msr reached an invalid msr. * Return true if we want to ignore/silent this failed msr access. @@ -9750,6 +9753,11 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r) goto out_free_percpu; + r = asi_register_class("KVM", NULL); + if (r < 0) + goto out_mmu_exit; + kvm_asi_index = r; + if (boot_cpu_has(X86_FEATURE_XSAVE)) { host_xcr0 = xgetbv(XCR_XFEATURE_ENABLED_MASK); kvm_caps.supported_xcr0 = host_xcr0 & KVM_SUPPORTED_XCR0; @@ -9767,7 +9775,7 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) r = ops->hardware_setup(); if (r != 0) - goto out_mmu_exit; + goto out_asi_unregister; kvm_ops_update(ops); @@ -9820,6 +9828,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) out_unwind_ops: kvm_x86_ops.hardware_enable = NULL; static_call(kvm_x86_hardware_unsetup)(); +out_asi_unregister: + asi_unregister_class(kvm_asi_index); out_mmu_exit: kvm_mmu_vendor_module_exit(); out_free_percpu: @@ -9851,6 +9861,7 @@ void kvm_x86_vendor_exit(void) cancel_work_sync(&pvclock_gtod_work); #endif static_call(kvm_x86_hardware_unsetup)(); + asi_unregister_class(kvm_asi_index); kvm_mmu_vendor_module_exit(); free_percpu(user_return_msrs); kmem_cache_destroy(x86_emulator_cache); @@ -11436,6 +11447,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) r = vcpu_run(vcpu); + /* + * At present ASI doesn't have the capability to transition directly + * from the restricted address space to the user address space. So we + * just return to the unrestricted address space in between. + */ + asi_exit(); + out: kvm_put_guest_fpu(vcpu); if (kvm_run->kvm_valid_regs) @@ -12539,10 +12557,14 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) kvm_mmu_init_vm(kvm); - ret = static_call(kvm_x86_vm_init)(kvm); + ret = asi_init(kvm->mm, kvm_asi_index, &kvm->arch.asi); if (ret) goto out_uninit_mmu; + ret = static_call(kvm_x86_vm_init)(kvm); + if (ret) + goto out_asi_destroy; + INIT_HLIST_HEAD(&kvm->arch.mask_notifier_list); atomic_set(&kvm->arch.noncoherent_dma_count, 0); @@ -12579,6 +12601,8 @@ int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) return 0; +out_asi_destroy: + asi_destroy(kvm->arch.asi); out_uninit_mmu: kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); @@ -12720,6 +12744,7 @@ void kvm_arch_destroy_vm(struct kvm *kvm) kvm_destroy_vcpus(kvm); kvfree(rcu_dereference_check(kvm->arch.apic_map, 1)); kfree(srcu_dereference_check(kvm->arch.pmu_event_filter, &kvm->srcu, 1)); + asi_destroy(kvm->arch.asi); kvm_mmu_uninit_vm(kvm); kvm_page_track_cleanup(kvm); kvm_xen_destroy_vm(kvm); From patchwork Fri Jul 12 17:00:40 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732032 Received: from mail-wr1-f74.google.com (mail-wr1-f74.google.com [209.85.221.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3595B17DA07 for ; Fri, 12 Jul 2024 17:01:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.221.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803720; cv=none; b=twSMn5ZSaZ26u++TNQTJi4EoiAkXxadYAJDf8zYsxC8pKEdBELZu+yRIOiQk1I0aOXu64Z69X8O9DBIWVI3gKKw8s4wdLqLcQ9s0+O+42SyuZ1PAov7pKiAMubalCYH2t1nctmJ44ET5EQFU1IqWDtcoY0hh1Wy4sBGUXyvaplY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803720; c=relaxed/simple; bh=K0gFt2zCMOORS0kcEjcT2GtcZt7SX3SQZ5OWDNzK2u8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nqMa2eEHfmJc2sghuOjjD2pQR3VT17wWyeDyCHPOGFFRBHxznL6/ZTK1DeISJiu46TqppcUd4Dw3YwqpcRlq7j3gCQd/XA92N4nwXhuHGPL79Q7a6x/Kn9+iRuUeDnNCpLSYE0qTp5OcKgeAmEn0r2VAwejhxOc4KDiAXxr1tHk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=IO7lm+24; arc=none smtp.client-ip=209.85.221.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="IO7lm+24" Received: by mail-wr1-f74.google.com with SMTP id ffacd0b85a97d-367960f4673so1771699f8f.1 for ; Fri, 12 Jul 2024 10:01:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803718; x=1721408518; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LtvK59wkpmv96YBPTwdu+UmQnFP5VkAZYiTxY4Mh4uQ=; b=IO7lm+24kv5ciuOZ7gHxWNridD11c/tzN06UzKBd9ATr7bK5zPmKZfCQhMyzGUZsVq iBaE03MU3Fh6QBzBc1Rlf3bH4WqdIWRi/tN6oB9mIYNwKQrdBFbBfGiS6Rpwx1NvW2mG gDVKYRHLFUrJSXXzagckK3VeBAO2Q2edzh2F4qQBGbEPwneKjlVOgxmpvrBRcOvxr97U lwu23Cbb6M/m8TQXaRKkew1idX5VdUW7fp7S6aysFp3kCPPLCenULgnZ84z40KyYv02Z /yBMGzdNYhIh+plgzmpn+ejjGmCndQ3NpATBfthZMP2mrbMV8VtKGIoTDnqbON5mDie+ EPzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803718; x=1721408518; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LtvK59wkpmv96YBPTwdu+UmQnFP5VkAZYiTxY4Mh4uQ=; b=DNHY9H4KL4V47arxT13hK9WQokyV9nt40uW2+4lvwu1s2/udFht3HkuJL+2q9tkg0F Ax3gaPDgeeShe68JSCavU86rAJZm7+liJdTY5FcmpFMadSBK0jBtKoQHlQOEMxmPVezq vttV8DPJXoCQTuGekRH2zV7ceOj1rvLQvksgcJhauNZ9SmD8Q2mlf6B8TfwIt5dvQgJx uRGVzwAJBQVSq23pG4EwZq3PVa1FLg+DIOzPjmJW1/5ZsQp1NiAW9cO1nmx9veW1nA1v jr6trjiiNaJijtWf2lM20aPWgClz0tPn0QtGtnEpSkI0fs492T4tHLmQiacBT1VhbP6x KWqg== X-Forwarded-Encrypted: i=1; AJvYcCXue+GyP2eEXL0fX3+B/Wk6hTab39D4tE9UU9GRE2yn8LrX3TqRmCeX0oTuGfzrjBqpjfW7nSJd2+txkLy6Sz1FblMX X-Gm-Message-State: AOJu0YxvGiAQHF9VId3+eIilINuai6LPFXZ1Q1l1WdGF3lqsNRVmZk8t C9n7DP98C/MzfDEFNIgZuPd6Pfw+epxZ+0S1wlbkxJljo7CDWlIJlAzckJCOkgmcCYjNc2siZu6 3gcoIt8iSkw== X-Google-Smtp-Source: AGHT+IFpeHjscoHoV+i7b46AZT6D1f9aKe2Nj5oFbdtPCtyqFA2ot3+sTCgSvjfbQ/uIAyhmtnXZfQlaZ6B1Pg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:2ae:b0:366:df3f:6f98 with SMTP id ffacd0b85a97d-367ff696f10mr6916f8f.1.1720803717468; Fri, 12 Jul 2024 10:01:57 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:40 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-22-144b319a40d8@google.com> Subject: [PATCH 22/26] KVM: x86: asi: Stabilize CR3 when potentially accessing with ASI From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman nested_vmx_check_vmentry_hw() does a VM Enter as a "dry run" to check the VMCS. It's important that we VM Exit back into the correct CR3 in order to avoid going out of sync with ASI state. Under ASI, CR3 is unstable even when interrupts are disabled, except a) during the ASI critical section and b) when the address space is unrestricted. We can take advantage of case b) here to make sure the VM Enter is safe. Signed-off-by: Brendan Jackman --- arch/x86/kvm/vmx/nested.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index d05ddf751491..ffca468f8197 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3147,6 +3147,14 @@ static int nested_vmx_check_vmentry_hw(struct kvm_vcpu *vcpu) */ vmcs_writel(GUEST_RFLAGS, 0); + /* + * Stabilize CR3 to ensure the VM Exit returns to the correct address + * space. This is costly; at the expense of complexity it could be + * optimized away by instead doing an asi_enter() to create an ASI + * critical section, in the case that we are currently restricted. + */ + asi_exit(); + cr3 = __get_current_cr3_fast(); if (unlikely(cr3 != vmx->loaded_vmcs->host_state.cr3)) { vmcs_writel(HOST_CR3, cr3); From patchwork Fri Jul 12 17:00:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732033 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 16FE917DA0E for ; Fri, 12 Jul 2024 17:02:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803722; cv=none; b=ZGjLVYM0MV8ft8f4milqFFUH4xYE2mikTw97USmudlh9giapBdFyfq+c96TAsVBbm3ub1uCz94yn8AQUyRFHBqmfMyCThpE6awDRsjESPOS3/7jZu1qbMVVEkvp02kmXZEGGAh4Ft3Jn6D8HxhQ4Ig6FGNMhLS8UOi9jnNO9oVA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803722; c=relaxed/simple; bh=ac71kaw+Ep3tBDBJVMZFqxi1H+z1kJbDT7fCK4NtG0E=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=E1geEU/xXLvyCTej3V63zqb1thUvJ7urEVNJwwjfSWUZQKgcHSq8/yWPgFSSl3F0tubDj5iSCaJ+dJm9Q3GloB+QNNYs7Gzn2/70r0ZWagDzKQiYoZVVXX8jHGWTUCzj50N6Z840tmBluewS9gsVvoJLNgf383XacCpImqWN4HQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=1t+bEW6Q; arc=none smtp.client-ip=209.85.219.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="1t+bEW6Q" Received: by mail-yb1-f202.google.com with SMTP id 3f1490d57ef6-dc691f1f83aso1235062276.1 for ; Fri, 12 Jul 2024 10:02:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803720; x=1721408520; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tumXnPtU8b3obH923+Z5afsvMmdmRsZkkH9VAm+WVDk=; b=1t+bEW6Qyoyc0O9Ode7AFDj5wtqGYpCUgVlIQ+QiBlOkwI7f3W36OjVyL6a10yJLK3 JZok2mIOhw7DCxnIQJm65JYHrjkVck0YELfdc/wgbeJUtOs7N+57A0MH3sPMh2lxxXDs oYJAMfv433pyzBiJr8nsysZNBlQPB5ZgQ916fmXMhBN1Fk3uY2OQ/YyymuOef5DoQkKZ BWkOe7PIC5iKjD50lAHGIZdw3np4ZkzlyE+wJReeYgkyIxgrOT8TvD/Rwf4+r+LOnlGu LbIlM/bFJwuJtASbXkr3cF0pr0ErFDu8MdXzra1UAgGn4YVQ0GGNcnAV99yyZyEr85sB UI1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803720; x=1721408520; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tumXnPtU8b3obH923+Z5afsvMmdmRsZkkH9VAm+WVDk=; b=be33gvJumC23WmUDxhVfh2XanpjRuv51UF1PDomWSBuI7S7eEtZqHfs5btSHuzsIwB p7iI3OLdxrKkyizi5PhqiWN2V6ZJCTErirWg7sRw/6S/oMJBtztrSt4mK1qP4bUndeas VyybdoD5S2MwpFbXdBDINzDxAPSaa1r0XYobGOekhX1nxPeeOYvfFjAJKL+s7GH7K3LL cU15z+nLOj37u+sJGtaK/c7RKNLQHVdKo2IiDkhr+VZtRdJVsJR0XrbcqHuMu/7vzaY0 ru1WkALnH5Nq3e4zNCBJYd2av/IM/85HPiU4VDD3qeaUDnOOHBBO6lCdKmmiKf4JnxyB QIcw== X-Forwarded-Encrypted: i=1; AJvYcCWGG29/prOXEk+U6RnMS05XUuH6ZB6nBoUeG5na2TQqjMGM71nu/pHIV847YfUlkwOfqhYjRjunCWb4NL4BUmD0/Sdj X-Gm-Message-State: AOJu0YwFqRxVSXjuuv4G+X+71yG3l2TB3YlpF1LJ80Y/Y7Z6+38BCvfL 9KpVHJvnPY8FiOgo3mGccP9jIKc8jtTUD3v7l5l4h0aXGs+C5US9CrJDUOcq20yhxXfVBN3Q1TR b4t3xz9DDRQ== X-Google-Smtp-Source: AGHT+IEh5KZMBf6CBGJRGFZI0bbM06TYxceB64wgc/tD6lYzumVKZafd3yatQIPYK79nMgLcMRU8yR72H65XJA== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:1892:b0:e03:3c8c:e80f with SMTP id 3f1490d57ef6-e058a86d9bamr152146276.1.1720803720133; Fri, 12 Jul 2024 10:02:00 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:41 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-23-144b319a40d8@google.com> Subject: [PATCH 23/26] mm: asi: Stabilize CR3 in switch_mm_irqs_off() From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman An ASI-restricted CR3 is unstable as interrupts can cause ASI-exits. Although we already unconditionally ASI-exit during context-switch, and before returning from the VM-run path, it's still possible to reach switch_mm_irqs_off() in a restricted context, because KVM code updates static keys, which requires using a temporary mm. Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index 36087d6238e6..a9804274049e 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -534,6 +534,9 @@ void switch_mm_irqs_off(struct mm_struct *unused, struct mm_struct *next, bool need_flush; u16 new_asid; + /* Stabilize CR3, before reading or writing CR3 */ + asi_exit(); + /* We don't want flush_tlb_func() to run concurrently with us. */ if (IS_ENABLED(CONFIG_PROVE_LOCKING)) WARN_ON_ONCE(!irqs_disabled()); From patchwork Fri Jul 12 17:00:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732034 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BA11517DA3A for ; Fri, 12 Jul 2024 17:02:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803725; cv=none; b=HXPwwn3NShHGzbtpgcbLCY0LHoZe/En3WqnsBmlzFK4RphSuqZLU2+RDWAPpnM/C2Sp8EKDBI4Wb1GlMqgp/C+Yznb6rCYISYBhBzW71scCtTJufvygVKjieG6Gt15nl+Xc7N2QK3OPWQApR+qSo47NtIGhjj2rBs/ZG/FS7UMg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803725; c=relaxed/simple; bh=NZYj7MLt+2lxNLGEUfzWJwn+jAe/yUlHRhssEDgRpv8=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=WLnBDeujOuQP0v6MgJ3huYINjzgvn3fFaPrZc9L10LPyFhQqjgx+dpHlkG+qBPZGxvieNjvBNf1v07BhE+ElgDTit8i7apUUPAn1of1atJzr5Jlg5GiOEZPoIzR0ufqPz/WafBRR8Xx2OJEikgJlwVNVBByfgtOfO0le4G5hkes= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=mntdEWTU; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="mntdEWTU" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-6525230ceeaso40872967b3.2 for ; Fri, 12 Jul 2024 10:02:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803723; x=1721408523; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2i/EY2B7r1Db+9uC+ppZJ+s/pV+Uwv0UgdTvi6uATis=; b=mntdEWTUwpsEtrLjZI18CSEwPWTr6TECPUpfek0CIR0nTCtb36AYk5CGBjD9Rowu8u CLyq9Qs1ztU1IVVyYhYsMmLTkP0BRCcnV6xoAOnJxYY3eR2DRCTWi2VlBNlwmEndZTV8 DIgBSpn1UPIw8ISEusfy+OIHr644WXt4Z0xxfQDmUJbFA2irLXzaw/8nr2MoUPvTBI/b jnzgzPD2GxElJ1yokguoNcgToUy0LZ8VKpvlP37RlF2adZyzxu5WkYgOj0o+9Nyk2DJM 1X5uz4HZBimhBzu9KB4gZqOsnk3bLq9Dx1UsnCvTV/G0ZJp9CSBJORZsUyqypBUnoxUa p9KQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803723; x=1721408523; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2i/EY2B7r1Db+9uC+ppZJ+s/pV+Uwv0UgdTvi6uATis=; b=iln48fyvVWVsl8iNyR1rybcLxjSVhklvOI531/RdxtRDH+A9DK6RaLFFF8tCjtytpX VM87e4h5sRXJ9tEvKpHc++DCWcq0hbuAnZyLwOA47MCbgQgGiqhhekzuGO278fdriGcJ KmROZQe6sPnvnYCZ+rsotw8Nk8dNo56Pz8erPcpInULvNQzekD/Y035BsdialAS9sjr2 4KrfGx0+uUmjPOl7EJmA2aOGwtBDTMCRYx3K5pOuodzbUqEuujHv1TH8oFnpQ2SWX9d8 i37bKQHGynK0U5HWFP/PvttOWBB1z4SnjfSNT33sKkzU3aS7+HLnJbqIcVPFKCMZw7rF Gnqg== X-Forwarded-Encrypted: i=1; AJvYcCWdaWBrvufnQS5SZm5LPV1opRfR4pUlGzEmqpTk8yJXge59FlUSYm8JTKThYTVFSfl4zWvejKMSMwvBHj72pg/Vrr6Z X-Gm-Message-State: AOJu0YzYY5aZzkOeeYE4RmGr2Ffo///12nNlAEgVyBxUlyZ4+mhMQSfr oYH7VqSAd7KZQ6rKt58SsGtxZ7tgLLanCrxb3963hnLjQGtvsnlz6X3OrUGdGMg6Ks9MwD5HCX/ jtAUbHe+sVg== X-Google-Smtp-Source: AGHT+IFNgdBalbrX784MJSkG7g28jv3ta6SaiIVV2y32vALYWfCkCKFBAZMdIemQeL1Zf8yLiAeIuqRY45gfTg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6902:154d:b0:dfe:fe5e:990a with SMTP id 3f1490d57ef6-e041b1134demr24774276.9.1720803722717; Fri, 12 Jul 2024 10:02:02 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:42 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-24-144b319a40d8@google.com> Subject: [PATCH 24/26] mm: asi: Make TLB flushing correct under ASI From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman This is the absolute minimum change for TLB flushing to be correct under ASI. There are two arguably orthogonal changes in here but they feel small enough for a single commit. .:: CR3 stabilization As noted in the comment ASI can destabilize CR3, but we can stabilize it again by calling asi_exit, this makes it safe to read CR3 and write it back. This is enough to be correct - we don't have to worry about invalidating the other ASI address space (i.e. we don't need to invalidate the restricted address space if we are currently unrestricted / vice versa) because we currently never set the noflush bit in CR3 for ASI transitions. Even without using CR3's noflush bit there are trivial optimizations still on the table here: on where invpcid_flush_single_context is available (i.e. with the INVPCID_SINGLE feature) we can use that in lieu of the CR3 read/write, and avoid the extremely costly asi_exit. .:: Invalidating kernel mappings Before ASI, with KPTI off we always either disable PCID or use global mappings for kernel memory. However ASI disables global kernel mappings regardless of factors. So we need to invalidate other address spaces to trigger a flush when we switch into them. Note that there is currently a pointless write of cpu_tlbstate.invalidate_other in the case of KPTI and !PCID. We've added another case of that (ASI, !KPTI and !PCID). I think that's preferable to expanding the conditional in flush_tlb_one_kernel. Signed-off-by: Brendan Jackman --- arch/x86/mm/tlb.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-) diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c index a9804274049e..1d9a300fe788 100644 --- a/arch/x86/mm/tlb.c +++ b/arch/x86/mm/tlb.c @@ -219,7 +219,7 @@ static void clear_asid_other(void) * This is only expected to be set if we have disabled * kernel _PAGE_GLOBAL pages. */ - if (!static_cpu_has(X86_FEATURE_PTI)) { + if (!static_cpu_has(X86_FEATURE_PTI) && !static_cpu_has(X86_FEATURE_ASI)) { WARN_ON_ONCE(1); return; } @@ -1178,15 +1178,19 @@ void flush_tlb_one_kernel(unsigned long addr) * use PCID if we also use global PTEs for the kernel mapping, and * INVLPG flushes global translations across all address spaces. * - * If PTI is on, then the kernel is mapped with non-global PTEs, and - * __flush_tlb_one_user() will flush the given address for the current - * kernel address space and for its usermode counterpart, but it does - * not flush it for other address spaces. + * If PTI or ASI is on, then the kernel is mapped with non-global PTEs, + * and __flush_tlb_one_user() will flush the given address for the + * current kernel address space and, if PTI is on, for its usermode + * counterpart, but it does not flush it for other address spaces. */ flush_tlb_one_user(addr); - if (!static_cpu_has(X86_FEATURE_PTI)) + /* Nothing more to do if PTI and ASI are completely off. */ + if (!static_cpu_has(X86_FEATURE_PTI) && !static_cpu_has(X86_FEATURE_ASI)) { + VM_WARN_ON_ONCE(static_cpu_has(X86_FEATURE_PCID) && + !(__default_kernel_pte_mask & _PAGE_GLOBAL)); return; + } /* * See above. We need to propagate the flush to all other address @@ -1275,6 +1279,13 @@ STATIC_NOPV void native_flush_tlb_local(void) invalidate_user_asid(this_cpu_read(cpu_tlbstate.loaded_mm_asid)); + /* + * Restricted ASI CR3 is unstable outside of critical section, so we + * couldn't flush via a CR3 read/write. + */ + if (!asi_in_critical_section()) + asi_exit(); + /* If current->mm == NULL then the read_cr3() "borrows" an mm */ native_write_cr3(__native_read_cr3()); } From patchwork Fri Jul 12 17:00:43 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732035 Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0444617DE29 for ; Fri, 12 Jul 2024 17:02:06 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803728; cv=none; b=UDfjLzRViyvRJUrG703c+taZ74EKBCJxC3VEGIQ3tq4g/34SzLkSPbABPXrH3GsBn6f+ea2r4dXXhgtzXfXvsMsANkz+Xs1YQjnGOVj3UAS2bd2jGZdJvFcglpmTEKehN46/8lNqlBSQCAuakcIbrJ2HYe2d59BD59tBZUJPPQI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803728; c=relaxed/simple; bh=YkaJF7Xf4QM1+szW+WKOyK5ItJOpiRPMrYEvbEfqZbU=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=NgwjF3Qos49sUPwWKjp7OGGkncsCZ3jKqU/h5gZfHB1KjYEcEMtKz1Qi8fYYLXf3OzVg1VieySX+VZbGtUJGFBDg63d9DXgIzJgjwU6cAVZcsgErp5NlBD6PG+Avh2EqoRGX9oXLUS1CyFgr18OTxuzkSl2YfWEKqDFWE/l/Kvk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=zhMe8Kzt; arc=none smtp.client-ip=209.85.128.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="zhMe8Kzt" Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42668857c38so15413905e9.1 for ; Fri, 12 Jul 2024 10:02:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803725; x=1721408525; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=dYEbdVw49p7KL3LM6Ug+7tiHwUhfthVzlKoBikBuWUA=; b=zhMe8KzterI1hpUMnNShaiFtQ2RejWo5Y//TQSed6cpWSJDdYvktR1jx6DMuJfXEsu 7YSE25yCEifXGNsgY96Klzq/ck0jgcsydSPLbXBmQwDo4guflswu0HwGlN/qbNvbrVXc xuiVd3cEPd3JQZ8Q6PFgjRMFhmxO2BCOHDQOdvhZv+h1EBT0sC+XpnFT8JnqeULewl8z 8XECTYGso5RoFe6gLv9M3b11Sv66A0YaIHCVPqqoGFiKXWllfFsim5eVWg2esyzaj6eD wY5YOTIe1wxX0jiiC09uC7wgUEFY9GSg0zL5qd7zwZ2VPZAnu/0DBSLDnToFNDzOz1c2 1Vjw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803725; x=1721408525; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=dYEbdVw49p7KL3LM6Ug+7tiHwUhfthVzlKoBikBuWUA=; b=hd/64Ys9JkBJneXMg3FNdkMKzgSvWdgju+d2iBhra7btRSLhHG5MTgXsgbsO7yIyXF IHobE2OzPvuBLHXVMaDdUnP2gQNnD2qErgHyDSJ2+Yx3BiyGfUNBwnai+3lvNkhRHppe lvGOoCpQ0EkJu0zBZj2xhfdVvu6AXvYjkrqyM4JRykOcObPbVWNpeSMaMA1OQuhkOGel Wte+LCw57XLfkXMQjXETPFcHgwtTDtSpn7PFBcoMbvXy1SlFRQ3csbtB/8MSxmTTLlJB j+ATaYTqqupPeOIlO+K0AizIsArGHmeoaWZ/rUiazXKVOZRgtymeOsDtKVwG2wTT5SSu +sKw== X-Forwarded-Encrypted: i=1; AJvYcCUBrotY3Q81ZyXaU0KTxGnUqnVGcvGh6FvzTx9dIOmZCudHVMf7MTA40GrTGtUBldHEs9qfSYgYOuU87JGBxQ6CVyN3 X-Gm-Message-State: AOJu0YzMPTFowTgatJWFR2kyxS4rZQP6qwUTyJbe3JL5lXcdul1OReOd 9wrmrkWaS5/ri4fVHnxhE2wNqCraoTxdiQDhC7K4byhQ+TM6fa8CpircCveHOF+EeKj9Q7j+YJR JtFmmajGE+w== X-Google-Smtp-Source: AGHT+IEpFjrIcdozWeccrHJ18iMfCUI+HuVuPHUhyCg9CuNp5bbpeH6dXY+Lf29VqG6hp3YNWuNGMoEtsiBUuQ== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:5129:b0:426:6a14:8e62 with SMTP id 5b1f17b1804b1-4266ff87a5fmr1913465e9.0.1720803725228; Fri, 12 Jul 2024 10:02:05 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:43 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-25-144b319a40d8@google.com> Subject: [PATCH 25/26] mm: asi: Stop ignoring asi=on cmdline flag From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman At this point the minimum requirements are in place for the kernel to operate correctly with ASI enabled. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 891b8d351df8..6cf0af1bfa07 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -174,14 +174,14 @@ void __init asi_check_boottime_disable(void) pr_info("ASI disabled through kernel command line.\n"); } else if (ret == 2 && !strncmp(arg, "on", 2)) { enabled = true; - pr_info("Ignoring asi=on param while ASI implementation is incomplete.\n"); + pr_info("ASI enabled through kernel command line.\n"); } else { pr_info("ASI %s by default.\n", enabled ? "enabled" : "disabled"); } if (enabled) - pr_info("ASI enablement ignored due to incomplete implementation.\n"); + setup_force_cpu_cap(X86_FEATURE_ASI); } /* From patchwork Fri Jul 12 17:00:44 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732036 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 44177178394 for ; Fri, 12 Jul 2024 17:02:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.128.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803731; cv=none; b=Ucc9Y3kRgxUb2oUO9KoSTR67TsazectOHdAfZ+qc1Vapuz8dJTylbaaWVDRxa5WAZdW4qwC7YFt+n521TMJQDMwh7t1eyw/zlG8BgQwWTZXdcicBo8HBEtXiS1LmkSjhASE36DIM+1CtDeerH0V6vyabS1id0nNn5kz342UM4k0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1720803731; c=relaxed/simple; bh=OmLuPXVN7trRIGuXekfaSbCyKoP5mCssZ2C3O0VIpDk=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=MvoQ0KhcJp52ZatKpdjQpzUPfsQxOtgLlUmzuUTHL/63Q/ItJtP6qHQNFOQTfEDgZNaDi6cPMwC1GLcxbkdA61ARTNjdzp75pROivm7mkz9RFVEeoLpSOxpFYMUMeewPb3ZBG1mQmvN72GQZOW9My0gUdI7vh++AXQucue690xU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=imE3CPHt; arc=none smtp.client-ip=209.85.128.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--jackmanb.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="imE3CPHt" Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-65ef46e8451so7970307b3.0 for ; Fri, 12 Jul 2024 10:02:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803728; x=1721408528; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Zn579dgno7SZboGdojpY5JJU8xi+US4yr/l6Mjf8vz0=; b=imE3CPHt6jlzRoW0vuKAQGp9OZFSpMi25xzF3PEKMFcU1hZhDQWNKP1qvyePWygjd/ ufbf20V5a2sfpeaEi1gvQIPrb56a4zltC13T1R62xC8d6EpG1UhDblf2tojno2E54gzm xD3I+y4D+nRBv4ZhmxCmz+Dfg1yzStVjoUT4W/vbEzJVPhtZe8tCaH/vIIabL8O+uXu0 /zUvhLbc2ajV1oAh02quZOiCDIJg28/Lf4YpL7A0t9xHxSdLONIDvTpSWXsLGb9tcbIc k8fOfsh9Ae3hU6njntbwNjcoctdkhM5iSUeZhUZapmzoGk+wfnhsQ2s4TuatTuDZMlZm /Ewg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803728; x=1721408528; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Zn579dgno7SZboGdojpY5JJU8xi+US4yr/l6Mjf8vz0=; b=up53lEMT+3lCcPad+QvVdd62sWzg9JYFo7kVa2GENLsdUf+pVDgGNQZu9g6r98Tb8a 0LeJLsNP7AmFVdH54BR9Jomg0/Y6AH+MKgH7wkiQw5aj9MDBRyaiE5eCYhw0S88GeUFb zknZ0O684dR/QK+6UdKbAia1HVjPEREq80+1jjeZ6C5KcZrtuAlk0Wf5Ci65dIb5a96B u1xQgHJy3Iu2gqRRwvIVsvayArtBK+zTKnVqVxW8Bx5khgqgxaBw4M83a24FSyWZlMxZ 41v40+f33PyN9y3eLhlILt7XcO+BGzo03S/f6IKOtIwwu6jAQ4mD5qYZyCX+40ot34RR lwKg== X-Forwarded-Encrypted: i=1; AJvYcCVHktXEiu3+Hr0GJ9w94EpkKkxtPneFHvjBHwQEnwPJoFbORF7q23Sdpd6xRDOOrIWSIyGY7y1WBw0G1un40+QI2c+E X-Gm-Message-State: AOJu0Yym3bPgNXG7Nibbw0jkUp7G7unSSgApKz+yf9HeWenq0BNUwrLu NA6g40V0PgLeaUjhUiJuMEVZQQJIA40AgDsVHdk9CC3ZtZonelVxpunZlay2tBHIHIzX8dmd8ac rN5Wm0vYTjQ== X-Google-Smtp-Source: AGHT+IGmf1CJRMIpY+h+xba+0XgEdJs1Sqn1txDmOypcZe8mfXwCEjETMiBlkTEaYdYTy3241JdukP3H8LKm6w== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:690c:46c8:b0:62c:de05:5a78 with SMTP id 00721157ae682-658f01fd061mr625407b3.6.1720803728252; Fri, 12 Jul 2024 10:02:08 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:44 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Precedence: bulk X-Mailing-List: kvm@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-26-144b319a40d8@google.com> Subject: [PATCH 26/26] KVM: x86: asi: Add some mitigations on address space transitions From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman Here we start actually turning ASI into a real exploit mitigation. On all CPUs we attempt to obliterate any indirect branch predictor training before mapping in any secrets. We can also flush side channels on the inverse transition. So, in this iteration we flush L1D, but only on CPUs affected by L1TF. The rationale for this is: L1TF seems to have been a relative outlier in terms of its impact, and the mitigation is obviously rather devastating. On the other hand, Spectre-type attacks are continuously being found, and it's quite reasonable to assume that existing systems are vulnerable to variations that are not currently mitigated by bespoke techniques like Safe RET. This is clearly an incomplete policy, for example it probably makes sense to perform MDS mitigations in post_asi_enter, and there is clearly a wide range of alternative postures with regard to per-platform vs blanket mitigation configurations. This also ought to be integrated more intelligently with bugs.c - this will probably require a fair bit of discussion so it might warrant a patchset all to itself. For now though, this ouhgt to provide an example of the kind of thing we might do with ASI. The changes to the inline asm for L1D flushes are to avoid duplicate jump labels breaking the build in the case that vmx_l1d_flush() gets inlined at multiple locations (as it seems to do in my builds). Signed-off-by: Brendan Jackman --- arch/x86/include/asm/kvm_host.h | 2 + arch/x86/include/asm/nospec-branch.h | 2 + arch/x86/kvm/vmx/vmx.c | 88 ++++++++++++++++++++++++------------ arch/x86/kvm/x86.c | 33 +++++++++++++- arch/x86/lib/retpoline.S | 7 +++ 5 files changed, 101 insertions(+), 31 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 6c3326cb8273c..8b7226dd2e027 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1840,6 +1840,8 @@ struct kvm_x86_init_ops { struct kvm_x86_ops *runtime_ops; struct kvm_pmu_ops *pmu_ops; + + void (*post_asi_enter)(void); }; struct kvm_arch_async_pf { diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index ff5f1ecc7d1e6..9502bdafc1edd 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -605,6 +605,8 @@ static __always_inline void mds_idle_clear_cpu_buffers(void) mds_clear_cpu_buffers(); } +extern void fill_return_buffer(void); + #endif /* __ASSEMBLY__ */ #endif /* _ASM_X86_NOSPEC_BRANCH_H_ */ diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 1105d666a8ade..6efcbddf6ce27 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6629,37 +6629,18 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) * is not exactly LRU. This could be sized at runtime via topology * information but as all relevant affected CPUs have 32KiB L1D cache size * there is no point in doing so. + * + * Must be reentrant, for use by vmx_post_asi_enter. */ -static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) +static inline_or_noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; /* - * This code is only executed when the flush mode is 'cond' or - * 'always' + * In theory we lose some of these increments to reentrancy under ASI. + * We just tolerate imprecise stats rather than deal with synchronizing. + * Anyway in practice on 64 bit it's gonna be a single instruction. */ - if (static_branch_likely(&vmx_l1d_flush_cond)) { - bool flush_l1d; - - /* - * Clear the per-vcpu flush bit, it gets set again - * either from vcpu_run() or from one of the unsafe - * VMEXIT handlers. - */ - flush_l1d = vcpu->arch.l1tf_flush_l1d; - vcpu->arch.l1tf_flush_l1d = false; - - /* - * Clear the per-cpu flush bit, it gets set again from - * the interrupt handlers. - */ - flush_l1d |= kvm_get_cpu_l1tf_flush_l1d(); - kvm_clear_cpu_l1tf_flush_l1d(); - - if (!flush_l1d) - return; - } - vcpu->stat.l1d_flush++; if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { @@ -6670,26 +6651,57 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) asm volatile( /* First ensure the pages are in the TLB */ "xorl %%eax, %%eax\n" - ".Lpopulate_tlb:\n\t" + ".Lpopulate_tlb_%=:\n\t" "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $4096, %%eax\n\t" "cmpl %%eax, %[size]\n\t" - "jne .Lpopulate_tlb\n\t" + "jne .Lpopulate_tlb_%=\n\t" "xorl %%eax, %%eax\n\t" "cpuid\n\t" /* Now fill the cache */ "xorl %%eax, %%eax\n" - ".Lfill_cache:\n" + ".Lfill_cache_%=:\n" "movzbl (%[flush_pages], %%" _ASM_AX "), %%ecx\n\t" "addl $64, %%eax\n\t" "cmpl %%eax, %[size]\n\t" - "jne .Lfill_cache\n\t" + "jne .Lfill_cache_%=\n\t" "lfence\n" :: [flush_pages] "r" (vmx_l1d_flush_pages), [size] "r" (size) : "eax", "ebx", "ecx", "edx"); } +static noinstr void vmx_maybe_l1d_flush(struct kvm_vcpu *vcpu) +{ + /* + * This code is only executed when the flush mode is 'cond' or + * 'always' + */ + if (static_branch_likely(&vmx_l1d_flush_cond)) { + bool flush_l1d; + + /* + * Clear the per-vcpu flush bit, it gets set again + * either from vcpu_run() or from one of the unsafe + * VMEXIT handlers. + */ + flush_l1d = vcpu->arch.l1tf_flush_l1d; + vcpu->arch.l1tf_flush_l1d = false; + + /* + * Clear the per-cpu flush bit, it gets set again from + * the interrupt handlers. + */ + flush_l1d |= kvm_get_cpu_l1tf_flush_l1d(); + kvm_clear_cpu_l1tf_flush_l1d(); + + if (!flush_l1d) + return; + } + + vmx_l1d_flush(vcpu); +} + static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) { struct vmcs12 *vmcs12 = get_vmcs12(vcpu); @@ -7284,7 +7296,7 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, * This is only after asi_enter() for performance reasons. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) - vmx_l1d_flush(vcpu); + vmx_maybe_l1d_flush(vcpu); else if (static_branch_unlikely(&mmio_stale_data_clear) && kvm_arch_has_assigned_device(vcpu->kvm)) mds_clear_cpu_buffers(); @@ -8321,6 +8333,14 @@ gva_t vmx_get_untagged_addr(struct kvm_vcpu *vcpu, gva_t gva, unsigned int flags return (sign_extend64(gva, lam_bit) & ~BIT_ULL(63)) | (gva & BIT_ULL(63)); } +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +static noinstr void vmx_post_asi_enter(void) +{ + if (boot_cpu_has_bug(X86_BUG_L1TF)) + vmx_l1d_flush(kvm_get_running_vcpu()); +} +#endif + static struct kvm_x86_ops vmx_x86_ops __initdata = { .name = KBUILD_MODNAME, @@ -8727,6 +8747,14 @@ static struct kvm_x86_init_ops vmx_init_ops __initdata = { .runtime_ops = &vmx_x86_ops, .pmu_ops = &intel_pmu_ops, + +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + /* + * Only Intel CPUs currently do anything in post-enter, so this is a + * vendor hook for now. + */ + .post_asi_enter = vmx_post_asi_enter, +#endif }; static void vmx_cleanup_l1d_flush(void) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index b9947e88d4ac6..b5e4df2aa1636 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -9695,6 +9695,36 @@ static void kvm_x86_check_cpu_compat(void *ret) *(int *)ret = kvm_x86_check_processor_compatibility(); } +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +static noinstr void pre_asi_exit(void) +{ + /* + * Flush out prediction trainings by the guest before we go to access + * secrets. + */ + + /* Clear normal indirect branch predictions, if we haven't */ + if (cpu_feature_enabled(X86_FEATURE_IBPB) && + !cpu_feature_enabled(X86_FEATURE_IBPB_ON_VMEXIT)) + __wrmsr(MSR_IA32_PRED_CMD, PRED_CMD_IBPB, 0); + + /* Flush the RAS/RSB if we haven't already. */ + if (!IS_ENABLED(CONFIG_RETPOLINE) || + !cpu_feature_enabled(X86_FEATURE_RSB_VMEXIT)) + fill_return_buffer(); +} + +struct asi_hooks asi_hooks = { + .pre_asi_exit = pre_asi_exit, + /* post_asi_enter populated later. */ +}; + +#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ +struct asi_hooks asi_hooks = {}; +#endif /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + + int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) { u64 host_pat; @@ -9753,7 +9783,8 @@ int kvm_x86_vendor_init(struct kvm_x86_init_ops *ops) if (r) goto out_free_percpu; - r = asi_register_class("KVM", NULL); + asi_hooks.post_asi_enter = ops->post_asi_enter; + r = asi_register_class("KVM", &asi_hooks); if (r < 0) goto out_mmu_exit; kvm_asi_index = r; diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S index 391059b2c6fbc..db5b8ee01efeb 100644 --- a/arch/x86/lib/retpoline.S +++ b/arch/x86/lib/retpoline.S @@ -396,3 +396,10 @@ SYM_CODE_END(__x86_return_thunk) EXPORT_SYMBOL(__x86_return_thunk) #endif /* CONFIG_MITIGATION_RETHUNK */ + +.pushsection .noinstr.text, "ax" +SYM_CODE_START(fill_return_buffer) + __FILL_RETURN_BUFFER(%_ASM_AX,RSB_CLEAR_LOOPS) + RET +SYM_CODE_END(fill_return_buffer) +.popsection