From patchwork Fri Jan 10 18:40:49 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13935252 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E6CBE7719C for ; Fri, 10 Jan 2025 18:41:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5584C6B00D6; Fri, 10 Jan 2025 13:41:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 509246B00D5; Fri, 10 Jan 2025 13:41:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 310426B00D6; Fri, 10 Jan 2025 13:41:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0CE036B00D4 for ; Fri, 10 Jan 2025 13:41:41 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C4EF2A0DF3 for ; Fri, 10 Jan 2025 18:41:40 +0000 (UTC) X-FDA: 82992410760.26.85C83E9 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by imf20.hostedemail.com (Postfix) with ESMTP id EF68E1C000D for ; Fri, 10 Jan 2025 18:41:38 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=daTiQU1+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 34WmBZwgKCPwnegoqerfksskpi.gsqpmry1-qqozego.svk@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=34WmBZwgKCPwnegoqerfksskpi.gsqpmry1-qqozego.svk@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736534499; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aGIlyFkOxdGIUAlLSc7B7HeTqLb/6yPSKg6NF5x4oeg=; b=GVD/8iuj2kRKBCtjz57fQeAWl8kjH84fIU7SDur9XkPFZo22uZL0P0pzuHmbVjlHmDdo/W aM46oGvoboiOA26POBXbydIfuRNEh+NuvnZDMJbGYWVcHopei3DA+2lc+c6Zux/hlFk3e8 Lv+cYgHn0M7pmbuQ98hpvokpBzcOEiY= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736534499; a=rsa-sha256; cv=none; b=BKzsiTan3PVjb1WzSJXNjJEzeg+dkKryYiw3mNl/A/yeV/fLXDcbVo2kgih2NK4R6dOjLk /aAxo3LYiLeUoS0SHQQCXnl2RCiyqY8mYUshKXOzBCbS5sIEG4IW2Vr3zKPrSfoSqhieTv X/S/C8qgAmVgYbi7tgJNJySLYfsA5N0= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=daTiQU1+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of 34WmBZwgKCPwnegoqerfksskpi.gsqpmry1-qqozego.svk@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=34WmBZwgKCPwnegoqerfksskpi.gsqpmry1-qqozego.svk@flex--jackmanb.bounces.google.com Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-4362552ce62so12186455e9.0 for ; Fri, 10 Jan 2025 10:41:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1736534497; x=1737139297; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aGIlyFkOxdGIUAlLSc7B7HeTqLb/6yPSKg6NF5x4oeg=; b=daTiQU1+sbAuxvfXnajxYNBJnZpOPKhjSeu18bbcgGcUTF3tAT/rZ7qYG2pjwkaJhZ GBGjkHL+TvIvgB/j5gatwRqd4AiVNZq4VYIEGtFlSeEwj0PHVUV/6lVtJ8UsAtTKYCpI fmlaJz4KW/lOyNsZFLEMqzytcCy5Q72dOaPJXptbvJpstByIQdQK4x/T3zVlbwRUUJN9 NgPNoV2dxmom752SLoj8ijECZ/LouKIjG6AKp4K4ygR6HJWxYcOoRWeZJZj/ontQeYUW D7nhttUEKri/WNYBLt6mGCwb59cYtdiJCzpGPp1nLuEDEfkewcmnjjtUgMk3JIVCQVkd 5nFw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736534497; x=1737139297; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aGIlyFkOxdGIUAlLSc7B7HeTqLb/6yPSKg6NF5x4oeg=; b=KQSIkGvBPhJPWcj4Qv3ooei6WcXtk1rpi8xKTEGnzoZ7BAuAbtZfaHkmZ6p4CZfVet ihG+yNpHDItXpZkIMseb/tfkr8jZW6Nwe1H9nmLIWWPAIURO0/ytHCcDilRROCUZKMDC 1WJeQwe0L+BBweZjWM+jodfUAPOEn14Ii0XPzjm4XMHFj0fBjdYI8riawHhsYdZ4Aexc 4O1mhDBTm54Lf7EKzGob83nj/Lv6VhaEfvZvRZO3Ah2/uyNRdM3AhwctmCnedxpx682W MYSsc1xjt0JievJxeFplORZUcTCV0at8j0QtZgWbZeUiXaJRbgWmGtLAb13lXU3Ugf3G ng8Q== X-Forwarded-Encrypted: i=1; AJvYcCWbHx6jrZCI4CffHpTcEj9kL8V2OPdQWuPAGc91HXH79zdpMRtJoIlug/mCCvnyNX7LNJWWvQoleA==@kvack.org X-Gm-Message-State: AOJu0YwTCRU7XNMZaklNiWNXDkURT25HKh++SpVNXLFvbtnM3VgNeTq6 x6eFSJNeHqD8Sy5d+W4Ft+ahA77MIXiaGY0XpkS4N7hLwlAAm3yOAHm/V2UgMmVqCKe30/u4w8Z fQTTUGwLISA== X-Google-Smtp-Source: AGHT+IFRy9/+C5CnBm/ma6a3ETfEwH3prKyhyhclEMfLJAYKbacCY4ZfacRZW3At4TIIMxUaqY38zEy/zVSAnQ== X-Received: from wmdn10.prod.google.com ([2002:a05:600c:294a:b0:436:d819:e4eb]) (user=jackmanb job=prod-delivery.src-stubby-dispatcher) by 2002:a05:600c:5848:b0:436:f3f6:9582 with SMTP id 5b1f17b1804b1-436f3f695dfmr6272215e9.8.1736534497408; Fri, 10 Jan 2025 10:41:37 -0800 (PST) Date: Fri, 10 Jan 2025 18:40:49 +0000 In-Reply-To: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> Mime-Version: 1.0 References: <20250110-asi-rfc-v2-v2-0-8419288bc805@google.com> X-Mailer: b4 0.15-dev Message-ID: <20250110-asi-rfc-v2-v2-23-8419288bc805@google.com> Subject: [PATCH RFC v2 23/29] mm: asi: exit ASI before suspend-like operations From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Richard Henderson , Matt Turner , Vineet Gupta , Russell King , Catalin Marinas , Will Deacon , Guo Ren , Brian Cain , Huacai Chen , WANG Xuerui , Geert Uytterhoeven , Michal Simek , Thomas Bogendoerfer , Dinh Nguyen , Jonas Bonn , Stefan Kristiansson , Stafford Horne , "James E.J. Bottomley" , Helge Deller , Michael Ellerman , Nicholas Piggin , Christophe Leroy , Naveen N Rao , Madhavan Srinivasan , Paul Walmsley , Palmer Dabbelt , Albert Ou , Heiko Carstens , Vasily Gorbik , Alexander Gordeev , Christian Borntraeger , Sven Schnelle , Yoshinori Sato , Rich Felker , John Paul Adrian Glaubitz , "David S. Miller" , Andreas Larsson , Richard Weinberger , Anton Ivanov , Johannes Berg , Chris Zankel , Max Filippov , Arnd Bergmann , Andrew Morton , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Uladzislau Rezki , Christoph Hellwig , Masami Hiramatsu , Mathieu Desnoyers , Mike Rapoport , Arnaldo Carvalho de Melo , Namhyung Kim , Mark Rutland , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Dennis Zhou , Tejun Heo , Christoph Lameter , Sean Christopherson , Paolo Bonzini , Ard Biesheuvel , Josh Poimboeuf , Pawan Gupta Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-alpha@vger.kernel.org, linux-snps-arc@lists.infradead.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linux-parisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, linux-arch@vger.kernel.org, linux-mm@kvack.org, linux-trace-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org, kvm@vger.kernel.org, linux-efi@vger.kernel.org, Brendan Jackman , Yosry Ahmed X-Stat-Signature: ykyrnnxd151835jrjfrjsnrf6rf7exgx X-Rspamd-Queue-Id: EF68E1C000D X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736534498-563479 X-HE-Meta: U2FsdGVkX1/t/xPg4zq9zfqODuoV+M+fAPevYwtLPZqECZKVSbLs6E470dpaGQvMhCmB/2bIlJapA63u4gGbmDnsvGh1kGI8EEUSyj3lqQRQmSV6YR0JCJuUfPZu0a7QfzTJUnPbAckoeYKjt2RHmZZ8E7ViFEKA8WeuaZdtjvzNjq5eJ9NjPhZKF1EXJgytqiXmdtAaou251N8NAwaRB6tVC7Y64aBK/V8cO2dRnHDNpQJIuBKXB0+Gc6sL9qfFFEHZkLwoAhEe/2PzgyW6yV9OxQCRFcWUYIUtfaDmIbaoR0hRisDSzEvAGV4daFd44CulYHXFXEv34m4d3qlGQGCiExkYxD/6YNRF5XI1loi7/UP7pFdJ4DEReswK5MiaAzKrrfsQ29Q9YhJ6VTWo3HOZX6MjuoNYfBzZaGRen//refoLH+k+Ctj+JAweZCjxkBBVUpNGBEdWuPvE7TEcA1ib377TD8cmkD9qEonTyg+rFU9GAzcJnE82PzTDqRCeAD61i7mVvfow5hQQFNZoVjXO+657FAoC9L1DLk61bh2s9/J79l14pcW54o//z79CRy0TikeVTYkGoe9WpVZ98CzdA0Ue0S97g2iOUfb3wPVezE8ibXP8k+kfCzHva9PoV5h16j1YJvC7Bc1hUWd23PdLs1Em8sx2fT0N8OEcXwiSy4YtGOXf4qowqQhlbtAe+0roD70nS7uOAOAqX24lz3/ZwViUhcWV/D9w0fvQHftMkjgkG0EcI8ghLCpuLr6qZYunbv2i0OTtdrAJaqrYdUdjh3jlGtlehhQLmsaUEZ4lFrENuoY2HB9R/JSPnxMB5IunHW39JinE/gbfhb5S4jZQ5HeBCE2o8ZWpOZrm87mBmYDH7h+Rocy3q+HGVZKGkEQC2SUOM0L8/N+RqJtQjLG9aKzpCT2XEKlbMtnwIOkEZHOtQzsGYgCHHGQbs+tPBXo1aVImGd3AxvGvTBe dvFGToCB ++yHL8KaNViGWAN9kUHwm5IqyIW8zLWvwkc+5M8v7VRkPBTO5rNwi3+OUWAenHSHIN56ZrLfkh4KzFBDk+rcgVVVb5UlIhlUQmMrIW24OJ9H/mJdefp7OMVa3e1v5NivzKC+G3kraOT+brYi57jtGrQBD5xxuOUXT7V4pu2aj1bRfNOXLMdkad4mhMHH2uyRxL9xufmI734SeC327jLmJpACjNKfSjddsSdJmzJ/KI5VVav3qaeEBTa0RphwK9syFP/6fN0HszwOEMiFrQ4OW2TbLVRDzhGnAT4skKtlHXy/d+Xpqhbl9ugEdVpavI6yiHObBVnNqkHwmh8wz7K8+2DOyTReMMoQlsTiGFJw+JvS1R12UaXE1sgUFlxWDXOkAyg8k63vTNjlsiASc2gdTDhplhcZXAaExxBx5gSsHVvzPHEL3bc85JaizwhvlLRpBTaJV0dwPxU5lPDVpAkeZCXfUFeBsUhVlRRh2L7us59C3aXSgbNdoTeSoO8AqC9nOF9XLKkj8T41MQ8CwfjLISvNvArtt0arotqdsDhtgxHzaelPeiHJ8HnHo0NaN943YJVFg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Yosry Ahmed During suspend-like operations (suspend, hibernate, kexec w/ preserve_context), the processor state (including CR3) is usually saved and restored later. In the kexec case, this only happens when KEXEC_PRESERVE_CONTEXT is used to jump back to the original kernel. In relocate_kernel(), some registers including CR3 are stored in VA_CONTROL_PAGE. If preserve_context is set (passed into relocate_kernel() in RCX), after running the new kernel the code under 'virtual_mapped' restores these registers. This is similar to what happens in suspend and hibernate. Note that even when KEXEC_PRESERVE_CONTEXT is not set, relocate_kernel() still accesses CR3. It mainly reads and writes it to flush the TLB. This could be problematic and cause improper ASI enters (see below), but it is assumed to be safe because the kernel will essentially reboot in this case anyway. Saving and restoring CR3 in this fashion can cause a problem if the suspend/hibernate/kexec is performed within an ASI domain. A restricted CR3 will be saved, and later restored after ASI had potentially already exited (e.g. from an NMI after CR3 is stored). This will cause an _improper_ ASI enter, where code starts executing in a restricted address space, yet ASI metadata (especially curr_asi) says otherwise. Exit ASI early in all these paths by registering a syscore_suspend() callback. syscore_suspend() is called in all the above paths (for kexec, only with KEXEC_PRESERVE_CONTEXT) after IRQs are finally disabled before the operation. This is not currently strictly required but is convenient because when ASI gains the ability to persist across context switching, there will be additional synchronization requirements simplified by this. Note: If the CR3 accesses in relocate_kernel() when KEXEC_PRESERVE_CONTEXT is not set are concerning, they could be handled by registering a syscore_shutdown() callback to exit ASI. syscore_shutdown() is called in the kexec path where KEXEC_PRESERVE_CONTEXT is not set starting commit 7bb943806ff6 ("kexec: do syscore_shutdown() in kernel_kexec"). Signed-off-by: Yosry Ahmed Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index a9f9bfbf85eb47d16ef8d0bfbc7713f07052d3ed..c5073af1a82ded1c6fc467cd7a5d29a39d676bb4 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -6,6 +6,7 @@ #include #include +#include #include #include @@ -243,6 +244,32 @@ static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) return 0; } +#ifdef CONFIG_PM_SLEEP +static int asi_suspend(void) +{ + /* + * Must be called after IRQs are disabled and rescheduling is no longer + * possible (so that we cannot re-enter ASI before suspending. + */ + lockdep_assert_irqs_disabled(); + + /* + * Suspend operations sometimes save CR3 as part of the saved state, + * which is restored later (e.g. do_suspend_lowlevel() in the suspend + * path, swsusp_arch_suspend() in the hibernate path, relocate_kernel() + * in the kexec path). Saving a restricted CR3 and restoring it later + * could leave to improperly entering ASI. Exit ASI before such + * operations. + */ + asi_exit(); + return 0; +} + +static struct syscore_ops asi_syscore_ops = { + .suspend = asi_suspend, +}; +#endif /* CONFIG_PM_SLEEP */ + static int __init asi_global_init(void) { int err; @@ -306,6 +333,10 @@ static int __init asi_global_init(void) asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START + (1UL << PGDIR_SHIFT)); +#ifdef CONFIG_PM_SLEEP + register_syscore_ops(&asi_syscore_ops); +#endif + return 0; } subsys_initcall(asi_global_init)