From patchwork Fri Jul 12 17:00:34 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13731999 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5DFCAC3DA45 for ; Fri, 12 Jul 2024 17:01:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E6AFB6B00AD; Fri, 12 Jul 2024 13:01:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF2646B00AE; Fri, 12 Jul 2024 13:01:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD2B96B00AF; Fri, 12 Jul 2024 13:01:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 9094F6B00AD for ; Fri, 12 Jul 2024 13:01:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 47061160C47 for ; Fri, 12 Jul 2024 17:01:42 +0000 (UTC) X-FDA: 82331717244.29.6644A3D Received: from mail-wm1-f74.google.com (mail-wm1-f74.google.com [209.85.128.74]) by imf07.hostedemail.com (Postfix) with ESMTP id 296334002C for ; Fri, 12 Jul 2024 17:01:39 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KUC6nPi1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3cmGRZggKCLUeVXfhViWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3cmGRZggKCLUeVXfhViWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jackmanb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720803674; a=rsa-sha256; cv=none; b=qn/AIWVIpUFr9oT7V57OxWcPDXfpmRMnxbVysLSPMBEBQ0DMdEkAOLTzxDnbciCH0vuGTJ jkw0J2/V449Zmlf5JwZbPlxtCl/Q2ip89/eUNXMUs/lbVvI57Ua6TW7WpR37fWJd/7h14Z f1rUF2EbBPx0MlJ+bWwLjiZc2BnFPDo= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=KUC6nPi1; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of 3cmGRZggKCLUeVXfhViWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jackmanb.bounces.google.com designates 209.85.128.74 as permitted sender) smtp.mailfrom=3cmGRZggKCLUeVXfhViWbjjbgZ.Xjhgdips-hhfqVXf.jmb@flex--jackmanb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720803674; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tuve7JnQEpqqfaMLOqszce9B8olVou/+ZFnfWaWBcIQ=; b=NxD0woi7kKruZmG+taWo3HIFLWUj1E+uVDOu0D1KqsnWILhgtnpTMlttlmDNPoVMcHJu1D 2GpMECJoGujZWjAQkqRhmD2ggBhUnHyFNVSzc3rlQFT01rTlpoMdP4WzFRaCxqUscEaWV4 o5GuZVHILWGu9LcKxAGTZaSy1FbSSx8= Received: by mail-wm1-f74.google.com with SMTP id 5b1f17b1804b1-42796140873so13373425e9.2 for ; Fri, 12 Jul 2024 10:01:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803699; x=1721408499; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tuve7JnQEpqqfaMLOqszce9B8olVou/+ZFnfWaWBcIQ=; b=KUC6nPi1Ts8/7uu9mMu9YhW4JIq3jjh66TAmqqdg+PT1IzX7ly35Gx4FdW4JFnOrYh m+OhbcBt7VyvjBBpniVuvo1Vd7xQw7a0hpu+1Bu0t68nfWLom2jqaRK6ikjCNUWz/rUa 5y3KZk0VXebuaB07NKOef9sLbYGvgWy9syOAiIqQeT0aP0DYcVBzyn1IYWGZ7LGQBmwZ 8prGj/veAX3NDqDdAjPWFd8Lb613FxviCP+tfXqiJXKZrQAkhzqJ1S4QG9ObzFyAlfKU lJXy+KOLqV0jLFhKXY5dEJ9J/Y+wwt/7x4sRNoNi/OsUwzNh0Pf2Axvej3xIW6fmPG91 Clvg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803699; x=1721408499; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tuve7JnQEpqqfaMLOqszce9B8olVou/+ZFnfWaWBcIQ=; b=KkeeiMXLO7ucqhOWL7rGyPSmAXP2Ug/vqF8LXrGa1Y1/GKlI8m2TBpg75pQ4Q9CO9W 3hRdrNKqryhur7/MokrBbzUE87jV4T7Rufx/N5OmhwL/KqmBLBCmdcouO7ge/efJIRVS DvW4dzTRuhlEbuAk+Uwf6V1eojXuF/nYebTBQkd+iTtmT9QaH000O2Ka+TNsFISs5nlr qGDH4oVAF75taDcokoGSi8Xdi2L7xkIB0xQ3JX8p7ZbONK+WeH2mZTOzyVgltLx9UoZL LehZjhBIPp4ctOwZ7Eazgk3Hwm+//SXanJXWOCppVXL4t+W+2ICsBkX00fkNtgr+UhSD fKlg== X-Forwarded-Encrypted: i=1; AJvYcCWLPQdTgxOifZNfWUwdpikzYjuuNZvYtsHqBuYfmNVnSQO99qfpjy6DHF9iar3acY5HK9UWan5VfiFFxQImXoF2zIA= X-Gm-Message-State: AOJu0Yx+ODIlpbo2G8RpH7NAgW39Yz1rjxUA8N4Xs4M5+7gMjSAVWIee TsHZ1Le1VLkggWEirmB3wONJWO8qd7IQ11Q40q8BuSYZEWTA/dUADTN4uPBTVQAU51qJ32BFfBV AVnYrdBH5pA== X-Google-Smtp-Source: AGHT+IE/LeQCdWU5upInTaAHGJKI73E8a+vH6WinMVUM+08W4ltowKkHhgsLti9XyJHiOTCwn2ccsqpZeVTo4A== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:6000:14c:b0:360:727b:8b5d with SMTP id ffacd0b85a97d-367cea738dbmr25058f8f.6.1720803698587; Fri, 12 Jul 2024 10:01:38 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:34 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-16-144b319a40d8@google.com> Subject: [PATCH 16/26] mm: asi: Map non-user buddy allocations as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman X-Rspamd-Queue-Id: 296334002C X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: o1x5gcin78ehjty7fftme7tcs1sghmxn X-HE-Tag: 1720803699-816883 X-HE-Meta: U2FsdGVkX18lzbbXg1mHZrkPx3eVH6O5UWtqzjoqIykp6Q5DDhceP1HAN0I8BZwL/nIChTiwysy+ruy6lb1dYUKV8EvTq65JIFkOC6VJ8Xkl25dlNX+1ZOEwVjqZ0hoZ1QzE7W5j/Xur9zDZrDL59TwmQmlzJD1Bo12sl7wWhNTQN/pnnhwOuocyM+kDvXNL+iHU8Jp5ZhLtZgg39zqHhJmCXCMKQ5+YjsyUw+1Xtec7hJYlorX//xH1OxaZQdiJHvOiCsJvwovIXjOzlpVPP5y7iTUvijbqWfiDgBnfC6bqN2P9jE22/S0RUqxmJQpFBkv/03zWodJF+ujdpcZR6qatPWuM/7gJwUW03e9HO3L4cr3uiZRp0ONKXjVD+Lp7fU3bxXCKPGZTQ15XtBj3UKqLgud8qv2xFj+oO9oV+PLFenB51xkirrQGK+Hytj8HgXbUKGvrh7esTPfuxnu+1Q4ICm1Vu9NGvtei7AMDsmju2G0gYnxnlbZFfb1g6Zsa8B15o8OfCcvNtw/f7sxStB8vvELc7rDnRmR4osROzRACsC0b2i1E7hruW5VBQdcGOCKTO3FPOArdGTaqzEJwe+cDSEQH4lk3EQWhXuqWTQjxGznBeAphPVAdFeHjfuJhd+/mu7Eu9omFI10E3dxfHoY15xt+wXsdvfpXgh/cxL0X4/zYXZaMd9X0XhAber/cYAtFf0BzAdC89/pefjb0zGJTl4ySYDZaRlrbdQnGAT7kP5pekHH25mMUK1OF6xJGiATjnLWgHnA565wIOQxuKCHTyG5Nh1Jr3uctepsU5aBHVW6kRtzrSno7AkE+wadkt1s2trkziulXI0HKkGVCeyPyZA/H1ogujG7BUvO9TN4P0xPKE5PYo6MZHntZZTkCetv4rsQm+XgipwtOYJAaOMO+a0gWB0W31qyijkrhyemv53r8ohqVj5Jssx76/mnsy5+VwC+SEqbDSxDrqpp QeZOq1Rc n+pDxpkk5FkEqdWfITLQBLcMUMaTayWyrEX3OPALf8UaULTZnzgMdPvGhZAnpGEu+dD/Ghs3t78n2ZK4etmpfVpnC0ayctjLOv8Uw/hNTmEm/utSf5SJ4sd0YTFubJ+bQmp1oyzwtfqFibJLvBBVc7k8mWe2ahOZMX7e3IiT0S2VtGXvtzQvNfLtGZJx1LE/j3I50hR/1G++zZl4QBn5VvaM+ZjxpKkP02+EXw3JUe6NHCscqpiqPhnWoxTd+wuPw5T5Fg8m5BptrcMM1ITjKTT3gnBwgnqKCOSePLxfLVtJY9KC/dw+BZANw5cH6k//7uw4wNVciMWq7eOXfKdPxbuSvOPjFFoeJhAcHjbfRk6AqcJncqLMyNH4q3GlkRB6+kX6ogAmp7iYclxR4bZAPz2oW8QtRosvI2vU1ZkWeZRsJU5t+m84GT9Zg/yHi+GwMl1zRYJ0jcITkDyvIOYnlXVlH1o65nmE6mRAD3uIgeDt0DAk13zMktOxWwsehhgU3nhhrDOfCaEKtrMLvHwHr+wRP/AKz6VnQf+Dl+yCKz+NYSyg7f6npAi5Kjdshqw8PO0Wa2LfH7x0+m+21sKfm7MvFNKAlnB9eYTq5UQ4t4JLSdQg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is just simplest possible page_alloc patch I could come up with to demonstrate ASI working in a "denylist" mode: we map the direct map into the restricted address space, except pages allocated with GFP_USER. Pages must be asi_unmap()'d before they can be re-allocated. This requires a TLB flush, which can't generally be done from the free path (requires IRQs on), so pages that need unmapping are freed via a workqueue. This solution is silly for at least the following reasons: - If the async queue gets long, we'll run out of allocatable memory. - We don't batch the TLB flushing or worker wakeups at all. - We drop FPI flags and skip the pcplists. Internally at Google we've so far found with plenty of extra complexity we're able to make the principle work for the workloads we've tested so far, but it seems likely we'll hit a wall where tuning gets impossible. So instead for the [PATCH] version I hope to come up with an implementation that instead just makes the allocator more deeply aware of sensitivity, most likely this will look a bit like an extra "dimension" like movability etc. This was discussed at LSF/MM/BPF [1] but I haven't made time to experiment on it yet. With this smarter approach, it should also be possible to remove the pageflag, as other contextual information will let us know if a page is mapped in the restricted address space (the page tables also reflect this status...). [1] https://youtu.be/WD9-ey8LeiI The main thing in here that is "real" and may warrant discussion is __GFP_SENSITIVE (or at least, some sort of allocator switch to determine sensitivity, in an "allowlist" model we would probably have the opposite, and in future iterations we might want additional options for different "types" of sensitivity). I think we need this as an extension to the allocation API; the main alternative would be to infer from context of the allocation whether the data should be treated as sensitive; however I think we will have contexts where both sensitive and nonsensitive data needs to be allocatable. If there are concerns about __GFP flags specifically, rather than just the general problem of expanding the allocator API, we could always just provide an API like __alloc_pages_sensitive or something, implemented with ALLOC_ flags internally. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 33 +++++++++- include/linux/gfp_types.h | 15 ++++- include/linux/page-flags.h | 9 +++ include/trace/events/mmflags.h | 12 +++- mm/page_alloc.c | 143 ++++++++++++++++++++++++++++++++++++++++- tools/perf/builtin-kmem.c | 1 + 6 files changed, 208 insertions(+), 5 deletions(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 807d51497f43a..6e106f25abbb9 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -5,6 +5,8 @@ #include #include +#include + #include #include #include @@ -102,10 +104,17 @@ EXPORT_SYMBOL_GPL(asi_unregister_class); * allocator from interrupts and the page allocator ultimately calls this * code. * - They support customizing the allocation flags. + * - They avoid infinite recursion when the page allocator calls back to + * asi_map * * On the other hand, they do not use the normal page allocation infrastructure, * that means that PTE pages do not have the PageTable type nor the PagePgtable * flag and we don't increment the meminfo stat (NR_PAGETABLE) as they do. + * + * As an optimisation we attempt to map the pagetables in + * ASI_GLOBAL_NONSENSITIVE, but this can fail, and for simplicity we don't do + * anything about that. This means it's invalid to access ASI pagetables from a + * critical section. */ static_assert(!IS_ENABLED(CONFIG_PARAVIRT)); #define DEFINE_ASI_PGTBL_ALLOC(base, level) \ @@ -114,8 +123,11 @@ static level##_t * asi_##level##_alloc(struct asi *asi, \ gfp_t flags) \ { \ if (unlikely(base##_none(*base))) { \ - ulong pgtbl = get_zeroed_page(flags); \ + /* Stop asi_map calls causing recursive allocation */ \ + gfp_t pgtbl_gfp = flags | __GFP_SENSITIVE; \ + ulong pgtbl = get_zeroed_page(pgtbl_gfp); \ phys_addr_t pgtbl_pa; \ + int err; \ \ if (!pgtbl) \ return NULL; \ @@ -129,6 +141,16 @@ static level##_t * asi_##level##_alloc(struct asi *asi, \ } \ \ mm_inc_nr_##level##s(asi->mm); \ + \ + err = asi_map_gfp(ASI_GLOBAL_NONSENSITIVE, \ + (void *)pgtbl, PAGE_SIZE, flags); \ + if (err) \ + /* Should be rare. Spooky. */ \ + pr_warn_ratelimited("Created sensitive ASI %s (%pK, maps %luK).\n",\ + #level, (void *)pgtbl, addr); \ + else \ + __SetPageGlobalNonSensitive(virt_to_page(pgtbl));\ + \ } \ out: \ VM_BUG_ON(base##_leaf(*base)); \ @@ -469,6 +491,9 @@ static bool follow_physaddr( * reason for this is that we don't want to unexpectedly undo mappings that * weren't created by the present caller. * + * This must not be called from the critical section, as ASI's pagetables are + * not guaranteed to be mapped in the restricted address space. + * * If the source mapping is a large page and the range being mapped spans the * entire large page, then it will be mapped as a large page in the ASI page * tables too. If the range does not span the entire huge page, then it will be @@ -492,6 +517,9 @@ int __must_check asi_map_gfp(struct asi *asi, void *addr, unsigned long len, gfp if (!static_asi_enabled()) return 0; + /* ASI pagetables might be sensitive. */ + WARN_ON_ONCE(asi_in_critical_section()); + VM_BUG_ON(!IS_ALIGNED(start, PAGE_SIZE)); VM_BUG_ON(!IS_ALIGNED(len, PAGE_SIZE)); VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ @@ -591,6 +619,9 @@ void asi_unmap(struct asi *asi, void *addr, size_t len) if (!static_asi_enabled() || !len) return; + /* ASI pagetables might be sensitive. */ + WARN_ON_ONCE(asi_in_critical_section()); + VM_BUG_ON(start & ~PAGE_MASK); VM_BUG_ON(len & ~PAGE_MASK); VM_BUG_ON(!fault_in_kernel_space(start)); /* Misnamed, ignore "fault_" */ diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h index 13becafe41df0..d33953a1c9b28 100644 --- a/include/linux/gfp_types.h +++ b/include/linux/gfp_types.h @@ -55,6 +55,7 @@ enum { #ifdef CONFIG_LOCKDEP ___GFP_NOLOCKDEP_BIT, #endif + ___GFP_SENSITIVE_BIT, ___GFP_LAST_BIT }; @@ -95,6 +96,11 @@ enum { #else #define ___GFP_NOLOCKDEP 0 #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define ___GFP_SENSITIVE BIT(___GFP_SENSITIVE_BIT) +#else +#define ___GFP_SENSITIVE 0 +#endif /* * Physical address zone modifiers (see linux/mmzone.h - low four bits) @@ -284,6 +290,12 @@ enum { /* Disable lockdep for GFP context tracking */ #define __GFP_NOLOCKDEP ((__force gfp_t)___GFP_NOLOCKDEP) +/* + * Allocate sensitive memory, i.e. do not map it into ASI's restricted address + * space. + */ +#define __GFP_SENSITIVE ((__force gfp_t)___GFP_SENSITIVE) + /* Room for N __GFP_FOO bits */ #define __GFP_BITS_SHIFT ___GFP_LAST_BIT #define __GFP_BITS_MASK ((__force gfp_t)((1 << __GFP_BITS_SHIFT) - 1)) @@ -365,7 +377,8 @@ enum { #define GFP_NOWAIT (__GFP_KSWAPD_RECLAIM | __GFP_NOWARN) #define GFP_NOIO (__GFP_RECLAIM) #define GFP_NOFS (__GFP_RECLAIM | __GFP_IO) -#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | __GFP_HARDWALL) +#define GFP_USER (__GFP_RECLAIM | __GFP_IO | __GFP_FS | \ + __GFP_HARDWALL | __GFP_SENSITIVE) #define GFP_DMA __GFP_DMA #define GFP_DMA32 __GFP_DMA32 #define GFP_HIGHUSER (GFP_USER | __GFP_HIGHMEM) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 57fa58899a661..d4842cd1fb59a 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -135,6 +135,9 @@ enum pageflags { #ifdef CONFIG_ARCH_USES_PG_ARCH_X PG_arch_2, PG_arch_3, +#endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + PG_global_nonsensitive, #endif __NR_PAGEFLAGS, @@ -642,6 +645,12 @@ FOLIO_TEST_CLEAR_FLAG(young, FOLIO_HEAD_PAGE) FOLIO_FLAG(idle, FOLIO_HEAD_PAGE) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +__PAGEFLAG(GlobalNonSensitive, global_nonsensitive, PF_ANY); +#else +__PAGEFLAG_FALSE(GlobalNonSensitive, global_nonsensitive); +#endif + /* * PageReported() is used to track reported free pages within the Buddy * allocator. We can use the non-atomic version of the test and set diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index d55e53ac91bd2..416a79fe1a66d 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -50,7 +50,8 @@ gfpflag_string(__GFP_RECLAIM), \ gfpflag_string(__GFP_DIRECT_RECLAIM), \ gfpflag_string(__GFP_KSWAPD_RECLAIM), \ - gfpflag_string(__GFP_ZEROTAGS) + gfpflag_string(__GFP_ZEROTAGS), \ + gfpflag_string(__GFP_SENSITIVE) #ifdef CONFIG_KASAN_HW_TAGS #define __def_gfpflag_names_kasan , \ @@ -95,6 +96,12 @@ #define IF_HAVE_PG_ARCH_X(_name) #endif +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define IF_HAVE_ASI(_name) ,{1UL << PG_##_name, __stringify(_name)} +#else +#define IF_HAVE_ASI(_name) +#endif + #define DEF_PAGEFLAG_NAME(_name) { 1UL << PG_##_name, __stringify(_name) } #define __def_pageflag_names \ @@ -125,7 +132,8 @@ IF_HAVE_PG_HWPOISON(hwpoison) \ IF_HAVE_PG_IDLE(idle) \ IF_HAVE_PG_IDLE(young) \ IF_HAVE_PG_ARCH_X(arch_2) \ -IF_HAVE_PG_ARCH_X(arch_3) +IF_HAVE_PG_ARCH_X(arch_3) \ +IF_HAVE_ASI(global_nonsensitive) #define show_page_flags(flags) \ (flags) ? __print_flags(flags, "|", \ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 14d39f34d3367..1e71ee9ae178c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1081,6 +1081,8 @@ static void kernel_init_pages(struct page *page, int numpages) kasan_enable_current(); } +static bool asi_async_free_enqueue(struct page *page, unsigned int order); + __always_inline bool free_pages_prepare(struct page *page, unsigned int order) { @@ -1177,7 +1179,7 @@ __always_inline bool free_pages_prepare(struct page *page, debug_pagealloc_unmap_pages(page, 1 << order); - return true; + return !asi_async_free_enqueue(page, order); } /* @@ -4364,6 +4366,136 @@ static inline bool prepare_alloc_pages(gfp_t gfp_mask, unsigned int order, return true; } +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION + +struct asi_async_free_cpu_state { + struct work_struct work; + struct list_head to_free; +}; +static DEFINE_PER_CPU(struct asi_async_free_cpu_state, asi_async_free_cpu_state); + +static bool async_free_work_initialized; + +static void asi_async_free_work_fn(struct work_struct *work) +{ + struct asi_async_free_cpu_state *cpu_state = + container_of(work, struct asi_async_free_cpu_state, work); + struct page *page, *tmp; + struct list_head to_free = LIST_HEAD_INIT(to_free); + + local_irq_disable(); + list_splice_init(&cpu_state->to_free, &to_free); + local_irq_enable(); /* IRQs must be on for asi_unmap. */ + + /* Use _safe because __free_the_page uses .lru */ + list_for_each_entry_safe(page, tmp, &to_free, lru) { + unsigned long order = page_private(page); + + asi_unmap(ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE << order); + for (int i = 0; i < (1 << order); i++) + __ClearPageGlobalNonSensitive(page + i); + + /* + * Note weird loop-de-loop here, we might already have called + * __free_pages_ok for this page, but now we've cleared + * PageGlobalNonSensitive so it won't end up back on the queue + * again. + */ + __free_pages_ok(page, order, FPI_NONE); + cond_resched(); + } +} + +/* Returns true if the page was queued for asynchronous freeing. */ +static bool asi_async_free_enqueue(struct page *page, unsigned int order) +{ + struct asi_async_free_cpu_state *cpu_state; + unsigned long flags; + + if (!PageGlobalNonSensitive(page)) + return false; + + local_irq_save(flags); + cpu_state = this_cpu_ptr(&asi_async_free_cpu_state); + set_page_private(page, order); + list_add(&page->lru, &cpu_state->to_free); + local_irq_restore(flags); + + return true; +} + +static int __init asi_page_alloc_init(void) +{ + int cpu; + + if (!static_asi_enabled()) + return 0; + + for_each_possible_cpu(cpu) { + struct asi_async_free_cpu_state *cpu_state + = &per_cpu(asi_async_free_cpu_state, cpu); + + INIT_WORK(&cpu_state->work, asi_async_free_work_fn); + INIT_LIST_HEAD(&cpu_state->to_free); + } + + /* + * This function is called before SMP is initialized, so we can assume + * that this is the only running CPU at this point. + */ + + barrier(); + async_free_work_initialized = true; + barrier(); + + return 0; +} +early_initcall(asi_page_alloc_init); + +static int asi_map_alloced_pages(struct page *page, uint order, gfp_t gfp_mask) +{ + + if (!static_asi_enabled()) + return 0; + + if (!(gfp_mask & __GFP_SENSITIVE)) { + int err = asi_map_gfp( + ASI_GLOBAL_NONSENSITIVE, page_to_virt(page), + PAGE_SIZE * (1 << order), gfp_mask); + uint i; + + if (err) + return err; + + for (i = 0; i < (1 << order); i++) + __SetPageGlobalNonSensitive(page + i); + } + + return 0; +} + +#else /* CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION */ + +static inline +int asi_map_alloced_pages(struct page *pages, uint order, gfp_t gfp_mask) +{ + return 0; +} + +static inline +bool asi_unmap_freed_pages(struct page *page, unsigned int order) +{ + return true; +} + +static bool asi_async_free_enqueue(struct page *page, unsigned int order) +{ + return false; +} + +#endif + /* * __alloc_pages_bulk - Allocate a number of order-0 pages to a list or array * @gfp: GFP flags for the allocation @@ -4551,6 +4683,10 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp)) return NULL; + /* Clear out old (maybe sensitive) data before reallocating as nonsensitive. */ + if (!static_asi_enabled() && !(gfp & __GFP_SENSITIVE)) + gfp |= __GFP_ZERO; + gfp &= gfp_allowed_mask; /* * Apply scoped allocation constraints. This is mainly about GFP_NOFS @@ -4597,6 +4733,11 @@ struct page *__alloc_pages(gfp_t gfp, unsigned int order, int preferred_nid, trace_mm_page_alloc(page, order, alloc_gfp, ac.migratetype); kmsan_alloc_page(page, order, alloc_gfp); + if (page && unlikely(asi_map_alloced_pages(page, order, gfp))) { + __free_pages(page, order); + page = NULL; + } + return page; } EXPORT_SYMBOL(__alloc_pages); diff --git a/tools/perf/builtin-kmem.c b/tools/perf/builtin-kmem.c index 9714327fd0ead..912497b7b1c3f 100644 --- a/tools/perf/builtin-kmem.c +++ b/tools/perf/builtin-kmem.c @@ -682,6 +682,7 @@ static const struct { { "__GFP_RECLAIM", "R" }, { "__GFP_DIRECT_RECLAIM", "DR" }, { "__GFP_KSWAPD_RECLAIM", "KR" }, + { "__GFP_SENSITIVE", "S" }, }; static size_t max_gfp_len;