From patchwork Fri Jul 12 17:00:35 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Brendan Jackman X-Patchwork-Id: 13732000 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 013DBC3DA4D for ; Fri, 12 Jul 2024 17:01:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 25E196B00AF; Fri, 12 Jul 2024 13:01:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1E6116B00B0; Fri, 12 Jul 2024 13:01:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0105E6B00B1; Fri, 12 Jul 2024 13:01:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CDAB56B00AF for ; Fri, 12 Jul 2024 13:01:45 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 3D32F12054B for ; Fri, 12 Jul 2024 17:01:45 +0000 (UTC) X-FDA: 82331717370.12.8E93C23 Received: from mail-wm1-f73.google.com (mail-wm1-f73.google.com [209.85.128.73]) by imf10.hostedemail.com (Postfix) with ESMTP id 010B2C0026 for ; Fri, 12 Jul 2024 17:01:42 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rblEXR1l; spf=pass (imf10.hostedemail.com: domain of 3dWGRZggKCLghYaikYlZemmejc.amkjglsv-kkitYai.mpe@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3dWGRZggKCLghYaikYlZemmejc.amkjglsv-kkitYai.mpe@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720803685; a=rsa-sha256; cv=none; b=rnE83kYg2qnE0iDYVJz2aAxA5Q7azrIwE0z5IAVDu3AXS5EuTSRUe/HEPdoFGLUhwTMFzX wtmLiDhxUvvzki2BVUA1baWkQX0w3GZl2ptR42Dmr9tdVqurF5wu2Vit16mLIhCAAh4Zbx 36cMJvmWJcJaW8EHWR1+KkyrBmhkQ4w= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=rblEXR1l; spf=pass (imf10.hostedemail.com: domain of 3dWGRZggKCLghYaikYlZemmejc.amkjglsv-kkitYai.mpe@flex--jackmanb.bounces.google.com designates 209.85.128.73 as permitted sender) smtp.mailfrom=3dWGRZggKCLghYaikYlZemmejc.amkjglsv-kkitYai.mpe@flex--jackmanb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720803685; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mWHMoCzbupyPLn4HJqmbJTZdF2QBYUbInE64x2NA2rE=; b=xfZE76rwq7CLvmOTxRAnXhRy5wyLep54e7jnDfSXui2ubN+yDh2Osz74OsA7cKUrwinVHx wAsRgdpERXGsoGKoQEujrYEBr+LPZNLegkbEe6fXdiaXwIuAaxYjrFdWm1uS3ysGohreMv 5wCYF62OtP9tnJZ/phqE+zioVLdw1EM= Received: by mail-wm1-f73.google.com with SMTP id 5b1f17b1804b1-426620721c2so15370935e9.2 for ; Fri, 12 Jul 2024 10:01:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1720803701; x=1721408501; darn=kvack.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mWHMoCzbupyPLn4HJqmbJTZdF2QBYUbInE64x2NA2rE=; b=rblEXR1l7C+VrAR8tY9mzVRYbwyr+5p0eDVrCIP7jlttGLsZvv3Vl4gQpL18gcbWiq U/pbq6sdqR/2LXF1zKXX/0r99cKwfHuXZguJmA/EIH5UylYcAFNxxa4pixZVvVvbibDu mzbxBqzZCSuMPza4NYqGz+NUZj622VoKDzHsBlLrhSSpeKslIwozFhjJACKvtTVaiFiv TvZyovKiVIBd3izO17muPHCC7/urG9XDMe21CRWzRSTdusxtFnv36YMWeG+wB+nordOw 1/QLgKMmmW4sgd+f1B3UI99n//s24rPXd2T0G76NHP75I7lHKNNzpjL38Fwm0kS23N2o zATA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720803701; x=1721408501; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mWHMoCzbupyPLn4HJqmbJTZdF2QBYUbInE64x2NA2rE=; b=MfmwFTHMMcdkrS+DSxEdDX6itSSOwnfz0QzsTPH1G8TgBZDWW9OTCEcds8iVGXhfJB miHll7enmddLqLBNJFKR8VLRn2gaZUzjwq/IWcboA6+XcG85+HQrn4BnyqvKXpBFGGaH r60Oxmcu7AfjDmsHJHig7WWj3bl1yMAZRSWDJxlyt/JJyLgZK23qIj76EmMk6C1z4Mku kYWaccbDI47+em4Y1pvOLPsF+zqngLTsjlWIokK+0EOlrZA4KbupAhR7adVf6q7Ok8gy QvWt7jE/0qsY/fwSsPM/kfEUf/Kqk7M5ONCOzrpfRrSvlpT/juL4EN827ZTx7qWXj73j 7+zQ== X-Forwarded-Encrypted: i=1; AJvYcCUcjMBIKV+R+J0suqzE92G1+4UM7Zsh+AdXyy6a31tEfuOJijq8uuqu7ofFfSU5u9OXmyTsn1eermnrKFy8ctF1lmE= X-Gm-Message-State: AOJu0Yw36BJ2RbSAw0ddok3+XSjTMSMQd1BUzf3JzG4G263PRndcBoVa MupwnlY4GmgcKohJ+sjIkTIBWJ+bldkPOukZqYNCvp6AUJ9V9wrjWWDITjTHbE+NtlAxdTy64WO WLaFUhwi2Mw== X-Google-Smtp-Source: AGHT+IH7bbw/Dn5jSbbHkSsXS63XrNwSXZ9nYhS09PC8vYQaS4DPRI1dYsfdXRcOP1EzG0hCXTQisDlHIVSzOg== X-Received: from beeg.c.googlers.com ([fda3:e722:ac3:cc00:28:9cb1:c0a8:11db]) (user=jackmanb job=sendgmr) by 2002:a05:600c:4f0e:b0:426:6a73:fb5f with SMTP id 5b1f17b1804b1-426708f9ab5mr1905165e9.7.1720803701319; Fri, 12 Jul 2024 10:01:41 -0700 (PDT) Date: Fri, 12 Jul 2024 17:00:35 +0000 In-Reply-To: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> Mime-Version: 1.0 References: <20240712-asi-rfc-24-v1-0-144b319a40d8@google.com> X-Mailer: b4 0.14-dev Message-ID: <20240712-asi-rfc-24-v1-17-144b319a40d8@google.com> Subject: [PATCH 17/26] mm: asi: Map kernel text and static data as nonsensitive From: Brendan Jackman To: Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Sean Christopherson , Paolo Bonzini , Alexandre Chartre , Liran Alon , Jan Setje-Eilers , Catalin Marinas , Will Deacon , Mark Rutland , Andrew Morton , Mel Gorman , Lorenzo Stoakes , David Hildenbrand , Vlastimil Babka , Michal Hocko , Khalid Aziz , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Valentin Schneider , Paul Turner , Reiji Watanabe , Junaid Shahid , Ofir Weisse , Yosry Ahmed , Patrick Bellasi , KP Singh , Alexandra Sandulescu , Matteo Rizzo , Jann Horn Cc: x86@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, kvm@vger.kernel.org, Brendan Jackman X-Stat-Signature: 6dnkufj4amybwka6fhd9h395745dxj4k X-Rspamd-Queue-Id: 010B2C0026 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720803702-307418 X-HE-Meta: U2FsdGVkX18afb65Esl3yCYZLwV5K5KIqO8Wa+ud+TYvdCrEDykXUtAt5xoTm4DJhmLLnE0OT78rQff7uJilb4FAaJSTZrDWtQZ4FhALwuLQTUECINOdNMXdS2OKpsyjYMcgXSg6sLEd/ZaHyi2aCZr8y0HwNKednCNzar0u1EVPbnpH4eExlbsINiMAjHIH8r5nMn7RtLbKEmv9dIdfoZ8CVrK3WK3ihWfYVkP8Do7iADbNWrq+vDT+LxXDNB7QQ/V28g6wAY9VCs3A+byc9mcy7lG1RfEhb7O1SgQdeQKRKfCXaSNEMHyH3lGYykzWPMfcFe/Vtoa5Fy9E5XEBAsRF6NmNarwjdiVhjPt47bGEzWUDF7rQvw+jqQJNcVfo4XZjvsAV0n8rCVarUb8WtepFAxXiRsUal9KAsqZTkC8Lk+t9UmvzgkPO+Oa3ioCoqgU/wXIHvJnzfSL4Tgo8JwMwqZXISQTUwcgpJQbIkaPHQypfNtU7C9drHqYkTA7cG9p+13F4KMkvUEGGbbvYKTY0y3TfuPbvdBrKVAQ5sz2XPP0vBAHBFpUr9AAxgPHR+e/fXpDQpZb4mnt2I0kwWMqUIfgmhecYmF90r8Je5ry7Cg8RtRekvyelevGG+ttlXDgBvMYo5yWJaBmHBzPDDm+j9KRA3vctAqD4cq08QDE1JFdgEH8l27s/5+BBZxa46f4eJqrnS2kkD0XcJ4qGLvYgXzXIibrYtY3x+nNv0qtQj9BZxW4w/eKUupch9xeBQOkK11JbxdE3RiZczIvNjQFb+MUGCLPqvPMgVab0xRXMWzBGEa3VidcZ4Mj7fIJZ5obs3YFoFyCm+zz3H9BK/r7Bl5mhoZ23+w2Gi5pSXS1ldA9HuA/XB7wXiCHXL7Lll2tTN7KAdkC3EBUQwG37ZYqIo7lKW6hagr3QMud/gIJemLnvJMEqpRm0oCcNORmqMkybMrathBoMQ6nGuED dxGLjYAG aWTUINO4d0GJcHog0YNHas0V9OGlNc2Oxm1ukNswECslPwC4jzK0WTV1bEuvCQuaCvULK2Y7gdv3fphxaYbIcKBJv46fPDNtLdVlKgNEBGkQJ04koJKMqTbuWoXPy7YRsOhj0pvhPNUi9oKztR8y7+oYD5V2w9DqkF3BsJxl+GmUtpNSPiBBoYzAuk2An181b2RBXMbg4+AV2JQRDbILoLstMN/K9YHyB2AIQ+c+DUTt0ir8KL69WOTdCmSR+/gbTZN48sHmpw2eNZIUIxzENu/3X8vFnkAKHsxNRl/mGOGeIpMISKBQlrTybM0BTYGSgOuCGI4hMkTIygOJiOhXqz0ExaHyebtPn5uurKIZmsksluE/Owdo60gAhpLMc9AOhyEJ9PU/Wxib/593/WxrGpAxXi0OBZ2ZYcGck7tJvhcM3SxvKccpLS7gFv44lA7Rad9+xj0smn7WvpkNS6+U9tty4qiqMHUpzA8aLXTz8InbJSz+dIXsDZXGI9eSVw47xmChDSD7wkzy/1at6ZBOqvlhNxR1chr2Fcyym X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Basically we need to map the kernel code and all its static variables. Per-CPU variables need to be treated specially as described in the comments. The cpu_entry_area is similar - this needs to be nonsensitive so that the CPU can access the GDT etc when handling a page fault. Under 5-level paging, most of the kernel memory comes under a single PGD entry (see Documentation/x86/x86_64/mm.rst. Basically, the mapping is for this big region is the same as under 4-level, just wrapped in an outer PGD entry). For that region, the "clone" logic is moved down one step of the paging hierarchy. Note that the p4d_alloc in asi_clone_p4d won't actually be used in practice; the relevant PGD entry will always have been populated by prior asi_map calls so this code would "work" if we just wrote p4d_offset (but asi_clone_p4d would be broken if viewed in isolation). The vmemmap area is not under this single PGD, it has its own 2-PGD area, so we still use asi_clone_pgd for that one. Signed-off-by: Brendan Jackman --- arch/x86/mm/asi.c | 106 +++++++++++++++++++++++++++++++++++++- include/asm-generic/vmlinux.lds.h | 11 ++++ 2 files changed, 116 insertions(+), 1 deletion(-) diff --git a/arch/x86/mm/asi.c b/arch/x86/mm/asi.c index 6e106f25abbb..891b8d351df8 100644 --- a/arch/x86/mm/asi.c +++ b/arch/x86/mm/asi.c @@ -7,8 +7,8 @@ #include #include -#include #include +#include #include #include #include @@ -184,8 +184,68 @@ void __init asi_check_boottime_disable(void) pr_info("ASI enablement ignored due to incomplete implementation.\n"); } +/* + * Map data by sharing sub-PGD pagetables with the unrestricted mapping. This is + * more efficient than asi_map, but only works when you know the whole top-level + * page needs to be mapped in the restricted tables. Note that the size of the + * mappings this creates differs between 4 and 5-level paging. + */ +static void asi_clone_pgd(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src = pgd_offset_pgd(src_table, addr); + pgd_t *dst = pgd_offset_pgd(dst_table, addr); + + if (!pgd_val(*dst)) + set_pgd(dst, *src); + else + WARN_ON_ONCE(pgd_val(*dst) != pgd_val(*src)); +} + +/* + * For 4-level paging this is exactly the same as asi_clone_pgd. For 5-level + * paging it clones one level lower. So this always creates a mapping of the + * same size. + */ +static void asi_clone_p4d(pgd_t *dst_table, pgd_t *src_table, size_t addr) +{ + pgd_t *src_pgd = pgd_offset_pgd(src_table, addr); + pgd_t *dst_pgd = pgd_offset_pgd(dst_table, addr); + p4d_t *src_p4d = p4d_alloc(&init_mm, src_pgd, addr); + p4d_t *dst_p4d = p4d_alloc(&init_mm, dst_pgd, addr); + + if (!p4d_val(*dst_p4d)) + set_p4d(dst_p4d, *src_p4d); + else + WARN_ON_ONCE(p4d_val(*dst_p4d) != p4d_val(*src_p4d)); +} + +/* + * percpu_addr is where the linker put the percpu variable. asi_map_percpu finds + * the place where the percpu allocator copied the data during boot. + * + * This is necessary even when the page allocator defaults to + * global-nonsensitive, because the percpu allocator uses the memblock allocator + * for early allocations. + */ +static int asi_map_percpu(struct asi *asi, void *percpu_addr, size_t len) +{ + int cpu, err; + void *ptr; + + for_each_possible_cpu(cpu) { + ptr = per_cpu_ptr(percpu_addr, cpu); + err = asi_map(asi, ptr, len); + if (err) + return err; + } + + return 0; +} + static int __init asi_global_init(void) { + int err; + if (!boot_cpu_has(X86_FEATURE_ASI)) return 0; @@ -205,6 +265,46 @@ static int __init asi_global_init(void) VMALLOC_START, VMALLOC_END, "ASI Global Non-sensitive vmalloc"); + /* Map all kernel text and static data */ + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)__START_KERNEL, + (size_t)_end - __START_KERNEL); + if (WARN_ON(err)) + return err; + err = asi_map(ASI_GLOBAL_NONSENSITIVE, (void *)FIXADDR_START, + FIXADDR_SIZE); + if (WARN_ON(err)) + return err; + /* Map all static percpu data */ + err = asi_map_percpu( + ASI_GLOBAL_NONSENSITIVE, + __per_cpu_start, __per_cpu_end - __per_cpu_start); + if (WARN_ON(err)) + return err; + + /* + * The next areas are mapped using shared sub-P4D paging structures + * (asi_clone_p4d instead of asi_map), since we know the whole P4D will + * be mapped. + */ + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + CPU_ENTRY_AREA_BASE); +#ifdef CONFIG_X86_ESPFIX64 + asi_clone_p4d(asi_global_nonsensitive_pgd, init_mm.pgd, + ESPFIX_BASE_ADDR); +#endif + /* + * The vmemmap area actually _must_ be cloned via shared paging + * structures, since mappings can potentially change dynamically when + * hugetlbfs pages are created or broken down. + * + * We always clone 2 PGDs, this is a corrolary of the sizes of struct + * page, a page, and the physical address space. + */ + WARN_ON(sizeof(struct page) * MAXMEM / PAGE_SIZE != 2 * (1UL << PGDIR_SHIFT)); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, VMEMMAP_START); + asi_clone_pgd(asi_global_nonsensitive_pgd, init_mm.pgd, + VMEMMAP_START + (1UL << PGDIR_SHIFT)); + return 0; } subsys_initcall(asi_global_init) @@ -482,6 +582,10 @@ static bool follow_physaddr( * Map the given range into the ASI page tables. The source of the mapping is * the regular unrestricted page tables. Can be used to map any kernel memory. * + * In contrast to some internal ASI logic (asi_clone_pgd and asi_clone_p4d) this + * never shares pagetables between restricted and unrestricted address spaces, + * instead it creates wholly new equivalent mappings. + * * The caller MUST ensure that the source mapping will not change during this * function. For dynamic kernel memory, this is generally ensured by mapping the * memory within the allocator. diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index f7749d0f2562..4eca33d62950 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -1021,6 +1021,16 @@ COMMON_DISCARDS \ } +/* + * ASI maps certain sections with certain sensitivity levels, so they need to + * have a page-aligned size. + */ +#ifdef CONFIG_MITIGATION_ADDRESS_SPACE_ISOLATION +#define ASI_ALIGN() ALIGN(PAGE_SIZE) +#else +#define ASI_ALIGN() . +#endif + /** * PERCPU_INPUT - the percpu input sections * @cacheline: cacheline size @@ -1042,6 +1052,7 @@ *(.data..percpu) \ *(.data..percpu..shared_aligned) \ PERCPU_DECRYPTED_SECTION \ + . = ASI_ALIGN(); \ __per_cpu_end = .; /**