From patchwork Wed Jan 17 14:46:52 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13521802 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7B6A1C47DA2 for ; Wed, 17 Jan 2024 14:48:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 04D6F8D0005; Wed, 17 Jan 2024 09:48:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F16C46B0101; Wed, 17 Jan 2024 09:48:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1CC48D0005; Wed, 17 Jan 2024 09:48:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BAF7F6B0100 for ; Wed, 17 Jan 2024 09:48:00 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 8AEE3A221A for ; Wed, 17 Jan 2024 14:48:00 +0000 (UTC) X-FDA: 81689082720.06.7EF8D0E Received: from smtp-fw-80008.amazon.com (smtp-fw-80008.amazon.com [99.78.197.219]) by imf14.hostedemail.com (Postfix) with ESMTP id 4F80F10000F for ; Wed, 17 Jan 2024 14:47:58 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=sVQZtU2L; spf=pass (imf14.hostedemail.com: domain of "prvs=7399cbc58=graf@amazon.de" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=7399cbc58=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705502878; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0f/NY4/GJeqPbHKErmiTmEXSMGDq04zozs1icLfM5Vo=; b=c4gKGhyYQNqhAEZdVA6Uo2QxAHNMkPb4D7+20r7Z+bwCHvwHahQKrxdxSutYPXTkqf0QgW 3tYvZ6tSTgp5sAsRkRxrcrX+dm78X9LIvSO290mbYVHhrI0X5GKSCR3U1g76oMPuv6RSXk ZMilGkdhs55Df6ajU61xpzjQ/ciCxsg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=sVQZtU2L; spf=pass (imf14.hostedemail.com: domain of "prvs=7399cbc58=graf@amazon.de" designates 99.78.197.219 as permitted sender) smtp.mailfrom="prvs=7399cbc58=graf@amazon.de"; dmarc=pass (policy=quarantine) header.from=amazon.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705502878; a=rsa-sha256; cv=none; b=ticrRxwh/3gcu+u1LIlZwLFGO309ysfNaBt2YbuQ5t+ZV7vXioZJZ/Rv9ZGhIyxds26Ldj a3jQnmjHI9h2agRK12CJ0xWd+NJS+vBHrn4LZwS1lrcGXYsAQ3EcjTEPSZYy9H5/JlH9fS ajLPuhYpZuYf+3RyEGpJ09MB9ftVo9E= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1705502878; x=1737038878; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=0f/NY4/GJeqPbHKErmiTmEXSMGDq04zozs1icLfM5Vo=; b=sVQZtU2L8OBE17jklfn6s6Hd0Dl1nRXGeqi752pdUgnj7KGEdPLD6VWx g21mkenptsTqa2lZ8c/mXchWskHhk9KbCu/G7R60e3PolC+2jd/V+2kkK 6UX/zQXdD4XhRvIK3fe5cU7KuemK5rgGKazjO/BxDdYQxGZ/8Eh+pVM0m w=; X-IronPort-AV: E=Sophos;i="6.05,200,1701129600"; d="scan'208";a="58980014" Received: from pdx4-co-svc-p1-lb2-vlan3.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-26a610d2.us-west-2.amazon.com) ([10.25.36.214]) by smtp-border-fw-80008.pdx80.corp.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 17 Jan 2024 14:47:54 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2b-m6i4x-26a610d2.us-west-2.amazon.com (Postfix) with ESMTPS id 80DDA40DB0; Wed, 17 Jan 2024 14:47:53 +0000 (UTC) Received: from EX19MTAUWC001.ant.amazon.com [10.0.21.151:4210] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.60.181:2525] with esmtp (Farcaster) id 5a2ee75b-be9c-4aee-a0cb-9bfddab2f713; Wed, 17 Jan 2024 14:47:53 +0000 (UTC) X-Farcaster-Flow-ID: 5a2ee75b-be9c-4aee-a0cb-9bfddab2f713 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWC001.ant.amazon.com (10.250.64.174) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 17 Jan 2024 14:47:52 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 17 Jan 2024 14:47:48 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H . Peter Anvin" , Andy Lutomirski , Peter Zijlstra , Steven Rostedt , Andrew Morton , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt , Rob Herring , Krzysztof Kozlowski Subject: [PATCH v3 05/17] kexec: Add KHO support to kexec file loads Date: Wed, 17 Jan 2024 14:46:52 +0000 Message-ID: <20240117144704.602-6-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20240117144704.602-1-graf@amazon.com> References: <20240117144704.602-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D045UWC004.ant.amazon.com (10.13.139.203) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 4F80F10000F X-Rspam-User: X-Stat-Signature: dmtbcx3xjq1mktkt6znp5g8g44oueran X-Rspamd-Server: rspam01 X-HE-Tag: 1705502878-855370 X-HE-Meta: U2FsdGVkX18jPbY4If/spUOCMNsV2bMuMKMiBo1JeTmnbnpdNZhTYwrbOpHvZxQfqSmuuHYlG60NK7Zvb91FDS70OcZk57f3XNW/Sg6pKHJe5BwVzYr9PCwDCHcGeDtSQGnk9PUxi5ZgjqArTIjWashmKqNCOWWZK0CfnUG157Xlvfas74Ja0VrXQTMq9P8qnbsLXmIBgkmm5UaRv/nBjTB+szzG1tjy30o5fQlKB01tYkg681NwVc5MMRolu9p1uGaRPiEwLl69nBN/fCD0R5Lf2gElqBUmj21jafeU32aUFDtOwtpae9sfHkDEKWoeAch7keME+oHtvBE6qiPs0pxeoX8Cof7mobVGM84DgfjOtKI4wGJoqloTd7VLKaxpb5NPDAAWAdCh9IpXOfdaoH1XXR6kZMvShVqtWHw6cM6+Et8Eo36mlbjq8rxCkCSxNtSUN3qGwXh+aM49OZ6xFcGRlxDy2lx0De0Q/dPqmEf80yGxVllCgVuhCzhKQYV2K4ikmOlGgEpSxTfw2TKfsTJQKL/VHDNs/IobKIAdMiWT+6pzdoS1krYUH9jxsN2x1JTxg6/Priv88giKGS4/LtVKN5uedE8XGAW8liRvpFVv4cpviQ4EDY8tBG5QWs+N/1dejz+RF/DYsh7MQ6QVG0Mf4fQbjsC72K8yFfe0CSpYEMlf59IOQ7xCrzIZHgGa8TMuIu3/0MvESIDqbAtALXIrdhiYoDj7kBaof/1WQZ5vfUmbM+zaJ7G0sqIrC7bCrPp3XK1QzDYGZpZrSwFskCeVLiNOrdmuCYPeuB0SmwwAKx3tyostXe8mWVi103BjkgXp+dKnkdnLHQ786igFmCgaK2PdRR3sIzaadXGGVBsiACNW4TBlZtgOSsdn6uqhIeo20OoyRlihgd9QH8yj7HI339X7R6/sbjnxMazzmuAD/vkneNmCG1U+G6SB5wS7oa0FQjrIzgQo/S3nimY HkSzWN7K y4DbvQqlRrqzg9Od8SdXjivzz6+mqQ5or1RaTeQ/HyJmqOvGJzu6W7pvjTsDxklRrFxIqNnXUIe2kLS/lmu+uOOWGKeiU7i4r4U9jv5jZdeG4lDsACy1IXe4a2SLgEGfRymNOilJA8/6K2CnRsejCFAJZHWKKK/nFvjv92Ircdrf4MNoAKGiEZnoubfJtQ0FOpx62WvwcmILvPPUExiUo4BEgT5G2hW+kA7wx9ysymF2KjSwO9hy893qyBEr1DDv/VQVTfyuQE6dZ1hsvUpG6fb8jim2yJR+rrnq+NgZM0d5RIwgkL8OC//yjZhf3ccrBMIrcdfWBaclvtORNQ9I5LW6Udr/E6WkZjZhvH4+qo4GZyI3LQ35UsgmPNF5nrg46EF0wI+awgpW/aBGe1HaEu8lt6f7uNRuXJRP/wk3ZZuwj+SLIMACEXefBCjKVI7JyS9EdkVQwPPGy0i3Rrl0NkeTtzly305zK1cSdCdae3tVHJm+gkfw4pMowqtSISTNfCDLRLBIKBdpsx+IZ7v8Ity09tgQrFMDELU2B61wtzxFugWfg08IL5G3rYzzZR+DKHTgxInwMl9tMFA/NezsjjVg+Tb4Fz+TwNJF+WPOh2+Clyi1btb0+047aaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kexec has 2 modes: A user space driven mode and a kernel driven mode. For the kernel driven mode, kernel code determines the physical addresses of all target buffers that the payload gets copied into. With KHO, we can only safely copy payloads into the "scratch area". Teach the kexec file loader about it, so it only allocates for that area. In addition, enlighten it with support to ask the KHO subsystem for its respective payloads to copy into target memory. Also teach the KHO subsystem how to fill the images for file loads. Signed-off-by: Alexander Graf --- include/linux/kexec.h | 9 ++ kernel/kexec_file.c | 41 ++++++++ kernel/kexec_kho_out.c | 210 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 260 insertions(+) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index eabf9536466a..225ef2222eb9 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -362,6 +362,13 @@ struct kimage { size_t ima_buffer_size; #endif +#ifdef CONFIG_KEXEC_KHO + struct { + struct kexec_buf dt; + struct kexec_buf mem_cache; + } kho; +#endif + /* Core ELF header buffer */ void *elf_headers; unsigned long elf_headers_sz; @@ -550,6 +557,7 @@ static inline bool is_kho_boot(void) /* egest handover metadata */ void kho_reserve_scratch(void); +int kho_fill_kimage(struct kimage *image); int register_kho_notifier(struct notifier_block *nb); int unregister_kho_notifier(struct notifier_block *nb); bool kho_is_active(void); @@ -567,6 +575,7 @@ static inline bool is_kho_boot(void) { return false; } /* egest handover metadata */ static inline void kho_reserve_scratch(void) { } +static inline int kho_fill_kimage(struct kimage *image) { return 0; } static inline int register_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline int unregister_kho_notifier(struct notifier_block *nb) { return -EINVAL; } static inline bool kho_is_active(void) { return false; } diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index bef2f6f2571b..28fa60b51828 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -113,6 +113,13 @@ void kimage_file_post_load_cleanup(struct kimage *image) image->ima_buffer = NULL; #endif /* CONFIG_IMA_KEXEC */ +#ifdef CONFIG_KEXEC_KHO + kvfree(image->kho.mem_cache.buffer); + image->kho.mem_cache = (struct kexec_buf) {}; + kvfree(image->kho.dt.buffer); + image->kho.dt = (struct kexec_buf) {}; +#endif + /* See if architecture has anything to cleanup post load */ arch_kimage_file_post_load_cleanup(image); @@ -253,6 +260,11 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, /* IMA needs to pass the measurement list to the next kernel. */ ima_add_kexec_buffer(image); + /* If KHO is active, add its images to the list */ + ret = kho_fill_kimage(image); + if (ret) + goto out; + /* Call image load handler */ ldata = kexec_image_load_default(image); @@ -526,6 +538,24 @@ static int locate_mem_hole_callback(struct resource *res, void *arg) return locate_mem_hole_bottom_up(start, end, kbuf); } +#ifdef CONFIG_KEXEC_KHO +static int kexec_walk_kho_scratch(struct kexec_buf *kbuf, + int (*func)(struct resource *, void *)) +{ + int ret = 0; + + struct resource res = { + .start = kho_scratch_phys, + .end = kho_scratch_phys + kho_scratch_len, + }; + + /* Try to fit the kimage into our KHO scratch region */ + ret = func(&res, kbuf); + + return ret; +} +#endif + #ifdef CONFIG_ARCH_KEEP_MEMBLOCK static int kexec_walk_memblock(struct kexec_buf *kbuf, int (*func)(struct resource *, void *)) @@ -622,6 +652,17 @@ int kexec_locate_mem_hole(struct kexec_buf *kbuf) if (kbuf->mem != KEXEC_BUF_MEM_UNKNOWN) return 0; +#ifdef CONFIG_KEXEC_KHO + /* + * If KHO is active, only use KHO scratch memory. All other memory + * could potentially be handed over. + */ + if (kho_is_active() && kbuf->image->type != KEXEC_TYPE_CRASH) { + ret = kexec_walk_kho_scratch(kbuf, locate_mem_hole_callback); + return ret == 1 ? 0 : -EADDRNOTAVAIL; + } +#endif + if (!IS_ENABLED(CONFIG_ARCH_KEEP_MEMBLOCK)) ret = kexec_walk_resources(kbuf, locate_mem_hole_callback); else diff --git a/kernel/kexec_kho_out.c b/kernel/kexec_kho_out.c index 765cf6ba7a46..2cf5755f5e4a 100644 --- a/kernel/kexec_kho_out.c +++ b/kernel/kexec_kho_out.c @@ -50,6 +50,216 @@ int unregister_kho_notifier(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(unregister_kho_notifier); +static int kho_mem_cache_add(void *fdt, struct kho_mem *mem_cache, int size, + struct kho_mem *new_mem) +{ + int entries = size / sizeof(*mem_cache); + u64 new_start = new_mem->addr; + u64 new_end = new_mem->addr + new_mem->len; + u64 prev_start = 0; + u64 prev_end = 0; + int i; + + if (WARN_ON((new_start < (kho_scratch_phys + kho_scratch_len)) && + (new_end > kho_scratch_phys))) { + pr_err("KHO memory runs over scratch memory"); + return -EINVAL; + } + + /* + * We walk the existing sorted mem cache and find the spot where this + * new entry would start, so we can insert it right there. + */ + for (i = 0; i < entries; i++) { + struct kho_mem *mem = &mem_cache[i]; + u64 mem_end = (mem->addr + mem->len); + + if (mem_end < new_start) { + /* No overlap */ + prev_start = mem->addr; + prev_end = mem->addr + mem->len; + continue; + } else if ((new_start >= mem->addr) && (new_end <= mem_end)) { + /* new_mem fits into mem, skip */ + return size; + } else if ((new_end >= mem->addr) && (new_start <= mem_end)) { + /* new_mem and mem overlap, fold them */ + bool remove = false; + + mem->addr = min(new_start, mem->addr); + mem->len = max(mem_end, new_end) - mem->addr; + mem_end = (mem->addr + mem->len); + + if (i > 0 && prev_end >= mem->addr) { + /* We now overlap with the previous mem, fold */ + struct kho_mem *prev = &mem_cache[i - 1]; + + prev->addr = min(prev->addr, mem->addr); + prev->len = max(mem_end, prev_end) - prev->addr; + remove = true; + } else if (i < (entries - 1) && mem_end >= mem_cache[i + 1].addr) { + /* We now overlap with the next mem, fold */ + struct kho_mem *next = &mem_cache[i + 1]; + u64 next_end = (next->addr + next->len); + + next->addr = min(next->addr, mem->addr); + next->len = max(mem_end, next_end) - next->addr; + remove = true; + } + + if (remove) { + /* We folded this mem into another, remove it */ + memmove(mem, mem + 1, (entries - i - 1) * sizeof(*mem)); + size -= sizeof(*new_mem); + } + + return size; + } else if (mem->addr > new_end) { + /* + * The mem cache is sorted. If we find the current + * entry start after our new_mem's end, we shot over + * which means we need to add it by creating a new + * hole right after the current entry. + */ + memmove(mem + 1, mem, (entries - i) * sizeof(*mem)); + break; + } + } + + mem_cache[i] = *new_mem; + size += sizeof(*new_mem); + + return size; +} + +/** + * kho_alloc_mem_cache - Allocate and initialize the mem cache kexec_buf + */ +static int kho_alloc_mem_cache(struct kimage *image, void *fdt) +{ + int offset, depth, initial_depth, len; + void *mem_cache; + int size; + + /* Count the elements inside all "mem" properties in the DT */ + size = offset = depth = initial_depth = 0; + for (offset = 0; + offset >= 0 && depth >= initial_depth; + offset = fdt_next_node(fdt, offset, &depth)) { + const struct kho_mem *mems; + + mems = fdt_getprop(fdt, offset, "mem", &len); + if (!mems || len & (sizeof(*mems) - 1)) + continue; + size += len; + } + + /* Allocate based on the max size we determined */ + mem_cache = kvmalloc(size, GFP_KERNEL); + if (!mem_cache) + return -ENOMEM; + + /* And populate the array */ + size = offset = depth = initial_depth = 0; + for (offset = 0; + offset >= 0 && depth >= initial_depth; + offset = fdt_next_node(fdt, offset, &depth)) { + const struct kho_mem *mems; + int nr_mems, i; + + mems = fdt_getprop(fdt, offset, "mem", &len); + if (!mems || len & (sizeof(*mems) - 1)) + continue; + + for (i = 0, nr_mems = len / sizeof(*mems); i < nr_mems; i++) { + const struct kho_mem *mem = &mems[i]; + ulong mstart = PAGE_ALIGN_DOWN(mem->addr); + ulong mend = PAGE_ALIGN(mem->addr + mem->len); + struct kho_mem cmem = { + .addr = mstart, + .len = (mend - mstart), + }; + + size = kho_mem_cache_add(fdt, mem_cache, size, &cmem); + if (size < 0) + return size; + } + } + + image->kho.mem_cache.buffer = mem_cache; + image->kho.mem_cache.bufsz = size; + image->kho.mem_cache.memsz = size; + + return 0; +} + +int kho_fill_kimage(struct kimage *image) +{ + int err = 0; + void *dt; + + mutex_lock(&kho.lock); + + if (!kho.active) + goto out; + + /* Initialize kexec_buf for mem_cache */ + image->kho.mem_cache = (struct kexec_buf) { + .image = image, + .buffer = NULL, + .bufsz = 0, + .mem = KEXEC_BUF_MEM_UNKNOWN, + .memsz = 0, + .buf_align = SZ_64K, /* Makes it easier to map */ + .buf_max = ULONG_MAX, + .top_down = true, + }; + + /* + * We need to make all allocations visible here via the mem_cache so that + * kho_is_destination_range() can identify overlapping regions and ensure + * that no kimage (including the DT one) lands on handed over memory. + * + * Since we conveniently already built an array of all allocations, let's + * pass that on to the target kernel so that reuse it to initialize its + * memory blocks. + */ + err = kho_alloc_mem_cache(image, kho.dt); + if (err) + goto out; + + err = kexec_add_buffer(&image->kho.mem_cache); + if (err) + goto out; + + /* + * Create a kexec copy of the DT here. We need this because lifetime may + * be different between kho.dt and the kimage + */ + dt = kvmemdup(kho.dt, kho.dt_len, GFP_KERNEL); + if (!dt) { + err = -ENOMEM; + goto out; + } + + /* Allocate target memory for kho dt */ + image->kho.dt = (struct kexec_buf) { + .image = image, + .buffer = dt, + .bufsz = kho.dt_len, + .mem = KEXEC_BUF_MEM_UNKNOWN, + .memsz = kho.dt_len, + .buf_align = SZ_64K, /* Makes it easier to map */ + .buf_max = ULONG_MAX, + .top_down = true, + }; + err = kexec_add_buffer(&image->kho.dt); + +out: + mutex_unlock(&kho.lock); + return err; +} + bool kho_is_active(void) { return kho.active;