From patchwork Wed Dec 13 00:04:40 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Graf X-Patchwork-Id: 13490139 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD993C4332F for ; Wed, 13 Dec 2023 00:05:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3BB9D6B0391; Tue, 12 Dec 2023 19:05:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 369B56B0393; Tue, 12 Dec 2023 19:05:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E3946B0395; Tue, 12 Dec 2023 19:05:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0864B6B0391 for ; Tue, 12 Dec 2023 19:05:46 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C8AD580B6B for ; Wed, 13 Dec 2023 00:05:45 +0000 (UTC) X-FDA: 81559851450.05.57474DD Received: from smtp-fw-6001.amazon.com (smtp-fw-6001.amazon.com [52.95.48.154]) by imf19.hostedemail.com (Postfix) with ESMTP id 9CDD91A0020 for ; Wed, 13 Dec 2023 00:05:43 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBGXCCSw; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf19.hostedemail.com: domain of "prvs=704f7accf=graf@amazon.de" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=704f7accf=graf@amazon.de" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702425943; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bvtk3HQrKsl8B+ppqatJMPW6FdHejxoz+znJi2t9qtA=; b=LxQQASUfG0XlYNkqeFgLCo9rr/V5t/eQcYw4DrJOXG8KF3HGYFs5oM4IaF/n3GBCffFnw3 Ddul/vRZpyIrWJgNR8jvbrmC8Ej3gB0/pv+agSuRKXp6w2Th+qx2975F6ZUP1KrobNyS8T zwpO+X3twpsfCZUtpHoxFpRdyRxzjwU= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=amazon.com header.s=amazon201209 header.b=rBGXCCSw; dmarc=pass (policy=quarantine) header.from=amazon.com; spf=pass (imf19.hostedemail.com: domain of "prvs=704f7accf=graf@amazon.de" designates 52.95.48.154 as permitted sender) smtp.mailfrom="prvs=704f7accf=graf@amazon.de" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702425943; a=rsa-sha256; cv=none; b=KSBvzXW5Bspu2XBYADhMxswuY7pB+i/+TGNaQOPPJO8fzxm6+cFfxyH1UpZ1o8BABTDIla 4ZK0bfF0QuCjHMKm8CtUsy80uA5/P5ELg9D4Qewan83pllUm90A1tdYqEq3/UMdp+oE9hP +gYmLqPpqFej5tcfd+BdmJTSk+4mrhk= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1702425943; x=1733961943; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=bvtk3HQrKsl8B+ppqatJMPW6FdHejxoz+znJi2t9qtA=; b=rBGXCCSwOe0Otch8qmyj/vsBDcZ8JlGzFewHnbwKFuYYF1uw4kkj2uEp vM9Vt0isF9XiE8pBgSgM1x29bHjmKYVFWkoGHQ7PdekNCYd5A2NV0S9sH bw3t9JIfx+ixBDyZf8kl9wG9TFA9aCQawNvg+OozhlGB378QDfxY+YV8z k=; X-IronPort-AV: E=Sophos;i="6.04,271,1695686400"; d="scan'208";a="375754434" Received: from iad12-co-svc-p1-lb1-vlan2.amazon.com (HELO email-inbound-relay-pdx-2b-m6i4x-a893d89c.us-west-2.amazon.com) ([10.43.8.2]) by smtp-border-fw-6001.iad6.amazon.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Dec 2023 00:05:39 +0000 Received: from smtpout.prod.us-west-2.prod.farcaster.email.amazon.dev (pdx2-ws-svc-p26-lb5-vlan2.pdx.amazon.com [10.39.38.66]) by email-inbound-relay-pdx-2b-m6i4x-a893d89c.us-west-2.amazon.com (Postfix) with ESMTPS id CDB7D40D47; Wed, 13 Dec 2023 00:05:32 +0000 (UTC) Received: from EX19MTAUWB002.ant.amazon.com [10.0.7.35:49387] by smtpin.naws.us-west-2.prod.farcaster.email.amazon.dev [10.0.11.243:2525] with esmtp (Farcaster) id 1de1abc8-aa8e-477e-a407-6ac5ab261a33; Wed, 13 Dec 2023 00:05:32 +0000 (UTC) X-Farcaster-Flow-ID: 1de1abc8-aa8e-477e-a407-6ac5ab261a33 Received: from EX19D020UWC004.ant.amazon.com (10.13.138.149) by EX19MTAUWB002.ant.amazon.com (10.250.64.231) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 13 Dec 2023 00:05:31 +0000 Received: from dev-dsk-graf-1a-5ce218e4.eu-west-1.amazon.com (10.253.83.51) by EX19D020UWC004.ant.amazon.com (10.13.138.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.1118.40; Wed, 13 Dec 2023 00:05:27 +0000 From: Alexander Graf To: CC: , , , , , , , Eric Biederman , "H. Peter Anvin" , Andy Lutomirski , Peter Zijlstra , "Rob Herring" , Steven Rostedt , "Andrew Morton" , Mark Rutland , "Tom Lendacky" , Ashish Kalra , James Gowans , Stanislav Kinsburskii , , , , Anthony Yznaga , Usama Arif , David Woodhouse , Benjamin Herrenschmidt Subject: [PATCH 03/15] kexec: Add Kexec HandOver (KHO) generation helpers Date: Wed, 13 Dec 2023 00:04:40 +0000 Message-ID: <20231213000452.88295-4-graf@amazon.com> X-Mailer: git-send-email 2.40.1 In-Reply-To: <20231213000452.88295-1-graf@amazon.com> References: <20231213000452.88295-1-graf@amazon.com> MIME-Version: 1.0 X-Originating-IP: [10.253.83.51] X-ClientProxiedBy: EX19D031UWC002.ant.amazon.com (10.13.139.212) To EX19D020UWC004.ant.amazon.com (10.13.138.149) X-Rspamd-Queue-Id: 9CDD91A0020 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 44tjoet8s6g3377gueou71hn3ewcj717 X-HE-Tag: 1702425943-305562 X-HE-Meta: U2FsdGVkX18sKdcVsWuP4GgdurCYUXmX58HeMOJG7yT8H8kv4je4MjDnZ7qnTSP7LSkZMm2uMa0boe2zZ29nA7jmFQqEFCMPwO5jWx5Wsv4KcEaow43bqReoF0zfJc/a0CTMUReHODSXGlE7xBxDOPbUWA9CVn/oLvzhgGra6VpemWx2p6FvQSF+xAdtCfKRTbICJYcuzHU1+eUvsJXYcOSnppXrz3YzTwUhtQSSgDnHDNUcsQdtrOWyDFSxdjnWrz0l2t/bUUACGfJxvKiw8lAVnmvesMHMPKvlncijEr7sEYD5ScwL7p/rwHAPwTljNSDOpFcFTGepK1T+4GgIQ1mx3z9WUeAz98EbVzFj6foJpRR9OcrZ+R+zNhRVVf2Q96LQYT2m0un5f2icvuEcqbNdfICzPCb8bO+NOaVm9eeOUH4GYAuMZvwFX+8cfO7JioSXE/V4V1+FvC2G0PmeFlucq3xBPAyFSAHHqU4LHkT3UwK5Rd9EjDoFynQbsBp9P2osSGBDjv1Qp/EbziewpDZ0xi0XR+fuYf6rhzAVluYCkNqcLkNZOKC76VC3cu8ksIyd+DnJ0CWoZYBmC0dWcyHVJ0dGjDFYVTOwot3KNy2TmawjiGM6m1HJuYRjzuYYjjSet/dLA9DFzpNzwgdUGBf17IB3pif9/6HZO6qsBk8Nguk8/X+LMrGuQ1dnya+R+Z1ZEIJtB0PpsFG8dqi+07l6ZvGzDEwRVJvQ0wv35LcmJfsDRiLpyfpuY0Eh8Gg+P15meDP/Cv8oIbKb+Xx9xtBmuWGw0R0gytnWpgz/FfTk1n1UffZljzDbe/MR6DMhQ0ItDz8w6hQhpA3dNyKQ4+pwVfCt9KcbugRA6k9+iZbgf42R4+xLEn88Kyi40vHN1tq9FI2a7jyucjvRSmjzn6cg0TSXj5JXr67yCxPeKB7CmQW379qq2qA/LKgoTE01ODJk3XNtsDW2+Bkx+Ug tJ6Cb5mr kK+ENm/cXvghim2E/3W3gNcT7dDfRdwOT6OmI8RTdJjqXQd0zttCkrXBB8fgpO/iQVKnilp3tPFLgl3edHmbhAU0HgzEa2NhfMhO/3N8CKrA1lI0k4ssl3q4Ng8en/TlsSQgJU7c1R61EwIlyG8Pmqfbcihoho6F1K65+8g+aZyoWeY1NLmTkzZ3yk2dhwjOyK7DGORjUhFH2/hTLlbWmvi700Wd0Y1m3MIdhFqLb4TTMh+dAKpHiDVJAwbVDC6KwG4vLUOvDM1HIsCZ2If9dtqtLzpc288jAK8UTL2sU8neh1rhq/R9Vtq/DcCn9qmwuju1D5P0Nc2iAqpehtV8JDWKa5QuDiCNHz4K5v5GQh3mCre/IByquwT36wP+HMn6I6qcsnhSWYV3KGjIU6T+lVGByIt42uth2PwHFTIiAa4ahZs1vvye9rUtihckHgQKbAR6iq98GzDZmlQcETwBQwk0OjnLupb4LLK42/249ntveC54nFGoHqgJtmITSomYtnDcpypVDJp/ktLN3dSH6rASoAn5CUWO/l7iFsg6whCIdvlfTq1YKNWqlr+AdLPbt7K4looSRQ2HRe5YvC8otRKJiAUvAE9Fgy8TbvoXiEdg4RNZbOnEN8xMCJpBsQShTB7CKNWh9DwvlBrnC3CZtK2Rq5K+IBKKaV1IisFiTpPkukGlVUOgL7RWAA3ZRtPAIQwZdolN+piJ0qjDzOxWMfT4EhMcqcL5l2SEMwvoekxVyVZMdtIWghRH1SanqBGWRkNLELO4N1MGEusP21viLjD4/aHkKH/A4/HzK8bbAA5micuLpL3jDf6y6Bg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch adds the core infrastructure to generate Kexec HandOver metadata. Kexec HandOver is a mechanism that allows Linux to preserve state - arbitrary properties as well as memory locations - across kexec. It does so using 3 concepts: 1) Device Tree - Every KHO kexec carries a KHO specific flattened device tree blob that describes the state of the system. Device drivers can register to KHO to serialize their state before kexec. 2) Mem cache - A memblocks like structure that contains full page ranges of reservations. These can not be part of the architectural reservations, because they differ on every kexec. 3) Scratch Region - A CMA region that we allocate in the first kernel. CMA gives us the guarantee that no handover pages land in that region, because handover pages must be at a static physical memory location. We use this region as the place to load future kexec images into which then won't collide with any handover data. Signed-off-by: Alexander Graf --- Documentation/ABI/testing/sysfs-kernel-kho | 53 +++ .../admin-guide/kernel-parameters.txt | 10 + MAINTAINERS | 1 + include/linux/kexec.h | 24 ++ include/uapi/linux/kexec.h | 6 + kernel/Makefile | 1 + kernel/kexec_kho_out.c | 316 ++++++++++++++++++ 7 files changed, 411 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kho create mode 100644 kernel/kexec_kho_out.c diff --git a/Documentation/ABI/testing/sysfs-kernel-kho b/Documentation/ABI/testing/sysfs-kernel-kho new file mode 100644 index 000000000000..f69e7b81a337 --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kho @@ -0,0 +1,53 @@ +What: /sys/kernel/kho/active +Date: December 2023 +Contact: Alexander Graf +Description: + Kexec HandOver (KHO) allows Linux to transition the state of + compatible drivers into the next kexec'ed kernel. To do so, + device drivers will serialize their current state into a DT. + While the state is serialized, they are unable to perform + any modifications to state that was serialized, such as + handed over memory allocations. + + When this file contains "1", the system is in the transition + state. When contains "0", it is not. To switch between the + two states, echo the respective number into this file. + +What: /sys/kernel/kho/dt_max +Date: December 2023 +Contact: Alexander Graf +Description: + KHO needs to allocate a buffer for the DT that gets + generated before it knows the final size. By default, it + will allocate 10 MiB for it. You can write to this file + to modify the size of that allocation. + +What: /sys/kernel/kho/scratch_len +Date: December 2023 +Contact: Alexander Graf +Description: + To support continuous KHO kexecs, we need to reserve a + physically contiguous memory region that will always stay + available for future kexec allocations. This file describes + the length of that memory region. Kexec user space tooling + can use this to determine where it should place its payload + images. + +What: /sys/kernel/kho/scratch_phys +Date: December 2023 +Contact: Alexander Graf +Description: + To support continuous KHO kexecs, we need to reserve a + physically contiguous memory region that will always stay + available for future kexec allocations. This file describes + the physical location of that memory region. Kexec user space + tooling can use this to determine where it should place its + payload images. + +What: /sys/kernel/kho/dt +Date: December 2023 +Contact: Alexander Graf +Description: + When KHO is active, the kernel exposes the generated DT that + carries its current KHO state in this file. Kexec user space + tooling can use this as input file for the KHO payload image. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 51575cd31741..efeef075617e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2504,6 +2504,16 @@ kgdbwait [KGDB] Stop kernel execution and enter the kernel debugger at the earliest opportunity. + kho_scratch=n[KMG] [KEXEC] Sets the size of the KHO scratch + region. The KHO scratch region is a physically + memory range that can only be used for non-kernel + allocations. That way, even when memory is heavily + fragmented with handed over memory, kexec will always + be able to find contiguous memory to place the next + kernel for kexec into. + + The default is 0. + kmac= [MIPS] Korina ethernet MAC address. Configure the RouterBoard 532 series on-chip Ethernet adapter MAC address. diff --git a/MAINTAINERS b/MAINTAINERS index 788be9ab5b73..4ebf7c5fd424 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -11769,6 +11769,7 @@ M: Eric Biederman L: kexec@lists.infradead.org S: Maintained W: http://kernel.org/pub/linux/utils/kernel/kexec/ +F: Documentation/ABI/testing/sysfs-kernel-kho F: include/linux/kexec.h F: include/uapi/linux/kexec.h F: kernel/kexec* diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 8227455192b7..db2597e5550d 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -21,6 +21,8 @@ #include #include +#include +#include extern note_buf_t __percpu *crash_notes; @@ -516,6 +518,28 @@ void set_kexec_sig_enforced(void); static inline void set_kexec_sig_enforced(void) {} #endif +#ifdef CONFIG_KEXEC_KHO +/* Notifier index */ +enum kho_event { + KEXEC_KHO_DUMP = 0, + KEXEC_KHO_ABORT = 1, +}; + +extern phys_addr_t kho_scratch_phys; +extern phys_addr_t kho_scratch_len; + +/* egest handover metadata */ +void kho_reserve(void); +int register_kho_notifier(struct notifier_block *nb); +int unregister_kho_notifier(struct notifier_block *nb); +bool kho_is_active(void); +#else +static inline void kho_reserve(void) { } +static inline int register_kho_notifier(struct notifier_block *nb) { return -EINVAL; } +static inline int unregister_kho_notifier(struct notifier_block *nb) { return -EINVAL; } +static inline bool kho_is_active(void) { return false; } +#endif + #endif /* !defined(__ASSEBMLY__) */ #endif /* LINUX_KEXEC_H */ diff --git a/include/uapi/linux/kexec.h b/include/uapi/linux/kexec.h index 01766dd839b0..d02ffd5960d6 100644 --- a/include/uapi/linux/kexec.h +++ b/include/uapi/linux/kexec.h @@ -49,6 +49,12 @@ /* The artificial cap on the number of segments passed to kexec_load. */ #define KEXEC_SEGMENT_MAX 16 +/* KHO passes an array of kho_mem as "mem cache" to the new kernel */ +struct kho_mem { + __u64 addr; + __u64 len; +}; + #ifndef __KERNEL__ /* * This structure is used to hold the arguments that are used when diff --git a/kernel/Makefile b/kernel/Makefile index 3947122d618b..a6bd31e22c09 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -73,6 +73,7 @@ obj-$(CONFIG_KEXEC_CORE) += kexec_core.o obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_KEXEC_FILE) += kexec_file.o obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o +obj-$(CONFIG_KEXEC_KHO) += kexec_kho_out.o obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_CGROUPS) += cgroup/ diff --git a/kernel/kexec_kho_out.c b/kernel/kexec_kho_out.c new file mode 100644 index 000000000000..e6184bde5c10 --- /dev/null +++ b/kernel/kexec_kho_out.c @@ -0,0 +1,316 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * kexec_kho_out.c - kexec handover code to egest metadata. + * Copyright (C) 2023 Alexander Graf + */ + +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt + +#include +#include +#include +#include +#include + +struct kho_out { + struct kobject *kobj; + bool active; + struct cma *cma; + struct blocking_notifier_head chain_head; + void *dt; + u64 dt_len; + u64 dt_max; + struct mutex lock; +}; + +static struct kho_out kho = { + .dt_max = (1024 * 1024 * 10), + .chain_head = BLOCKING_NOTIFIER_INIT(kho.chain_head), + .lock = __MUTEX_INITIALIZER(kho.lock), +}; + +/* + * Size for scratch (non-KHO) memory. With KHO enabled, memory can become + * fragmented because KHO regions may be anywhere in physical address + * space. The scratch region gives us a safe zone that we will never see + * KHO allocations from. This is where we can later safely load our new kexec + * images into. + */ +static phys_addr_t kho_scratch_size __initdata; + +int register_kho_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_register(&kho.chain_head, nb); +} +EXPORT_SYMBOL_GPL(register_kho_notifier); + +int unregister_kho_notifier(struct notifier_block *nb) +{ + return blocking_notifier_chain_unregister(&kho.chain_head, nb); +} +EXPORT_SYMBOL_GPL(unregister_kho_notifier); + +bool kho_is_active(void) +{ + return kho.active; +} +EXPORT_SYMBOL_GPL(kho_is_active); + +static ssize_t raw_read(struct file *file, struct kobject *kobj, + struct bin_attribute *attr, char *buf, + loff_t pos, size_t count) +{ + mutex_lock(&kho.lock); + memcpy(buf, attr->private + pos, count); + mutex_unlock(&kho.lock); + + return count; +} + +static BIN_ATTR(dt, 0400, raw_read, NULL, 0); + +static int kho_expose_dt(void *fdt) +{ + long fdt_len = fdt_totalsize(fdt); + int err; + + kho.dt = fdt; + kho.dt_len = fdt_len; + + bin_attr_dt.size = fdt_totalsize(fdt); + bin_attr_dt.private = fdt; + err = sysfs_create_bin_file(kho.kobj, &bin_attr_dt); + + return err; +} + +static void kho_abort(void) +{ + if (!kho.active) + return; + + sysfs_remove_bin_file(kho.kobj, &bin_attr_dt); + + kvfree(kho.dt); + kho.dt = NULL; + kho.dt_len = 0; + + blocking_notifier_call_chain(&kho.chain_head, KEXEC_KHO_ABORT, NULL); + + kho.active = false; +} + +static int kho_serialize(void) +{ + void *fdt = NULL; + int err; + + kho.active = true; + err = -ENOMEM; + + fdt = kvmalloc(kho.dt_max, GFP_KERNEL); + if (!fdt) + goto out; + + if (fdt_create(fdt, kho.dt_max)) { + err = -EINVAL; + goto out; + } + + err = fdt_finish_reservemap(fdt); + if (err) + goto out; + + err = fdt_begin_node(fdt, ""); + if (err) + goto out; + + err = fdt_property_string(fdt, "compatible", "kho-v1"); + if (err) + goto out; + + /* Loop through all kho dump functions */ + err = blocking_notifier_call_chain(&kho.chain_head, KEXEC_KHO_DUMP, fdt); + err = notifier_to_errno(err); + if (err) + goto out; + + /* Close / */ + err = fdt_end_node(fdt); + if (err) + goto out; + + err = fdt_finish(fdt); + if (err) + goto out; + + if (WARN_ON(fdt_check_header(fdt))) { + err = -EINVAL; + goto out; + } + + err = kho_expose_dt(fdt); + +out: + if (err) { + pr_err("kho failed to serialize state: %d", err); + kho_abort(); + } + return err; +} + +/* Handling for /sys/kernel/kho */ + +#define KHO_ATTR_RO(_name) static struct kobj_attribute _name##_attr = __ATTR_RO_MODE(_name, 0400) +#define KHO_ATTR_RW(_name) static struct kobj_attribute _name##_attr = __ATTR_RW_MODE(_name, 0600) + +static ssize_t active_store(struct kobject *dev, struct kobj_attribute *attr, + const char *buf, size_t size) +{ + ssize_t retsize = size; + bool val = false; + int ret; + + if (kstrtobool(buf, &val) < 0) + return -EINVAL; + + if (!kho_scratch_len) + return -ENOMEM; + + mutex_lock(&kho.lock); + if (val != kho.active) { + if (val) { + ret = kho_serialize(); + if (ret) { + retsize = -EINVAL; + goto out; + } + } else { + kho_abort(); + } + } + +out: + mutex_unlock(&kho.lock); + return retsize; +} + +static ssize_t active_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + ssize_t ret; + + mutex_lock(&kho.lock); + ret = sysfs_emit(buf, "%d\n", kho.active); + mutex_unlock(&kho.lock); + + return ret; +} +KHO_ATTR_RW(active); + +static ssize_t dt_max_store(struct kobject *dev, struct kobj_attribute *attr, + const char *buf, size_t size) +{ + u64 val; + + if (kstrtoull(buf, 0, &val)) + return -EINVAL; + + kho.dt_max = val; + + return size; +} + +static ssize_t dt_max_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho.dt_max); +} +KHO_ATTR_RW(dt_max); + +static ssize_t scratch_len_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho_scratch_len); +} +KHO_ATTR_RO(scratch_len); + +static ssize_t scratch_phys_show(struct kobject *dev, struct kobj_attribute *attr, + char *buf) +{ + return sysfs_emit(buf, "0x%llx\n", kho_scratch_phys); +} +KHO_ATTR_RO(scratch_phys); + +static __init int kho_out_init(void) +{ + int ret = 0; + + kho.kobj = kobject_create_and_add("kho", kernel_kobj); + if (!kho.kobj) { + ret = -ENOMEM; + goto err; + } + + ret = sysfs_create_file(kho.kobj, &active_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &dt_max_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &scratch_phys_attr.attr); + if (ret) + goto err; + + ret = sysfs_create_file(kho.kobj, &scratch_len_attr.attr); + if (ret) + goto err; + +err: + return ret; +} +late_initcall(kho_out_init); + +static int __init early_kho_scratch(char *p) +{ + kho_scratch_size = memparse(p, &p); + return 0; +} +early_param("kho_scratch", early_kho_scratch); + +/** + * kho_reserve - Reserve a contiguous chunk of memory for kexec + * + * With KHO we can preserve arbitrary pages in the system. To ensure we still + * have a large contiguous region of memory when we search the physical address + * space for target memory, let's make sure we always have a large CMA region + * active. This CMA region will only be used for movable pages which are not a + * problem for us during KHO because we can just move them somewhere else. + */ +__init void kho_reserve(void) +{ + int r; + + if (kho_get_fdt()) { + /* + * We came from a previous KHO handover, so we already have + * a known good scratch region that we preserve. No need to + * allocate another. + */ + return; + } + + /* Only allocate KHO scratch memory when we're asked to */ + if (!kho_scratch_size) + return; + + r = cma_declare_contiguous_nid(0, kho_scratch_size, 0, PAGE_SIZE, 0, + false, "kho", &kho.cma, NUMA_NO_NODE); + if (WARN_ON(r)) + return; + + kho_scratch_phys = cma_get_base(kho.cma); + kho_scratch_len = cma_get_size(kho.cma); +}