From patchwork Mon Mar 10 12:03:12 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009704 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 01A5EC28B30 for ; Mon, 10 Mar 2025 12:04:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 38469280004; Mon, 10 Mar 2025 08:04:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 27102280001; Mon, 10 Mar 2025 08:04:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 07547280004; Mon, 10 Mar 2025 08:04:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id D9339280001 for ; Mon, 10 Mar 2025 08:04:05 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 9D588C0E90 for ; Mon, 10 Mar 2025 12:04:07 +0000 (UTC) X-FDA: 83205508134.23.299D08C Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf20.hostedemail.com (Postfix) with ESMTP id A64031C0028 for ; Mon, 10 Mar 2025 12:04:05 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=YmAdzPZ6; spf=pass (imf20.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1b7amWEdfOMn3dBa5y8kCkl4gC/rL4o8XC5y5Q7zNnY=; b=3eHL/J5T0VO96RQk2xAIOpT6nwR6sY46gwdMNOzKnpZDvR1bDrl5nmuidLSVCoPQFOBJpO zOvBNMb3RMrl8Bv12qxxk9FJZgQzqeTTcFBS6MUgYpujcnn5EjPdw5Igl7EDXYDBBYaDCr WPDjbu4onx8hJF1fJSsrsU+g31uyKf0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608246; a=rsa-sha256; cv=none; b=U3iuatDzMAf4C1DhZdnC5wGkpbqR8nR4LBJnRsYTvYNCG+FMc+QZyWN2LWcYiMQ4hJyjEF 48Vu2GJDU4MkI/pS2Au5buzwS9/u6d62TwyCUhXshhlxZGRFT13gBIdv/Z0iVq7nx2R6YE GQr+0o+q7p+lb9nPEXMnG8MZOIXLbJ8= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=YmAdzPZ6; spf=pass (imf20.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 03EC360B5B; Mon, 10 Mar 2025 15:04:04 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-wpbfwx9m; Mon, 10 Mar 2025 15:04:02 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608243; bh=1b7amWEdfOMn3dBa5y8kCkl4gC/rL4o8XC5y5Q7zNnY=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=YmAdzPZ6RtAzCKFkm/3kw3wrdqegk5mcA5h8cUxsfgsVjemo8cq/ZFPdS717vlMeI eosrMAMlEZ1IzlszEE9Z4z08VMWAer6yN0gi/IBi+yJJ9S/5qQGFDAuRRnUR03yAJO fX9DMIwj0e2Fd0FPp8DyK3kUZU8pVYI4/J6Kb2vo= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 1/7] kstate: Add kstate - a mechanism to describe and migrate kernel state across kexec Date: Mon, 10 Mar 2025 13:03:12 +0100 Message-ID: <20250310120318.2124-2-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Server: rspam12 X-Rspam-User: X-Rspamd-Queue-Id: A64031C0028 X-Stat-Signature: h8nr3r6behnfbs5c41e8bno8frrm87bo X-HE-Tag: 1741608245-344117 X-HE-Meta: U2FsdGVkX1+WTgg+wB0npYo/5tJGjA+UIJ5zWqs2oB302BzwgoxPX/joV/A7+W8jv35c3/mvTZzMFJGZbyXQ+W6D8s9bp2OCvZ8Y06EYHafuL/XdI4SyHg8v2PZYyzjyMiQkZCG2Qtn6Z75TSUAzcEOdomtYI32BUJmAcd02AxxS4xmZdOnz3OgVhVogqoX+oYN0+z1a3WOL8PsM1lQ8mzpkmZO+KzEGqGdBBlRpGO+JvjfqvRBNz84jg84T1hUCwCGiTD7gwLNXN0oAf/12vlOq/CaXVPlYFtb9Bz3AwWvZfx1ROMH7f+xmg+a+LeDSC+XzbUMMPe8ICvnTgRrCpIQgVj/iKOW2ngsz9PRZZGGp3hCCg8SMXgjKm5R39PbdhGJwY2UiZ9XsXMY+EspRd2VEy2Cmejg8ln7RXxWrO2IGfy2ZuRqafiO2ifGdfcVN63DIjYMh9pHMrkFp1yMSIxWMQAwYDQ8tzrc7iQdTDLWdalzQo5mLY6jKGSbcaxD6KY0dczLvn7fbuZeURXF2TYVRFYL+LSrUackPMfplYGXCg4QnfP7wSidYBBn0lph3Q+lj9dDOgMdzmemDDDeKK4MlKkZI03lpXwjuCHhsFFPe+kuL+16FBHhvYUHoqJu8AzjSvqVRLwzq4EsLbtF4GOt44UObYHbGAXJAhHyJceWuLUwxq1K7CAn4L604dlwNDt0gO+E3BrAMdKXBHlOi/rQMjBgBrWIbkeGcdX4kmcPKVRicKD8hThvwOHO1KzqcEXekkBOZpUepV3qMY92q9JkCZFxJn13PncPC2BR38k5sBk0KqQGeVFIKQ3W7mwQJIUzAssV/QkpnlVDGdNEhX793TswdYgo7E+jB3gdoCJbxNSe9lKGl3mx/1brhYUPxnC10wE5baZf8B4gy6F1wS6b1n0uI1cQaYmiL3QQDxEQew8OzmR0owxqQ0JCg/FAmV87Nqr0SbpNMVTHxypP FNeuo3QF tXqvjwA9dCMdcXPvc+firNJuCvy9dIpAFfQ6E3czfSzwqM0cy5dInchR42BLgiPjPp6F+O+nJSNzJEZFWw3DJa28ZiFyu2ZHdJXUQFJoClHFEDYzIwJz2S3iuH3jrammSmdE9dNfMM1NkYbVcwZISS9fYvH3aXB36pyQBfBNHe5RxlY1tznvPv0tED0u4gRipo7HmN/RqxiO6QmHFJcqZOH7w4WBycKYnaInIU1oSvQcxgvRPiIhk69jaynHMbszZw3DyVTtnbPWeHSbYjN6aOCnM0rKxMj5PGg9PdbLEFgmWNSVp/BR89Y+35Eu664wPHOPpbLu2zpSqMOQoWRdycUoxCxwMOZNxS1ilKLxIGmbLVqQXzo+JmTdK4oAp3bDyyRxbqlwicuqCmxCcqQLE273Xx7sQwYlUZsuT/US3w4VdPNKEqad0P34HgghfYhzli2Xm4vdxOgXjYsfQ2D92JgsLIg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: KSTATE (kernel state) is a mechanism to describe internal kernel state save it into the memory and restore the state after kexec in new kernel. The end goal here and the main use case for this is to be able to update host kernel under VMs with VFIO pass-through devices running on that host. The idea behind KSTATE resembles QEMU's migration framework [1], which solves quite similar problem - migrate state of VM/emulated devices across different versions of QEMU. This and following patches try to establish some basic infrastructure to describe and migrate in-kernel data structures. State of kernel data (usually it's some struct) is described by the 'struct kstate_description' containing the array of individual fields descpriptions - 'struct kstate_field'. Each field has set of bits in ->flags which instructs how to save/restore a certain field of the struct. E.g. (see kstate.h for the full list): KS_BASE_TYPE flag tells that field can be just copied by value, KS_POINTER means that the struct member is a pointer to the actual data, so it needs to be dereference before saving/restoring data to/from kstate data steam. kstate_register() call accepts kstate_description along with an instance of an object and registers it in the global 'states' list. During kexec reboot phase we go through the list of 'kstate_description's and each instance of kstate_description forms the 'struct kstate_entry' which save into the kstate's data stream. The 'kstate_entry' contains information like ID of kstate_description, version of it, size of migration data and the data itself. The ->data is formed in accordance to the kstate_field's of the corresponding kstate_description. After the reboot, when the kstate_register() called it parses migration stream, finds the appropriate 'kstate_entry' and restores the contents of the object in accordance with kstate_description and ->fields. [1] https://www.qemu.org/docs/master/devel/migration/main.html#vmstate Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 178 ++++++++++++++++++++++++++ kernel/Kconfig.kexec | 13 ++ kernel/Makefile | 1 + kernel/kstate.c | 282 +++++++++++++++++++++++++++++++++++++++++ 4 files changed, 474 insertions(+) create mode 100644 include/linux/kstate.h create mode 100644 kernel/kstate.c diff --git a/include/linux/kstate.h b/include/linux/kstate.h new file mode 100644 index 000000000000..4fc01e535bc0 --- /dev/null +++ b/include/linux/kstate.h @@ -0,0 +1,178 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef _KSTATE_H +#define _KSTATE_H + +#include +#include +#include +#include + +struct kstate_description; +struct kstate_stream; +struct kimage; + +enum kstate_flags { + + /* + * The struct member at 'obj + kstate_field.offset' is some basic + * type, just copy it by value. The size is kstate_field->size. + */ + + KS_BASE_TYPE = (1 << 0), + + /* + * The struct member at 'obj + kstate_field.offset' is a pointer + * to the actual data (e.g. struct a { int *b; }). + * save_kstate() will dereference the pointer to get the actual data + * and store it to the stream. restore_kstate() will copy the data from + * the stream to wherever the pointer points to. + */ + KS_POINTER = (1 << 1), + + /* + * The struct member at 'obj + kstate_field.offset' is another struct. + * kstate_field->ksd points to 'kstate_description' of that struct. + */ + KS_STRUCT = (1 << 2), + + /* + * Some non-trivial field that requires custom kstate_field->save() + * ->restore() callbacks to save/restore data. + */ + KS_CUSTOM = (1 << 3), + + /* + * The field is a array of kstate_field->count() pointers + * (e.g. struct a { uint8_t *b[]; }). Dereference each array entry + * before store/restore data. + */ + KS_ARRAY_OF_POINTER = (1 << 4), + + /* + * The field is a pointer to vmemmap or linear memory (determined by + * kstate_field->addr_type). This is used for pointers to persistent + * pages/data. Store offset from the start of the area instead of + * pointer itself, so we could defeat KASLR on restore phase (by adding + * new kernel's corresponding offset). + */ + KS_ADDRESS = (1 << 5), + + /* Marks the end of fields list */ + KS_END = (1UL << 31), +}; + +enum kstate_addr_type { + KS_VMEMMAP_ADDR, + KS_LINEAR_ADDR, +}; + +struct kstate_stream { + void *start; + void *pos; + size_t size; +}; + +struct kstate_field { + const char *name; + size_t offset; + size_t size; + enum kstate_flags flags; + const struct kstate_description *ksd; + enum kstate_addr_type addr_type; + int version_id; + int (*restore)(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); + int (*save)(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); + int (*count)(void); +}; + +enum kstate_ids { + KSTATE_LAST_ID = -1, +}; + +struct kstate_description { + const char *name; + enum kstate_ids id; + atomic_t instance_id; + int version_id; + struct list_head state_list; + + const struct kstate_field *fields; +}; + +struct state_entry { + u64 id; + struct list_head list; + struct kstate_description *kstd; + void *obj; +}; + +extern int kstate_save_data(struct kstate_stream *stream, void *val, size_t size); + +static inline bool kstate_get_byte(struct kstate_stream *stream) +{ + bool ret = *(u8 *)stream->pos; + stream->pos++; + return ret; +} + +static inline unsigned long kstate_get_ulong(struct kstate_stream *stream) +{ + unsigned long ret = *(unsigned long *)stream->pos; + stream->pos += sizeof(unsigned long); + return ret; +} + +#ifdef CONFIG_KSTATE + +int kstate_save_state(void); +void free_kstate_stream(void); + +int kstate_register(struct kstate_description *state, void *obj); + +struct kstate_entry; +int save_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, + void *obj); +void restore_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, void *obj); + +#else + +#define kstate_register(state, obj) + +static inline int kstate_save_state(void) { return 0; } +static inline void free_kstate_stream(void) { } + +#endif + + +#define KSTATE_BASE_TYPE(_f, _state, _type) { \ + .name = (__stringify(_f)), \ + .size = sizeof(_type) + BUILD_BUG_ON_ZERO( \ + !__same_type(typeof_member(_state, _f), _type)),\ + .flags = KS_BASE_TYPE, \ + .offset = offsetof(_state, _f), \ +} + +#define KSTATE_POINTER(_f, _state) { \ + .name = (__stringify(_f)), \ + .size = sizeof(*(((_state *)0)->_f)), \ + .flags = KS_POINTER, \ + .offset = offsetof(_state, _f), \ + } + +#define KSTATE_ADDRESS(_f, _state, _addr_type) { \ + .name = (__stringify(_f)), \ + .size = sizeof(*(((_state *)0)->_f)), \ + .addr_type = (_addr_type), \ + .flags = KS_ADDRESS, \ + .offset = offsetof(_state, _f), \ + } + +#define KSTATE_END_OF_LIST() { \ + .flags = KS_END,\ + } + +#endif diff --git a/kernel/Kconfig.kexec b/kernel/Kconfig.kexec index 4d111f871951..480dc156b08b 100644 --- a/kernel/Kconfig.kexec +++ b/kernel/Kconfig.kexec @@ -151,4 +151,17 @@ config CRASH_MAX_MEMORY_RANGES the computation behind the value provided through the /sys/kernel/crash_elfcorehdr_size attribute. +config ARCH_HAS_KSTATE + bool + +config KSTATE + bool "Migrate internal kernel state across kexec" + default n + depends on ARCH_HAS_KSTATE + depends on KEXEC_FILE + help + KSTATE (kernel state) is a mechanism to describe internal kernel + state, save it into the memory and restore the state after kexec + in new kernel. + endmenu diff --git a/kernel/Makefile b/kernel/Makefile index 87866b037fbe..6bdf947fc84f 100644 --- a/kernel/Makefile +++ b/kernel/Makefile @@ -75,6 +75,7 @@ obj-$(CONFIG_CRASH_DUMP) += crash_core.o obj-$(CONFIG_KEXEC) += kexec.o obj-$(CONFIG_KEXEC_FILE) += kexec_file.o obj-$(CONFIG_KEXEC_ELF) += kexec_elf.o +obj-$(CONFIG_KSTATE) += kstate.o obj-$(CONFIG_BACKTRACE_SELF_TEST) += backtracetest.o obj-$(CONFIG_COMPAT) += compat.o obj-$(CONFIG_CGROUPS) += cgroup/ diff --git a/kernel/kstate.c b/kernel/kstate.c new file mode 100644 index 000000000000..a73a9a42e55b --- /dev/null +++ b/kernel/kstate.c @@ -0,0 +1,282 @@ +// SPDX-License-Identifier: GPL-2.0-only +#include +#include +#include +#include +#include +#include + +static LIST_HEAD(states); + +struct kstate_entry { + int state_id; + int version_id; + int instance_id; + int size; + DECLARE_FLEX_ARRAY(u8, data); +}; + +struct kstate_stream kstate_stream; + +static unsigned long get_addr_offset(const struct kstate_field *field) +{ + switch (field->addr_type) { + case KS_VMEMMAP_ADDR: + return VMEMMAP_START; + case KS_LINEAR_ADDR: + return PAGE_OFFSET; + default: + WARN_ON(1); + } + return 0; +} + +static int alloc_space(struct kstate_stream *stream, size_t size) +{ + void *new_start; + size_t new_size; + size_t cur_size = stream->pos - stream->start; + + size = size + 4; /* Always alloc extra for KSTATE_LAST_ID */ + if (cur_size + size < stream->size) + return 0; + + new_size = PAGE_ALIGN(cur_size + size); + + new_start = vrealloc(stream->start, new_size, GFP_KERNEL); + if (!new_start) + return -ENOMEM; + + stream->start = new_start; + stream->size = new_size; + stream->pos = stream->start + cur_size; + return 0; +} + +int kstate_save_data(struct kstate_stream *stream, void *val, size_t size) +{ + int ret; + + ret = alloc_space(stream, size); + if (ret) + return ret; + memcpy(stream->pos, val, size); + stream->pos += size; + return 0; +} + +int save_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, + void *obj) +{ + const struct kstate_field *field = kstate->fields; + struct kstate_entry *ke; + unsigned long ke_off; + int ret = 0; + + ret = alloc_space(stream, sizeof(*ke)); + if (ret) + goto err; + + ke_off = stream->pos - stream->start; + ke = stream->pos; + stream->pos += sizeof(*ke); + + ke->state_id = kstate->id; + ke->version_id = kstate->version_id; + ke->instance_id = id; + + while (field->flags != KS_END) { + void *first, *cur; + int n_elems = 1; + int size, i; + + first = obj + field->offset; + + if (field->flags & KS_POINTER) + first = *(void **)(obj + field->offset); + if (field->count) + n_elems = field->count(); + size = field->size; + for (i = 0; i < n_elems; i++) { + cur = first + i * size; + + if (field->flags & KS_ARRAY_OF_POINTER) + cur = *(void **)cur; + + if (field->flags & KS_STRUCT) { + ret = save_kstate(stream, 0, field->ksd, cur); + if (ret) + goto err; + } else if (field->flags & KS_CUSTOM) { + if (field->save) { + ret = field->save(stream, cur, field); + if (ret) + goto err; + } + } else if (field->flags & (KS_BASE_TYPE|KS_POINTER)) { + ret = kstate_save_data(stream, cur, size); + if (ret) + goto err; + } else if (field->flags & KS_ADDRESS) { + void *addr_offset = *(void **)cur + - get_addr_offset(field); + ret = kstate_save_data(stream, &addr_offset, + sizeof(addr_offset)); + if (ret) + goto err; + } else + WARN_ON_ONCE(1); + } + field++; + + } + + ke = stream->start + ke_off; + ke->size = (stream->pos - stream->start) - (ke_off + sizeof(*ke)); +err: + if (ret) + pr_err("kstate: save of state %s failed\n", kstate->name); + + return ret; +} + +static int alloc_kstate_stream(void) +{ + size_t size = PAGE_SIZE; + void *buf; + + buf = vzalloc(size); + if (!buf) + return -ENOMEM; + + kstate_stream.size = size; + kstate_stream.start = kstate_stream.pos = buf; + return 0; +} + +void free_kstate_stream(void) +{ + vfree(kstate_stream.start); + kstate_stream.start = NULL; + kstate_stream.size = 0; +} + +int kstate_save_state(void) +{ + struct state_entry *se; + struct kstate_entry *ke; + int ret; + + ret = alloc_kstate_stream(); + if (ret) + return ret; + + list_for_each_entry(se, &states, list) { + ret = save_kstate(&kstate_stream, se->id, se->kstd, se->obj); + if (ret) + return ret; + } + ke = kstate_stream.pos; + ke->state_id = KSTATE_LAST_ID; + return 0; +} + +void restore_kstate(struct kstate_stream *stream, int id, + const struct kstate_description *kstate, void *obj) +{ + const struct kstate_field *field = kstate->fields; + struct kstate_entry *ke = stream->pos; + stream->pos = ke->data; + + WARN_ONCE(ke->version_id != kstate->version_id, "version mismatch %d %d\n", + ke->version_id, kstate->version_id); + + WARN_ONCE(ke->instance_id != id, "instance id mismatch %d %d\n", + ke->instance_id, id); + + while (field->flags != KS_END) { + void *first, *cur; + int n_elems = 1; + int size, i; + + first = obj + field->offset; + if (field->flags & KS_POINTER) + first = *(void **)(obj + field->offset); + if (field->count) + n_elems = field->count(); + size = field->size; + for (i = 0; i < n_elems; i++) { + cur = first + i * size; + + if (field->flags & KS_ARRAY_OF_POINTER) + cur = *(void **)cur; + + if (field->flags & KS_STRUCT) + restore_kstate(stream, 0, field->ksd, cur); + else if (field->flags & KS_CUSTOM) { + if (field->restore) + field->restore(stream, cur, field); + } else if (field->flags & (KS_BASE_TYPE | KS_POINTER)) { + memcpy(cur, stream->pos, size); + stream->pos += size; + } else if (field->flags & KS_ADDRESS) { + *(void **)cur = (*(void **)stream->pos) + + get_addr_offset(field); + stream->pos += sizeof(void *); + } else + WARN_ON_ONCE(1); + + } + field++; + } +} + +static void restore_migrate_state(unsigned long kstate_data, + struct state_entry *se) +{ + struct kstate_stream stream; + struct kstate_entry *ke; + + if (kstate_data == -1) + return; + + ke = (struct kstate_entry *)phys_to_virt(kstate_data); + if (WARN_ON_ONCE(ke->state_id == 0)) + return; + + stream.start = stream.pos = ke; + while (ke->state_id != KSTATE_LAST_ID) { + if (ke->state_id != se->kstd->id || + ke->instance_id != se->id) { + ke = (struct kstate_entry *)(ke->data + ke->size); + continue; + } + stream.pos = ke; + restore_kstate(&stream, se->id, se->kstd, se->obj); + ke = (struct kstate_entry *)(ke->data + ke->size); + } +} + +static void __kstate_register(struct kstate_description *state, void *obj, + struct state_entry *se) +{ + se->kstd = state; + se->id = atomic_inc_return(&state->instance_id); + se->obj = obj; + list_add(&se->list, &states); + restore_migrate_state(0 /*migrate_stream_addr*/, se); +} + +int kstate_register(struct kstate_description *state, void *obj) +{ + struct state_entry *se; + + se = kmalloc(sizeof(*se), GFP_KERNEL); + if (!se) + return -ENOMEM; + + __kstate_register(state, obj, se); + return 0; +} + From patchwork Mon Mar 10 12:03:13 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009705 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D8CAC28B2E for ; Mon, 10 Mar 2025 12:04:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4ACA5280005; Mon, 10 Mar 2025 08:04:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4111E280001; Mon, 10 Mar 2025 08:04:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 239F7280005; Mon, 10 Mar 2025 08:04:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 03716280001 for ; Mon, 10 Mar 2025 08:04:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B95B3120EBE for ; Mon, 10 Mar 2025 12:04:09 +0000 (UTC) X-FDA: 83205508218.29.58FADFE Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf23.hostedemail.com (Postfix) with ESMTP id C158814001A for ; Mon, 10 Mar 2025 12:04:07 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=N3tQSyH7; spf=pass (imf23.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QkhEET4Wo+gouJiK8Lgb4vMFbcgAXJpnZW6CjWUXYUI=; b=vcbW7eKvswIUUBu7E9xBujEXYy9R90Td/p1dLG01gfEyXXYpEp2bxlqoRjBShoUrBQnSu1 MsGjG5lTp+qKlWhPTzPBdFpGsrRY2IDKqOtfjWDUyeT1+fZgHQ5gsCyT34XxCGJ8t2d0og 7aBaGCMVDteokfS8/Re/Zl8GH0/1XJM= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=N3tQSyH7; spf=pass (imf23.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com; dmarc=pass (policy=none) header.from=yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608248; a=rsa-sha256; cv=none; b=r8gSgi5LPaNam4nlUsyQXFyx7ot4iULawOyftNqgTicM52luIJlXPuGPoy+ZLR1oEZECgi vDamdBNVns1qS5v+RwZ7FOF+baoekXJYxPTGjGOFC3qkR3OiZnOeyp3qkLj2254rsUx1oD cKqenwz31EeZ51GBvVJcdzR/yNXIVuo= Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 3D4D160DE9; Mon, 10 Mar 2025 15:04:06 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-U5ZTnMcq; Mon, 10 Mar 2025 15:04:05 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608245; bh=QkhEET4Wo+gouJiK8Lgb4vMFbcgAXJpnZW6CjWUXYUI=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=N3tQSyH7q5Aw8vLpj6oIzXJUFol2N5amtQM0FyQ56H5V1ZTHW1L15Ttrhb4B+gc9s kwO45GOraHszy2v9J2VM/BFPgJC3K8JF8XkbIkUICHgn30PaHURZx5W6AKTRCMaz9h /YGOyUFNzrS71N7G5BQA84g6wuvpXWXemUAU3A/g= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 2/7] kstate, kexec, x86: transfer kstate data across kexec Date: Mon, 10 Mar 2025 13:03:13 +0100 Message-ID: <20250310120318.2124-3-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: C158814001A X-Stat-Signature: i3pzkpgek6brytgs4ayfmxa9dfaan163 X-Rspam-User: X-HE-Tag: 1741608247-952481 X-HE-Meta: U2FsdGVkX19EaDlqGeupPS70Hrbg/me6SW6lPEfnBOB95fc8YA2pwTUChgNObzVkCgTRlTrcFjDZ4nD97bfHeGBLCkJebQwVI1OBvNzrfBvzvFbAxPCXAqDbYnaQ8uo+EvMs5dKvtZd44VxETdJSAhEpLrhSsBBFNstJ3T1+HQOqsjwlEu1uidMRbGdq+FcjVDLtvCQW3/pys+lVjgWgu4iobzR2kz29pOx7vOPvE9aWEzilMDuyS+2eNVBzU6G3wn+FAb0pK+hoOFOlk2u6Yhsae/PPLeGbf10FgLArcoREC4lQ6nckGZEz41+tKGkIZ+f8a+DEZVwIdOBnOG3UkQVExN8AKXh2gP7ZA+miSSeeZAMcGKSQ9TasPX3kMMcDoIlcEvzf+RRLa5mDcYDWJSLWELDMz37BOffNjd1dGy4W/P0nTuZwwQLrgJSYsU9oheMNTmaWDnQtAfCuLzMcwr5Ty+oeNi3H6Rl8/ydaIvSzYLKgL7zcwHga1autzub6tBkReIXajK14UUEeFp59I4rv5x5ZyX7F1qi52V/nziZ2HB26X4emTcotG1xoNZ60DmibfSoRng7p5Ap6zvK5qxm2eRaDaNcUrMR9XSYYdbCb2djX7wyEyH0xWd+mLhoiJh3hDjjb753TWGzKkWID9njV8uVxT6vdNx60s/E1KE5apDqyW32fQXOoONqExmR6pFn/gsHA/FHdx8INKEYsgz55jEn/KMGf2mF8drGmVyTQdytk4aYA//TyfdMvfspmlUhq8YsycU2e0qEc6kIBo0AOk+rA6OwMlJAXgG+299EDZ/GE2uzNJXWPAM3syxinB0uJ+yDwEXiHEhR54JqhW7AGe9IzPvnmHd3xPLNUysUSnZCWF5+wbHj1C0a46Cn0pDRAnQLfWBMpcjTymiTEfm1lQKPf2GCRzofbghmgotREUxPETO9BxIYP/uYEYM0ASy26JRPKli9RoGBQtq2 g8837DXR 4Z4UDoO+uSsgtktNkg8A/Xs2UMCCRbokefQGF0yTMHpqu8kuGgIaTQgosMKDseW0Fld/u0YV042Q/6NF3Mjf91HZU8Zi7Hl8nOhWNrnFagf8tzdPficaXu56GZSsZ+15FFXkR98XUxrER759iqVgY4fZBTk7VBVcPhyQ7k30lqeezVUfZ4bnPkvVgTf4MuuVvZurmui/fPtYAHs6gIkAddk3jcswdAyMxVwnxw95WiAkd9bq4kFSGjTS+chXfA63AL2vbrctzTpituPnE3kb2TPak3KS3c3qVZFK7ATP9luWisHsgqZ1LNwa5RwtuIPsd+pc1wACEkn/g9DRDokBbUP65kkKx+AZXbM0K X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Add kstate data to kexec segments so it got copied to the new kernel. Use cmdline to inform next kernel about kstate data location and size. Signed-off-by: Andrey Ryabinin --- I've used cmdline as it's the simplest way to transfer address to the new kernel. Perhaps passing it via dtb would be more elegant solution, but I don't have strong opinion here. --- arch/x86/Kconfig | 1 + arch/x86/kernel/kexec-bzimage64.c | 4 +++ arch/x86/kernel/setup.c | 2 ++ include/linux/kexec.h | 2 ++ include/linux/kstate.h | 5 ++++ kernel/kexec_file.c | 5 ++++ kernel/kstate.c | 49 ++++++++++++++++++++++++++++++- 7 files changed, 67 insertions(+), 1 deletion(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 0e27ebd7e36a..7358d9e15957 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -90,6 +90,7 @@ config X86 select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_KCOV if X86_64 select ARCH_HAS_KERNEL_FPU_SUPPORT + select ARCH_HAS_KSTATE if X86_64 select ARCH_HAS_MEM_ENCRYPT select ARCH_HAS_MEMBARRIER_SYNC_CORE select ARCH_HAS_NMI_SAFE_THIS_CPU_OPS diff --git a/arch/x86/kernel/kexec-bzimage64.c b/arch/x86/kernel/kexec-bzimage64.c index 68530fad05f7..d3c98c8bda29 100644 --- a/arch/x86/kernel/kexec-bzimage64.c +++ b/arch/x86/kernel/kexec-bzimage64.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -77,6 +78,9 @@ static int setup_cmdline(struct kimage *image, struct boot_params *params, len = sprintf(cmdline_ptr, "elfcorehdr=0x%lx ", image->elf_load_addr); } + if (IS_ENABLED(CONFIG_KSTATE)) + len = sprintf(cmdline_ptr, "kstate_stream=0x0%lx@%ld ", + image->kstate_stream_addr, image->kstate_size); memcpy(cmdline_ptr + len, cmdline, cmdline_len); cmdline_len += len; diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index cebee310e200..b32c141ffcdd 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -992,6 +993,7 @@ void __init setup_arch(char **cmdline_p) memblock_set_current_limit(ISA_END_ADDRESS); e820__memblock_setup(); + kstate_init(); /* * Needs to run after memblock setup because it needs the physical diff --git a/include/linux/kexec.h b/include/linux/kexec.h index f0e9f8eda7a3..bd82f04888a1 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -299,6 +299,8 @@ struct kimage { unsigned long start; struct page *control_code_page; struct page *swap_page; + unsigned long kstate_stream_addr; + size_t kstate_size; void *vmcoreinfo_data_copy; /* locates in the crash memory */ unsigned long nr_segments; diff --git a/include/linux/kstate.h b/include/linux/kstate.h index 4fc01e535bc0..ae583d090111 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -126,6 +126,8 @@ static inline unsigned long kstate_get_ulong(struct kstate_stream *stream) #ifdef CONFIG_KSTATE +void kstate_init(void); + int kstate_save_state(void); void free_kstate_stream(void); @@ -137,14 +139,17 @@ int save_kstate(struct kstate_stream *stream, int id, void *obj); void restore_kstate(struct kstate_stream *stream, int id, const struct kstate_description *kstate, void *obj); +int kstate_load_migrate_buf(struct kimage *image); #else +static inline void kstate_init(void) { } #define kstate_register(state, obj) static inline int kstate_save_state(void) { return 0; } static inline void free_kstate_stream(void) { } +static inline int kstate_load_migrate_buf(struct kimage *image) { return 0; } #endif diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 3eedb8c226ad..a024ff379133 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -14,6 +14,7 @@ #include #include #include +#include #include #include #include @@ -253,6 +254,10 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, /* IMA needs to pass the measurement list to the next kernel. */ ima_add_kexec_buffer(image); + ret = kstate_load_migrate_buf(image); + if (ret) + goto out; + /* Call image load handler */ ldata = kexec_image_load_default(image); diff --git a/kernel/kstate.c b/kernel/kstate.c index a73a9a42e55b..d35996287b76 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -2,6 +2,7 @@ #include #include #include +#include #include #include #include @@ -182,6 +183,31 @@ int kstate_save_state(void) return 0; } +int kstate_load_migrate_buf(struct kimage *image) +{ + int ret; + struct kexec_buf kbuf = { .image = image, .buf_min = 0, + .buf_max = ULONG_MAX, .top_down = true }; + + kbuf.bufsz = kstate_stream.size; + kbuf.buffer = kstate_stream.start; + + kbuf.memsz = kstate_stream.size; + + kbuf.buf_align = PAGE_SIZE; + kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; + ret = kexec_add_buffer(&kbuf); + if (ret) + return ret; + image->kstate_stream_addr = kbuf.mem; + image->kstate_size = kstate_stream.size; + + pr_info("kstate: Loaded mig_stream at 0x%lx bufsz=0x%lx memsz=0x%lx\n", + kbuf.mem, kbuf.bufsz, kbuf.memsz); + + return ret; +} + void restore_kstate(struct kstate_stream *stream, int id, const struct kstate_description *kstate, void *obj) { @@ -258,6 +284,9 @@ static void restore_migrate_state(unsigned long kstate_data, } } +static unsigned long kstate_stream_addr = -1; +static unsigned long kstate_size; + static void __kstate_register(struct kstate_description *state, void *obj, struct state_entry *se) { @@ -265,7 +294,7 @@ static void __kstate_register(struct kstate_description *state, void *obj, se->id = atomic_inc_return(&state->instance_id); se->obj = obj; list_add(&se->list, &states); - restore_migrate_state(0 /*migrate_stream_addr*/, se); + restore_migrate_state(kstate_stream_addr, se); } int kstate_register(struct kstate_description *state, void *obj) @@ -280,3 +309,21 @@ int kstate_register(struct kstate_description *state, void *obj) return 0; } +static int __init setup_kstate(char *arg) +{ + char *end; + + if (!arg) + return -EINVAL; + kstate_stream_addr = memparse(arg, &end); + if (*end == '@') + kstate_size = memparse(end + 1, &end); + + return end > arg ? 0 : -EINVAL; +} +early_param("kstate_stream", setup_kstate); + +void __init kstate_init(void) +{ + memblock_reserve(kstate_stream_addr, kstate_size); +} From patchwork Mon Mar 10 12:03:14 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009706 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF7CFC28B2E for ; Mon, 10 Mar 2025 12:04:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3629C280006; Mon, 10 Mar 2025 08:04:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F050280001; Mon, 10 Mar 2025 08:04:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 16679280006; Mon, 10 Mar 2025 08:04:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ECD8F280001 for ; Mon, 10 Mar 2025 08:04:09 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id A2275140411 for ; Mon, 10 Mar 2025 12:04:11 +0000 (UTC) X-FDA: 83205508302.28.3C48078 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf19.hostedemail.com (Postfix) with ESMTP id B54071A0009 for ; Mon, 10 Mar 2025 12:04:09 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b="Zt+/vVXD"; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf19.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608250; a=rsa-sha256; cv=none; b=yZ2bHlrMrqNHd97GOyxrhj9SED1I7TOJeqadL7WZEj4k63ZnEtF+fgYwtitrhrjlGASjC0 rQ1zPmYyEC8qr1nn+4G1vuiZNYOvcgylwKIvfGKXBpGk6JvTzzteAeeaIFMdBGX8FUpo2g OBUJ61YPzGNDSAvnGMdmaXNdLFbCaqw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b="Zt+/vVXD"; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf19.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dCfHmi7JSqwfRn9hK0doUgOfig37g7tF5ZyIee0j3QQ=; b=XK3jXFUAytgpN3Fv8k/kHGQOC2B+CK9x7vScDo3rGPbZv+3J/yEHySUO3tR/1xzTkmGhpa PvShfva9SxbyunnhGsJaD44CXeUjdtoDWY4M+gJwCZYtNAiwJouQA4AgxJDJvbk3D8zXiE 5JZw3GFCzWrwUrsFLSU2dxaQzTwXHDM= Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 5958960DCE; Mon, 10 Mar 2025 15:04:08 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-61iTaEl4; Mon, 10 Mar 2025 15:04:07 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608247; bh=dCfHmi7JSqwfRn9hK0doUgOfig37g7tF5ZyIee0j3QQ=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=Zt+/vVXDKpSCu8KDQZJ8u8uEpxL5R4fTTyn3U+HF42BdMdqDXozWrf/zuiX14bZkH oTU0E2qHEUTnKlbABGK7W3Ny4IiGns9mKIZhLDvhYoLMNrAZ/l3Mgu8cIFmpSlrxFX CW2cOcIjkxXIsVbLsjrUPNEkV+dqhsmZyuQeWDOQ= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 3/7] kexec: exclude control pages from the destination addresses Date: Mon, 10 Mar 2025 13:03:14 +0100 Message-ID: <20250310120318.2124-4-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: B54071A0009 X-Rspamd-Server: rspam11 X-Stat-Signature: psck33okw99abzea93dw55zouq5ebqew X-Rspam-User: X-HE-Tag: 1741608249-715414 X-HE-Meta: U2FsdGVkX18H64H+TGEMODIEFR+Vggd05YtMDR0da6FjhqmW0CH7sbTFr+09/L7J3n20bJOy2TGmnCtVXz5rAVOoyzSW/9qvU44VgsQ6aeC203nhTlH5I7ufMm9q5J2HbgX3Kb9XY9ganfSM2PSi61FiR4cY1JIOFuFe9IYwqk8OTbGErvw6WujIQyazKMt1gelFPmf7G3zdR3ff2KCpcvl/sIUzJG7W0IA6t5FF0+XzyZ6v340Ja0AElijkgGafFE4E/bBFmoLSEPqamLchSf7DJ/L4NIhvQuc9dYVoVIf5qI6IiRjEk3BiejrQWV3NxuHVHltJnbD/sObmIe58YtI3WCzi48l3An3Sd4cvVfku/hn9+gk/IPjt8ReRYFIWV89xjeDaO6MAU1wovMcw0zQAwrapY4VgaHFZ1eQQBRv7m3Y2+v39YCYCAz5ik4nUU97uMB0oA7h+NpdIt+dtaz7bMPRrimdk7NQ3+5F4zn9syJucbQGA7cv/yDxqkauFI4OEKiG8cmr6y+rG9phTjZCwHnII2HYJZxmuHqHV8sysioDatEkDEmEGsU+ftwCuyL6hfFddb5ExcsDVJbZsL2epgPe/Hdaz74Rtq0v9mpBcNgFxEBiI0wua128Y0IuK44DR/NW1WUrF+O2LuYnHu2oDV59EupKlTP0hSoZsYvZYn2epFs8TN0BVrotDfNoHKmgXzu2fjyjqIpVuR+7EO4JYEYgUtgyPIufHoDsjk1kxMgQ7WlgHus/ukTLqwvElPoYTn9Na3yOQ/K3mHl2PSF0rOiD1EzaPJph5fVG2NjFKOILNduXvolHdBD2A6dWfxFIfofkmurIkYuD9TTG/hzqIAZChGqt8xI/3WlrzW8vvcmbZKtdHc71THH9+ztsTU0mkAMxXH4D7ObeYbVcMzulBC3CZKi+4vf6Mr0UXtlEYNJrEuJmGiCHpK56gDrd4dj94HBalk3vwiXr2Zyv mikpKx/2 KL9x93KfTzmXXjYfJLsleVs3YMwfbTmDEYnTBeWdCIS1GobK/LVLkNWtgGCnbSBgOZBaH6EA1i/wJ5btRuoO8d7D4ccUpGTycXK7ULI9LrmH8xqITu/W2ifgKlf6XRwyoUh/FIDuw6N3MFQIkzA2Q82QAy1WEWLNP/seRO3QWg8PdmT7Xd3qRs4OaVi8UdCN62xnkJ3iJs0xp+cm2AWYFQ8WspOxl0qVlkhIwYsdg76c6X1qipDWl/KNsYoDlPnombD6X8JsQlra2C/P2t/dbE+KP1avC3P1X6o27E4a4cApUHjX6m5+6psHRa0oytiHiyGRBFRMwXSmpeoV+u79aZteFgvAgezr0EZ17 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Kexec relies on control pages allocated after all destination ranges have been chosen. To be able to preserve memory across kexec we need to be able to pick destination ranges after the control pages allocated. Add check for control pages to locate_mem_hole() callbacks so it excludes control pages, hence we can allocate them in any order. Signed-off-by: Andrey Ryabinin --- kernel/kexec_core.c | 18 ++++++++++++++++++ kernel/kexec_file.c | 18 ++++-------------- kernel/kexec_internal.h | 3 +++ 3 files changed, 25 insertions(+), 14 deletions(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c0bdc1686154..647ab5705c37 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -264,6 +264,24 @@ int kimage_is_destination_range(struct kimage *image, return 0; } +int kimage_is_control_page(struct kimage *image, + unsigned long start, + unsigned long end) +{ + + struct page *page; + + list_for_each_entry(page, &image->control_pages, lru) { + unsigned long pstart, pend; + pstart = page_to_boot_pfn(page) << PAGE_SHIFT; + pend = pstart + PAGE_SIZE * (1 << page_private(page)) - 1; + if ((end >= pstart) && (start <= pend)) + return 1; + } + + return 0; +} + static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order) { struct page *pages; diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index a024ff379133..8ecd34071bfa 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -464,7 +464,8 @@ static int locate_mem_hole_top_down(unsigned long start, unsigned long end, * Make sure this does not conflict with any of existing * segments */ - if (kimage_is_destination_range(image, temp_start, temp_end)) { + if (kimage_is_destination_range(image, temp_start, temp_end) || + kimage_is_control_page(image, temp_start, temp_end)) { temp_start = temp_start - PAGE_SIZE; continue; } @@ -498,7 +499,8 @@ static int locate_mem_hole_bottom_up(unsigned long start, unsigned long end, * Make sure this does not conflict with any of existing * segments */ - if (kimage_is_destination_range(image, temp_start, temp_end)) { + if (kimage_is_destination_range(image, temp_start, temp_end) || + kimage_is_control_page(image, temp_start, temp_end)) { temp_start = temp_start + PAGE_SIZE; continue; } @@ -671,18 +673,6 @@ int kexec_add_buffer(struct kexec_buf *kbuf) if (kbuf->image->nr_segments >= KEXEC_SEGMENT_MAX) return -EINVAL; - /* - * Make sure we are not trying to add buffer after allocating - * control pages. All segments need to be placed first before - * any control pages are allocated. As control page allocation - * logic goes through list of segments to make sure there are - * no destination overlaps. - */ - if (!list_empty(&kbuf->image->control_pages)) { - WARN_ON(1); - return -EINVAL; - } - /* Ensure minimum alignment needed for segments. */ kbuf->memsz = ALIGN(kbuf->memsz, PAGE_SIZE); kbuf->buf_align = max(kbuf->buf_align, PAGE_SIZE); diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index d35d9792402d..12e655a70e25 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -14,6 +14,9 @@ int kimage_load_segment(struct kimage *image, struct kexec_segment *segment); void kimage_terminate(struct kimage *image); int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end); +int kimage_is_control_page(struct kimage *image, + unsigned long start, + unsigned long end); /* * Whatever is used to serialize accesses to the kexec_crash_image needs to be From patchwork Mon Mar 10 12:03:15 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009707 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 11082C282DE for ; Mon, 10 Mar 2025 12:04:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AD619280007; Mon, 10 Mar 2025 08:04:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7B3A280001; Mon, 10 Mar 2025 08:04:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F5D1280007; Mon, 10 Mar 2025 08:04:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 6D3D2280001 for ; Mon, 10 Mar 2025 08:04:12 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id EE9D01A0E20 for ; Mon, 10 Mar 2025 12:04:13 +0000 (UTC) X-FDA: 83205508386.23.8157795 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf14.hostedemail.com (Postfix) with ESMTP id 11786100008 for ; Mon, 10 Mar 2025 12:04:11 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=x72TBAYJ; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf14.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608252; a=rsa-sha256; cv=none; b=2XA9hKgs2zxdfhaDwhna1fai+uQ6cXzE8CRKdCzUCiZiWCdWmkhTF//N12HwCP31ya3MhQ sgP5Y4g26zKm3W3vFSP7MvlXkEn1O9zUZJosBdfKBpPQI7EJUmeAUtsZkNV+sblaKYvZoN bHADZMwzszZN0asfMA5hXPM09Mf6Rjw= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=x72TBAYJ; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf14.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608252; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C5c3jBE2oO6bqhBgG572MhRimfpuB96f90y0qeqDqaE=; b=G/jGdiKnsiOPikd4gJPM5NilM/6e3JQBV49Zh5+1i19DOJgjuZTT4FxXwUjC7VQljTWFGg KBedYjKG0eGVhJRo7APeJGyjWUpPdsI9KlhCNssBtoNjL0hAVFOOlWjIVSD7AElmoaNIY/ Shzr8BmaG0GKGQoZFEZHmICU1OZ317Q= Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id A80A160E06; Mon, 10 Mar 2025 15:04:10 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-3q4ic3FI; Mon, 10 Mar 2025 15:04:09 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608250; bh=C5c3jBE2oO6bqhBgG572MhRimfpuB96f90y0qeqDqaE=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=x72TBAYJRosfJkV7QqfGY6QJIW4ozXZqVj+oHuUpLHNMH2cZMJvUt085RbtBwN2x+ d7TAjQWgFTsAD4ThVtRq38Oxh4sxvtZW/C5NJi3+wwzfXsmGrTOuJh6TQNaT1lZGRs bX2YRijLOeTBrqHRVuYuRUt4MkNUCTd9QMpWNxT4= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 4/7] kexec, kstate: delay loading of kexec segments Date: Mon, 10 Mar 2025 13:03:15 +0100 Message-ID: <20250310120318.2124-5-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 11786100008 X-Stat-Signature: 4n55dah96oacgj73bn7365uobbnyuugy X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1741608251-807212 X-HE-Meta: U2FsdGVkX1+VvZM5pTipQFsYl4p9DdUhISuaG7f+whCLK8EGIPBHNWhwsnA5dvvJvShbrT1B+u0xetM35BPu7R/rediQEOQ/bliuQ+mpSGHQbcjvBf2sAIyLgHVQWsZR5+ZGtN48lhQvUzTUc+m2jc6Org1K4haR4mIoOcJMUCg3zaOQTIp+99bwPc8gOz59c4lbbwFEaDZsvkvA9jln6R5k8DazAf42MqlONxHOyOSaC6dn5lUL6Pjqovm/ECoEpf5dDDYkB1ZMczEtTCDGTosKcwmbFxJMrZzGBlAGh1qSNLxLOdlNs7/nl22wnDrK6qc3VS3uKwBg6/wgHwaqyduWtuOpoLCAgAthTXjITxw9jYL9QW6OT3eOR+4bvJ7i1zwxacQ0z3BjNobFEG6BIhK+H/HI2mKfpB5Gkjr54OyHERLIWe9FI8k1RJrkgMLGSAz3ztl8/0dliHP3lAzkZTK4/q2aqUmYErdMCbM/RXxRk7oe5A8tk77eBhmDT9YcAGK5b1NSHvpcpa9E5QdXWsTwiEg2GSIlHasJsx0gz2uceo1ncMXovmMw5ELD1EuFq2eNWUB9hEh/eomihICrYVsfRLWqBC6609r/9pOuCazWKJ0kQeNSGsqEClrEU9fm84YJu8w8VRmW0QP+iyN7DrfQ+UOY2upJbx/TBngkncr/8+h4BrG+EqtJeoBesV3nZvUyR4OcOT4SfUriTl52/95jc4MEvwvUzYKr724SDUzYratkqMZvwyBax7e42hcUsU60hzJirmY7FvOTmgm72kQBo9uEY7Kj+nDm08d0EeLBf/3nnBYXiwbK7+M+8BFMnx/2K3RuSgc5kqOS+qTjCpQzqCl9FG3uadaaWSFZmTxylwUlRYYdqX+Wb53NXcDh+tKZksdbwAKLnblkr0BMywIz22fnCSH6SF/w4fNWt1vCTsJwtXOU2z9PKp2Wr8c0DcWlcm6SBZnEH/GjlaM CWeTlHfs r07e1kOCk4ge/iVYnIwS9fhDrt87DBS1YJgJFOoQ0KYQyziNkta2bl5k+93EyvoAAvbJFoIg9ovMG4g4o40bD/3b/lm4wRsmrgUc3tNhX4vfC/IXmumbz0uthMp4gQx/gf8z/cPCkptYfEkYW0jAjy0ZaX+ErsLvfT2PvG7ae8cjoOZA4XfmVztjBc/Lz6biBxuvJYE3Cf+QYqnuDqO+79GpRUk+gFvSJzGtpH/7Jg/MMMqh25ftPbN6/nXA8GqJZHbmRzeP3OcmHAyUQeV1XDgAi9lB1gJQ2SnZ0GIpNcvx3eo3vDOitOFPOKcTdtVJSCXAje/yCFeZl1FEpg2jICmqE5SpszsjmWIiA X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: KSTATE's purpose is to preserve some memory across kexec. To make this happen kexec needs to choose destination ranges after the KSTATE, so these ranges doesn't collide with KSTATE preserved memory. Kexec chooses destination ranges on the kexec load stage which might happen long before the actual reboot to the new kernel. This means that KSTATE must know all preserved memory before the kexec_file_load(), unless we delay loading of kexec segments/destination addresses to the latter, at the point of reboot to the new kernel. So let's do that. Signed-off-by: Andrey Ryabinin --- include/linux/kexec.h | 1 + kernel/kexec_core.c | 6 ++ kernel/kexec_file.c | 144 ++++++++++++++++++++++++++-------------- kernel/kexec_internal.h | 6 ++ 4 files changed, 108 insertions(+), 49 deletions(-) diff --git a/include/linux/kexec.h b/include/linux/kexec.h index bd82f04888a1..539aaacfd3fd 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -377,6 +377,7 @@ extern void machine_kexec(struct kimage *image); extern int machine_kexec_prepare(struct kimage *image); extern void machine_kexec_cleanup(struct kimage *image); extern int kernel_kexec(void); +extern int kexec_file_load_segments(struct kimage *image); extern struct page *kimage_alloc_control_pages(struct kimage *image, unsigned int order); diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 647ab5705c37..7c79addeb93b 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1017,6 +1017,12 @@ int kernel_kexec(void) goto Unlock; } + if (kexec_late_load(kexec_image)) { + error = kexec_file_load_segments(kexec_image); + if (error) + goto Unlock; + } + #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { /* diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c index 8ecd34071bfa..634e2ed4cc4c 100644 --- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -187,6 +187,34 @@ kimage_validate_signature(struct kimage *image) } #endif +static int kimage_add_buffers(struct kimage *image) +{ + void *ldata; + int ret = 0; + + /* IMA needs to pass the measurement list to the next kernel. */ + ima_add_kexec_buffer(image); + + ret = kstate_load_migrate_buf(image); + if (ret) + goto out; + + /* Call image load handler */ + ldata = kexec_image_load_default(image); + + if (IS_ERR(ldata)) { + ret = PTR_ERR(ldata); + goto out; + } + + image->image_loader_data = ldata; +out: + /* In case of error, free up all allocated memory in this function */ + if (ret) + kimage_file_post_load_cleanup(image); + return ret; + +} /* * In file mode list of segments is prepared by kernel. Copy relevant * data from user space, do error checking, prepare segment list @@ -197,7 +225,6 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, unsigned long cmdline_len, unsigned flags) { ssize_t ret; - void *ldata; ret = kernel_read_file_from_fd(kernel_fd, 0, &image->kernel_buf, KEXEC_FILE_SIZE_MAX, NULL, @@ -251,22 +278,6 @@ kimage_file_prepare_segments(struct kimage *image, int kernel_fd, int initrd_fd, image->cmdline_buf_len - 1); } - /* IMA needs to pass the measurement list to the next kernel. */ - ima_add_kexec_buffer(image); - - ret = kstate_load_migrate_buf(image); - if (ret) - goto out; - - /* Call image load handler */ - ldata = kexec_image_load_default(image); - - if (IS_ERR(ldata)) { - ret = PTR_ERR(ldata); - goto out; - } - - image->image_loader_data = ldata; out: /* In case of error, free up all allocated memory in this function */ if (ret) @@ -303,10 +314,6 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, if (ret) goto out_free_image; - ret = sanity_check_segment_list(image); - if (ret) - goto out_free_post_load_bufs; - ret = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, get_order(KEXEC_CONTROL_PAGE_SIZE)); @@ -334,6 +341,70 @@ kimage_file_alloc_init(struct kimage **rimage, int kernel_fd, return ret; } +static int kimage_post_load(struct kimage *image) +{ + int ret, i; + + ret = kexec_calculate_store_digests(image); + if (ret) + goto out; + + kexec_dprintk("nr_segments = %lu\n", image->nr_segments); + for (i = 0; i < image->nr_segments; i++) { + struct kexec_segment *ksegment; + + ksegment = &image->segment[i]; + kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", + i, ksegment->buf, ksegment->bufsz, ksegment->mem, + ksegment->memsz); + + ret = kimage_load_segment(image, &image->segment[i]); + if (ret) + goto out; + } + + kimage_terminate(image); + + ret = machine_kexec_post_load(image); + if (ret) + goto out; + + kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx\n", + image->type, image->start, image->head); +out: + return ret; +} + +int kexec_file_load_segments(struct kimage *image) +{ + int ret; + + ret = kimage_add_buffers(image); + if (ret) { + pr_err("failed to add kimage buffers %d\n", ret); + goto out; + } + + ret = sanity_check_segment_list(image); + if (ret) { + pr_err("sanity check failed %d\n", ret); + goto out; + } + + ret = kimage_post_load(image); + if (ret) + pr_err("kimage post load failed %d\n", ret); + +out: + /* + * Free up any temporary buffers allocated which are not needed + * after image has been loaded + */ + kimage_file_post_load_cleanup(image); + + return ret; +} + SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, unsigned long, cmdline_len, const char __user *, cmdline_ptr, unsigned long, flags) @@ -341,7 +412,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, int image_type = (flags & KEXEC_FILE_ON_CRASH) ? KEXEC_TYPE_CRASH : KEXEC_TYPE_DEFAULT; struct kimage **dest_image, *image; - int ret = 0, i; + int ret = 0; /* We only trust the superuser with rebooting the system. */ if (!kexec_load_permitted(image_type)) @@ -398,37 +469,12 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd, if (ret) goto out; - ret = kexec_calculate_store_digests(image); - if (ret) - goto out; - - kexec_dprintk("nr_segments = %lu\n", image->nr_segments); - for (i = 0; i < image->nr_segments; i++) { - struct kexec_segment *ksegment; - - ksegment = &image->segment[i]; - kexec_dprintk("segment[%d]: buf=0x%p bufsz=0x%zx mem=0x%lx memsz=0x%zx\n", - i, ksegment->buf, ksegment->bufsz, ksegment->mem, - ksegment->memsz); - - ret = kimage_load_segment(image, &image->segment[i]); + if (!kexec_late_load(image)) { + ret = kexec_file_load_segments(image); if (ret) goto out; } - kimage_terminate(image); - - ret = machine_kexec_post_load(image); - if (ret) - goto out; - - kexec_dprintk("kexec_file_load: type:%u, start:0x%lx head:0x%lx flags:0x%lx\n", - image->type, image->start, image->head, flags); - /* - * Free up any temporary buffers allocated which are not needed - * after image has been loaded - */ - kimage_file_post_load_cleanup(image); exchange: image = xchg(dest_image, image); out: diff --git a/kernel/kexec_internal.h b/kernel/kexec_internal.h index 12e655a70e25..690b1c21b642 100644 --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -34,6 +34,12 @@ static inline void kexec_unlock(void) atomic_set_release(&__kexec_lock, 0); } +static inline bool kexec_late_load(struct kimage *image) +{ + return IS_ENABLED(CONFIG_KSTATE) && image->file_mode && + (image->type == KEXEC_TYPE_DEFAULT); +} + #ifdef CONFIG_KEXEC_FILE #include void kimage_file_post_load_cleanup(struct kimage *image); From patchwork Mon Mar 10 12:03:16 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009708 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D531FC282DE for ; Mon, 10 Mar 2025 12:04:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AB57280008; Mon, 10 Mar 2025 08:04:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 95A49280001; Mon, 10 Mar 2025 08:04:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82409280008; Mon, 10 Mar 2025 08:04:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 60510280001 for ; Mon, 10 Mar 2025 08:04:14 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2BCFC1C9ED9 for ; Mon, 10 Mar 2025 12:04:16 +0000 (UTC) X-FDA: 83205508512.15.208AD47 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf05.hostedemail.com (Postfix) with ESMTP id 416C4100016 for ; Mon, 10 Mar 2025 12:04:13 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=tKVDYRr1; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf05.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608254; a=rsa-sha256; cv=none; b=f6sCCBYK0Oro8U3fr52Nsjz74/PPKFcOb4Qlh+tnFPEqg0+DLbji8DvuUWjU1wPVP7VJxC nKGSWynFgYifz2q6yWeyfH1zeWzyW8Y6/7LjEpPPigXGg3fA2py0qNrYp2u5l13ylBLHEg G38bu6AMdiCw0vZ1VT3ky2Vs60+s3s8= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=tKVDYRr1; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf05.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9sCYhPB2JXvpHRR7mGAnOD5wFhzVOzlRXeW9jewkrPs=; b=t9xZgG7YA/PTpBw0y0yL3+6XLNhy81ESVB1bcB2c6t8NFWggRp7fZujWdnjoF3pHVEvEmO OsEVBQnkb9Mgo9+dCzmBBjERJKSQQukri2r4SDuCifmkjjsfzKRwRQSTxGL2rpKGt+Sye+ o7uEWUZ/7WhK1kjSAaV57AeqLX73jTE= Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id D8CAE60EAF; Mon, 10 Mar 2025 15:04:12 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-nBaQjWYT; Mon, 10 Mar 2025 15:04:12 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608252; bh=9sCYhPB2JXvpHRR7mGAnOD5wFhzVOzlRXeW9jewkrPs=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=tKVDYRr1Fc9PZMG5kcffLo5ML52XYvjB9fT2RP6/B4ZrI68qC5XErlG3jfkkQS0xQ LrPykZVjqAxqEQcIvjW8Q1fYRGUY3Nxe3UrpdFO1uG7CnyBvoqCzCxivetoS5oBrG4 Z3/CxdprsbnC7nsU6PIIrdFT3yANX6Z3kGq3BFqU= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 5/7] x86, kstate: Add the ability to preserve memory pages across kexec. Date: Mon, 10 Mar 2025 13:03:16 +0100 Message-ID: <20250310120318.2124-6-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 416C4100016 X-Rspamd-Server: rspam11 X-Stat-Signature: z6w6aa1twq3hnq9knw9onzfsffgkopgd X-Rspam-User: X-HE-Tag: 1741608253-406666 X-HE-Meta: U2FsdGVkX19nZ2F+/KrJZAI3pH8FpxRHmwd53LKDpZm63jXJwm/KeDS6154xAayXzQ2o0wi/jEYuDoAXnDar8Zd/TMAcDyoB9Z0DoVP2C6lde/MX9Sa6l9q6DlDOAFkPcp1OvEq+2QtoO+Hd7P5kFV0GYlrrQWG59dedbcA4Lot9kA+Oxb1/5RVL936ign3RKLQa4XbAxdLBNyo2Z282XnBxJgRavNUG/Igt9U4p5d7DfS8vJc5UneJ5fYw6qXmIWgN8K0bbKdgAkrfrZ8SU5as7CVATOmZpmUq+I192e8MkXqgRm+5f6fYTHJkfpuTcEcRqlx8T6gRGl6tzWQoU7lxIhdjTrGUoeqyPDG03gwgqeco3nwOCpmJhxufmJAjf4qBRLM+2z93W7ig/hd7RDFVbBSBrSnvFmWKiQ6qdbFo6m/BBuk2e2lOFq2ojP+iB+XaWqe2LPSuC8/sjpeB8vZQHrhcOG5Mf6VHd03lF/AsZh9nOZbHypnagmgztcG17/imWcpuFROMYM3KK152+fgBaW6nNM47jUuaBbleFWPcD/GaYgFw3p4E3MwJCyaOn//G/Qf4AkBDQKPL6gkcFOWKmdWH0fQr5PxSy/IEjfX+n0a28iE3+ILCL5UBRVSMnAFbWvQnGFOqnYWu3d4/EYfnF1i7gREqtGlisOBV+6/vatjZGeT+gMr8xkwz4i0O7xl5We2bte/5jfdUTR9wnc7ceJuQN1K0siDtH72SiM8siXFpwFoeAyG4H3p1AWe2aglEpyFxqi8vJCQXQ1UgcuCWM4b/Pm+gVlDlj0UXnYz8CRqJYVcfJ4ELwuSrMulyxHTK75siBVSRGIIWXfhBdAiu+mifYM+bbecLyy2PlEv3jNp1YZe/sqSS6NQfGMmabUW02TqyvBBnng9o+/AtEDGT3+jHY8FeTK3XpIhv8XMqCRlEG5rEMsvofhGlAuXZNmHEJiEdmC8iBd4Dz7F9 YCa3KWTp VI4rQ8K6u1n2e+uxcWNiZXsKmkWVStKvPTHgrctPZx2f1hLxsnuOspeEUMAzL2D8EJ73Nvz5kKnEg2OX6TEoyTcMKZg6CMFYQLFAKVILh7DvENMwVQX6QaOIfPHwcafwZ7rRGhZQB3kvrTZcDTiRZzhxYg5fPsedYMjVIN0R4k/JzfkJ/GamZ7ugttaqEdNPECT0yhDBEoe7Xop0AHZStKiSiDvIvtSqc/9VoEUEDpwRmM07g/bnLsbG3LVoV5Jw+CYwIhzXBU/aibbRp0LJ4fXxqy49I+3Si8DfsoFDAH22AblnUoLMUPlj5qL3vUpHjagRITP/fmDcjwcm/Fg1OHkSXU+ngyaoT1vi3 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This adds ability to specify page of memory that kstate needs to preserve across kexec. kstate_register_page() stores struct page in the special list of 'struct kpage_state's. At kexec reboot stage this list iterated, pfns saved into kstate's data stream. The new kernel after kexec reads pfns from the stream and marks memory as reserved to keep it intact. Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 30 ++++++++++ kernel/kexec_core.c | 3 +- kernel/kstate.c | 124 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 156 insertions(+), 1 deletion(-) diff --git a/include/linux/kstate.h b/include/linux/kstate.h index ae583d090111..36cfefd87572 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -88,6 +88,8 @@ struct kstate_field { }; enum kstate_ids { + KSTATE_RSVD_MEM_ID = 1, + KSTATE_STRUCT_PAGE_ID, KSTATE_LAST_ID = -1, }; @@ -124,6 +126,8 @@ static inline unsigned long kstate_get_ulong(struct kstate_stream *stream) return ret; } +extern struct kstate_description page_state; + #ifdef CONFIG_KSTATE void kstate_init(void); @@ -141,6 +145,12 @@ void restore_kstate(struct kstate_stream *stream, int id, const struct kstate_description *kstate, void *obj); int kstate_load_migrate_buf(struct kimage *image); +int kstate_page_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field); +int kstate_register_page(struct page *page, int order); + +bool kstate_range_is_preserved(unsigned long start, unsigned long end); + #else static inline void kstate_init(void) { } @@ -150,6 +160,11 @@ static inline int kstate_save_state(void) { return 0; } static inline void free_kstate_stream(void) { } static inline int kstate_load_migrate_buf(struct kimage *image) { return 0; } + +static inline bool kstate_range_is_preserved(unsigned long start, + unsigned long end) +{ return 0; } + #endif @@ -176,6 +191,21 @@ static inline int kstate_load_migrate_buf(struct kimage *image) { return 0; } .offset = offsetof(_state, _f), \ } +#define KSTATE_PAGE(_f, _state) \ + { \ + .name = "page", \ + .flags = KS_CUSTOM, \ + .offset = offsetof(_state, _f), \ + .save = kstate_page_save, \ + }, \ + KSTATE_ADDRESS(_f, _state, KS_VMEMMAP_ADDR), \ + { \ + .name = "struct_page", \ + .flags = KS_STRUCT | KS_POINTER, \ + .offset = offsetof(_state, _f), \ + .ksd = &page_state, \ + } + #define KSTATE_END_OF_LIST() { \ .flags = KS_END,\ } diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 7c79addeb93b..5d001b7a9e44 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include #include @@ -261,7 +262,7 @@ int kimage_is_destination_range(struct kimage *image, return 1; } - return 0; + return kstate_range_is_preserved(start, end); } int kimage_is_control_page(struct kimage *image, diff --git a/kernel/kstate.c b/kernel/kstate.c index d35996287b76..68a1272abceb 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -309,6 +309,13 @@ int kstate_register(struct kstate_description *state, void *obj) return 0; } +int kstate_page_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field) +{ + kstate_register_page(*(struct page **)obj, 0); + return 0; +} + static int __init setup_kstate(char *arg) { char *end; @@ -323,7 +330,124 @@ static int __init setup_kstate(char *arg) } early_param("kstate_stream", setup_kstate); +/* + * TODO: probably should use folio instead/in addition, + * also will need to think/decide what fields + * to preserve or not + */ +struct kstate_description page_state = { + .name = "struct_page", + .id = KSTATE_STRUCT_PAGE_ID, + .state_list = LIST_HEAD_INIT(page_state.state_list), + .fields = (const struct kstate_field[]) { + KSTATE_BASE_TYPE(_mapcount, struct page, atomic_t), + KSTATE_BASE_TYPE(_refcount, struct page, atomic_t), + KSTATE_END_OF_LIST() + }, +}; + +struct state_entry preserved_se; + +struct preserved_pages { + unsigned int nr_pages; + struct list_head list; +}; +struct kpage_state { + struct list_head list; + u8 order; + struct page *page; +}; + +struct preserved_pages preserved_pages = { + .list = LIST_HEAD_INIT(preserved_pages.list) +}; + +int kstate_register_page(struct page *page, int order) +{ + struct kpage_state *state; + + state = kmalloc(sizeof(*state), GFP_KERNEL); + if (!state) + return -ENOMEM; + + state->page = page; + state->order = order; + list_add(&state->list, &preserved_pages.list); + preserved_pages.nr_pages++; + return 0; +} + +static int kstate_pages_save(struct kstate_stream *stream, void *obj, + const struct kstate_field *field) +{ + struct kpage_state *p_state; + int ret; + + list_for_each_entry(p_state, &preserved_pages.list, list) { + unsigned long paddr = page_to_phys(p_state->page); + + ret = kstate_save_data(stream, &p_state->order, + sizeof(p_state->order)); + if (ret) + return ret; + ret = kstate_save_data(stream, &paddr, sizeof(paddr)); + if (ret) + return ret; + } + return 0; +} + +bool kstate_range_is_preserved(unsigned long start, unsigned long end) +{ + struct kpage_state *p_state; + + list_for_each_entry(p_state, &preserved_pages.list, list) { + unsigned long pstart, pend; + pstart = page_to_boot_pfn(p_state->page); + pend = pstart + (p_state->order << PAGE_SHIFT) - 1; + if ((end >= pstart) && (start <= pend)) + return 1; + } + return 0; +} + +static int __init kstate_pages_restore(struct kstate_stream *stream, void *obj, + const struct kstate_field *field) +{ + struct preserved_pages *preserved_pages = obj; + int nr_pages, i; + + nr_pages = preserved_pages->nr_pages; + for (i = 0; i < nr_pages; i++) { + int order = kstate_get_byte(stream); + unsigned long phys = kstate_get_ulong(stream); + + memblock_reserve(phys, PAGE_SIZE << order); + } + return 0; +} + +struct kstate_description kstate_preserved_mem = { + .name = "preserved_range", + .id = KSTATE_RSVD_MEM_ID, + .state_list = LIST_HEAD_INIT(kstate_preserved_mem.state_list), + .fields = (const struct kstate_field[]) { + KSTATE_BASE_TYPE(nr_pages, struct preserved_pages, unsigned int), + { + .name = "pages", + .flags = KS_CUSTOM, + .size = sizeof(struct preserved_pages), + .save = kstate_pages_save, + .restore = kstate_pages_restore, + }, + + KSTATE_END_OF_LIST() + }, +}; + void __init kstate_init(void) { memblock_reserve(kstate_stream_addr, kstate_size); + __kstate_register(&kstate_preserved_mem, &preserved_pages, + &preserved_se); } From patchwork Mon Mar 10 12:03:17 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009709 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 09621C282DE for ; Mon, 10 Mar 2025 12:04:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAD8428000A; Mon, 10 Mar 2025 08:04:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0CDF280001; Mon, 10 Mar 2025 08:04:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 83CAA28000A; Mon, 10 Mar 2025 08:04:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 62936280001 for ; Mon, 10 Mar 2025 08:04:16 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 32016140EE8 for ; Mon, 10 Mar 2025 12:04:18 +0000 (UTC) X-FDA: 83205508596.20.41A854F Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf13.hostedemail.com (Postfix) with ESMTP id 5CEB72000E for ; Mon, 10 Mar 2025 12:04:16 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=X8XMfwpK; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf13.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608256; a=rsa-sha256; cv=none; b=cDnaMP+PdxpKjoomY8cIlIJjYhwDGMWbogRn8X4GN2OTyV60wLqDvy2QvNMdqJ5pz1Wv3/ /Q3V/vRp7FHW+T0gdlr2+b7WrUxVWJPVU22zsspfkmlfbA8+89ExCUukwjY/GE3BC2Doj0 Qj/tENyMiIiNU9zpWaSHITg2a56OhtI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=X8XMfwpK; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf13.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=x8DAP9eUkaG75fkMO+dR1XpuCnXAgRk3qwZNeEbFzSE=; b=f3oEo3UCzrLzF8U6lJzx5Ar92bS6FF7xHzqMlGsGhAEosOlOFiYqe9+etYTkBSXVUy7Tnc Ru6O92Y/s8rajlh2X0ZzsRxYa577ffoRNSG9F3Mx9zFZgtfTCSQSS5NMJYzxBSL4g3CIl6 tNWnhen6rpwRoLmMCjNRTUAQnhmbIfE= Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id EB62A60EA5; Mon, 10 Mar 2025 15:04:14 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-4JGepy3f; Mon, 10 Mar 2025 15:04:14 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608254; bh=x8DAP9eUkaG75fkMO+dR1XpuCnXAgRk3qwZNeEbFzSE=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=X8XMfwpKpDu3TGM8eK67J+BXWD2pI0JbT66tVSPMZZ0DiCY2JiXh7/uXc320kZaMU IJHEWFV9cPloQKHvb18wQ0KuvOvE+q0ePmRIbCRYatLzo/haokQJ41l3Mfs/0sOpwX 2f3uIuhOrLk7paFJV3a+Y4xsnqPm9YHI58QoRTVk= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 6/7] kexec, kstate: save kstate data before kexec'ing Date: Mon, 10 Mar 2025 13:03:17 +0100 Message-ID: <20250310120318.2124-7-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspamd-Queue-Id: 5CEB72000E X-Stat-Signature: cwdhpkzbnafhs98hiaw5gep56swhkemk X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1741608256-230064 X-HE-Meta: U2FsdGVkX1+HOyUKjVcVG8sl3yJqWDfmhrkOUsT+LmbPdp4MpwcUCZ8cIWEw0ejnwYSk13XbPE/0UX6mS1sI9ZFu0V2w98F3wVwkjBtUzvQKx6TYr2FivC7t870fabC2t/yL/IVNWl3l8+Oh8b1HF3azp6MbNiWAeMBz7/Ilwo+EZTk3s4z+tUT0FBzPz1SwMv3k0W47S7QUwzDyhwfK0cuYMnCiNZiwD2H6lFEa7xpcWKTbMOFr6TMFXWXMBu9xGLyBO45F4u5N2+CW0tLVmUYqtUbxAGzDjGntC35xrXQSSIeLSs5Stb3EZw4pJZMydsBPMmGcKDEBvOJYo7WyCS01G//y69RYhkyJbm5x0tl4PimjhUFWcYCIwOyywEnKVsnjxKnSGz3rVceNx3wzL5YFWp2gscqVTzFBtmv0pBhiUTV8GuY+wDJXpR36srFUbgoLxbdToyKs1JWqrWXWaugcBB3j1BBWw7sW2K+u7LZFQI06INMyYngI96ywiSXoUD8Pa0fYsrdeW1/ti6qTOIylDkoGxz4Sk7Y6J8QJFGoI9X554Wv8ADJ8QUAOpqMp6sH7kmv6AVjPam2YehxzahRzDGeNqLu5gfri/g0YWbTJj77WAbXKcaouNObJECg8CdyX/JoDC9zt6KFCl2nhuoiW89skq3Mi4kY6HNK6j9hq3LEgMSjQc62qJ8Fbpizv7cfQKbdmd0Hh/cm/vkeKeTb5oNQXyn5FzsBWx2NrN+1Ohs/PpCia2acr4XMEyvV/RTeXmrQs1KGGQl1ojMwx/X+3UNqZOgklAiFkH5J0Xongrd6T5gMbd7gY0LzVwllEHbO+KZ4QTQ4DCjoe/08D8oSDmUIvg2qrBoXlgQ9bo9mt51OG0nmcpMV6hJ8FdFg0ZcV3n67QmpOmahZpFVqdNDSgDZSM/Ft/9vnhPSeV9X45TFcL2PZJCD7sp/K+glI59TMmCwbeFYEgfQdihSV U6QY2B5p rjeOA1MK4xZqvmyQFfl5FQn2JnQ+YYu0c+Ktdr8JEXn+A8ceYvrp/GHFUPa5rQkT74GIqIgo0o15K5PaRYxrECdnlGh9RMB/m4MGQbbC0BMUmGqHrrNv8dKidCDn1e/a029/Z0HFKwuFC1cBvKqpXYnQSSG5qZ0+5PkO4uxQhq3fABpu9zRc30fzgK1BpsGgR8BiuYQ5id5J+vlNBhOsOvM0WuNuGAgvRW71Vj6eA+MnmZS++CDoHLcWK1Wi3CUyb4jiZMuRkvwYWYVWkai6AlwecuEEjnWIebsr9kP2T/TTN+ATzZQnbUDeOvbRRVz2GNESFiHXqmw4KYzXJ9SHkNA2vGlvNsN7b/nSN+ey3SnPltjM= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Call kstate_save_state() to serialize all the required data into the kstate data stream. Signed-off-by: Andrey Ryabinin --- kernel/kexec_core.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index 5d001b7a9e44..7dcdaee14bfa 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1017,11 +1017,14 @@ int kernel_kexec(void) error = -EINVAL; goto Unlock; } + error = kstate_save_state(); + if (error) + goto Unlock; if (kexec_late_load(kexec_image)) { error = kexec_file_load_segments(kexec_image); if (error) - goto Unlock; + goto Free_kstate; } #ifdef CONFIG_KEXEC_JUMP @@ -1104,6 +1107,8 @@ int kernel_kexec(void) } #endif + Free_kstate: + free_kstate_stream(); Unlock: kexec_unlock(); return error; From patchwork Mon Mar 10 12:03:18 2025 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 14009710 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F7DFC282DE for ; Mon, 10 Mar 2025 12:04:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 125C428000B; Mon, 10 Mar 2025 08:04:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0AEBB280001; Mon, 10 Mar 2025 08:04:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DF4E828000B; Mon, 10 Mar 2025 08:04:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id BBB9A280001 for ; Mon, 10 Mar 2025 08:04:18 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8E412C0EA7 for ; Mon, 10 Mar 2025 12:04:20 +0000 (UTC) X-FDA: 83205508680.01.45A9D67 Received: from forwardcorp1a.mail.yandex.net (forwardcorp1a.mail.yandex.net [178.154.239.72]) by imf08.hostedemail.com (Postfix) with ESMTP id 7B07016001B for ; Mon, 10 Mar 2025 12:04:18 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=3N8MAR+n; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf08.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741608258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yQbSR/IJoyQ0CM0slHnQK+Nqd9vWCYt9LC8hjrwd9hg=; b=p8kMd33LAj0TOywa0qj63g9HBTnDzz/2K4s6PZVBtQW6xYnkVLCwUAIxVPqICxBVdiJA/s /AytgosG97wbl+QWAoxwgEXtBCmZIRsBwJc4+Iqy+EWkttezXzxfqJVLFg/lYuZdSYJaIM 7HoRycceil12X6FLvPNy7tk3HU7aB3o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741608258; a=rsa-sha256; cv=none; b=hAumWROwbqiJQkm+hFYSzNHJs4tk6k70ICe2hk84lhs++I1ildcw7e78woani8+9KH6/ZN bhX9aEyzrCVBoMs+jV3D6f9BgSt7r2IBPfoNMOPiaTvX9yuKvbJ5Wvu1P3earTsR4it8Ug o81/xOj6eolT0B0ZvoANcNU/uhZ8X8U= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=3N8MAR+n; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf08.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.72 as permitted sender) smtp.mailfrom=arbn@yandex-team.com Received: from mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net [IPv6:2a02:6b8:c1f:600c:0:640:a431:0]) by forwardcorp1a.mail.yandex.net (Yandex) with ESMTPS id 1ED3460EB2; Mon, 10 Mar 2025 15:04:17 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-83.vla.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id s3o0lL2FT0U0-HnZRPCd1; Mon, 10 Mar 2025 15:04:16 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1741608256; bh=yQbSR/IJoyQ0CM0slHnQK+Nqd9vWCYt9LC8hjrwd9hg=; h=Message-ID:Date:In-Reply-To:Cc:Subject:References:To:From; b=3N8MAR+n+0LltwNhIsUCsS4z9ZdlN3mVjT3YjhVxhQapXN9CB0W8hACTOOcCmQHqZ mfYcPKoqXadSjs+nvISkRWgKPHTOWWT9SOuKc3vWVBZnWWdW6b1pj0AUVYfILPIgOg 7txAX17EwUCueYGi5I8why2gQaWxCynL+viNswMM= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H . Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Pratyush Yadav , Jason Gunthorpe , Pasha Tatashin , David Rientjes , Andrey Ryabinin Subject: [PATCH v2 7/7] kstate, test: add test module for testing kstate subsystem. Date: Mon, 10 Mar 2025 13:03:18 +0100 Message-ID: <20250310120318.2124-8-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.3 In-Reply-To: <20250310120318.2124-1-arbn@yandex-team.com> References: <20250310120318.2124-1-arbn@yandex-team.com> MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Queue-Id: 7B07016001B X-Rspamd-Server: rspam08 X-Stat-Signature: iy99gredieog9ioqg86aoirqmj5453ke X-HE-Tag: 1741608258-676211 X-HE-Meta: U2FsdGVkX1+JYgqx4uJJKZTzmW9TWSwI8jpvotT5dcon7y/wMbb3aWYEDNnGyqWo6eJnEItwwIVDoGo75QvLIXxiv6DP36qn/Zvyd26JNvkXBHH66PEf85Rh7hDk5JcwjKdZoUyoi0hICADoELJzttjGUMBC/++tkDPUK89SC5TalrJBXynXa+7szmp65M8B63ei+lEJgWNJRu24K93T41pIsxfe/llcIz6hvWSTeS1JDAVvwARwQjcmLXdP6XcSHlQ+bZIitVX8fUYn7FLjCDGxHWbOjIYqlQvwVowBH41/WLO56HDHyfQrJkX5hMDL5I+6EqufzdCkF2HaK3frpiBUOqzwcCp7vDM3h1mDo97UiNbwtvRMb917CosO4ns2dcXGbjEYXr9kT2B8NV8dniwgrqhHxNIH8GHkJ220Rw90H+lMfA+I4rzA4QpJs6iAKB2csgZWGmckLEF/h37syhuhSYInCyWuyLW7neTWklTjccsx7Q5d2Wqa33NgUrwwzmR3V0XW30KGHc+gK48Ni36w9jAYAhE8mjO0OPorZJmJYZNXEL1OEOOqUHP1IAjso5H1q5y0en/mL42T+ID/ql/4q/bVkXLz/cRlieWux7qsbranZNiKDvCZ9kQlc70NchwJLWNg+yeDu8tJS2ZnXJePM7sx3We0yPl/rj/9Z7O6JOriHx1sS2Rk7E4fkgn19DdZuXBZ/Tscx8BUw4Yayr59xUzBv+2dbQ2g8GQSQsjMNMUqZCl5xtUxqvXpldPu47S9zmxHksNMrFbyqaNiVwLJ3Fxm5pqVikznAPCVanR2iKUPbIYiIr4D3BY093OElVe7zCTCxNT1vzU7sUynOMPi7REo4eLBreReOFNDYN1tMfD6S8GN+cViUcTxWBeyYnJJzr5FK6rOZpcCLsbdUDKXhKBg4pkHu8Il7ejTTOLrOYWgYRbmNA/ci21yTR2VdT/iFOww5eqj/Rbo8XE dsr46JDE ovYKkuljxny97F9WXyXH4rlCGuHCUIPHw2HL0zUaKYWL0UEtWTy3QAPBG6k5pD+wNXtDU4zncBUTh5u73WE1xqW9p8RGbCrYSwtL6ZQwjPAiplxrcaRMfRFbQ6r8py5FBKcCChfyLOqp2VZAPxdnCcSso70QjuHCzjVC3iYDUm2Nm/epzJJ3I0qPkEFD8ozbA6bcODRUY8PUglTxYKbCvzr2QQ24QrBFucP3HNB6R00vCfaoPf69c9iylbNc/r2FqvMkmEjU0tOyObAD4NjfPP0fNTItarV25ZG0f6ClnUfXuq+cRkfzD/RnIYWx4IRUZLoi9gXoxOLMopIJQsLkj1/QA0FB9PIDj8aCd X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This is simple test and playground useful kstate subsystem development. It contains some structure with different kind of data which migrated across kexec to the new kernel using kstate. Signed-off-by: Andrey Ryabinin --- include/linux/kstate.h | 3 ++ kernel/kstate.c | 5 +++ lib/Makefile | 2 + lib/test_kstate.c | 86 ++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 96 insertions(+) create mode 100644 lib/test_kstate.c diff --git a/include/linux/kstate.h b/include/linux/kstate.h index 36cfefd87572..0bde76aa4d8f 100644 --- a/include/linux/kstate.h +++ b/include/linux/kstate.h @@ -90,6 +90,7 @@ struct kstate_field { enum kstate_ids { KSTATE_RSVD_MEM_ID = 1, KSTATE_STRUCT_PAGE_ID, + KSTATE_TEST_ID, KSTATE_LAST_ID = -1, }; @@ -132,6 +133,8 @@ extern struct kstate_description page_state; void kstate_init(void); +bool is_kstate_kernel(void); + int kstate_save_state(void); void free_kstate_stream(void); diff --git a/kernel/kstate.c b/kernel/kstate.c index 68a1272abceb..3d9b786da72a 100644 --- a/kernel/kstate.c +++ b/kernel/kstate.c @@ -287,6 +287,11 @@ static void restore_migrate_state(unsigned long kstate_data, static unsigned long kstate_stream_addr = -1; static unsigned long kstate_size; +bool is_kstate_kernel(void) +{ + return kstate_stream_addr != -1; +} + static void __kstate_register(struct kstate_description *state, void *obj, struct state_entry *se) { diff --git a/lib/Makefile b/lib/Makefile index d5cfc7afbbb8..1395b852b58d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -356,6 +356,8 @@ obj-$(CONFIG_PARMAN) += parman.o obj-y += group_cpus.o +obj-$(CONFIG_KSTATE) += test_kstate.o + # GCC library routines obj-$(CONFIG_GENERIC_LIB_ASHLDI3) += ashldi3.o obj-$(CONFIG_GENERIC_LIB_ASHRDI3) += ashrdi3.o diff --git a/lib/test_kstate.c b/lib/test_kstate.c new file mode 100644 index 000000000000..1d9feb017415 --- /dev/null +++ b/lib/test_kstate.c @@ -0,0 +1,86 @@ +// SPDX-License-Identifier: GPL-2.0 +#include +#include +#include +#include + +static unsigned long ulong_val; +struct kstate_test_data { + int i; + unsigned long *p_ulong; + char s[10]; + struct page *page; +}; + +struct kstate_description test_state = { + .name = "test", + .version_id = 1, + .id = KSTATE_TEST_ID, + .state_list = LIST_HEAD_INIT(test_state.state_list), + .fields = (const struct kstate_field[]) { + KSTATE_BASE_TYPE(i, struct kstate_test_data, int), + KSTATE_BASE_TYPE(s, struct kstate_test_data, char [10]), + KSTATE_POINTER(p_ulong, struct kstate_test_data), + KSTATE_PAGE(page, struct kstate_test_data), + KSTATE_END_OF_LIST() + }, +}; + +static struct kstate_test_data test_data; + +static int init_test_data(void) +{ + struct page *page; + int i; + + test_data.i = 10; + ulong_val = 20; + memcpy(test_data.s, "abcdefghk", sizeof(test_data.s)); + page = alloc_page(GFP_KERNEL); + if (!page) + return -ENOMEM; + + for (i = 0; i < PAGE_SIZE/4; i += 4) + *((u32 *)page_address(page) + i) = 0xdeadbeef; + test_data.page = page; + return 0; +} + +static void validate_test_data(void) +{ + int i; + + if (WARN_ON(test_data.i != 10)) + return; + if (WARN_ON(*test_data.p_ulong != 20)) + return; + if (WARN_ON(strcmp(test_data.s, "abcdefghk") != 0)) + return; + + for (i = 0; i < PAGE_SIZE/4; i += 4) { + u32 val = *((u32 *)page_address(test_data.page) + i); + + WARN_ON(val != 0xdeadbeef); + } +} + +static int __init test_kstate_init(void) +{ + int ret = 0; + + test_data.p_ulong = &ulong_val; + + if (!is_kstate_kernel()) { + ret = init_test_data(); + if (ret) + goto out; + } + + kstate_register(&test_state, &test_data); + + validate_test_data(); + +out: + return ret; +} +__initcall(test_kstate_init);