From patchwork Wed Oct 2 16:07:15 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andrey Ryabinin X-Patchwork-Id: 13820013 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2A99CF6D3C for ; Wed, 2 Oct 2024 16:08:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D3704401BC; Wed, 2 Oct 2024 12:08:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 583DE4401B5; Wed, 2 Oct 2024 12:08:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 44D994401BC; Wed, 2 Oct 2024 12:08:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 196734401B5 for ; Wed, 2 Oct 2024 12:08:58 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 78BCF120B2E for ; Wed, 2 Oct 2024 16:08:57 +0000 (UTC) X-FDA: 82629145914.11.F1CDA96 Received: from forwardcorp1d.mail.yandex.net (forwardcorp1d.mail.yandex.net [178.154.239.200]) by imf25.hostedemail.com (Postfix) with ESMTP id DD257A0019 for ; Wed, 2 Oct 2024 16:08:52 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=Qco0Cr8s; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf25.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.200 as permitted sender) smtp.mailfrom=arbn@yandex-team.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1727885192; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=Qu8AzV4vrubALdhHGWYjkmRujMkHXpB9Tz28NNmsJNU=; b=V1P3RWCzE7b+UxiMGxt0cYGAsEUtmrusiWgi8zak8hShDnE3XMufM5VfF39QOmFLP0T208 k3zn/tqujQ0tzNb2UePLo1uGtQF8PIlNjRLq1/j0poj5MZ+QY9Y9DJjdEQ6eDiW4YyKYfN ZrPeUlHLvk+wDgbc6veaW7B5SfD+Lkc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1727885192; a=rsa-sha256; cv=none; b=WkvcAZ1CxKpCt4CWiHm+fxTvOFAYgDv5hhnKpahulcH/cjXiwk1wDZU0WOiAkwvufp1U/9 ZhWZO8wDHZGjiVoReDWA2GFlUdXbOvE5BNFb+md+KIOGOAxvsu37bTTghShpBbGx/OQmoZ lx9fr53IeP/MGNjezkKO3LbzfnCtRro= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=yandex-team.com header.s=default header.b=Qco0Cr8s; dmarc=pass (policy=none) header.from=yandex-team.com; spf=pass (imf25.hostedemail.com: domain of arbn@yandex-team.com designates 178.154.239.200 as permitted sender) smtp.mailfrom=arbn@yandex-team.com Received: from mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net [IPv6:2a02:6b8:c42:b1cb:0:640:2a1e:0]) by forwardcorp1d.mail.yandex.net (Yandex) with ESMTPS id 4084E60A53; Wed, 2 Oct 2024 19:08:50 +0300 (MSK) Received: from dellarbn.yandex.net (unknown [10.214.35.248]) by mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id Z8emWD2IhiE0-AzWbIPI5; Wed, 02 Oct 2024 19:08:49 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.com; s=default; t=1727885329; bh=Qu8AzV4vrubALdhHGWYjkmRujMkHXpB9Tz28NNmsJNU=; h=Message-ID:Date:Cc:Subject:To:From; b=Qco0Cr8sVJxDco/+vtwTXI7KiXbCi2ZO0WM08Z3X3KzHc88CYxfKbsa3MZ/jjFFb/ 1PQigYr31xemiZ5awWpcOa2tnMclwk+CdFMbl+M3UpNVx724+Z+jc4UzW5Y7jhxvth 3gRidhfUKoCKDvOe5CABTSfbPFVO75C2umbWyYBM= From: Andrey Ryabinin To: linux-kernel@vger.kernel.org Cc: Alexander Graf , James Gowans , Mike Rapoport , Andrew Morton , linux-mm@kvack.org, Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Eric Biederman , kexec@lists.infradead.org, Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , linux-trace-kernel@vger.kernel.org, valesini@yandex-team.com, Andrey Ryabinin Subject: [RFC PATCH 0/7] KSTATE: a mechanism to migrate some part of the kernel state across kexec Date: Wed, 2 Oct 2024 18:07:15 +0200 Message-ID: <20241002160722.20025-1-arbn@yandex-team.com> X-Mailer: git-send-email 2.45.2 MIME-Version: 1.0 X-Yandex-Filter: 1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: DD257A0019 X-Stat-Signature: tpeu3ih53fi8ootcnqpsftrsukbnziok X-Rspam-User: X-HE-Tag: 1727885332-757795 X-HE-Meta: U2FsdGVkX18Jpqa7Dp7TuWFkSDJJ9mINFPO0WxOaIgTcSpjF4Oj44rV/8mrLHY+6QcCZxNjKOTKkVyj2ts3gf5tF+adaEqC1Ef1j7JAIxAJImeJY5552NEH3SO1o6auJs2NVtmn5FuRNA0gAgyLfBRiJ/+2so1QsmjlfJycU6vDdBJtcBF0Tr9BPmzEmhIOu03WB3ieXv/3AIEXRTZIMmG4uxWmfMIOjpdj8V6XwfDGM5nqZbC1AS9FZGcenUnXbW6LbymMBuTLAIhOvM/3dRVWCjDaKt/dCML6T0iQ+0PxEc6uqpZt5lpqM64dxqZ/Mx2j0pPshi7d5nkE/VTB/ikaA6/JWtVe251yTGc5//Ja8/J635GFHbAG5EWFIPkxbyYrjinXUMkr142djEHwyypPBE/oXnFsYS91FFyKZLSjPHqIVcElAgf9AGtxkSKQvEpAnyEpLOGPLd5vS3cfK7dz8am8RyvuYIQYr9djx1a6fObPj03s1fO03Qjb/We15PSXQXvsj9yhGxa5V63FvZ52b2dXNasqy+iyz7gUKZphhh9amuPsLOaPFlrHkc3WI+13SHfdfybU+abhG/6h1vtvKlj9l/tlsKml86nhNCpukgK6c1hAFvLt5S3U7VAd1UfOJzOjNsQ0LzQND3O+g14hmqQO9fZmsito51ZkIozsop4jbxiUC3YNZShqw9I2UMgGlSCjZGHzbQ78doaxDCnucCwPCd6Ike/pcGUBcaDLKPqevhdvYS6zrQGtZ7cTQvvLc1lfTrtXkW3YKFirhrGm5lV/mdc/Nt/XqRhIh3WErpvZq45C0RzBwMHJf0u3t2czoHkzrTcqc2rtFL7pHyN9S4hy2wtIZ8K5yhOIRUqeDvZXxTe/zOCL0SY0IurSLBCLXsIgVoCDvElTZ6ZS8dMjWDlIheRv8m77ndHzqDzh5ZCLTPPXUMS/TU+st/P3FX6r3KkvczXhfksV2AUl 4yGrcJ5S E84QP5bhAkl4orEuoSzSZRaHHapAvrgDW7+AxxwYf/GPRKMEqJpGp/mr4FV3VvcIuZDNOzyAv9RJCMOnYMG7pfPOeuwDrzN69Xw5wtlaY9CSzYV1pBK572m8NiJJB8Es9YlykO4HsvIw/lpXotuRH3L5CZcVltnWbKYpdsgOcuUZFW7yGHLKMr3dRjBwzYnG/TE6mTfbbfQX0B/f9/YT8g0DMZjuNGVaV4usBsnclnbanbiwNwc1W1nYuVScUdWSSNpniv8UqoEwvrmdBtfttgxwj3CSfrKWeedalJGJhKtK7NVtIYc1QpON+FEcSb1/ee10iXfmcmKLHzxgNSAs1ePgE8rlqLjXC+Ney X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: kstate (kernel state) is a mechanism to describe internal some part of the kernel state, save it into the memory and restore the state after kexec in the new kernel. This is a very early RFC with a lot of hacks and cut corners with the purpose to demonstrate the concept itself. Some parts of this feature isn't well thought trough yet (like dealing with struct changes between old and new kernel, fixed size of migrate stream memory and absence of boundary checks, and so on and on). The end goal here and the main use case for this is to be able to update host kernel under VMs with VFIO pass-through devices running on that host. We are pretty far from that end goal yet. This patchset only tries to establish some basic infrastructure to describe and migrate complex in-kernel states. The inspiration for this came from QEMU and its VMSTATE stuff which is used to solve similar problem - migrate complex internal state across different versions of QEMU. So there is a bit of similarity here. The alternative for the kstate is KHO (Kexec Hand Over) [1]. Since in KHO migrates trace buffers, I decided to choose them as a victim for kstate too. So we can compare both approaches. In my very biased opinion with kstate it's much easier to describe some state to migrate it to new kernel. And seems requires almost none intervention into existing code paths of the subsystem.icated So now to the part how this works. States (usually this is some struct) are described by the 'struct kstate_description' containing the array of individual fields descpriptions - 'struct kstate_field'. Fields have different types like: KS_SIMPLE - trivial type that just copied by value KS_POINTER - field contains pointer, it will be dereferenced to copy the value during save/restore phases. KS_STRUCT - contains another struct, field->ksd must point to another 'struct kstate_dscription' KS_CUSTOM - something that requires fit trivial types as above, for this fields the callbacks field->save()/->restore() must do all job KS_ARRAY_OF_POINTER - array of pointers, the size of array determined by the field->count() callback KS_END - special flag indicating the end of migration stream data. kstate_register() call accepts kstate_description along with an instance of an object and registers it in the global 'states' list. During kexec reboot phase this list iterated, and for each instance in the list 'struct kstate_entry' formed and saved in the migration stream. 'kstate_entry' contains information like ID of kstate_description, version of it, size of migration data and the data itself. After the reboot, when the kstate_register() called it parses migration stream, finds the appropriate 'kstate_entry' and restores the contents of the object. The content of this patchset: The first patch contains the most of the basic KSTATE infrastructure. The 2,3 patches are temporary hacks needed to pass the memory used to store migration data across kexec. Will be completely redone later. The 4,5 patches are bits needed to preserve pages intact across kexec. 6 is test&playground patch to develop and test kstate itself. 7 is a demonstration of how to migrate trace buffer using kstate. [1] https://lore.kernel.org/all/20240117144704.602-1-graf@amazon.com/ Andrey Ryabinin (7): kstate: Add kstate - a mechanism to migrate some kernel state across kexec kexec: Hack and abuse crashkernel for the kstate's migration stream [hack] purgatory: disable purgatory verification. mm/memblock: Add MEMBLOCK_PRSRV flag kstate: Add mechanism to preserved specified memory pages across kexec. kstate, test: add test module for testing kstate subsystem. trace: migrate trace buffers across kexec arch/x86/kernel/kexec-bzimage64.c | 36 +++++ arch/x86/kernel/machine_kexec_64.c | 5 +- arch/x86/kernel/setup.c | 81 ++++++++++++ arch/x86/purgatory/purgatory.c | 2 + include/linux/kexec.h | 6 +- include/linux/kstate.h | 129 ++++++++++++++++++ include/linux/memblock.h | 7 + include/uapi/linux/kexec.h | 2 + kernel/Kconfig.kexec | 12 ++ kernel/Makefile | 1 + kernel/crash_core.c | 3 +- kernel/kexec_core.c | 10 +- kernel/kexec_file.c | 15 ++- kernel/kstate.c | 205 +++++++++++++++++++++++++++++ kernel/trace/ring_buffer.c | 189 ++++++++++++++++++++++++++ kernel/trace/trace.c | 81 ++++++++++++ lib/Makefile | 2 + lib/test_kstate.c | 89 +++++++++++++ mm/memblock.c | 9 +- mm/mm_init.c | 19 +++ 20 files changed, 895 insertions(+), 8 deletions(-) create mode 100644 include/linux/kstate.h create mode 100644 kernel/kstate.c create mode 100644 lib/test_kstate.c