diff mbox series

[PATCHv7,11/14] x86: Disable kexec if system has unaccepted memory

Message ID 20220614120231.48165-12-kirill.shutemov@linux.intel.com (mailing list archive)
State New
Headers show
Series mm, x86/cc: Implement support for unaccepted memory | expand

Commit Message

kirill.shutemov@linux.intel.com June 14, 2022, 12:02 p.m. UTC
On kexec, the target kernel has to know what memory has been accepted.
Information in EFI map is out of date and cannot be used.

boot_params.unaccepted_memory can be used to pass the bitmap between two
kernels on kexec, but the use-case is not yet implemented.

Disable kexec on machines with unaccepted memory for now.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/unaccepted_memory.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

Dave Hansen June 23, 2022, 5:23 p.m. UTC | #1
... adding kexec folks

On 6/14/22 05:02, Kirill A. Shutemov wrote:
> On kexec, the target kernel has to know what memory has been accepted.
> Information in EFI map is out of date and cannot be used.
> 
> boot_params.unaccepted_memory can be used to pass the bitmap between two
> kernels on kexec, but the use-case is not yet implemented.
> 
> Disable kexec on machines with unaccepted memory for now.
...
> +static int __init unaccepted_init(void)
> +{
> +	if (!boot_params.unaccepted_memory)
> +		return 0;
> +
> +#ifdef CONFIG_KEXEC_CORE
> +	/*
> +	 * TODO: Information on memory acceptance status has to be communicated
> +	 * between kernel.
> +	 */
> +	pr_warn("Disable kexec: not yet supported on systems with unaccepted memory\n");
> +	kexec_load_disabled = 1;
> +#endif

This looks to be the *only* in-kernel user tweaking kexec_load_disabled.
 It doesn't feel great to just be disabling kexec like this.  Why not
just fix it properly?

What do the kexec folks think?
Eric W. Biederman June 23, 2022, 9:48 p.m. UTC | #2
Dave Hansen <dave.hansen@intel.com> writes:

> ... adding kexec folks
>
> On 6/14/22 05:02, Kirill A. Shutemov wrote:
>> On kexec, the target kernel has to know what memory has been accepted.
>> Information in EFI map is out of date and cannot be used.
>> 
>> boot_params.unaccepted_memory can be used to pass the bitmap between two
>> kernels on kexec, but the use-case is not yet implemented.
>> 
>> Disable kexec on machines with unaccepted memory for now.
> ...
>> +static int __init unaccepted_init(void)
>> +{
>> +	if (!boot_params.unaccepted_memory)
>> +		return 0;
>> +
>> +#ifdef CONFIG_KEXEC_CORE
>> +	/*
>> +	 * TODO: Information on memory acceptance status has to be communicated
>> +	 * between kernel.
>> +	 */
>> +	pr_warn("Disable kexec: not yet supported on systems with unaccepted memory\n");
>> +	kexec_load_disabled = 1;
>> +#endif
>
> This looks to be the *only* in-kernel user tweaking kexec_load_disabled.
>  It doesn't feel great to just be disabling kexec like this.  Why not
> just fix it properly?
>
> What do the kexec folks think?

I didn't realized someone had implemented kexec_load_disabled.  I am not
particularly happy about that.  It looks like an over-broad stick that
we will have to support forever.

This change looks like it just builds on that bad decision.

If people don't want to deal with this situation right now, then I
recommend they make this new code and KEXEC conflict at the Kconfig
level.  That would give serious incentive to adding the missing
implementation.

If there is some deep and fundamental why this can not be supported
then it probably makes sense to put some code in the arch_kexec_load
hook that verifies that deep and fundamental reason is present.

With the kexec code all we have to verify it works is a little testing
and careful code review.  Something like this makes code review much
harder because the entire kernel has to be checked to see if some random
driver without locking changed a variable.  Rather than having it
apparent that this special case exists when reading through the kexec
code.

Eric
kirill.shutemov@linux.intel.com June 24, 2022, 2 a.m. UTC | #3
On Thu, Jun 23, 2022 at 04:48:59PM -0500, Eric W. Biederman wrote:
> Dave Hansen <dave.hansen@intel.com> writes:
> 
> > ... adding kexec folks
> >
> > On 6/14/22 05:02, Kirill A. Shutemov wrote:
> >> On kexec, the target kernel has to know what memory has been accepted.
> >> Information in EFI map is out of date and cannot be used.
> >> 
> >> boot_params.unaccepted_memory can be used to pass the bitmap between two
> >> kernels on kexec, but the use-case is not yet implemented.
> >> 
> >> Disable kexec on machines with unaccepted memory for now.
> > ...
> >> +static int __init unaccepted_init(void)
> >> +{
> >> +	if (!boot_params.unaccepted_memory)
> >> +		return 0;
> >> +
> >> +#ifdef CONFIG_KEXEC_CORE
> >> +	/*
> >> +	 * TODO: Information on memory acceptance status has to be communicated
> >> +	 * between kernel.
> >> +	 */
> >> +	pr_warn("Disable kexec: not yet supported on systems with unaccepted memory\n");
> >> +	kexec_load_disabled = 1;
> >> +#endif
> >
> > This looks to be the *only* in-kernel user tweaking kexec_load_disabled.
> >  It doesn't feel great to just be disabling kexec like this.  Why not
> > just fix it properly?

Unfortunately, problems with kexec are not limited to the unaccepted
memory. Isaku pointed out that MADT CPU wake is also problematic for
kexec. It doesn't allow CPU offline so secondary kernel will not be able
to wake it up. So additional limitation (as of now) for kexec is !SMP on
TDX guest.

I guess we can implement CPU offlining by going to a loop that checks
mailbox and responds to the command. That loops has to be somehow
protected from being overwritten on kexec.

Other issues may come up as we actually try to implement it.

That's all doable, but feels like a scope creep for unaccepted memory
enabling patchset :/

Is it a must for merge consideration?

> > What do the kexec folks think?
> 
> I didn't realized someone had implemented kexec_load_disabled.  I am not
> particularly happy about that.  It looks like an over-broad stick that
> we will have to support forever.
> 
> This change looks like it just builds on that bad decision.
> 
> If people don't want to deal with this situation right now, then I
> recommend they make this new code and KEXEC conflict at the Kconfig
> level.  That would give serious incentive to adding the missing
> implementation.

I tried to limit KEXEC on Kconfig level before[1]. Naive approach does not work[2]:

WARNING: unmet direct dependencies detected for UNACCEPTED_MEMORY
  Depends on [n]: EFI [=y] && EFI_STUB [=y] && !KEXEC_CORE [=y]
  Selected by [y]:
  - INTEL_TDX_GUEST [=y] && HYPERVISOR_GUEST [=y] && X86_64 [=y] && CPU_SUP_INTEL [=y] && X86_X2APIC [=y]

Maybe my Kconfig-fu is not strong enough, I donno.

[1] https://lore.kernel.org/all/20220425033934.68551-6-kirill.shutemov@linux.intel.com
[2] https://lore.kernel.org/all/YnOjJB8h3ZUR9sLX@zn.tnic

> If there is some deep and fundamental why this can not be supported
> then it probably makes sense to put some code in the arch_kexec_load
> hook that verifies that deep and fundamental reason is present.

Sounds straight-forward. I can do this.

> With the kexec code all we have to verify it works is a little testing
> and careful code review.  Something like this makes code review much
> harder because the entire kernel has to be checked to see if some random
> driver without locking changed a variable.  Rather than having it
> apparent that this special case exists when reading through the kexec
> code.
> 
> Eric
>
kirill.shutemov@linux.intel.com June 28, 2022, 11:51 p.m. UTC | #4
On Fri, Jun 24, 2022 at 05:00:05AM +0300, Kirill A. Shutemov wrote:
> > If there is some deep and fundamental why this can not be supported
> > then it probably makes sense to put some code in the arch_kexec_load
> > hook that verifies that deep and fundamental reason is present.
> 
> Sounds straight-forward. I can do this.

What about the patch below?

From 0b758600e1eef5525f2a46630ab3559f118a272a Mon Sep 17 00:00:00 2001
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Date: Tue, 10 May 2022 19:02:18 +0300
Subject: [PATCH] x86: Disable kexec if system has unaccepted memory

On kexec, the target kernel has to know what memory has been accepted.
Information in EFI map is out of date and cannot be used.

boot_params.unaccepted_memory can be used to pass the bitmap between two
kernels on kexec, but the use-case is not yet implemented.

Disable kexec on machines with unaccepted memory for now.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
---
 arch/x86/mm/unaccepted_memory.c | 16 ++++++++++++++++
 include/linux/kexec.h           |  2 ++
 kernel/kexec.c                  |  4 ++++
 kernel/kexec_core.c             |  5 +++++
 kernel/kexec_file.c             |  4 ++++
 5 files changed, 31 insertions(+)

diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
index 566c3a72aee8..529c3fd1dab3 100644
--- a/arch/x86/mm/unaccepted_memory.c
+++ b/arch/x86/mm/unaccepted_memory.c
@@ -1,4 +1,5 @@
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kexec.h>
 #include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/pfn.h>
@@ -98,3 +99,18 @@ bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
 
 	return ret;
 }
+
+#ifdef CONFIG_KEXEC_CORE
+int arch_kexec_load(void)
+{
+	if (!boot_params.unaccepted_memory)
+		return 0;
+
+	/*
+	 * TODO: Information on memory acceptance status has to be communicated
+	 * between kernel.
+	 */
+	pr_warn_once("Disable kexec: not yet supported on systems with unaccepted memory\n");
+	return -EOPNOTSUPP;
+}
+#endif
diff --git a/include/linux/kexec.h b/include/linux/kexec.h
index ce6536f1d269..dfd9493d0b4b 100644
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -396,6 +396,8 @@ void crash_free_reserved_phys_range(unsigned long begin, unsigned long end);
 void arch_kexec_protect_crashkres(void);
 void arch_kexec_unprotect_crashkres(void);
 
+int arch_kexec_load(void);
+
 #ifndef page_to_boot_pfn
 static inline unsigned long page_to_boot_pfn(struct page *page)
 {
diff --git a/kernel/kexec.c b/kernel/kexec.c
index b5e40f069768..352b3742f07a 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -195,6 +195,10 @@ static inline int kexec_load_check(unsigned long nr_segments,
 {
 	int result;
 
+	result = arch_kexec_load();
+	if (result)
+		return result;
+
 	/* We only trust the superuser with rebooting the system. */
 	if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
 		return -EPERM;
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index 4d34c78334ce..4d51b9271f6b 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -1238,3 +1238,8 @@ void __weak arch_kexec_protect_crashkres(void)
 
 void __weak arch_kexec_unprotect_crashkres(void)
 {}
+
+int __weak arch_kexec_load(void)
+{
+	return 0;
+}
diff --git a/kernel/kexec_file.c b/kernel/kexec_file.c
index 145321a5e798..d531df94ffbb 100644
--- a/kernel/kexec_file.c
+++ b/kernel/kexec_file.c
@@ -324,6 +324,10 @@ SYSCALL_DEFINE5(kexec_file_load, int, kernel_fd, int, initrd_fd,
 	int ret = 0, i;
 	struct kimage **dest_image, *image;
 
+	ret = arch_kexec_load();
+	if (ret)
+		return ret;
+
 	/* We only trust the superuser with rebooting the system. */
 	if (!capable(CAP_SYS_BOOT) || kexec_load_disabled)
 		return -EPERM;
Dave Hansen June 29, 2022, 12:10 a.m. UTC | #5
On 6/28/22 16:51, Kirill A. Shutemov wrote:
> On Fri, Jun 24, 2022 at 05:00:05AM +0300, Kirill A. Shutemov wrote:
>>> If there is some deep and fundamental why this can not be supported
>>> then it probably makes sense to put some code in the arch_kexec_load
>>> hook that verifies that deep and fundamental reason is present.
...
> +	/*
> +	 * TODO: Information on memory acceptance status has to be communicated
> +	 * between kernel.
> +	 */

So, the deep and fundamental reason is... drum roll... you haven't
gotten around to implementing bitmap passing yet?!?!?   I have the
feeling that wasn't what Eric was looking for.
kirill.shutemov@linux.intel.com June 29, 2022, 12:59 a.m. UTC | #6
On Tue, Jun 28, 2022 at 05:10:56PM -0700, Dave Hansen wrote:
> On 6/28/22 16:51, Kirill A. Shutemov wrote:
> > On Fri, Jun 24, 2022 at 05:00:05AM +0300, Kirill A. Shutemov wrote:
> >>> If there is some deep and fundamental why this can not be supported
> >>> then it probably makes sense to put some code in the arch_kexec_load
> >>> hook that verifies that deep and fundamental reason is present.
> ...
> > +	/*
> > +	 * TODO: Information on memory acceptance status has to be communicated
> > +	 * between kernel.
> > +	 */
> 
> So, the deep and fundamental reason is... drum roll... you haven't
> gotten around to implementing bitmap passing yet?!?!?   I have the
> feeling that wasn't what Eric was looking for.

The deep fundamental reason is that everything cannot be implemented and
upstreamed at once.
Dave Young July 4, 2022, 7:18 a.m. UTC | #7
On Wed, 29 Jun 2022 at 08:59, Kirill A. Shutemov
<kirill.shutemov@linux.intel.com> wrote:
>
> On Tue, Jun 28, 2022 at 05:10:56PM -0700, Dave Hansen wrote:
> > On 6/28/22 16:51, Kirill A. Shutemov wrote:
> > > On Fri, Jun 24, 2022 at 05:00:05AM +0300, Kirill A. Shutemov wrote:
> > >>> If there is some deep and fundamental why this can not be supported
> > >>> then it probably makes sense to put some code in the arch_kexec_load
> > >>> hook that verifies that deep and fundamental reason is present.
> > ...
> > > +   /*
> > > +    * TODO: Information on memory acceptance status has to be communicated
> > > +    * between kernel.
> > > +    */
> >
> > So, the deep and fundamental reason is... drum roll... you haven't
> > gotten around to implementing bitmap passing yet?!?!?   I have the
> > feeling that wasn't what Eric was looking for.
>
> The deep fundamental reason is that everything cannot be implemented and
> upstreamed at once.

If the only thing is to pass bitmap to kexec kernel, since you have
reserved the bitmap memory I guess it is straightforward to set the
kexec bootparams.unaccepted_memory as the old value.  Not sure if
there are problems when the decompress code accepts memory again
though.
for kernel kexec_file_load, refer to function setup_boot_parameters()
in arch/x86/kernel/kexec-bzimage64.c for kexec_file_load,
for kexec-tools kexec_load code refer to
setup_linux_system_parameters() kexec/arch/i386/x86-linux-setup.c

Thanks
Dave
diff mbox series

Patch

diff --git a/arch/x86/mm/unaccepted_memory.c b/arch/x86/mm/unaccepted_memory.c
index bcd56fe82b9e..05e216716690 100644
--- a/arch/x86/mm/unaccepted_memory.c
+++ b/arch/x86/mm/unaccepted_memory.c
@@ -1,4 +1,5 @@ 
 // SPDX-License-Identifier: GPL-2.0-only
+#include <linux/kexec.h>
 #include <linux/memblock.h>
 #include <linux/mm.h>
 #include <linux/pfn.h>
@@ -95,3 +96,21 @@  bool range_contains_unaccepted_memory(phys_addr_t start, phys_addr_t end)
 
 	return ret;
 }
+
+static int __init unaccepted_init(void)
+{
+	if (!boot_params.unaccepted_memory)
+		return 0;
+
+#ifdef CONFIG_KEXEC_CORE
+	/*
+	 * TODO: Information on memory acceptance status has to be communicated
+	 * between kernel.
+	 */
+	pr_warn("Disable kexec: not yet supported on systems with unaccepted memory\n");
+	kexec_load_disabled = 1;
+#endif
+
+	return 0;
+}
+fs_initcall(unaccepted_init);