diff mbox series

[v2,1/7] s390/kdump: implement is_kdump_kernel()

Message ID 20241014144622.876731-2-david@redhat.com (mailing list archive)
State New
Headers show
Series virtio-mem: s390 support | expand

Commit Message

David Hildenbrand Oct. 14, 2024, 2:46 p.m. UTC
s390 currently always results in is_kdump_kernel() == false, because
it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
deactivate the elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic.

This is required for virtio-mem to reliably identify a kdump
environment to not try hotplugging memory.

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 arch/s390/include/asm/kexec.h | 4 ++++
 arch/s390/kernel/crash_dump.c | 6 ++++++
 2 files changed, 10 insertions(+)

Comments

Heiko Carstens Oct. 14, 2024, 6:20 p.m. UTC | #1
On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote:
> s390 currently always results in is_kdump_kernel() == false, because
> it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
> deactivate the elfcorehdr= kernel parameter.
> 
> Let's follow the powerpc example and implement our own logic.
> 
> This is required for virtio-mem to reliably identify a kdump
> environment to not try hotplugging memory.
> 
> Tested-by: Mario Casquero <mcasquer@redhat.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>
> ---
>  arch/s390/include/asm/kexec.h | 4 ++++
>  arch/s390/kernel/crash_dump.c | 6 ++++++
>  2 files changed, 10 insertions(+)

Looks like this could work. But the comment in smp.c above
dump_available() needs to be updated.

Are you willing to do that, or should I provide an addon patch?
David Hildenbrand Oct. 14, 2024, 7:26 p.m. UTC | #2
On 14.10.24 20:20, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 04:46:13PM +0200, David Hildenbrand wrote:
>> s390 currently always results in is_kdump_kernel() == false, because
>> it sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to
>> deactivate the elfcorehdr= kernel parameter.
>>
>> Let's follow the powerpc example and implement our own logic.
>>
>> This is required for virtio-mem to reliably identify a kdump
>> environment to not try hotplugging memory.
>>
>> Tested-by: Mario Casquero <mcasquer@redhat.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
>> ---
>>   arch/s390/include/asm/kexec.h | 4 ++++
>>   arch/s390/kernel/crash_dump.c | 6 ++++++
>>   2 files changed, 10 insertions(+)
> 
> Looks like this could work. But the comment in smp.c above
> dump_available() needs to be updated.

A right, I remember that there was some outdated documentation.

> 
> Are you willing to do that, or should I provide an addon patch?
> 

I can squash the following:

diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..a4f538876462 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
   *    with sigp stop-and-store-status. The firmware or the boot-loader
   *    stored the registers of the boot CPU in the absolute lowcore in the
   *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
   *    The state for all CPUs except the boot CPU needs to be collected
   *    with sigp stop-and-store-status. The kexec code or the boot-loader
   *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old Kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter.
   */
  static bool dump_available(void)
  {


Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
Heiko Carstens Oct. 15, 2024, 8:30 a.m. UTC | #3
On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
> On 14.10.24 20:20, Heiko Carstens wrote:
> > Looks like this could work. But the comment in smp.c above
> > dump_available() needs to be updated.
> 
> A right, I remember that there was some outdated documentation.
> 
> > 
> > Are you willing to do that, or should I provide an addon patch?
> > 
> 
> I can squash the following:
> 
> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
> index 4df56fdb2488..a4f538876462 100644
> --- a/arch/s390/kernel/smp.c
> +++ b/arch/s390/kernel/smp.c
> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>   *    with sigp stop-and-store-status. The firmware or the boot-loader
>   *    stored the registers of the boot CPU in the absolute lowcore in the
>   *    memory of the old system.
> - * 3) kdump and the old kernel did not store the CPU state,
> - *    or stand-alone kdump for DASD
> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
> + * 3) kdump or stand-alone kdump for DASD
> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>   *    The state for all CPUs except the boot CPU needs to be collected
>   *    with sigp stop-and-store-status. The kexec code or the boot-loader
>   *    stored the registers of the boot CPU in the memory of the old system.
> - * 4) kdump and the old kernel stored the CPU state
> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
> - *    This case does not exist for s390 anymore, setup_arch explicitly
> - *    deactivates the elfcorehdr= kernel parameter
> + *
> + * Note that the old Kdump mode where the old kernel stored the CPU state

To be consistent with the rest of the comment, please write kdump in
all lower case characters, please.

> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent

Typo: kudmp.

> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?

Yes, it is some sort of kdump, even though a bit odd. But the comment
as it is doesn't need to be changed. Only at the very top, please also
change: "There are four cases" into "There are three cases".

Then it all looks good. Thanks a lot!
David Hildenbrand Oct. 15, 2024, 8:41 a.m. UTC | #4
On 15.10.24 10:30, Heiko Carstens wrote:
> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>> On 14.10.24 20:20, Heiko Carstens wrote:
>>> Looks like this could work. But the comment in smp.c above
>>> dump_available() needs to be updated.
>>
>> A right, I remember that there was some outdated documentation.
>>
>>>
>>> Are you willing to do that, or should I provide an addon patch?
>>>
>>
>> I can squash the following:
>>
>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>> index 4df56fdb2488..a4f538876462 100644
>> --- a/arch/s390/kernel/smp.c
>> +++ b/arch/s390/kernel/smp.c
>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>    *    with sigp stop-and-store-status. The firmware or the boot-loader
>>    *    stored the registers of the boot CPU in the absolute lowcore in the
>>    *    memory of the old system.
>> - * 3) kdump and the old kernel did not store the CPU state,
>> - *    or stand-alone kdump for DASD
>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>> + * 3) kdump or stand-alone kdump for DASD
>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>    *    The state for all CPUs except the boot CPU needs to be collected
>>    *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>    *    stored the registers of the boot CPU in the memory of the old system.
>> - * 4) kdump and the old kernel stored the CPU state
>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>> - *    deactivates the elfcorehdr= kernel parameter
>> + *
>> + * Note that the old Kdump mode where the old kernel stored the CPU state
> 
> To be consistent with the rest of the comment, please write kdump in
> all lower case characters, please.

It obviously was too late in the evening for me :) Thanks!

> 
>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
> 
> Typo: kudmp.
> 
>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
> 
> Yes, it is some sort of kdump, even though a bit odd.

My concern is that we'll now have

bool is_kdump_kernel(void)
{
        return oldmem_data.start && !is_ipl_type_dump();
}

Which matches 3), but if 2) is also called "kdump", then should it 
actually be

bool is_kdump_kernel(void)
{
        return oldmem_data.start;
}

?

When I wrote that code I was rather convinced that the variant in this 
patch is the right thing to do.
David Hildenbrand Oct. 15, 2024, 8:53 a.m. UTC | #5
On 15.10.24 10:41, David Hildenbrand wrote:
> On 15.10.24 10:30, Heiko Carstens wrote:
>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>> Looks like this could work. But the comment in smp.c above
>>>> dump_available() needs to be updated.
>>>
>>> A right, I remember that there was some outdated documentation.
>>>
>>>>
>>>> Are you willing to do that, or should I provide an addon patch?
>>>>
>>>
>>> I can squash the following:
>>>
>>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>>> index 4df56fdb2488..a4f538876462 100644
>>> --- a/arch/s390/kernel/smp.c
>>> +++ b/arch/s390/kernel/smp.c
>>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>>     *    with sigp stop-and-store-status. The firmware or the boot-loader
>>>     *    stored the registers of the boot CPU in the absolute lowcore in the
>>>     *    memory of the old system.
>>> - * 3) kdump and the old kernel did not store the CPU state,
>>> - *    or stand-alone kdump for DASD
>>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>>> + * 3) kdump or stand-alone kdump for DASD
>>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>>     *    The state for all CPUs except the boot CPU needs to be collected
>>>     *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>>     *    stored the registers of the boot CPU in the memory of the old system.
>>> - * 4) kdump and the old kernel stored the CPU state
>>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>>> - *    deactivates the elfcorehdr= kernel parameter
>>> + *
>>> + * Note that the old Kdump mode where the old kernel stored the CPU state
>>
>> To be consistent with the rest of the comment, please write kdump in
>> all lower case characters, please.
> 
> It obviously was too late in the evening for me :) Thanks!
> 
>>
>>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
>>
>> Typo: kudmp.
>>
>>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
>>
>> Yes, it is some sort of kdump, even though a bit odd.
> 
> My concern is that we'll now have
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start && !is_ipl_type_dump();
> }
> 
> Which matches 3), but if 2) is also called "kdump", then should it
> actually be
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start;
> }
> 
> ?
> 
> When I wrote that code I was rather convinced that the variant in this
> patch is the right thing to do.
> 

I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct:

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index cca1827d3d2e..fbc5de66d03b 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
         u64 hdr_off;
  
         /* If we are not in kdump or zfcp/nvme dump mode return */
-       if (!oldmem_data.start && !is_ipl_type_dump())
+       if (!dump_available())
                 return 0;
         /* If we cannot get HSA size for zfcp/nvme dump return error */
         if (is_ipl_type_dump() && !sclp.hsa_size)
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index b695f980bbde..09578f400ef7 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -148,7 +148,7 @@ static void os_info_old_init(void)
  
         if (os_info_init)
                 return;
-       if (!oldmem_data.start && !is_ipl_type_dump())
+       if (!dump_available())
                 goto fail;
         if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
                 goto fail;
diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
index 33cebb91b933..6a194b4f6ba5 100644
--- a/drivers/s390/char/zcore.c
+++ b/drivers/s390/char/zcore.c
@@ -300,9 +300,7 @@ static int __init zcore_init(void)
         unsigned char arch;
         int rc;
  
-       if (!is_ipl_type_dump())
-               return -ENODATA;
-       if (oldmem_data.start)
+       if (is_kdump_kernel())
                 return -ENODATA;
  
         zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long));
David Hildenbrand Oct. 15, 2024, 8:56 a.m. UTC | #6
On 15.10.24 10:53, David Hildenbrand wrote:
> On 15.10.24 10:41, David Hildenbrand wrote:
>> On 15.10.24 10:30, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>>> Looks like this could work. But the comment in smp.c above
>>>>> dump_available() needs to be updated.
>>>>
>>>> A right, I remember that there was some outdated documentation.
>>>>
>>>>>
>>>>> Are you willing to do that, or should I provide an addon patch?
>>>>>
>>>>
>>>> I can squash the following:
>>>>
>>>> diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
>>>> index 4df56fdb2488..a4f538876462 100644
>>>> --- a/arch/s390/kernel/smp.c
>>>> +++ b/arch/s390/kernel/smp.c
>>>> @@ -587,16 +587,16 @@ int smp_store_status(int cpu)
>>>>      *    with sigp stop-and-store-status. The firmware or the boot-loader
>>>>      *    stored the registers of the boot CPU in the absolute lowcore in the
>>>>      *    memory of the old system.
>>>> - * 3) kdump and the old kernel did not store the CPU state,
>>>> - *    or stand-alone kdump for DASD
>>>> - *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
>>>> + * 3) kdump or stand-alone kdump for DASD
>>>> + *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
>>>>      *    The state for all CPUs except the boot CPU needs to be collected
>>>>      *    with sigp stop-and-store-status. The kexec code or the boot-loader
>>>>      *    stored the registers of the boot CPU in the memory of the old system.
>>>> - * 4) kdump and the old kernel stored the CPU state
>>>> - *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
>>>> - *    This case does not exist for s390 anymore, setup_arch explicitly
>>>> - *    deactivates the elfcorehdr= kernel parameter
>>>> + *
>>>> + * Note that the old Kdump mode where the old kernel stored the CPU state
>>>
>>> To be consistent with the rest of the comment, please write kdump in
>>> all lower case characters, please.
>>
>> It obviously was too late in the evening for me :) Thanks!
>>
>>>
>>>> + * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
>>>> + * kernel parameter. The is_kudmp_kernel() implementation on s390 is independent
>>>
>>> Typo: kudmp.
>>>
>>>> Does that sound reasonable? I'm not so sure about the "2) stand-alone kdump for
>>>> SCSI/NVMe (zfcp/nvme dump with swapped memory)": is that really "kdump" ?
>>>
>>> Yes, it is some sort of kdump, even though a bit odd.
>>
>> My concern is that we'll now have
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start && !is_ipl_type_dump();
>> }
>>
>> Which matches 3), but if 2) is also called "kdump", then should it
>> actually be
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start;
>> }
>>
>> ?
>>
>> When I wrote that code I was rather convinced that the variant in this
>> patch is the right thing to do.
>>
> 
> I think we can do some follow up cleanups, assuming is_kdump_kernel() here is correct:
> 
> diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
> index cca1827d3d2e..fbc5de66d03b 100644
> --- a/arch/s390/kernel/crash_dump.c
> +++ b/arch/s390/kernel/crash_dump.c
> @@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
>           u64 hdr_off;
>    
>           /* If we are not in kdump or zfcp/nvme dump mode return */
> -       if (!oldmem_data.start && !is_ipl_type_dump())
> +       if (!dump_available())
>                   return 0;
>           /* If we cannot get HSA size for zfcp/nvme dump return error */
>           if (is_ipl_type_dump() && !sclp.hsa_size)
> diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
> index b695f980bbde..09578f400ef7 100644
> --- a/arch/s390/kernel/os_info.c
> +++ b/arch/s390/kernel/os_info.c
> @@ -148,7 +148,7 @@ static void os_info_old_init(void)
>    
>           if (os_info_init)
>                   return;
> -       if (!oldmem_data.start && !is_ipl_type_dump())
> +       if (!dump_available())
>                   goto fail;
>           if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
>                   goto fail;
> diff --git a/drivers/s390/char/zcore.c b/drivers/s390/char/zcore.c
> index 33cebb91b933..6a194b4f6ba5 100644
> --- a/drivers/s390/char/zcore.c
> +++ b/drivers/s390/char/zcore.c
> @@ -300,9 +300,7 @@ static int __init zcore_init(void)
>           unsigned char arch;
>           int rc;
>    
> -       if (!is_ipl_type_dump())
> -               return -ENODATA;
> -       if (oldmem_data.start)
> +       if (is_kdump_kernel())
>                   return -ENODATA;
>    
>           zcore_dbf = debug_register("zcore", 4, 1, 4 * sizeof(long));
> 
> 

Ugh, ignore the last one, I'm just confused about dumping options on 
s390x at this point :)
David Hildenbrand Oct. 15, 2024, 10:40 a.m. UTC | #7
On 15.10.24 12:08, Heiko Carstens wrote:
> On Tue, Oct 15, 2024 at 10:41:21AM +0200, David Hildenbrand wrote:
>> On 15.10.24 10:30, Heiko Carstens wrote:
>>> On Mon, Oct 14, 2024 at 09:26:03PM +0200, David Hildenbrand wrote:
>>>> On 14.10.24 20:20, Heiko Carstens wrote:
>>>>> Looks like this could work. But the comment in smp.c above
>>>>> dump_available() needs to be updated.
>>>>
>>>> A right, I remember that there was some outdated documentation.
> 
> ...
> 
>> My concern is that we'll now have
>>
>> bool is_kdump_kernel(void)
>> {
>>         return oldmem_data.start && !is_ipl_type_dump();
>> }
>>
>> Which matches 3), but if 2) is also called "kdump", then should it actually
>> be
>>
>> bool is_kdump_kernel(void)
>> {
>>         return oldmem_data.start;
>> }
>>
>> ?
>>
>> When I wrote that code I was rather convinced that the variant in this patch
>> is the right thing to do.
> 
> Oh well, we simply of too many dump options. I'll let Alexander and
> Mikhail better comment on this :)

Thanks!

> 
> Alexander, Mikhail, the discussion starts here, and we need a sane
> is_kdump_kernel() for s390:
> https://lore.kernel.org/all/20241014144622.876731-1-david@redhat.com/
> 

With the following cleanup in mind:

 From 9fbbff43f725a8482ce9e85857316ab8484ff3c8 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Tue, 15 Oct 2024 11:07:43 +0200
Subject: [PATCH] s390: use !dump_available() instead of "!oldmem_data.start &&
  !is_ipl_type_dump()"

Let's make the code more readable:

    !dump_available()
-> !(oldmem_data.start || is_ipl_type_dump())
-> !oldmem_data.start && !is_ipl_type_dump()

Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/s390/kernel/crash_dump.c | 2 +-
  arch/s390/kernel/os_info.c    | 2 +-
  2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index cca1827d3d2e..fbc5de66d03b 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -609,7 +609,7 @@ int elfcorehdr_alloc(unsigned long long *addr, unsigned long long *size)
  	u64 hdr_off;
  
  	/* If we are not in kdump or zfcp/nvme dump mode return */
-	if (!oldmem_data.start && !is_ipl_type_dump())
+	if (!dump_available())
  		return 0;
  	/* If we cannot get HSA size for zfcp/nvme dump return error */
  	if (is_ipl_type_dump() && !sclp.hsa_size)
diff --git a/arch/s390/kernel/os_info.c b/arch/s390/kernel/os_info.c
index b695f980bbde..09578f400ef7 100644
--- a/arch/s390/kernel/os_info.c
+++ b/arch/s390/kernel/os_info.c
@@ -148,7 +148,7 @@ static void os_info_old_init(void)
  
  	if (os_info_init)
  		return;
-	if (!oldmem_data.start && !is_ipl_type_dump())
+	if (!dump_available())
  		goto fail;
  	if (copy_oldmem_kernel(&addr, __LC_OS_INFO, sizeof(addr)))
  		goto fail;
Alexander Egorenkov Oct. 16, 2024, 1:35 p.m. UTC | #8
Hi David,

> My concern is that we'll now have
>
> bool is_kdump_kernel(void)
> {
>         return oldmem_data.start && !is_ipl_type_dump();
> }
>
> Which matches 3), but if 2) is also called "kdump", then should it actually
> be
>
> bool is_kdump_kernel(void)
> {
>         return oldmem_data.start;
> }
>
> ?
>
> When I wrote that code I was rather convinced that the variant in this patch
> is the right thing to do.

A short explanation about what a stand-alone kdump is.

* First, it's not really a _regular_ kdump activated with kexec-tools and
  executed by Linux itself but a regular stand-alone dump (SCSI) from the
  FW's perspective (one has to use HMC or dumpconf to execute it and not
  with kexec-tools like for the _regular_ kdump).
* One has to reserve crashkernel memory region in the old crashed kernel
  even if it remains unused until the dump starts.
* zipl uses regular kdump kernel and initramfs to create stand-alone
  dumper images and to write them to a dump disk which is used for
  IPLIng the stand-alone dumper.
* The zipl bootloader takes care of transferring the old kernel memory
  saved in HSA by the FW to the crashkernel memory region reserved by the old
  crashed kernel before it enters the dumper. The HSA memory is released
  by the zipl bootloader _before_ the dumper image is entered,
  therefore, we cannot use HSA to read old kernel memory, and instead
  use memory from crashkernel region, just like the regular kdump.
* is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
  the dumper like a regular stand-alone dump (e.g. zfcpdump).
* Summarized, zipl bootloader prepares an environment which is expected by
  the regular kdump for a stand-alone kdump dumper before it is entered.

In my opinion, the correct version of is_kdump_kernel() would be

bool is_kdump_kernel(void)
{
        return oldmem_data.start;
}

because Linux kernel doesn't differentiate between both the regular
and the stand-alone kdump where it matters while performing dumper
operations (e.g. reading saved old kernel memory from crashkernel memory region).

Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
is to tell us whether Linux kernel runs in a kdump like environment and not
whether the current mode is identical to the proper and true kdump,
right ? And if stand-alone kdump swims like a duck, quacks like one, then it
is one, regardless how it was started, by kexecing or IPLing
from a disk.

The stand-alone kdump has a very special use case which most users will
never encounter. And usually, one just takes zfcpdump instead which is
more robust and much smaller considering how big kdump initrd can get.
stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.

Regards
Alex
David Hildenbrand Oct. 16, 2024, 3:47 p.m. UTC | #9
>>
>> When I wrote that code I was rather convinced that the variant in this patch
>> is the right thing to do.
> 
> A short explanation about what a stand-alone kdump is.
> 
> * First, it's not really a _regular_ kdump activated with kexec-tools and
>    executed by Linux itself but a regular stand-alone dump (SCSI) from the
>    FW's perspective (one has to use HMC or dumpconf to execute it and not
>    with kexec-tools like for the _regular_ kdump).

Ah, that makes sense.

> * One has to reserve crashkernel memory region in the old crashed kernel
>    even if it remains unused until the dump starts.
> * zipl uses regular kdump kernel and initramfs to create stand-alone
>    dumper images and to write them to a dump disk which is used for
>    IPLIng the stand-alone dumper.
> * The zipl bootloader takes care of transferring the old kernel memory
>    saved in HSA by the FW to the crashkernel memory region reserved by the old
>    crashed kernel before it enters the dumper. The HSA memory is released
>    by the zipl bootloader _before_ the dumper image is entered,
>    therefore, we cannot use HSA to read old kernel memory, and instead
>    use memory from crashkernel region, just like the regular kdump.
> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>    the dumper like a regular stand-alone dump (e.g. zfcpdump).
> * Summarized, zipl bootloader prepares an environment which is expected by
>    the regular kdump for a stand-alone kdump dumper before it is entered.

Thanks for the details!

> 
> In my opinion, the correct version of is_kdump_kernel() would be
> 
> bool is_kdump_kernel(void)
> {
>          return oldmem_data.start;
> }
> 
> because Linux kernel doesn't differentiate between both the regular
> and the stand-alone kdump where it matters while performing dumper
> operations (e.g. reading saved old kernel memory from crashkernel memory region).
> 

Right, but if we consider "/proc/vmcore is available", a better version 
would IMHO be:

bool is_kdump_kernel(void)
{
           return dump_available();
}

Because that is mostly (not completely) how is_kdump_kernel() would have 
worked right now *after* we had the elfcorehdr_alloc() during the 
fs_init call.


> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
> is to tell us whether Linux kernel runs in a kdump like environment and not
> whether the current mode is identical to the proper and true kdump,
> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
> is one, regardless how it was started, by kexecing or IPLing
> from a disk.

Same thinking here.

> 
> The stand-alone kdump has a very special use case which most users will
> never encounter. And usually, one just takes zfcpdump instead which is
> more robust and much smaller considering how big kdump initrd can get.
> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.

Makes sense, so it boils down to either

bool is_kdump_kernel(void)
{
          return oldmem_data.start;
}

Which means is_kdump_kernel() can be "false" even though /proc/vmcore is 
available or

bool is_kdump_kernel(void)
{
          return dump_available();
}

Which means is_kdump_kernel() can never be "false" if /proc/vmcore is 
available. There is the chance of is_kdump_kernel() being "true" if 
"elfcorehdr_alloc()" fails with -ENODEV.


You're call :) Thanks!
David Hildenbrand Oct. 16, 2024, 3:54 p.m. UTC | #10
On 16.10.24 17:47, David Hildenbrand wrote:
>>>
>>> When I wrote that code I was rather convinced that the variant in this patch
>>> is the right thing to do.
>>
>> A short explanation about what a stand-alone kdump is.
>>
>> * First, it's not really a _regular_ kdump activated with kexec-tools and
>>     executed by Linux itself but a regular stand-alone dump (SCSI) from the
>>     FW's perspective (one has to use HMC or dumpconf to execute it and not
>>     with kexec-tools like for the _regular_ kdump).
> 
> Ah, that makes sense.
> 
>> * One has to reserve crashkernel memory region in the old crashed kernel
>>     even if it remains unused until the dump starts.
>> * zipl uses regular kdump kernel and initramfs to create stand-alone
>>     dumper images and to write them to a dump disk which is used for
>>     IPLIng the stand-alone dumper.
>> * The zipl bootloader takes care of transferring the old kernel memory
>>     saved in HSA by the FW to the crashkernel memory region reserved by the old
>>     crashed kernel before it enters the dumper. The HSA memory is released
>>     by the zipl bootloader _before_ the dumper image is entered,
>>     therefore, we cannot use HSA to read old kernel memory, and instead
>>     use memory from crashkernel region, just like the regular kdump.
>> * is_ipl_type_dump() will be true for a stand-alone kdump because we IPL
>>     the dumper like a regular stand-alone dump (e.g. zfcpdump).
>> * Summarized, zipl bootloader prepares an environment which is expected by
>>     the regular kdump for a stand-alone kdump dumper before it is entered.
> 
> Thanks for the details!
> 
>>
>> In my opinion, the correct version of is_kdump_kernel() would be
>>
>> bool is_kdump_kernel(void)
>> {
>>           return oldmem_data.start;
>> }
>>
>> because Linux kernel doesn't differentiate between both the regular
>> and the stand-alone kdump where it matters while performing dumper
>> operations (e.g. reading saved old kernel memory from crashkernel memory region).
>>
> 
> Right, but if we consider "/proc/vmcore is available", a better version
> would IMHO be:
> 
> bool is_kdump_kernel(void)
> {
>             return dump_available();
> }
> 
> Because that is mostly (not completely) how is_kdump_kernel() would have
> worked right now *after* we had the elfcorehdr_alloc() during the
> fs_init call.
> 
> 
>> Furthermore, if i'm not mistaken then the purpose of is_kdump_kernel()
>> is to tell us whether Linux kernel runs in a kdump like environment and not
>> whether the current mode is identical to the proper and true kdump,
>> right ? And if stand-alone kdump swims like a duck, quacks like one, then it
>> is one, regardless how it was started, by kexecing or IPLing
>> from a disk.
> 
> Same thinking here.
> 
>>
>> The stand-alone kdump has a very special use case which most users will
>> never encounter. And usually, one just takes zfcpdump instead which is
>> more robust and much smaller considering how big kdump initrd can get.
>> stand-alone kdump dumper images cannot exceed HSA memory limit on a Z machine.
> 
> Makes sense, so it boils down to either
> 
> bool is_kdump_kernel(void)
> {
>            return oldmem_data.start;
> }
> 
> Which means is_kdump_kernel() can be "false" even though /proc/vmcore is
> available or
> 
> bool is_kdump_kernel(void)
> {
>            return dump_available();
> }
> 
> Which means is_kdump_kernel() can never be "false" if /proc/vmcore is
> available. There is the chance of is_kdump_kernel() being "true" if
> "elfcorehdr_alloc()" fails with -ENODEV.
> 
> 
> You're call :) Thanks!
> 

What I think we should do is the following (improved comment + patch
description), but I'll do whatever you think is better:


 From e86194b5195c743eff33f563796b9c725fecc65f Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david@redhat.com>
Date: Wed, 4 Sep 2024 14:57:10 +0200
Subject: [PATCH] s390/kdump: provide custom is_kdump_kernel()

s390 currently always results in is_kdump_kernel() == false until
vmcore_init()->elfcorehdr_alloc() ran, because it sets
"elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch to deactivate
any elfcorehdr= kernel parameter.

Let's follow the powerpc example and implement our own logic. Let's use
"dump_available()", because this is mostly (with one exception when
elfcorehdr_alloc() fails with -ENODEV) when we would create /proc/vmcore
and when is_kdump_kernel() would have returned "true" after
vmcore_init().

This is required for virtio-mem to reliably identify a kdump
environment before vmcore_init() was called to not try hotplugging memory.

Update the documentation above dump_available().

Tested-by: Mario Casquero <mcasquer@redhat.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
  arch/s390/include/asm/kexec.h |  4 ++++
  arch/s390/kernel/crash_dump.c |  6 ++++++
  arch/s390/kernel/smp.c        | 16 ++++++++--------
  3 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@ void arch_kexec_protect_crashkres(void);
  
  void arch_kexec_unprotect_crashkres(void);
  #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
  #endif
  
  #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@ int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
  int arch_kimage_file_post_load_cleanup(struct kimage *image);
  #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
  #endif
+
  #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index 51313ed7e617..43bbaf534dd2 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@ int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
  						       prot);
  }
  
+bool is_kdump_kernel(void)
+{
+	return dump_available();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
  static const char *nt_name(Elf64_Word type)
  {
  	const char *name = "LINUX";
diff --git a/arch/s390/kernel/smp.c b/arch/s390/kernel/smp.c
index 4df56fdb2488..bd41e35a27a0 100644
--- a/arch/s390/kernel/smp.c
+++ b/arch/s390/kernel/smp.c
@@ -574,7 +574,7 @@ int smp_store_status(int cpu)
  
  /*
   * Collect CPU state of the previous, crashed system.
- * There are four cases:
+ * There are three cases:
   * 1) standard zfcp/nvme dump
   *    condition: OLDMEM_BASE == NULL && is_ipl_type_dump() == true
   *    The state for all CPUs except the boot CPU needs to be collected
@@ -587,16 +587,16 @@ int smp_store_status(int cpu)
   *    with sigp stop-and-store-status. The firmware or the boot-loader
   *    stored the registers of the boot CPU in the absolute lowcore in the
   *    memory of the old system.
- * 3) kdump and the old kernel did not store the CPU state,
- *    or stand-alone kdump for DASD
- *    condition: OLDMEM_BASE != NULL && !is_kdump_kernel()
+ * 3) kdump or stand-alone kdump for DASD
+ *    condition: OLDMEM_BASE != NULL && !is_ipl_type_dump() == false
   *    The state for all CPUs except the boot CPU needs to be collected
   *    with sigp stop-and-store-status. The kexec code or the boot-loader
   *    stored the registers of the boot CPU in the memory of the old system.
- * 4) kdump and the old kernel stored the CPU state
- *    condition: OLDMEM_BASE != NULL && is_kdump_kernel()
- *    This case does not exist for s390 anymore, setup_arch explicitly
- *    deactivates the elfcorehdr= kernel parameter
+ *
+ * Note that the old kdump mode where the old kernel stored the CPU state
+ * does no longer exist: setup_arch explicitly deactivates the elfcorehdr=
+ * kernel parameter. The is_kdump_kernel() implementation on s390 is independent
+ * of the elfcorehdr= parameter, and is purely based on dump_available().
   */
  static bool dump_available(void)
  {
diff mbox series

Patch

diff --git a/arch/s390/include/asm/kexec.h b/arch/s390/include/asm/kexec.h
index 1bd08eb56d5f..bd20543515f5 100644
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -94,6 +94,9 @@  void arch_kexec_protect_crashkres(void);
 
 void arch_kexec_unprotect_crashkres(void);
 #define arch_kexec_unprotect_crashkres arch_kexec_unprotect_crashkres
+
+bool is_kdump_kernel(void);
+#define is_kdump_kernel is_kdump_kernel
 #endif
 
 #ifdef CONFIG_KEXEC_FILE
@@ -107,4 +110,5 @@  int arch_kexec_apply_relocations_add(struct purgatory_info *pi,
 int arch_kimage_file_post_load_cleanup(struct kimage *image);
 #define arch_kimage_file_post_load_cleanup arch_kimage_file_post_load_cleanup
 #endif
+
 #endif /*_S390_KEXEC_H */
diff --git a/arch/s390/kernel/crash_dump.c b/arch/s390/kernel/crash_dump.c
index edae13416196..cca1827d3d2e 100644
--- a/arch/s390/kernel/crash_dump.c
+++ b/arch/s390/kernel/crash_dump.c
@@ -237,6 +237,12 @@  int remap_oldmem_pfn_range(struct vm_area_struct *vma, unsigned long from,
 						       prot);
 }
 
+bool is_kdump_kernel(void)
+{
+	return oldmem_data.start && !is_ipl_type_dump();
+}
+EXPORT_SYMBOL_GPL(is_kdump_kernel);
+
 static const char *nt_name(Elf64_Word type)
 {
 	const char *name = "LINUX";