mbox series

[RFC,v3,0/7] device backed vmemmap crash dump support

Message ID 20240306102846.1020868-1-lizhijian@fujitsu.com (mailing list archive)
Headers show
Series device backed vmemmap crash dump support | expand

Message

Zhijian Li (Fujitsu) March 6, 2024, 10:28 a.m. UTC
Hello folks,

Compared with the V2[1] I posted a long time ago, this time it is a
completely new proposal design.

### Background and motivate overview ###
---
Crash dump is an important feature for troubleshooting the kernel. It is the
final way to chase what happened at the kernel panic, slowdown, and so on. It
is one of the most important tools for customer support.

Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
configure the dumpable regions. Generally, (A)iomem resources registered with
flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
(B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
are dumpable.

The pmem use cases including fsdax and devdax, could map their vmemmap to
their own devices. In this case, these part of vmemmap will not be dumped when
crash happened since these regions are satisfied with neither the above (A)
nor (B).

In fsdax, the vmemmap(struct page array) becomes very important, it is one of
the key data to find status of reverse map. Lacking of the information may
cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
That means troubleshooters are unable to check more details about pmem from
the dumpfile.

### Proposal ###
---
In this proposal, register the device backed vmemmap as a separate resource.
This resource has its own new flag and name, and then teaches kexec_file_load(2)
and kexec_load(2) to mark it as dumpable.

Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
Proposed name: "Device Backed Vmemmap"

NOTE: crash-utils also needs to adapt to this new name for kexec_load()

With current proposal, the /proc/iomem should show as following for device
backed vmemmap
# cat /proc/iomem
...
fffc0000-ffffffff : Reserved
100000000-13fffffff : Persistent Memory
  100000000-10fffffff : namespace0.0
    100000000-1005fffff : Device Backed Vmemmap  # fsdax
a80000000-b7fffffff : CXL Window 0
  a80000000-affffffff : Persistent Memory
    a80000000-affffffff : region1
      a80000000-a811fffff : namespace1.0
        a80000000-a811fffff : Device Backed Vmemmap # devdax
      a81200000-abfffffff : dax1.0
b80000000-c7fffffff : CXL Window 1
c80000000-147fffffff : PCI Bus 0000:00
  c80000000-c801fffff : PCI Bus 0000:01
...

### Kdump service reloading ###
---
Once the kdump service is loaded, if changes to CPUs or memory occur,
either by hot un/plug or off/onlining, the crash elfcorehdr should also
be updated. There are 2 approaches to make the reloading more efficient.
1) Use udev rules to watch CPU and memory events, then reload kdump
2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)

This reloading also needed when device backed vmemmap layouts change, Similar
to what 1) does now, one could add the following as the first lines to the
RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:

# namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
# devdax <-> system-ram updated: watch daxX.Y of dax
SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"

Regarding 2), my idea is that it would need to call the memory_notify() in
devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
hotplug. This part is not yet mature, but it does not affect the whole feature
because we can still use method 1) alternatively.

[1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
--------------------------------------------
changes from V2[1]
- new proposal design

CC: Alison Schofield <alison.schofield@intel.com>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Baoquan He <bhe@redhat.com>
CC: Borislav Petkov <bp@alien8.de>
CC: Dan Williams <dan.j.williams@intel.com>
CC: Dave Hansen <dave.hansen@linux.intel.com>
CC: Dave Jiang <dave.jiang@intel.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: "H. Peter Anvin" <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: Ira Weiny <ira.weiny@intel.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Vishal Verma <vishal.l.verma@intel.com>
CC: linux-cxl@vger.kernel.org
CC: linux-mm@kvack.org
CC: nvdimm@lists.linux.dev
CC: x86@kernel.org

Li Zhijian (7):
  mm: memremap: register/unregister altmap region to a separate resource
  mm: memremap: add pgmap_parent_resource() helper
  nvdimm: pmem: assign a parent resource for vmemmap region for the
    fsdax
  dax: pmem: assign a parent resource for vmemmap region for the devdax
  resource: Introduce walk device_backed_vmemmap res() helper
  x86/crash: make device backed vmemmap dumpable for kexec_file_load
  nvdimm: set force_raw=1 in kdump kernel

 arch/x86/kernel/crash.c         |  5 +++++
 drivers/dax/pmem.c              |  8 ++++++--
 drivers/nvdimm/namespace_devs.c |  3 +++
 drivers/nvdimm/pmem.c           |  9 ++++++---
 include/linux/ioport.h          |  4 ++++
 include/linux/memremap.h        |  4 ++++
 kernel/resource.c               | 13 +++++++++++++
 mm/memremap.c                   | 30 +++++++++++++++++++++++++++++-
 8 files changed, 70 insertions(+), 6 deletions(-)

Comments

Zhijian Li (Fujitsu) March 21, 2024, 5:40 a.m. UTC | #1
ping


Any comment is welcome.


On 06/03/2024 18:28, Li Zhijian wrote:
> Hello folks,
> 
> Compared with the V2[1] I posted a long time ago, this time it is a
> completely new proposal design.
> 
> ### Background and motivate overview ###
> ---
> Crash dump is an important feature for troubleshooting the kernel. It is the
> final way to chase what happened at the kernel panic, slowdown, and so on. It
> is one of the most important tools for customer support.
> 
> Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
> configure the dumpable regions. Generally, (A)iomem resources registered with
> flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
> (B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
> are dumpable.
> 
> The pmem use cases including fsdax and devdax, could map their vmemmap to
> their own devices. In this case, these part of vmemmap will not be dumped when
> crash happened since these regions are satisfied with neither the above (A)
> nor (B).
> 
> In fsdax, the vmemmap(struct page array) becomes very important, it is one of
> the key data to find status of reverse map. Lacking of the information may
> cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
> That means troubleshooters are unable to check more details about pmem from
> the dumpfile.
> 
> ### Proposal ###
> ---
> In this proposal, register the device backed vmemmap as a separate resource.
> This resource has its own new flag and name, and then teaches kexec_file_load(2)
> and kexec_load(2) to mark it as dumpable.
> 
> Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
> Proposed name: "Device Backed Vmemmap"
> 
> NOTE: crash-utils also needs to adapt to this new name for kexec_load()
> 
> With current proposal, the /proc/iomem should show as following for device
> backed vmemmap
> # cat /proc/iomem
> ...
> fffc0000-ffffffff : Reserved
> 100000000-13fffffff : Persistent Memory
>    100000000-10fffffff : namespace0.0
>      100000000-1005fffff : Device Backed Vmemmap  # fsdax
> a80000000-b7fffffff : CXL Window 0
>    a80000000-affffffff : Persistent Memory
>      a80000000-affffffff : region1
>        a80000000-a811fffff : namespace1.0
>          a80000000-a811fffff : Device Backed Vmemmap # devdax
>        a81200000-abfffffff : dax1.0
> b80000000-c7fffffff : CXL Window 1
> c80000000-147fffffff : PCI Bus 0000:00
>    c80000000-c801fffff : PCI Bus 0000:01
> ...
> 
> ### Kdump service reloading ###
> ---
> Once the kdump service is loaded, if changes to CPUs or memory occur,
> either by hot un/plug or off/onlining, the crash elfcorehdr should also
> be updated. There are 2 approaches to make the reloading more efficient.
> 1) Use udev rules to watch CPU and memory events, then reload kdump
> 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)
> 
> This reloading also needed when device backed vmemmap layouts change, Similar
> to what 1) does now, one could add the following as the first lines to the
> RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
> 
> # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
> SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
> SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> # devdax <-> system-ram updated: watch daxX.Y of dax
> SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
> SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> 
> Regarding 2), my idea is that it would need to call the memory_notify() in
> devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
> hotplug. This part is not yet mature, but it does not affect the whole feature
> because we can still use method 1) alternatively.
> 
> [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
> --------------------------------------------
> changes from V2[1]
> - new proposal design
> 
> CC: Alison Schofield <alison.schofield@intel.com>
> CC: Andrew Morton <akpm@linux-foundation.org>
> CC: Baoquan He <bhe@redhat.com>
> CC: Borislav Petkov <bp@alien8.de>
> CC: Dan Williams <dan.j.williams@intel.com>
> CC: Dave Hansen <dave.hansen@linux.intel.com>
> CC: Dave Jiang <dave.jiang@intel.com>
> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> CC: "H. Peter Anvin" <hpa@zytor.com>
> CC: Ingo Molnar <mingo@redhat.com>
> CC: Ira Weiny <ira.weiny@intel.com>
> CC: Thomas Gleixner <tglx@linutronix.de>
> CC: Vishal Verma <vishal.l.verma@intel.com>
> CC: linux-cxl@vger.kernel.org
> CC: linux-mm@kvack.org
> CC: nvdimm@lists.linux.dev
> CC: x86@kernel.org
> 
> Li Zhijian (7):
>    mm: memremap: register/unregister altmap region to a separate resource
>    mm: memremap: add pgmap_parent_resource() helper
>    nvdimm: pmem: assign a parent resource for vmemmap region for the
>      fsdax
>    dax: pmem: assign a parent resource for vmemmap region for the devdax
>    resource: Introduce walk device_backed_vmemmap res() helper
>    x86/crash: make device backed vmemmap dumpable for kexec_file_load
>    nvdimm: set force_raw=1 in kdump kernel
> 
>   arch/x86/kernel/crash.c         |  5 +++++
>   drivers/dax/pmem.c              |  8 ++++++--
>   drivers/nvdimm/namespace_devs.c |  3 +++
>   drivers/nvdimm/pmem.c           |  9 ++++++---
>   include/linux/ioport.h          |  4 ++++
>   include/linux/memremap.h        |  4 ++++
>   kernel/resource.c               | 13 +++++++++++++
>   mm/memremap.c                   | 30 +++++++++++++++++++++++++++++-
>   8 files changed, 70 insertions(+), 6 deletions(-)
>
Baoquan He March 21, 2024, 6:17 a.m. UTC | #2
On 03/21/24 at 05:40am, Zhijian Li (Fujitsu) wrote:
> ping
> 
> 
> Any comment is welcome.

I will have a look at this from kdump side. How do you test your code?

By the way, there's issue reported by test robot.

Thanks
Baoquan

> 
> 
> On 06/03/2024 18:28, Li Zhijian wrote:
> > Hello folks,
> > 
> > Compared with the V2[1] I posted a long time ago, this time it is a
> > completely new proposal design.
> > 
> > ### Background and motivate overview ###
> > ---
> > Crash dump is an important feature for troubleshooting the kernel. It is the
> > final way to chase what happened at the kernel panic, slowdown, and so on. It
> > is one of the most important tools for customer support.
> > 
> > Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
> > configure the dumpable regions. Generally, (A)iomem resources registered with
> > flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
> > (B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
> > are dumpable.
> > 
> > The pmem use cases including fsdax and devdax, could map their vmemmap to
> > their own devices. In this case, these part of vmemmap will not be dumped when
> > crash happened since these regions are satisfied with neither the above (A)
> > nor (B).
> > 
> > In fsdax, the vmemmap(struct page array) becomes very important, it is one of
> > the key data to find status of reverse map. Lacking of the information may
> > cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
> > That means troubleshooters are unable to check more details about pmem from
> > the dumpfile.
> > 
> > ### Proposal ###
> > ---
> > In this proposal, register the device backed vmemmap as a separate resource.
> > This resource has its own new flag and name, and then teaches kexec_file_load(2)
> > and kexec_load(2) to mark it as dumpable.
> > 
> > Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
> > Proposed name: "Device Backed Vmemmap"
> > 
> > NOTE: crash-utils also needs to adapt to this new name for kexec_load()
> > 
> > With current proposal, the /proc/iomem should show as following for device
> > backed vmemmap
> > # cat /proc/iomem
> > ...
> > fffc0000-ffffffff : Reserved
> > 100000000-13fffffff : Persistent Memory
> >    100000000-10fffffff : namespace0.0
> >      100000000-1005fffff : Device Backed Vmemmap  # fsdax
> > a80000000-b7fffffff : CXL Window 0
> >    a80000000-affffffff : Persistent Memory
> >      a80000000-affffffff : region1
> >        a80000000-a811fffff : namespace1.0
> >          a80000000-a811fffff : Device Backed Vmemmap # devdax
> >        a81200000-abfffffff : dax1.0
> > b80000000-c7fffffff : CXL Window 1
> > c80000000-147fffffff : PCI Bus 0000:00
> >    c80000000-c801fffff : PCI Bus 0000:01
> > ...
> > 
> > ### Kdump service reloading ###
> > ---
> > Once the kdump service is loaded, if changes to CPUs or memory occur,
> > either by hot un/plug or off/onlining, the crash elfcorehdr should also
> > be updated. There are 2 approaches to make the reloading more efficient.
> > 1) Use udev rules to watch CPU and memory events, then reload kdump
> > 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)
> > 
> > This reloading also needed when device backed vmemmap layouts change, Similar
> > to what 1) does now, one could add the following as the first lines to the
> > RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
> > 
> > # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> > # devdax <-> system-ram updated: watch daxX.Y of dax
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
> > SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"
> > 
> > Regarding 2), my idea is that it would need to call the memory_notify() in
> > devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
> > hotplug. This part is not yet mature, but it does not affect the whole feature
> > because we can still use method 1) alternatively.
> > 
> > [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
> > --------------------------------------------
> > changes from V2[1]
> > - new proposal design
> > 
> > CC: Alison Schofield <alison.schofield@intel.com>
> > CC: Andrew Morton <akpm@linux-foundation.org>
> > CC: Baoquan He <bhe@redhat.com>
> > CC: Borislav Petkov <bp@alien8.de>
> > CC: Dan Williams <dan.j.williams@intel.com>
> > CC: Dave Hansen <dave.hansen@linux.intel.com>
> > CC: Dave Jiang <dave.jiang@intel.com>
> > CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> > CC: "H. Peter Anvin" <hpa@zytor.com>
> > CC: Ingo Molnar <mingo@redhat.com>
> > CC: Ira Weiny <ira.weiny@intel.com>
> > CC: Thomas Gleixner <tglx@linutronix.de>
> > CC: Vishal Verma <vishal.l.verma@intel.com>
> > CC: linux-cxl@vger.kernel.org
> > CC: linux-mm@kvack.org
> > CC: nvdimm@lists.linux.dev
> > CC: x86@kernel.org
> > 
> > Li Zhijian (7):
> >    mm: memremap: register/unregister altmap region to a separate resource
> >    mm: memremap: add pgmap_parent_resource() helper
> >    nvdimm: pmem: assign a parent resource for vmemmap region for the
> >      fsdax
> >    dax: pmem: assign a parent resource for vmemmap region for the devdax
> >    resource: Introduce walk device_backed_vmemmap res() helper
> >    x86/crash: make device backed vmemmap dumpable for kexec_file_load
> >    nvdimm: set force_raw=1 in kdump kernel
> > 
> >   arch/x86/kernel/crash.c         |  5 +++++
> >   drivers/dax/pmem.c              |  8 ++++++--
> >   drivers/nvdimm/namespace_devs.c |  3 +++
> >   drivers/nvdimm/pmem.c           |  9 ++++++---
> >   include/linux/ioport.h          |  4 ++++
> >   include/linux/memremap.h        |  4 ++++
> >   kernel/resource.c               | 13 +++++++++++++
> >   mm/memremap.c                   | 30 +++++++++++++++++++++++++++++-
> >   8 files changed, 70 insertions(+), 6 deletions(-)
> >
Zhijian Li (Fujitsu) March 21, 2024, 6:57 a.m. UTC | #3
Baoquan,



On 21/03/2024 14:17, Baoquan He wrote:
> On 03/21/24 at 05:40am, Zhijian Li (Fujitsu) wrote:
>> ping
>>
>>
>> Any comment is welcome.
> 
> I will have a look at this from kdump side. How do you test your code?

Thanks for your support.

- nothing change is required for makedumpfile and crash-utils
- nothing change is required for kexe-utils if you are using kexec_file_load(2)
- kexec-tool needs apply below patch if you are using kexec_load(2),


After the coredump is collected, crash-utils is able to inspect the device backed vmemmap memory.


 From 4a2f0f91cdd8b084bb4ebafdc4f71b8e2a333720 Mon Sep 17 00:00:00 2001
From: Li Zhijian <lizhijian@fujitsu.com>
Date: Fri, 1 Mar 2024 13:44:36 +0800
Subject: [PATCH] consider device_backed_vmemmap dumpable
                                                          
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
  kexec/arch/i386/crashdump-x86.c    | 2 ++
  kexec/arch/i386/kexec-x86-common.c | 3 +++
  kexec/crashdump-elf.c              | 2 +-
  kexec/kexec.h                      | 1 +
  4 files changed, 7 insertions(+), 1 deletion(-)
                                                          
diff --git a/kexec/arch/i386/crashdump-x86.c b/kexec/arch/i386/crashdump-x86.c
index a01031e570aa..757922dc3a57 100644
--- a/kexec/arch/i386/crashdump-x86.c
+++ b/kexec/arch/i386/crashdump-x86.c
@@ -284,6 +284,8 @@ static int get_crash_memory_ranges(struct memory_range **range, int *ranges,
                         type = RANGE_PRAM;
                 } else if(memcmp(str,"Persistent Memory\n",18) == 0 ) {
                         type = RANGE_PMEM;
+               } else if (memcmp(str, "Device Backed Vmemmap\n", 22)) {
+                       type = RANGE_DEVICE_BACKED_VMEMMAP;
                 } else if(memcmp(str,"reserved\n",9) == 0 ) {
                         type = RANGE_RESERVED;
                 } else if (memcmp(str, "Reserved\n", 9) == 0) {
diff --git a/kexec/arch/i386/kexec-x86-common.c b/kexec/arch/i386/kexec-x86-common.c
index ffc95a9e43f8..0dd76903e7fc 100644
--- a/kexec/arch/i386/kexec-x86-common.c
+++ b/kexec/arch/i386/kexec-x86-common.c
@@ -111,6 +111,9 @@ static int get_memory_ranges_proc_iomem(struct memory_range **range, int *ranges
                 else if (memcmp(str, "Persistent Memory\n", 18) == 0) {
                         type = RANGE_PMEM;
                 }
+               else if (memcmp(str, "Device Backed Vmemmap\n", 22)) {
+                       type = RANGE_DEVICE_BACKED_VMEMMAP;
+               }
                 else {
                         continue;
                 }
diff --git a/kexec/crashdump-elf.c b/kexec/crashdump-elf.c
index b8bb686a17ca..98ad854e3f8d 100644
--- a/kexec/crashdump-elf.c
+++ b/kexec/crashdump-elf.c
@@ -199,7 +199,7 @@ int FUNC(struct kexec_info *info,
          * A seprate program header for Backup Region*/
         for (i = 0; i < ranges; i++, range++) {
                 unsigned long long mstart, mend;
-               if (range->type != RANGE_RAM)
+               if (range->type != RANGE_RAM || range->type != RANGE_DEVICE_BACKED_VMEMMAP)
                         continue;
                 mstart = range->start;
                 mend = range->end;
diff --git a/kexec/kexec.h b/kexec/kexec.h
index 1004aff15945..c0481bb2727d 100644
--- a/kexec/kexec.h
+++ b/kexec/kexec.h
@@ -139,6 +139,7 @@ struct memory_range {
  #define RANGE_UNCACHED 4
  #define RANGE_PMEM             6
  #define RANGE_PRAM             11
+#define RANGE_DEVICE_BACKED_VMEMMAP 12
  };
   
  struct memory_ranges {
Zhijian Li (Fujitsu) April 19, 2024, 2:05 a.m. UTC | #4
ping again,

I would appreciate feedback from the nvdimm and linux-mm communities

Thanks
Zhijian


On 21/03/2024 13:40, Li Zhijian wrote:
> ping
> 
> 
> Any comment is welcome.
> 
> 
> On 06/03/2024 18:28, Li Zhijian wrote:
>> Hello folks,
>>
>> Compared with the V2[1] I posted a long time ago, this time it is a
>> completely new proposal design.
>>
>> ### Background and motivate overview ###
>> ---
>> Crash dump is an important feature for troubleshooting the kernel. It is the
>> final way to chase what happened at the kernel panic, slowdown, and so on. It
>> is one of the most important tools for customer support.
>>
>> Currently, there are 2 syscalls(kexec_file_load(2) and kexec_load(2)) to
>> configure the dumpable regions. Generally, (A)iomem resources registered with
>> flags (IORESOURCE_SYSTEM_RAM | IORESOUCE_BUSY) for kexec_file_load(2) or
>> (B)iomem resources registered with "System RAM" name prefix for kexec_load(2)
>> are dumpable.
>>
>> The pmem use cases including fsdax and devdax, could map their vmemmap to
>> their own devices. In this case, these part of vmemmap will not be dumped when
>> crash happened since these regions are satisfied with neither the above (A)
>> nor (B).
>>
>> In fsdax, the vmemmap(struct page array) becomes very important, it is one of
>> the key data to find status of reverse map. Lacking of the information may
>> cause difficulty to analyze trouble around pmem (especially Filesystem-DAX).
>> That means troubleshooters are unable to check more details about pmem from
>> the dumpfile.
>>
>> ### Proposal ###
>> ---
>> In this proposal, register the device backed vmemmap as a separate resource.
>> This resource has its own new flag and name, and then teaches kexec_file_load(2)
>> and kexec_load(2) to mark it as dumpable.
>>
>> Proposed flag: IORESOURCE_DEVICE_BACKED_VMEMMAP
>> Proposed name: "Device Backed Vmemmap"
>>
>> NOTE: crash-utils also needs to adapt to this new name for kexec_load()
>>
>> With current proposal, the /proc/iomem should show as following for device
>> backed vmemmap
>> # cat /proc/iomem
>> ...
>> fffc0000-ffffffff : Reserved
>> 100000000-13fffffff : Persistent Memory
>>    100000000-10fffffff : namespace0.0
>>      100000000-1005fffff : Device Backed Vmemmap  # fsdax
>> a80000000-b7fffffff : CXL Window 0
>>    a80000000-affffffff : Persistent Memory
>>      a80000000-affffffff : region1
>>        a80000000-a811fffff : namespace1.0
>>          a80000000-a811fffff : Device Backed Vmemmap # devdax
>>        a81200000-abfffffff : dax1.0
>> b80000000-c7fffffff : CXL Window 1
>> c80000000-147fffffff : PCI Bus 0000:00
>>    c80000000-c801fffff : PCI Bus 0000:01
>> ...
>>
>> ### Kdump service reloading ###
>> ---
>> Once the kdump service is loaded, if changes to CPUs or memory occur,
>> either by hot un/plug or off/onlining, the crash elfcorehdr should also
>> be updated. There are 2 approaches to make the reloading more efficient.
>> 1) Use udev rules to watch CPU and memory events, then reload kdump
>> 2) Enable kernel crash hotplug to automatically reload elfcorehdr (>= 6.5)
>>
>> This reloading also needed when device backed vmemmap layouts change, Similar
>> to what 1) does now, one could add the following as the first lines to the
>> RHEL udev rule file /usr/lib/udev/rules.d/98-kexec.rules:
>>
>> # namespace updated: watch daxX.Y(devdax) and pfnX.Y(fsdax) of nd
>> SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="bind", GOTO="kdump_reload"
>> SUBSYSTEM=="nd", KERNEL=="[dp][af][xn][0-9].*", ACTION=="unbind", GOTO="kdump_reload"
>> # devdax <-> system-ram updated: watch daxX.Y of dax
>> SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="bind", GOTO="kdump_reload"
>> SUBSYSTEM=="dax", KERNEL=="dax[0-9].*", ACTION=="unbind", GOTO="kdump_reload"
>>
>> Regarding 2), my idea is that it would need to call the memory_notify() in
>> devm_memremap_pages_release() and devm_memremap_pages() to trigger the crash
>> hotplug. This part is not yet mature, but it does not affect the whole feature
>> because we can still use method 1) alternatively.
>>
>> [1] https://lore.kernel.org/lkml/02066f0f-dbc0-0388-4233-8e24b6f8435b@fujitsu.com/T/
>> --------------------------------------------
>> changes from V2[1]
>> - new proposal design
>>
>> CC: Alison Schofield <alison.schofield@intel.com>
>> CC: Andrew Morton <akpm@linux-foundation.org>
>> CC: Baoquan He <bhe@redhat.com>
>> CC: Borislav Petkov <bp@alien8.de>
>> CC: Dan Williams <dan.j.williams@intel.com>
>> CC: Dave Hansen <dave.hansen@linux.intel.com>
>> CC: Dave Jiang <dave.jiang@intel.com>
>> CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> CC: "H. Peter Anvin" <hpa@zytor.com>
>> CC: Ingo Molnar <mingo@redhat.com>
>> CC: Ira Weiny <ira.weiny@intel.com>
>> CC: Thomas Gleixner <tglx@linutronix.de>
>> CC: Vishal Verma <vishal.l.verma@intel.com>
>> CC: linux-cxl@vger.kernel.org
>> CC: linux-mm@kvack.org
>> CC: nvdimm@lists.linux.dev
>> CC: x86@kernel.org
>>
>> Li Zhijian (7):
>>    mm: memremap: register/unregister altmap region to a separate resource
>>    mm: memremap: add pgmap_parent_resource() helper
>>    nvdimm: pmem: assign a parent resource for vmemmap region for the
>>      fsdax
>>    dax: pmem: assign a parent resource for vmemmap region for the devdax
>>    resource: Introduce walk device_backed_vmemmap res() helper
>>    x86/crash: make device backed vmemmap dumpable for kexec_file_load
>>    nvdimm: set force_raw=1 in kdump kernel
>>
>>   arch/x86/kernel/crash.c         |  5 +++++
>>   drivers/dax/pmem.c              |  8 ++++++--
>>   drivers/nvdimm/namespace_devs.c |  3 +++
>>   drivers/nvdimm/pmem.c           |  9 ++++++---
>>   include/linux/ioport.h          |  4 ++++
>>   include/linux/memremap.h        |  4 ++++
>>   kernel/resource.c               | 13 +++++++++++++
>>   mm/memremap.c                   | 30 +++++++++++++++++++++++++++++-
>>   8 files changed, 70 insertions(+), 6 deletions(-)
>>