diff mbox

[v9,04/11] arm64: kexec_file: allocate memory walking through memblock list

Message ID 20180425062629.29404-5-takahiro.akashi@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

AKASHI Takahiro April 25, 2018, 6:26 a.m. UTC
We need to prevent firmware-reserved memory regions, particularly EFI
memory map as well as ACPI tables, from being corrupted by loading
kernel/initrd (or other kexec buffers). We also want to support memory
allocation in top-down manner in addition to default bottom-up.
So let's have arm64 specific arch_kexec_walk_mem() which will search
for available memory ranges in usable memblock list,
i.e. !NOMAP & !reserved, instead of system resource tree.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/kernel/Makefile             |  3 +-
 arch/arm64/kernel/machine_kexec_file.c | 57 ++++++++++++++++++++++++++
 2 files changed, 59 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm64/kernel/machine_kexec_file.c

Comments

James Morse May 1, 2018, 5:46 p.m. UTC | #1
Hi Akashi,

On 25/04/18 07:26, AKASHI Takahiro wrote:
> We need to prevent firmware-reserved memory regions, particularly EFI
> memory map as well as ACPI tables, from being corrupted by loading
> kernel/initrd (or other kexec buffers). We also want to support memory
> allocation in top-down manner in addition to default bottom-up.
> So let's have arm64 specific arch_kexec_walk_mem() which will search
> for available memory ranges in usable memblock list,
> i.e. !NOMAP & !reserved, 

> instead of system resource tree.

Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
be safe in the EFI-memory-map/ACPI-tables case?

It would be good to avoid having two ways of doing this, and I would like to
avoid having extra arch code...


> diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> new file mode 100644
> index 000000000000..f9ebf54ca247
> --- /dev/null
> +++ b/arch/arm64/kernel/machine_kexec_file.c
> @@ -0,0 +1,57 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * kexec_file for arm64
> + *
> + * Copyright (C) 2018 Linaro Limited
> + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> + *

> + * Most code is derived from arm64 port of kexec-tools

How does kexec-tools walk memblock?


> + */
> +
> +#define pr_fmt(fmt) "kexec_file: " fmt
> +
> +#include <linux/ioport.h>
> +#include <linux/kernel.h>
> +#include <linux/kexec.h>
> +#include <linux/memblock.h>
> +
> +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> +				int (*func)(struct resource *, void *))
> +{
> +	phys_addr_t start, end;
> +	struct resource res;
> +	u64 i;
> +	int ret = 0;
> +
> +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> +		return func(&crashk_res, kbuf);
> +
> +	if (kbuf->top_down)
> +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range_reverse() is a more readable version of this helper.

> +			if (!memblock_is_map_memory(start))
> +				continue;

Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.


> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +	else
> +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> +				NUMA_NO_NODE, MEMBLOCK_NONE,
> +				&start, &end, NULL) {

for_each_free_mem_range()?

> +			if (!memblock_is_map_memory(start))
> +				continue;
> +
> +			res.start = start;
> +			res.end = end;
> +			ret = func(&res, kbuf);
> +			if (ret)
> +				break;
> +		}
> +
> +	return ret;
> +}
> 

With these changes, what we have is almost:
arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
(the difference being powerpc doesn't yet support crash-kernels here)

If the argument is walking memblock gives a better answer than the stringy
walk_system_ram_res() thing, is there any mileage in moving this code into
kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?

This would save arm64/powerpc having near-identical implementations.
32bit arm keeps memblock if it has kexec, so it may be useful there too if
kexec_file_load() support is added.


Thanks,

James
AKASHI Takahiro May 7, 2018, 5:59 a.m. UTC | #2
James,

On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> Hi Akashi,
> 
> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > We need to prevent firmware-reserved memory regions, particularly EFI
> > memory map as well as ACPI tables, from being corrupted by loading
> > kernel/initrd (or other kexec buffers). We also want to support memory
> > allocation in top-down manner in addition to default bottom-up.
> > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > for available memory ranges in usable memblock list,
> > i.e. !NOMAP & !reserved, 
> 
> > instead of system resource tree.
> 
> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> be safe in the EFI-memory-map/ACPI-tables case?
> 
> It would be good to avoid having two ways of doing this, and I would like to
> avoid having extra arch code...

I know what you mean.
/proc/iomem or system resource is, in my opinion, not the best place to
describe memory usage of kernel but rather to describe *physical* hardware
layout. As we are still discussing about "reserved" memory, I don't want
to depend on it.
Along with memblock list, we will have more accurate control over memory
usage.

> 
> > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > new file mode 100644
> > index 000000000000..f9ebf54ca247
> > --- /dev/null
> > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > @@ -0,0 +1,57 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * kexec_file for arm64
> > + *
> > + * Copyright (C) 2018 Linaro Limited
> > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > + *
> 
> > + * Most code is derived from arm64 port of kexec-tools
> 
> How does kexec-tools walk memblock?

Will remove this comment from this patch.
Obviously, this comment is for the rest of the code which will be
added to succeeding patches (patch #5 and #7).


> 
> > + */
> > +
> > +#define pr_fmt(fmt) "kexec_file: " fmt
> > +
> > +#include <linux/ioport.h>
> > +#include <linux/kernel.h>
> > +#include <linux/kexec.h>
> > +#include <linux/memblock.h>
> > +
> > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > +				int (*func)(struct resource *, void *))
> > +{
> > +	phys_addr_t start, end;
> > +	struct resource res;
> > +	u64 i;
> > +	int ret = 0;
> > +
> > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > +		return func(&crashk_res, kbuf);
> > +
> > +	if (kbuf->top_down)
> > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range_reverse() is a more readable version of this helper.

OK. I used to use my own limited list of reserved memory instead of
memblock.reserved here to exclude verbose ranges.


> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> 
> Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.

Sure, I confirmed it.

> 
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +	else
> > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > +				&start, &end, NULL) {
> 
> for_each_free_mem_range()?

OK.

> > +			if (!memblock_is_map_memory(start))
> > +				continue;
> > +
> > +			res.start = start;
> > +			res.end = end;
> > +			ret = func(&res, kbuf);
> > +			if (ret)
> > +				break;
> > +		}
> > +
> > +	return ret;
> > +}
> > 
> 
> With these changes, what we have is almost:
> arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> (the difference being powerpc doesn't yet support crash-kernels here)
> 
> If the argument is walking memblock gives a better answer than the stringy
> walk_system_ram_res() thing, is there any mileage in moving this code into
> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> 
> This would save arm64/powerpc having near-identical implementations.
> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> kexec_file_load() support is added.

Thanks. I've forgot ppc.

-Takahiro AKASHI


> 
> Thanks,
> 
> James
AKASHI Takahiro May 15, 2018, 4:35 a.m. UTC | #3
James,

On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

If you don't have further objection, I will take memblock approach
(with factoring out powerpc's arch_kexec_walk_mem()).

Thanks,
-Takahiro AKASHI


> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James
James Morse May 15, 2018, 4:17 p.m. UTC | #4
Hi Akashi,

On 15/05/18 05:35, AKASHI Takahiro wrote:
> On Mon, May 07, 2018 at 02:59:07PM +0900, AKASHI Takahiro wrote:
>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>> allocation in top-down manner in addition to default bottom-up.
>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>> for available memory ranges in usable memblock list,
>>>> i.e. !NOMAP & !reserved, 
>>>
>>>> instead of system resource tree.
>>>
>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>
>>> It would be good to avoid having two ways of doing this, and I would like to
>>> avoid having extra arch code...
>>
>> I know what you mean.
>> /proc/iomem or system resource is, in my opinion, not the best place to
>> describe memory usage of kernel but rather to describe *physical* hardware
>> layout. As we are still discussing about "reserved" memory, I don't want
>> to depend on it.

I agree. We have funny stuff that isn't hardware-layout, but is important for
the next boot. The kernel doesn't have an ABI to support when it queries the
list itself.


>> Along with memblock list, we will have more accurate control over memory
>> usage.

>>> If the argument is walking memblock gives a better answer than the stringy
>>> walk_system_ram_res() thing, is there any mileage in moving this code into
>>> kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
>>>
>>> This would save arm64/powerpc having near-identical implementations.
>>> 32bit arm keeps memblock if it has kexec, so it may be useful there too if
>>> kexec_file_load() support is added.

> If you don't have further objection, I will take memblock approach
> (with factoring out powerpc's arch_kexec_walk_mem()).

If we're agreed that the memblock walking is generic, then it would be quicker
to make the arm64 version as close as possible and merge them as a later series.
(saves a cross arch dependency)

With that,
Reviewed-by: James Morse <james.morse@arm.com>


Thanks,

James
Baoquan He May 17, 2018, 2:10 a.m. UTC | #5
On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> James,
> 
> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > Hi Akashi,
> > 
> > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > memory map as well as ACPI tables, from being corrupted by loading
> > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > allocation in top-down manner in addition to default bottom-up.
> > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > for available memory ranges in usable memblock list,
> > > i.e. !NOMAP & !reserved, 
> > 
> > > instead of system resource tree.
> > 
> > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > be safe in the EFI-memory-map/ACPI-tables case?
> > 
> > It would be good to avoid having two ways of doing this, and I would like to
> > avoid having extra arch code...
> 
> I know what you mean.
> /proc/iomem or system resource is, in my opinion, not the best place to
> describe memory usage of kernel but rather to describe *physical* hardware
> layout. As we are still discussing about "reserved" memory, I don't want
> to depend on it.
> Along with memblock list, we will have more accurate control over memory
> usage.

In kexec-tools, we see any usable memory as candidate which can be used
to load kexec kernel image/initrd etc. However kexec loading is a
preparation work, it just books those position for later kexec kernel
jumping after "kexec -e", that is why we need kexec_buf to remember
them and do the real content copy of kernel/initrd. Here you use
memblock to search available memory, isn't it deviating too far away
from the original design in kexec-tools. Assume kexec loading and
kexec_file loading should be consistent on loading even though they are
done in different space, kernel space and user space.

I didn't follow the earlier post, may miss something.

Thanks
Baoquan

> 
> > 
> > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > new file mode 100644
> > > index 000000000000..f9ebf54ca247
> > > --- /dev/null
> > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > @@ -0,0 +1,57 @@
> > > +// SPDX-License-Identifier: GPL-2.0
> > > +/*
> > > + * kexec_file for arm64
> > > + *
> > > + * Copyright (C) 2018 Linaro Limited
> > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > + *
> > 
> > > + * Most code is derived from arm64 port of kexec-tools
> > 
> > How does kexec-tools walk memblock?
> 
> Will remove this comment from this patch.
> Obviously, this comment is for the rest of the code which will be
> added to succeeding patches (patch #5 and #7).
> 
> 
> > 
> > > + */
> > > +
> > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > +
> > > +#include <linux/ioport.h>
> > > +#include <linux/kernel.h>
> > > +#include <linux/kexec.h>
> > > +#include <linux/memblock.h>
> > > +
> > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > +				int (*func)(struct resource *, void *))
> > > +{
> > > +	phys_addr_t start, end;
> > > +	struct resource res;
> > > +	u64 i;
> > > +	int ret = 0;
> > > +
> > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > +		return func(&crashk_res, kbuf);
> > > +
> > > +	if (kbuf->top_down)
> > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range_reverse() is a more readable version of this helper.
> 
> OK. I used to use my own limited list of reserved memory instead of
> memblock.reserved here to exclude verbose ranges.
> 
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > 
> > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> 
> Sure, I confirmed it.
> 
> > 
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +	else
> > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > +				&start, &end, NULL) {
> > 
> > for_each_free_mem_range()?
> 
> OK.
> 
> > > +			if (!memblock_is_map_memory(start))
> > > +				continue;
> > > +
> > > +			res.start = start;
> > > +			res.end = end;
> > > +			ret = func(&res, kbuf);
> > > +			if (ret)
> > > +				break;
> > > +		}
> > > +
> > > +	return ret;
> > > +}
> > > 
> > 
> > With these changes, what we have is almost:
> > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > (the difference being powerpc doesn't yet support crash-kernels here)
> > 
> > If the argument is walking memblock gives a better answer than the stringy
> > walk_system_ram_res() thing, is there any mileage in moving this code into
> > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > 
> > This would save arm64/powerpc having near-identical implementations.
> > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > kexec_file_load() support is added.
> 
> Thanks. I've forgot ppc.
> 
> -Takahiro AKASHI
> 
> 
> > 
> > Thanks,
> > 
> > James
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
Baoquan He May 17, 2018, 2:15 a.m. UTC | #6
On 05/17/18 at 10:10am, Baoquan He wrote:
> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > James,
> > 
> > On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > > Hi Akashi,
> > > 
> > > On 25/04/18 07:26, AKASHI Takahiro wrote:
> > > > We need to prevent firmware-reserved memory regions, particularly EFI
> > > > memory map as well as ACPI tables, from being corrupted by loading
> > > > kernel/initrd (or other kexec buffers). We also want to support memory
> > > > allocation in top-down manner in addition to default bottom-up.
> > > > So let's have arm64 specific arch_kexec_walk_mem() which will search
> > > > for available memory ranges in usable memblock list,
> > > > i.e. !NOMAP & !reserved, 
> > > 
> > > > instead of system resource tree.
> > > 
> > > Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > > be safe in the EFI-memory-map/ACPI-tables case?
> > > 
> > > It would be good to avoid having two ways of doing this, and I would like to
> > > avoid having extra arch code...
> > 
> > I know what you mean.
> > /proc/iomem or system resource is, in my opinion, not the best place to
> > describe memory usage of kernel but rather to describe *physical* hardware
> > layout. As we are still discussing about "reserved" memory, I don't want
> > to depend on it.
> > Along with memblock list, we will have more accurate control over memory
> > usage.
> 
> In kexec-tools, we see any usable memory as candidate which can be used

Here I said 'any', it's not accurate. Those memory which need be passed
to 2nd kernel for use need be excluded, just as we have done in
kexec-tools.

> to load kexec kernel image/initrd etc. However kexec loading is a
> preparation work, it just books those position for later kexec kernel
> jumping after "kexec -e", that is why we need kexec_buf to remember
> them and do the real content copy of kernel/initrd. Here you use
> memblock to search available memory, isn't it deviating too far away
> from the original design in kexec-tools. Assume kexec loading and
> kexec_file loading should be consistent on loading even though they are
> done in different space, kernel space and user space.
> 
> I didn't follow the earlier post, may miss something.
> 
> Thanks
> Baoquan
> 
> > 
> > > 
> > > > diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
> > > > new file mode 100644
> > > > index 000000000000..f9ebf54ca247
> > > > --- /dev/null
> > > > +++ b/arch/arm64/kernel/machine_kexec_file.c
> > > > @@ -0,0 +1,57 @@
> > > > +// SPDX-License-Identifier: GPL-2.0
> > > > +/*
> > > > + * kexec_file for arm64
> > > > + *
> > > > + * Copyright (C) 2018 Linaro Limited
> > > > + * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
> > > > + *
> > > 
> > > > + * Most code is derived from arm64 port of kexec-tools
> > > 
> > > How does kexec-tools walk memblock?
> > 
> > Will remove this comment from this patch.
> > Obviously, this comment is for the rest of the code which will be
> > added to succeeding patches (patch #5 and #7).
> > 
> > 
> > > 
> > > > + */
> > > > +
> > > > +#define pr_fmt(fmt) "kexec_file: " fmt
> > > > +
> > > > +#include <linux/ioport.h>
> > > > +#include <linux/kernel.h>
> > > > +#include <linux/kexec.h>
> > > > +#include <linux/memblock.h>
> > > > +
> > > > +int arch_kexec_walk_mem(struct kexec_buf *kbuf,
> > > > +				int (*func)(struct resource *, void *))
> > > > +{
> > > > +	phys_addr_t start, end;
> > > > +	struct resource res;
> > > > +	u64 i;
> > > > +	int ret = 0;
> > > > +
> > > > +	if (kbuf->image->type == KEXEC_TYPE_CRASH)
> > > > +		return func(&crashk_res, kbuf);
> > > > +
> > > > +	if (kbuf->top_down)
> > > > +		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range_reverse() is a more readable version of this helper.
> > 
> > OK. I used to use my own limited list of reserved memory instead of
> > memblock.reserved here to exclude verbose ranges.
> > 
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > 
> > > Passing MEMBLOCK_NONE means this walk will never find MEMBLOCK_NOMAP memory.
> > 
> > Sure, I confirmed it.
> > 
> > > 
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +	else
> > > > +		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
> > > > +				NUMA_NO_NODE, MEMBLOCK_NONE,
> > > > +				&start, &end, NULL) {
> > > 
> > > for_each_free_mem_range()?
> > 
> > OK.
> > 
> > > > +			if (!memblock_is_map_memory(start))
> > > > +				continue;
> > > > +
> > > > +			res.start = start;
> > > > +			res.end = end;
> > > > +			ret = func(&res, kbuf);
> > > > +			if (ret)
> > > > +				break;
> > > > +		}
> > > > +
> > > > +	return ret;
> > > > +}
> > > > 
> > > 
> > > With these changes, what we have is almost:
> > > arch/powerpc/kernel/machine_kexec_file_64.c::arch_kexec_walk_mem() !
> > > (the difference being powerpc doesn't yet support crash-kernels here)
> > > 
> > > If the argument is walking memblock gives a better answer than the stringy
> > > walk_system_ram_res() thing, is there any mileage in moving this code into
> > > kexec_file.c, and using it if !IS_ENABLED(CONFIG_ARCH_DISCARD_MEMBLOCK)?
> > > 
> > > This would save arm64/powerpc having near-identical implementations.
> > > 32bit arm keeps memblock if it has kexec, so it may be useful there too if
> > > kexec_file_load() support is added.
> > 
> > Thanks. I've forgot ppc.
> > 
> > -Takahiro AKASHI
> > 
> > 
> > > 
> > > Thanks,
> > > 
> > > James
> > 
> > _______________________________________________
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec
> 
> _______________________________________________
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
James Morse May 17, 2018, 6:04 p.m. UTC | #7
Hi Baoquan,

On 17/05/18 03:15, Baoquan He wrote:
> On 05/17/18 at 10:10am, Baoquan He wrote:
>> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
>>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
>>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
>>>>> We need to prevent firmware-reserved memory regions, particularly EFI
>>>>> memory map as well as ACPI tables, from being corrupted by loading
>>>>> kernel/initrd (or other kexec buffers). We also want to support memory
>>>>> allocation in top-down manner in addition to default bottom-up.
>>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
>>>>> for available memory ranges in usable memblock list,
>>>>> i.e. !NOMAP & !reserved, 
>>>>
>>>>> instead of system resource tree.
>>>>
>>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
>>>> be safe in the EFI-memory-map/ACPI-tables case?
>>>>
>>>> It would be good to avoid having two ways of doing this, and I would like to
>>>> avoid having extra arch code...
>>>
>>> I know what you mean.
>>> /proc/iomem or system resource is, in my opinion, not the best place to
>>> describe memory usage of kernel but rather to describe *physical* hardware
>>> layout. As we are still discussing about "reserved" memory, I don't want
>>> to depend on it.
>>> Along with memblock list, we will have more accurate control over memory
>>> usage.
>>
>> In kexec-tools, we see any usable memory as candidate which can be used
> 
> Here I said 'any', it's not accurate. Those memory which need be passed
> to 2nd kernel for use need be excluded, just as we have done in
> kexec-tools.
> 
>> to load kexec kernel image/initrd etc. However kexec loading is a
>> preparation work, it just books those position for later kexec kernel
>> jumping after "kexec -e", that is why we need kexec_buf to remember
>> them and do the real content copy of kernel/initrd.

The problem we have on arm64 is /proc/iomem is being used for two things.
1) Kexec's this is memory I can book for the new kernel.
2) Kdump's this is memory I must describe for vmcore.

We get the memory map from UEFI via the EFI stub, and leave it in
memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
overwrite it. The same goes for the ACPI tables, they could be reclaimed and
used as memory, but the new kexec kernel needs them to boot, they are
memblock_reserved() too.

If we knock all memblock_reserved() regions out of /proc/iomem then kdump
doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
initcode/data is an example of memory we release from memblock_reserve() after
this, then gets used for data we need in the vmcore.

Ideally we would describe all this in /proc/iomem with:
| 8001e80000-83ff186fff : System RAM
|   8002080000-8002feffff : [Data you really need to boot]

kexec-tools should not overwrite 'data you really need to boot' unless it knows
what it is, and that the system will never need it again. (examples: overwrite
the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
the DT has been regenerated for a non-uefi kernel)

But, kexec-tools doesn't parse those second level entries properly. We have a
bug in user-space, and a bug in the kernel.

Because /proc/iomem is being used for two things, and kexec-tools only parses
one level, I don't think we can fix this in the kernel without breaking one of
the use-cases. I think Akashi's fix user-space too approach is the most
pragmatic approach.


>> Here you use
>> memblock to search available memory, isn't it deviating too far away
>> from the original design in kexec-tools. Assume kexec loading and
>> kexec_file loading should be consistent on loading even though they are
>> done in different space, kernel space and user space.

Its much easier for us to parse memblock in the kernel as the helpers step over
the regions we know we don't want. For the resource list we would need to
strcmp(), and a bunch of handling for the second level entries.


Thanks,

James
Baoquan He May 18, 2018, 1:37 a.m. UTC | #8
On 05/17/18 at 07:04pm, James Morse wrote:
> Hi Baoquan,
> 
> On 17/05/18 03:15, Baoquan He wrote:
> > On 05/17/18 at 10:10am, Baoquan He wrote:
> >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> >>>>> memory map as well as ACPI tables, from being corrupted by loading
> >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> >>>>> allocation in top-down manner in addition to default bottom-up.
> >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> >>>>> for available memory ranges in usable memblock list,
> >>>>> i.e. !NOMAP & !reserved, 
> >>>>
> >>>>> instead of system resource tree.
> >>>>
> >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> >>>> be safe in the EFI-memory-map/ACPI-tables case?
> >>>>
> >>>> It would be good to avoid having two ways of doing this, and I would like to
> >>>> avoid having extra arch code...
> >>>
> >>> I know what you mean.
> >>> /proc/iomem or system resource is, in my opinion, not the best place to
> >>> describe memory usage of kernel but rather to describe *physical* hardware
> >>> layout. As we are still discussing about "reserved" memory, I don't want
> >>> to depend on it.
> >>> Along with memblock list, we will have more accurate control over memory
> >>> usage.
> >>
> >> In kexec-tools, we see any usable memory as candidate which can be used
> > 
> > Here I said 'any', it's not accurate. Those memory which need be passed
> > to 2nd kernel for use need be excluded, just as we have done in
> > kexec-tools.
> > 
> >> to load kexec kernel image/initrd etc. However kexec loading is a
> >> preparation work, it just books those position for later kexec kernel
> >> jumping after "kexec -e", that is why we need kexec_buf to remember
> >> them and do the real content copy of kernel/initrd.
> 
> The problem we have on arm64 is /proc/iomem is being used for two things.
> 1) Kexec's this is memory I can book for the new kernel.
> 2) Kdump's this is memory I must describe for vmcore.
> 
> We get the memory map from UEFI via the EFI stub, and leave it in
> memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> used as memory, but the new kexec kernel needs them to boot, they are
> memblock_reserved() too.

Thanks for these details. Seems arm64 is different. In x86 64 memblock
is used as bootmem allocator and will be released when buddy takes over.
Mainly, using memblock may bring concern that kexec kernel
will jump to a unfixed position. This creates an unexpected effect as
KASLR is doing, namely kernel could be put at a random position. As we
know, kexec was invented for fast kernel dev testing by bypassing
firmware reset, and has been taken to reboot those huge server with
thousands of devices and large memory for business currently. This extra
unpected KASLR effect may cause annoyance even though people have
disabled KASLR explicitly for a specific testing purpose.

Besides, discarding the /proc/iomem scanning but taking memblock instead
in kernel space works for kexec loading for the time being, the flaw of
/proc/iomem still exists and cause problem for user space kexec-tools,
as pointed out. Do we have a plan for that?

> 
> If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> initcode/data is an example of memory we release from memblock_reserve() after
> this, then gets used for data we need in the vmcore.

Hmm, I'm a little confused here. We have defined different iores type
for different memory region. If acpi need be reused by kdump/kexec, we
can change to not reclaim it, and add them into /proc/iomem in order to
notify components which rely on them to process.


enum {  
        IORES_DESC_NONE                         = 0,
        IORES_DESC_CRASH_KERNEL                 = 1,
        IORES_DESC_ACPI_TABLES                  = 2,
        IORES_DESC_ACPI_NV_STORAGE              = 3,
        IORES_DESC_PERSISTENT_MEMORY            = 4,
        IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
        IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
        IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
};


Just walk around and talk about it, limited by poor arm64 knowledge, I
may not have a complete view. If it's not like what I think about, I
will stop, and can come back when I get more background knowledge.

Thanks
Baoquan

> 
> Ideally we would describe all this in /proc/iomem with:
> | 8001e80000-83ff186fff : System RAM
> |   8002080000-8002feffff : [Data you really need to boot]
> 
> kexec-tools should not overwrite 'data you really need to boot' unless it knows
> what it is, and that the system will never need it again. (examples: overwrite
> the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> the DT has been regenerated for a non-uefi kernel)
> 
> But, kexec-tools doesn't parse those second level entries properly. We have a
> bug in user-space, and a bug in the kernel.
> 
> Because /proc/iomem is being used for two things, and kexec-tools only parses
> one level, I don't think we can fix this in the kernel without breaking one of
> the use-cases. I think Akashi's fix user-space too approach is the most
> pragmatic approach.
> 
> 
> >> Here you use
> >> memblock to search available memory, isn't it deviating too far away
> >> from the original design in kexec-tools. Assume kexec loading and
> >> kexec_file loading should be consistent on loading even though they are
> >> done in different space, kernel space and user space.
> 
> Its much easier for us to parse memblock in the kernel as the helpers step over
> the regions we know we don't want. For the resource list we would need to
> strcmp(), and a bunch of handling for the second level entries.
> 
> 
> Thanks,
> 
> James
AKASHI Takahiro May 18, 2018, 5:07 a.m. UTC | #9
Baoquan,

On Fri, May 18, 2018 at 09:37:35AM +0800, Baoquan He wrote:
> On 05/17/18 at 07:04pm, James Morse wrote:
> > Hi Baoquan,
> > 
> > On 17/05/18 03:15, Baoquan He wrote:
> > > On 05/17/18 at 10:10am, Baoquan He wrote:
> > >> On 05/07/18 at 02:59pm, AKASHI Takahiro wrote:
> > >>> On Tue, May 01, 2018 at 06:46:09PM +0100, James Morse wrote:
> > >>>> On 25/04/18 07:26, AKASHI Takahiro wrote:
> > >>>>> We need to prevent firmware-reserved memory regions, particularly EFI
> > >>>>> memory map as well as ACPI tables, from being corrupted by loading
> > >>>>> kernel/initrd (or other kexec buffers). We also want to support memory
> > >>>>> allocation in top-down manner in addition to default bottom-up.
> > >>>>> So let's have arm64 specific arch_kexec_walk_mem() which will search
> > >>>>> for available memory ranges in usable memblock list,
> > >>>>> i.e. !NOMAP & !reserved, 
> > >>>>
> > >>>>> instead of system resource tree.
> > >>>>
> > >>>> Didn't we try to fix the system-resource-tree in order to fix regular-kexec to
> > >>>> be safe in the EFI-memory-map/ACPI-tables case?
> > >>>>
> > >>>> It would be good to avoid having two ways of doing this, and I would like to
> > >>>> avoid having extra arch code...
> > >>>
> > >>> I know what you mean.
> > >>> /proc/iomem or system resource is, in my opinion, not the best place to
> > >>> describe memory usage of kernel but rather to describe *physical* hardware
> > >>> layout. As we are still discussing about "reserved" memory, I don't want
> > >>> to depend on it.
> > >>> Along with memblock list, we will have more accurate control over memory
> > >>> usage.
> > >>
> > >> In kexec-tools, we see any usable memory as candidate which can be used
> > > 
> > > Here I said 'any', it's not accurate. Those memory which need be passed
> > > to 2nd kernel for use need be excluded, just as we have done in
> > > kexec-tools.
> > > 
> > >> to load kexec kernel image/initrd etc. However kexec loading is a
> > >> preparation work, it just books those position for later kexec kernel
> > >> jumping after "kexec -e", that is why we need kexec_buf to remember
> > >> them and do the real content copy of kernel/initrd.
> > 
> > The problem we have on arm64 is /proc/iomem is being used for two things.
> > 1) Kexec's this is memory I can book for the new kernel.
> > 2) Kdump's this is memory I must describe for vmcore.
> > 
> > We get the memory map from UEFI via the EFI stub, and leave it in
> > memblock_reserved() memory. A new kexec kernel needs this to boot: it mustn't
> > overwrite it. The same goes for the ACPI tables, they could be reclaimed and
> > used as memory, but the new kexec kernel needs them to boot, they are
> > memblock_reserved() too.
> 
> Thanks for these details. Seems arm64 is different. In x86 64 memblock

Thanks to James from me, too.

> is used as bootmem allocator and will be released when buddy takes over.
> Mainly, using memblock may bring concern that kexec kernel
> will jump to a unfixed position. This creates an unexpected effect as
> KASLR is doing, namely kernel could be put at a random position. As we

I don't think that this would be a problem on arm64.

> know, kexec was invented for fast kernel dev testing by bypassing
> firmware reset, and has been taken to reboot those huge server with
> thousands of devices and large memory for business currently. This extra
> unpected KASLR effect may cause annoyance even though people have
> disabled KASLR explicitly for a specific testing purpose.
> 
> Besides, discarding the /proc/iomem scanning but taking memblock instead
> in kernel space works for kexec loading for the time being, the flaw of
> /proc/iomem still exists and cause problem for user space kexec-tools,
> as pointed out. Do we have a plan for that?

This was the difference between my and James' standpoint (at leas initially).
James didn't want to require userspace changes to fix the issue, but
the reality is that, without modifying it, we can't support kexec and kdump
perfectly as James explained in his email.

> > 
> > If we knock all memblock_reserved() regions out of /proc/iomem then kdump
> > doesn't work, because /proc/iomem is only generated once. Its a snapshot. The
> > initcode/data is an example of memory we release from memblock_reserve() after
> > this, then gets used for data we need in the vmcore.
> 
> Hmm, I'm a little confused here. We have defined different iores type
> for different memory region. If acpi need be reused by kdump/kexec, we
> can change to not reclaim it, and add them into /proc/iomem in order to
> notify components which rely on them to process.
> 
> 
> enum {  
>         IORES_DESC_NONE                         = 0,
>         IORES_DESC_CRASH_KERNEL                 = 1,
>         IORES_DESC_ACPI_TABLES                  = 2,
>         IORES_DESC_ACPI_NV_STORAGE              = 3,
>         IORES_DESC_PERSISTENT_MEMORY            = 4,
>         IORES_DESC_PERSISTENT_MEMORY_LEGACY     = 5,
>         IORES_DESC_DEVICE_PRIVATE_MEMORY        = 6,
>         IORES_DESC_DEVICE_PUBLIC_MEMORY         = 7,
> };

I don't think that is the point.
Let me give you analogy: x86 has e820 and handles memory layout in kexec/
kdump with *x86-specific* code in kexec-tools, right? We want to do
something similar without introducing e820-like data.
In the current implementation on arm64, however, kexec-tools will only
recognize top-level entries in /proc/iomem leaving subsequent level of
entries ignored (except kernel text & data).
So adding extra hierarchy to /proc/iomem will break the compatibility
in any way.

The main reason that I insist on memblock in my kexec_file patch
is that we seem to be still far from reaching to agreement and
final solution in kexec (opposite to kexec_file) case.

Thanks,
-Takahiro AKASHI


> 
> Just walk around and talk about it, limited by poor arm64 knowledge, I
> may not have a complete view. If it's not like what I think about, I
> will stop, and can come back when I get more background knowledge.
> 
> Thanks
> Baoquan
> 
> > 
> > Ideally we would describe all this in /proc/iomem with:
> > | 8001e80000-83ff186fff : System RAM
> > |   8002080000-8002feffff : [Data you really need to boot]
> > 
> > kexec-tools should not overwrite 'data you really need to boot' unless it knows
> > what it is, and that the system will never need it again. (examples: overwrite
> > the ACPI tables when booting a non-acpi kernel, overwrite the UEFI memory map if
> > the DT has been regenerated for a non-uefi kernel)
> > 
> > But, kexec-tools doesn't parse those second level entries properly. We have a
> > bug in user-space, and a bug in the kernel.
> > 
> > Because /proc/iomem is being used for two things, and kexec-tools only parses
> > one level, I don't think we can fix this in the kernel without breaking one of
> > the use-cases. I think Akashi's fix user-space too approach is the most
> > pragmatic approach.
> > 
> > 
> > >> Here you use
> > >> memblock to search available memory, isn't it deviating too far away
> > >> from the original design in kexec-tools. Assume kexec loading and
> > >> kexec_file loading should be consistent on loading even though they are
> > >> done in different space, kernel space and user space.
> > 
> > Its much easier for us to parse memblock in the kernel as the helpers step over
> > the regions we know we don't want. For the resource list we would need to
> > strcmp(), and a bunch of handling for the second level entries.
> > 
> > 
> > Thanks,
> > 
> > James
diff mbox

Patch

diff --git a/arch/arm64/kernel/Makefile b/arch/arm64/kernel/Makefile
index bf825f38d206..2f2b2757ae7a 100644
--- a/arch/arm64/kernel/Makefile
+++ b/arch/arm64/kernel/Makefile
@@ -48,8 +48,9 @@  arm64-obj-$(CONFIG_ARM64_ACPI_PARKING_PROTOCOL)	+= acpi_parking_protocol.o
 arm64-obj-$(CONFIG_PARAVIRT)		+= paravirt.o
 arm64-obj-$(CONFIG_RANDOMIZE_BASE)	+= kaslr.o
 arm64-obj-$(CONFIG_HIBERNATION)		+= hibernate.o hibernate-asm.o
-arm64-obj-$(CONFIG_KEXEC)		+= machine_kexec.o relocate_kernel.o	\
+arm64-obj-$(CONFIG_KEXEC_CORE)		+= machine_kexec.o relocate_kernel.o	\
 					   cpu-reset.o
+arm64-obj-$(CONFIG_KEXEC_FILE)		+= machine_kexec_file.o
 arm64-obj-$(CONFIG_ARM64_RELOC_TEST)	+= arm64-reloc-test.o
 arm64-reloc-test-y := reloc_test_core.o reloc_test_syms.o
 arm64-obj-$(CONFIG_CRASH_DUMP)		+= crash_dump.o
diff --git a/arch/arm64/kernel/machine_kexec_file.c b/arch/arm64/kernel/machine_kexec_file.c
new file mode 100644
index 000000000000..f9ebf54ca247
--- /dev/null
+++ b/arch/arm64/kernel/machine_kexec_file.c
@@ -0,0 +1,57 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * kexec_file for arm64
+ *
+ * Copyright (C) 2018 Linaro Limited
+ * Author: AKASHI Takahiro <takahiro.akashi@linaro.org>
+ *
+ * Most code is derived from arm64 port of kexec-tools
+ */
+
+#define pr_fmt(fmt) "kexec_file: " fmt
+
+#include <linux/ioport.h>
+#include <linux/kernel.h>
+#include <linux/kexec.h>
+#include <linux/memblock.h>
+
+int arch_kexec_walk_mem(struct kexec_buf *kbuf,
+				int (*func)(struct resource *, void *))
+{
+	phys_addr_t start, end;
+	struct resource res;
+	u64 i;
+	int ret = 0;
+
+	if (kbuf->image->type == KEXEC_TYPE_CRASH)
+		return func(&crashk_res, kbuf);
+
+	if (kbuf->top_down)
+		for_each_mem_range_rev(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+	else
+		for_each_mem_range(i, &memblock.memory, &memblock.reserved,
+				NUMA_NO_NODE, MEMBLOCK_NONE,
+				&start, &end, NULL) {
+			if (!memblock_is_map_memory(start))
+				continue;
+
+			res.start = start;
+			res.end = end;
+			ret = func(&res, kbuf);
+			if (ret)
+				break;
+		}
+
+	return ret;
+}