diff mbox

[06/12] ARM: kexec: advertise location of bootable RAM

Message ID E1aviF4-0000jG-MO@rmk-PC.arm.linux.org.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Russell King April 28, 2016, 9:28 a.m. UTC
Advertise the location of bootable RAM to kexec-tools.  kexec needs to
know where it can place the kernel in RAM, and so be executable when
the system needs to jump into it.

Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
tag.

Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
---
 arch/arm/kernel/setup.c | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

Comments

Pratyush Anand April 29, 2016, 2:56 p.m. UTC | #1
Hi Russell,

On Thu, Apr 28, 2016 at 2:58 PM, Russell King
<rmk+kernel@arm.linux.org.uk> wrote:
> Advertise the location of bootable RAM to kexec-tools.  kexec needs to
> know where it can place the kernel in RAM, and so be executable when
> the system needs to jump into it.
>
> Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
> tag.
>
> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

Can you please also share git tree path of corresponding kexec-tools changes?

Could it be a better idea (if things in user space become simpler)
that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
user space, and then user space manipulates existing "Crash kernel"
and "System RAM" resources.

~Pratyush
Russell King - ARM Linux April 29, 2016, 6 p.m. UTC | #2
On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
> Hi Russell,
> 
> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
> <rmk+kernel@arm.linux.org.uk> wrote:
> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
> > know where it can place the kernel in RAM, and so be executable when
> > the system needs to jump into it.
> >
> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
> > tag.
> >
> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> 
> Can you please also share git tree path of corresponding kexec-tools changes?
> 
> Could it be a better idea (if things in user space become simpler)
> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
> user space, and then user space manipulates existing "Crash kernel"
> and "System RAM" resources.

Given that it's only _one_ platform right now, I don't think that
additional complexity is worth it.  It means that we have to invent
some API to do it, and I don't see why userspace should even care
about having the delta exported - especially when the conversion
may not be as trivial.

The method I've implemented here keeps things completely independent
of whatever conversion between boot and running physical addresses
may be present on the kernel side as far as userspace is concerned.
Pratyush Anand April 30, 2016, 3:27 a.m. UTC | #3
On Fri, Apr 29, 2016 at 11:30 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
>> Hi Russell,
>>
>> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
>> <rmk+kernel@arm.linux.org.uk> wrote:
>> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
>> > know where it can place the kernel in RAM, and so be executable when
>> > the system needs to jump into it.
>> >
>> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
>> > tag.
>> >
>> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>>
>> Can you please also share git tree path of corresponding kexec-tools changes?
>>
>> Could it be a better idea (if things in user space become simpler)
>> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
>> user space, and then user space manipulates existing "Crash kernel"
>> and "System RAM" resources.
>
> Given that it's only _one_ platform right now, I don't think that
> additional complexity is worth it.  It means that we have to invent

Probably, I could not communicate it well.  I was not trying  to have
*additional* complexity. Wanted to see if things could be simpler
rather. So this is what my understanding was:
-- We create one patch to pass arch_phys_to_idmap_offset to user space
(say in /sys/kernel/bootmem_idmap_offset)
-- We do not use patch 5,6,11 and 12 of this series. Probably few more
content of the series will go away.
--  In kexec-tools code , we read /sys/kernel/bootmem_idmap_offset and
add that value in "start" and "end" of "Crash Kernel" and "System RAM"
resources.

> some API to do it, and I don't see why userspace should even care
> about having the delta exported - especially when the conversion
> may not be as trivial.
>

Yes, I agree, if translation is not trivial like that of keystone,
then what I am proposing will not work.

Reviewed-by: Pratyush Anand <panand@redhat.com>
Russell King - ARM Linux April 30, 2016, 8:20 a.m. UTC | #4
On Sat, Apr 30, 2016 at 08:57:34AM +0530, Pratyush Anand wrote:
> On Fri, Apr 29, 2016 at 11:30 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
> >> Hi Russell,
> >>
> >> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
> >> <rmk+kernel@arm.linux.org.uk> wrote:
> >> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
> >> > know where it can place the kernel in RAM, and so be executable when
> >> > the system needs to jump into it.
> >> >
> >> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
> >> > tag.
> >> >
> >> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> >>
> >> Can you please also share git tree path of corresponding kexec-tools changes?
> >>
> >> Could it be a better idea (if things in user space become simpler)
> >> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
> >> user space, and then user space manipulates existing "Crash kernel"
> >> and "System RAM" resources.
> >
> > Given that it's only _one_ platform right now, I don't think that
> > additional complexity is worth it.  It means that we have to invent
> 
> Probably, I could not communicate it well.  I was not trying  to have
> *additional* complexity. Wanted to see if things could be simpler
> rather. So this is what my understanding was:
> -- We create one patch to pass arch_phys_to_idmap_offset to user space
> (say in /sys/kernel/bootmem_idmap_offset)
> -- We do not use patch 5,6,11 and 12 of this series. Probably few more
> content of the series will go away.

Patches 11 and 12 don't go away with what you're suggesting.  Patches
11 and 12 are necessary to allow the boot-view addresses to be passed
into the kernel through kexec, and to allow kexec to find appropriate
memory resources.

For example, from patch 11:

@@ -48,7 +48,8 @@ static int kimage_alloc_init(struct kimage **rimage, unsigned
long entry,

-               if ((entry < crashk_res.start) || (entry > crashk_res.end))
+               if ((entry < phys_to_boot_phys(crashk_res.start)) ||
+                   (entry > phys_to_boot_phys(crashk_res.end)))

"entry" is limited to a 32-bit physical address as it is unsigned long,
and is the boot-view physical address.  crashk_res.start is the
running-view physical address.  Without this change, the rest will
always be true on Keystone 2.

@@ -229,8 +229,8 @@ int sanity_check_segment_list(struct kimage *image)
-                       if ((mstart < crashk_res.start) ||
-                           (mend > crashk_res.end))
+                       if ((mstart < phys_to_boot_phys(crashk_res.start)) ||
+                           (mend > phys_to_boot_phys(crashk_res.end)))

Same problem - mstart and mend are both 32-bit quantities.  The result
is the segment list validation always fails.

@@ -354,7 +354,7 @@ static struct page *kimage_alloc_normal_control_pages(struct kimage *image,
-               pfn   = page_to_pfn(pages);
+               pfn   = page_to_boot_pfn(pages);

The result without this change is that we allocate _all_ system memory
looking for a suitable page, never finding one because we never find
a page which matches.  Without a previous patch, killing many
processes and taking the system down.

@@ -480,7 +480,7 @@ static int kimage_add_entry(struct kimage *image, kimage_entry_t entry)
-               *image->entry = virt_to_phys(ind_page) | IND_INDIRECTION;
+               *image->entry = virt_to_boot_phys(ind_page) | IND_INDIRECTION;

The physical address would end up being truncated to 32-bits, but
would actually be larger than 4GiB.  So, *image->entry would point
at the incorrect address, and kexec would fail.

@@ -535,13 +535,13 @@ void kimage_terminate(struct kimage *image)
 #define for_each_kimage_entry(image, ptr, entry) \
        for (ptr = &image->head; (entry = *ptr) && !(entry & IND_DONE); \
                ptr = (entry & IND_INDIRECTION) ? \
-                       phys_to_virt((entry & PAGE_MASK)) : ptr + 1)
+                       boot_phys_to_virt((entry & PAGE_MASK)) : ptr + 1)

"entry" is truncated to 32-bit, and so this passes an invalid
physical address which is not part of the lowmem mapping to
phys_to_virt().  The resulting virtual address is undefined.

-       page = pfn_to_page(entry >> PAGE_SHIFT);
+       page = boot_pfn_to_page(entry >> PAGE_SHIFT);

Same, except the resulting struct page pointer is undefined.

... and so it goes on.

The only patches which get replaced are patches 5 and 6 with a new
userspace API.
Pratyush Anand May 2, 2016, 7:34 a.m. UTC | #5
On Sat, Apr 30, 2016 at 1:50 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Sat, Apr 30, 2016 at 08:57:34AM +0530, Pratyush Anand wrote:
>> On Fri, Apr 29, 2016 at 11:30 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
>> >> Hi Russell,
>> >>
>> >> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
>> >> <rmk+kernel@arm.linux.org.uk> wrote:
>> >> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
>> >> > know where it can place the kernel in RAM, and so be executable when
>> >> > the system needs to jump into it.
>> >> >
>> >> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
>> >> > tag.
>> >> >
>> >> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>> >>
>> >> Can you please also share git tree path of corresponding kexec-tools changes?
>> >>
>> >> Could it be a better idea (if things in user space become simpler)
>> >> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
>> >> user space, and then user space manipulates existing "Crash kernel"
>> >> and "System RAM" resources.
>> >
>> > Given that it's only _one_ platform right now, I don't think that
>> > additional complexity is worth it.  It means that we have to invent
>>
>> Probably, I could not communicate it well.  I was not trying  to have
>> *additional* complexity. Wanted to see if things could be simpler
>> rather. So this is what my understanding was:
>> -- We create one patch to pass arch_phys_to_idmap_offset to user space
>> (say in /sys/kernel/bootmem_idmap_offset)
>> -- We do not use patch 5,6,11 and 12 of this series. Probably few more
>> content of the series will go away.
>
> Patches 11 and 12 don't go away with what you're suggesting.  Patches
> 11 and 12 are necessary to allow the boot-view addresses to be passed
> into the kernel through kexec, and to allow kexec to find appropriate
> memory resources.

But once we would have manipulated "start" and "end" of "Crash Kernel"
and "System RAM" resources in user space using
/sys/kernel/bootmem_idmap_offset , then kernel through kexec system
call would have already receive boot-view addresses, no?

~Pratyush
Russell King - ARM Linux May 2, 2016, 10:10 a.m. UTC | #6
On Mon, May 02, 2016 at 01:04:28PM +0530, Pratyush Anand wrote:
> On Sat, Apr 30, 2016 at 1:50 PM, Russell King - ARM Linux
> <linux@arm.linux.org.uk> wrote:
> > On Sat, Apr 30, 2016 at 08:57:34AM +0530, Pratyush Anand wrote:
> >> On Fri, Apr 29, 2016 at 11:30 PM, Russell King - ARM Linux
> >> <linux@arm.linux.org.uk> wrote:
> >> > On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
> >> >> Hi Russell,
> >> >>
> >> >> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
> >> >> <rmk+kernel@arm.linux.org.uk> wrote:
> >> >> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
> >> >> > know where it can place the kernel in RAM, and so be executable when
> >> >> > the system needs to jump into it.
> >> >> >
> >> >> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
> >> >> > tag.
> >> >> >
> >> >> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
> >> >>
> >> >> Can you please also share git tree path of corresponding kexec-tools changes?
> >> >>
> >> >> Could it be a better idea (if things in user space become simpler)
> >> >> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
> >> >> user space, and then user space manipulates existing "Crash kernel"
> >> >> and "System RAM" resources.
> >> >
> >> > Given that it's only _one_ platform right now, I don't think that
> >> > additional complexity is worth it.  It means that we have to invent
> >>
> >> Probably, I could not communicate it well.  I was not trying  to have
> >> *additional* complexity. Wanted to see if things could be simpler
> >> rather. So this is what my understanding was:
> >> -- We create one patch to pass arch_phys_to_idmap_offset to user space
> >> (say in /sys/kernel/bootmem_idmap_offset)
> >> -- We do not use patch 5,6,11 and 12 of this series. Probably few more
> >> content of the series will go away.
> >
> > Patches 11 and 12 don't go away with what you're suggesting.  Patches
> > 11 and 12 are necessary to allow the boot-view addresses to be passed
> > into the kernel through kexec, and to allow kexec to find appropriate
> > memory resources.
> 
> But once we would have manipulated "start" and "end" of "Crash Kernel"
> and "System RAM" resources in user space using
> /sys/kernel/bootmem_idmap_offset , then kernel through kexec system
> call would have already receive boot-view addresses, no?

Correct, but that's still a problem for all the reasons I gave in the
email to which you replied to.

I'm not sure where the misunderstanding is.

Let me repeat: even if we do what you're suggesting, patches 11 and 12
do *not* go away.  I've explained in detail why each of the changes are
necessary (which you have cut from your reply.)

In other words: exporting this offset via
/sys/kernel/bootmem_idmap_offset is technically inferior to the solution
I have come up with here, and it saves very little complexity and code.
Pratyush Anand May 2, 2016, 10:48 a.m. UTC | #7
On Mon, May 2, 2016 at 3:40 PM, Russell King - ARM Linux
<linux@arm.linux.org.uk> wrote:
> On Mon, May 02, 2016 at 01:04:28PM +0530, Pratyush Anand wrote:
>> On Sat, Apr 30, 2016 at 1:50 PM, Russell King - ARM Linux
>> <linux@arm.linux.org.uk> wrote:
>> > On Sat, Apr 30, 2016 at 08:57:34AM +0530, Pratyush Anand wrote:
>> >> On Fri, Apr 29, 2016 at 11:30 PM, Russell King - ARM Linux
>> >> <linux@arm.linux.org.uk> wrote:
>> >> > On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
>> >> >> Hi Russell,
>> >> >>
>> >> >> On Thu, Apr 28, 2016 at 2:58 PM, Russell King
>> >> >> <rmk+kernel@arm.linux.org.uk> wrote:
>> >> >> > Advertise the location of bootable RAM to kexec-tools.  kexec needs to
>> >> >> > know where it can place the kernel in RAM, and so be executable when
>> >> >> > the system needs to jump into it.
>> >> >> >
>> >> >> > Advertise these areas in /proc/iomem with a "System RAM (boot alias)"
>> >> >> > tag.
>> >> >> >
>> >> >> > Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
>> >> >>
>> >> >> Can you please also share git tree path of corresponding kexec-tools changes?
>> >> >>
>> >> >> Could it be a better idea (if things in user space become simpler)
>> >> >> that in stead of patch 5 and 6, we pass arch_phys_to_idmap_offset to
>> >> >> user space, and then user space manipulates existing "Crash kernel"
>> >> >> and "System RAM" resources.
>> >> >
>> >> > Given that it's only _one_ platform right now, I don't think that
>> >> > additional complexity is worth it.  It means that we have to invent
>> >>
>> >> Probably, I could not communicate it well.  I was not trying  to have
>> >> *additional* complexity. Wanted to see if things could be simpler
>> >> rather. So this is what my understanding was:
>> >> -- We create one patch to pass arch_phys_to_idmap_offset to user space
>> >> (say in /sys/kernel/bootmem_idmap_offset)
>> >> -- We do not use patch 5,6,11 and 12 of this series. Probably few more
>> >> content of the series will go away.
>> >
>> > Patches 11 and 12 don't go away with what you're suggesting.  Patches
>> > 11 and 12 are necessary to allow the boot-view addresses to be passed
>> > into the kernel through kexec, and to allow kexec to find appropriate
>> > memory resources.
>>
>> But once we would have manipulated "start" and "end" of "Crash Kernel"
>> and "System RAM" resources in user space using
>> /sys/kernel/bootmem_idmap_offset , then kernel through kexec system
>> call would have already receive boot-view addresses, no?
>
> Correct, but that's still a problem for all the reasons I gave in the
> email to which you replied to.
>
> I'm not sure where the misunderstanding is.

No, no..there is no misunderstanding. I agreed to your implementation
because that will work for generic cases and for me complete series is
OK.

I just wanted to clarify my understanding, and so was the last argument.

>
> Let me repeat: even if we do what you're suggesting, patches 11 and 12
> do *not* go away.  I've explained in detail why each of the changes are
> necessary (which you have cut from your reply.)
>

Again, it is just for clarifying myself.
I cut the reply because I understood that in patch 11 and 12, you
convert addresses passed by kexec tools from run time view to boot
view using different helpers like phys_to_boot_phys(). So, had kexec
system call passed boot view addresses, we would have not needed 11
and 12. This is what I wanted to clarify.

> In other words: exporting this offset via
> /sys/kernel/bootmem_idmap_offset is technically inferior to the solution
> I have come up with here, and it saves very little complexity and code.

I still have opinion that code will probably be more simple and reduce
significantly, however solution will siege to work the moment
idmap_offset is not a simple additive value.

Therefore, I am OK with your implementation.

~Pratyush
Russell King - ARM Linux May 3, 2016, 10:29 a.m. UTC | #8
On Fri, Apr 29, 2016 at 08:26:00PM +0530, Pratyush Anand wrote:
> Can you please also share git tree path of corresponding kexec-tools changes?

On their way as a 32-patch series.  I've found the userspace tools to be
in particularly poor shape, so some of the patches are fixing that up.
I can't believe that anyone uses these userspace tools, because there
are some fundamental bugs present in this code which go back years.
However, the repository does seem to be maintained, with the latest
commit within the last two months (or day when I started working on
this.)

They're based upon 2.0.12-rc1 (93d3a617) of the

	git://git.kernel.org/pub/scm/utils/kernel/kexec/kexec-tools.git

repository, which was the latest commit when I started working on this
back in March.  Things may have moved forwards since then - I've not
been able to check yet.  However, this set of changes should be
sufficient to get an idea of what's required on the userspace side.

Thanks.
diff mbox

Patch

diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
index 19b25ad61385..7cf1c0d4f773 100644
--- a/arch/arm/kernel/setup.c
+++ b/arch/arm/kernel/setup.c
@@ -847,10 +847,29 @@  static void __init request_standard_resources(const struct machine_desc *mdesc)
 	kernel_data.end     = virt_to_phys(_end - 1);
 
 	for_each_memblock(memory, region) {
+		phys_addr_t start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
+		phys_addr_t end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+		unsigned long boot_alias_start;
+
+		/*
+		 * Some systems have a special memory alias which is only
+		 * used for booting.  We need to advertise this region to
+		 * kexec-tools so they know where bootable RAM is located.
+		 */
+		boot_alias_start = phys_to_idmap(start);
+		if (arm_has_idmap_alias() && boot_alias_start != INVALID_IDMAP_ADDR) {
+			res = memblock_virt_alloc(sizeof(*res), 0);
+			res->name = "System RAM (boot alias)";
+			res->start = boot_alias_start;
+			res->end = phys_to_idmap(end);
+			res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
+			request_resource(&iomem_resource, res);
+		}
+
 		res = memblock_virt_alloc(sizeof(*res), 0);
 		res->name  = "System RAM";
-		res->start = __pfn_to_phys(memblock_region_memory_base_pfn(region));
-		res->end = __pfn_to_phys(memblock_region_memory_end_pfn(region)) - 1;
+		res->start = start;
+		res->end = end;
 		res->flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 
 		request_resource(&iomem_resource, res);