diff mbox

[v2] arm64: Allocate elfcorehdr & crashkernel mem before any reservation

Message ID 1515080048-9154-1-git-send-email-poonam.aggrwal@nxp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Poonam Aggrwal Jan. 4, 2018, 3:34 p.m. UTC
Address for both crashkernel memory and elfcorehdr can be assigned
statically. For crashkernel memory it is via crashkernel=SIZE@ADDRESS
and elfcorehdr_addr via by kexec-utils in dump kernel device tree.

So memory should be reserved for both the above areas before any
other memory reservations are done. Otherwise overlaps may happen and
memory allocations will fail leading to failure in booting the 
dump capture kernel.

Signed-off-by: Guanhua <guanhua.gao@nxp.com>
Signed-off-by: Poonam Aggrwal <poonam.aggrwal@nxp.com>
Signed-off-by: Abhimanyu Saini <abhimanyu.saini@nxp.com>
---
v2:
 corrected the email address

reworked based on the discussions:(with Will Deacon and Takahiro)
https://patchwork.kernel.org/patch/10104249/
 arch/arm64/mm/init.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

James Morse Jan. 5, 2018, 11:06 a.m. UTC | #1
Hi Poonam,

On 04/01/18 15:34, Poonam Aggrwal wrote:
> Address for both crashkernel memory and elfcorehdr can be assigned
> statically. For crashkernel memory it is via crashkernel=SIZE@ADDRESS
> and elfcorehdr_addr via by kexec-utils in dump kernel device tree.

There are some crashkernel=SIZE@ADDRESS values that the user can supply that we
must reject. The obvious one is if it overlaps with the kernel text. (this patch
won't spot this). We need to read the hardware's reserved regions from the DT
before we allocate the crashkernel region, for example if the bootloader
reserved a chunk of memory for a frame-buffer, I shouldn't be able to use that
as crashkernel memory.

(you shouldn't need to specify an address, doing so prevents the kernel from
picking memory it can use)


> So memory should be reserved for both the above areas before any
> other memory reservations are done. Otherwise overlaps may happen and
> memory allocations will fail leading to failure in booting the 
> dump capture kernel.

> Signed-off-by: Guanhua <guanhua.gao@nxp.com>

The first signed-off-by should be the patch author.
If you want to credit your colleagues you can use 'CC', or they can give
reviewed-by/acked-by to the patch.


> Signed-off-by: Poonam Aggrwal <poonam.aggrwal@nxp.com>
> Signed-off-by: Abhimanyu Saini <abhimanyu.saini@nxp.com>

And the last signed-of-by should be from the person posting to the mailing list.


> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 00e7b90..24ce15d 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)
>  	 * Register the kernel text, kernel data, initrd, and initial
>  	 * pagetables with memblock.
>  	 */

Please don't insert this code between 'memblock_reserve(__pa_symbol(_text), _end
- _text);' and the comment that describes it.


> +
> +	/* make these the first reservations to avoid chances of
> +	 * overlap
> +	 */

Nit: comment style


> +	reserve_elfcorehdr();

(Moving reserve_elfcorehdr() makes sense, but..)


> +	reserve_crashkernel();

reserve_crashkernel() does the allocation for the crashkernels reserved memory.
I expect this to always fail in the kdump kernel because there isn't enough
memory. (fdt_enforce_memory_region() at the top of this function calls
memblock_cap_memory_range()).

Moving this allocation above the early_init_fdt_scan_reserved_mem() below means
we may allocate memory for the crashdump that is in use by firmware/hardware and
described as reserved in the DT.


>  	memblock_reserve(__pa_symbol(_text), _end - _text);
>  #ifdef CONFIG_BLK_DEV_INITRD
>  	if (initrd_start) {
> @@ -472,10 +480,6 @@ void __init arm64_memblock_init(void)
>  	else
>  		arm64_dma_phys_limit = PHYS_MASK + 1;
>  
> -	reserve_crashkernel();
> -
> -	reserve_elfcorehdr();
> -
>  	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
>  
>  	dma_contiguous_reserve(arm64_dma_phys_limit);
> 


Thanks,

James
Poonam Aggrwal Jan. 8, 2018, 4:31 a.m. UTC | #2
Thanks James for  the feedback.
Please find replies inline.

Regards
Poonam

> -----Original Message-----
> From: James Morse [mailto:james.morse@arm.com]
> Sent: Friday, January 05, 2018 4:37 PM
> To: Poonam Aggrwal <poonam.aggrwal@nxp.com>; linux-arm-
> kernel@lists.infradead.org
> Cc: takahiro.akashi@linaro.org; will.deacon@arm.com; G.h. Gao
> <guanhua.gao@nxp.com>; Abhimanyu Saini <abhimanyu.saini@nxp.com>
> Subject: Re: [PATCH][v2] arm64: Allocate elfcorehdr & crashkernel mem before
> any reservation
> 
> Hi Poonam,
> 
> On 04/01/18 15:34, Poonam Aggrwal wrote:
> > Address for both crashkernel memory and elfcorehdr can be assigned
> > statically. For crashkernel memory it is via crashkernel=SIZE@ADDRESS
> > and elfcorehdr_addr via by kexec-utils in dump kernel device tree.
> 
> There are some crashkernel=SIZE@ADDRESS values that the user can supply that
> we must reject. The obvious one is if it overlaps with the kernel text. (this patch
> won't spot this). We need to read the hardware's reserved regions from the DT
> before we allocate the crashkernel region, for example if the bootloader
> reserved a chunk of memory for a frame-buffer, I shouldn't be able to use that
> as crashkernel memory.
> 
> (you shouldn't need to specify an address, doing so prevents the kernel from
> picking memory it can use)
> 
> 
> > So memory should be reserved for both the above areas before any other
> > memory reservations are done. Otherwise overlaps may happen and memory
> > allocations will fail leading to failure in booting the dump capture
> > kernel.
> 
> > Signed-off-by: Guanhua <guanhua.gao@nxp.com>
> 
> The first signed-off-by should be the patch author.
> If you want to credit your colleagues you can use 'CC', or they can give
> reviewed-by/acked-by to the patch.
> 
> 
> > Signed-off-by: Poonam Aggrwal <poonam.aggrwal@nxp.com>
> > Signed-off-by: Abhimanyu Saini <abhimanyu.saini@nxp.com>
> 
> And the last signed-of-by should be from the person posting to the mailing list.
> 
Sure will correct this.
> 
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index
> > 00e7b90..24ce15d 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)
> >  	 * Register the kernel text, kernel data, initrd, and initial
> >  	 * pagetables with memblock.
> >  	 */
> 
> Please don't insert this code between 'memblock_reserve(__pa_symbol(_text),
> _end
> - _text);' and the comment that describes it.
Thanks for catching this, will fix it.
> 
> 
> > +
> > +	/* make these the first reservations to avoid chances of
> > +	 * overlap
> > +	 */
> 
> Nit: comment style
Right, will fix it.
> 
> 
> > +	reserve_elfcorehdr();
> 
> (Moving reserve_elfcorehdr() makes sense, but..)
> 
> 
> > +	reserve_crashkernel();
> 
> reserve_crashkernel() does the allocation for the crashkernels reserved memory.
> I expect this to always fail in the kdump kernel because there isn't enough
> memory. (fdt_enforce_memory_region() at the top of this function calls
> memblock_cap_memory_range()).
> 
> Moving this allocation above the early_init_fdt_scan_reserved_mem() below
> means we may allocate memory for the crashdump that is in use by
> firmware/hardware and described as reserved in the DT.
Yeah, this is a good point. So ideally the address of the crash kernel should be diligently provided by the user based on the system.
> 
> 
> >  	memblock_reserve(__pa_symbol(_text), _end - _text);  #ifdef
> > CONFIG_BLK_DEV_INITRD
> >  	if (initrd_start) {
> > @@ -472,10 +480,6 @@ void __init arm64_memblock_init(void)
> >  	else
> >  		arm64_dma_phys_limit = PHYS_MASK + 1;
> >
> > -	reserve_crashkernel();
> > -
> > -	reserve_elfcorehdr();
> > -
> >  	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
> >
> >  	dma_contiguous_reserve(arm64_dma_phys_limit);
> >
> 
> 
> Thanks,
> 
> James
James Morse Jan. 12, 2018, 5:58 p.m. UTC | #3
Hi Poonam,

On 08/01/18 04:31, Poonam Aggrwal wrote:
> James Morse wrote:
>> On 04/01/18 15:34, Poonam Aggrwal wrote:
>>> Address for both crashkernel memory and elfcorehdr can be assigned
>>> statically. For crashkernel memory it is via crashkernel=SIZE@ADDRESS
>>> and elfcorehdr_addr via by kexec-utils in dump kernel device tree.
>>
>> There are some crashkernel=SIZE@ADDRESS values that the user can supply that
>> we must reject. The obvious one is if it overlaps with the kernel text. (this patch
>> won't spot this). We need to read the hardware's reserved regions from the DT
>> before we allocate the crashkernel region, for example if the bootloader
>> reserved a chunk of memory for a frame-buffer, I shouldn't be able to use that
>> as crashkernel memory.
>>
>> (you shouldn't need to specify an address, doing so prevents the kernel from
>> picking memory it can use)
>>
>>
>>> So memory should be reserved for both the above areas before any other
>>> memory reservations are done. Otherwise overlaps may happen and memory
>>> allocations will fail leading to failure in booting the dump capture
>>> kernel.

>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index
>>> 00e7b90..24ce15d 100644
>>> --- a/arch/arm64/mm/init.c
>>> +++ b/arch/arm64/mm/init.c
>>> @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)

>>> +	reserve_elfcorehdr();
>>
>> (Moving reserve_elfcorehdr() makes sense, but..)
>>
>>
>>> +	reserve_crashkernel();
>>
>> reserve_crashkernel() does the allocation for the crashkernels reserved memory.
>> I expect this to always fail in the kdump kernel because there isn't enough
>> memory. (fdt_enforce_memory_region() at the top of this function calls
>> memblock_cap_memory_range()).
>>
>> Moving this allocation above the early_init_fdt_scan_reserved_mem() below
>> means we may allocate memory for the crashdump that is in use by
>> firmware/hardware and described as reserved in the DT.

> Yeah, this is a good point. So ideally the address of the crash kernel should
> be diligently provided by the user based on the system.

Even better: the region to store the crash kernel in should be chosen by the kernel.
When using kdump I boot with 'crashkernel=1G', the kernel chooses where to place
the reserved region. Even if I specified a reasonable physical address, the
efistub may relocate the kernel over the top during boot as part of its KASLR work.

(Why does anyone ever need to specify an offset here?)


Thanks,

James
Poonam Aggrwal Jan. 13, 2018, 3:07 a.m. UTC | #4
Hi James

Regards
Poonam

> -----Original Message-----
> From: James Morse [mailto:james.morse@arm.com]
> Sent: Friday, January 12, 2018 11:29 PM
> To: Poonam Aggrwal <poonam.aggrwal@nxp.com>
> Cc: linux-arm-kernel@lists.infradead.org; takahiro.akashi@linaro.org;
> will.deacon@arm.com; G.h. Gao <guanhua.gao@nxp.com>; Abhimanyu Saini
> <abhimanyu.saini@nxp.com>; kexec@lists.infradead.org
> Subject: Re: [PATCH][v2] arm64: Allocate elfcorehdr & crashkernel mem before
> any reservation
> 
> Hi Poonam,
> 
> On 08/01/18 04:31, Poonam Aggrwal wrote:
> > James Morse wrote:
> >> On 04/01/18 15:34, Poonam Aggrwal wrote:
> >>> Address for both crashkernel memory and elfcorehdr can be assigned
> >>> statically. For crashkernel memory it is via
> >>> crashkernel=SIZE@ADDRESS and elfcorehdr_addr via by kexec-utils in dump
> kernel device tree.
> >>
> >> There are some crashkernel=SIZE@ADDRESS values that the user can
> >> supply that we must reject. The obvious one is if it overlaps with
> >> the kernel text. (this patch won't spot this). We need to read the
> >> hardware's reserved regions from the DT before we allocate the
> >> crashkernel region, for example if the bootloader reserved a chunk of
> >> memory for a frame-buffer, I shouldn't be able to use that as crashkernel
> memory.
> >>
> >> (you shouldn't need to specify an address, doing so prevents the
> >> kernel from picking memory it can use)
> >>
> >>
> >>> So memory should be reserved for both the above areas before any
> >>> other memory reservations are done. Otherwise overlaps may happen
> >>> and memory allocations will fail leading to failure in booting the
> >>> dump capture kernel.
> 
> >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index
> >>> 00e7b90..24ce15d 100644
> >>> --- a/arch/arm64/mm/init.c
> >>> +++ b/arch/arm64/mm/init.c
> >>> @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)
> 
> >>> +	reserve_elfcorehdr();
> >>
> >> (Moving reserve_elfcorehdr() makes sense, but..)
> >>
> >>
> >>> +	reserve_crashkernel();
> >>
> >> reserve_crashkernel() does the allocation for the crashkernels reserved
> memory.
> >> I expect this to always fail in the kdump kernel because there isn't
> >> enough memory. (fdt_enforce_memory_region() at the top of this
> >> function calls memblock_cap_memory_range()).
> >>
> >> Moving this allocation above the early_init_fdt_scan_reserved_mem()
> >> below means we may allocate memory for the crashdump that is in use
> >> by firmware/hardware and described as reserved in the DT.
> 
> > Yeah, this is a good point. So ideally the address of the crash kernel
> > should be diligently provided by the user based on the system.
> 
> Even better: the region to store the crash kernel in should be chosen by the
> kernel.
> When using kdump I boot with 'crashkernel=1G', the kernel chooses where to
> place the reserved region.
Right. This is a commonly used.
 Even if I specified a reasonable physical address, the
> efistub may relocate the kernel over the top during boot as part of its KASLR
> work.
Agree
> 
> (Why does anyone ever need to specify an offset here?)
offset is normally an optional argument. Request Takahiro to provide his inputs.  Does this imply any updates in kexec design/implementation/documentation?

> 
> 
> Thanks,
> 
> James
Bhupesh Sharma Jan. 15, 2018, 4:44 a.m. UTC | #5
Hi Poonam, James

On Sat, Jan 13, 2018 at 8:37 AM, Poonam Aggrwal <poonam.aggrwal@nxp.com> wrote:
> Hi James
>
> Regards
> Poonam
>
>> -----Original Message-----
>> From: James Morse [mailto:james.morse@arm.com]
>> Sent: Friday, January 12, 2018 11:29 PM
>> To: Poonam Aggrwal <poonam.aggrwal@nxp.com>
>> Cc: linux-arm-kernel@lists.infradead.org; takahiro.akashi@linaro.org;
>> will.deacon@arm.com; G.h. Gao <guanhua.gao@nxp.com>; Abhimanyu Saini
>> <abhimanyu.saini@nxp.com>; kexec@lists.infradead.org
>> Subject: Re: [PATCH][v2] arm64: Allocate elfcorehdr & crashkernel mem before
>> any reservation
>>
>> Hi Poonam,
>>
>> On 08/01/18 04:31, Poonam Aggrwal wrote:
>> > James Morse wrote:
>> >> On 04/01/18 15:34, Poonam Aggrwal wrote:
>> >>> Address for both crashkernel memory and elfcorehdr can be assigned
>> >>> statically. For crashkernel memory it is via
>> >>> crashkernel=SIZE@ADDRESS and elfcorehdr_addr via by kexec-utils in dump
>> kernel device tree.
>> >>
>> >> There are some crashkernel=SIZE@ADDRESS values that the user can
>> >> supply that we must reject. The obvious one is if it overlaps with
>> >> the kernel text. (this patch won't spot this). We need to read the
>> >> hardware's reserved regions from the DT before we allocate the
>> >> crashkernel region, for example if the bootloader reserved a chunk of
>> >> memory for a frame-buffer, I shouldn't be able to use that as crashkernel
>> memory.
>> >>
>> >> (you shouldn't need to specify an address, doing so prevents the
>> >> kernel from picking memory it can use)
>> >>
>> >>
>> >>> So memory should be reserved for both the above areas before any
>> >>> other memory reservations are done. Otherwise overlaps may happen
>> >>> and memory allocations will fail leading to failure in booting the
>> >>> dump capture kernel.
>>
>> >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index
>> >>> 00e7b90..24ce15d 100644
>> >>> --- a/arch/arm64/mm/init.c
>> >>> +++ b/arch/arm64/mm/init.c
>> >>> @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)
>>
>> >>> + reserve_elfcorehdr();
>> >>
>> >> (Moving reserve_elfcorehdr() makes sense, but..)
>> >>
>> >>
>> >>> + reserve_crashkernel();
>> >>
>> >> reserve_crashkernel() does the allocation for the crashkernels reserved
>> memory.
>> >> I expect this to always fail in the kdump kernel because there isn't
>> >> enough memory. (fdt_enforce_memory_region() at the top of this
>> >> function calls memblock_cap_memory_range()).
>> >>
>> >> Moving this allocation above the early_init_fdt_scan_reserved_mem()
>> >> below means we may allocate memory for the crashdump that is in use
>> >> by firmware/hardware and described as reserved in the DT.
>>
>> > Yeah, this is a good point. So ideally the address of the crash kernel
>> > should be diligently provided by the user based on the system.
>>
>> Even better: the region to store the crash kernel in should be chosen by the
>> kernel.
>> When using kdump I boot with 'crashkernel=1G', the kernel chooses where to
>> place the reserved region.
> Right. This is a commonly used.
>  Even if I specified a reasonable physical address, the
>> efistub may relocate the kernel over the top during boot as part of its KASLR
>> work.
> Agree
>>
>> (Why does anyone ever need to specify an offset here?)
> offset is normally an optional argument. Request Takahiro to provide his inputs.  Does this imply any updates in kexec design/implementation/documentation?

offset is a optional argument. For relocatable kernels (and kernels
which support KASLR), specifying offset is normally not needed.

Please refer to the 'Extended crashkernel syntax' documentation
(<https://github.com/torvalds/linux/blob/master/Documentation/kdump/kdump.txt#L259>):

Extended crashkernel syntax
===========================

While the "crashkernel=size[@offset]" syntax is sufficient for most
configurations, sometimes it's handy to have the reserved memory dependent
on the value of System RAM -- that's mostly for distributors that pre-setup
the kernel command line to avoid a unbootable system after some memory has
been removed from the machine.

The syntax is:

    crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
    range=start-[end]

For example:

    crashkernel=512M-2G:64M,2G-:128M

This would mean:

    1) if the RAM is smaller than 512M, then don't reserve anything
       (this is the "rescue" case)
    2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
    3) if the RAM size is larger than 2G, then reserve 128M

As James mentioned for arm64, in case of relocatable/kaslr kernels,
the efistub may relocate the kernel over the top during boot as part
of its KASLR. So, the offset field may make more sense for
non-relocatable/static kernels, but for newer kernels, its better to
use the 'Extended crashkernel syntax' syntax which is also supported
by newer distribution versions.

For e.g. see ubuntu trusty kdump-config man page -
<http://manpages.ubuntu.com/manpages/trusty/man8/kdump-config.8.html>:

  kdump kernel relocation address does not match crashkernel= parameter:
              For non-relocatable architectures,  the  kdump  kernel  must  be
              built   with   a  predetermined  start  address.   This  message
              indicates that the start address of the  kdump  kernel  and  the
              start address in the crashkernel= parameter do not match.

Regards,
Bhupesh
AKASHI Takahiro Jan. 16, 2018, 7:07 a.m. UTC | #6
On Mon, Jan 15, 2018 at 10:14:05AM +0530, Bhupesh SHARMA wrote:
> Hi Poonam, James
> 
> On Sat, Jan 13, 2018 at 8:37 AM, Poonam Aggrwal <poonam.aggrwal@nxp.com> wrote:
> > Hi James
> >
> > Regards
> > Poonam
> >
> >> -----Original Message-----
> >> From: James Morse [mailto:james.morse@arm.com]
> >> Sent: Friday, January 12, 2018 11:29 PM
> >> To: Poonam Aggrwal <poonam.aggrwal@nxp.com>
> >> Cc: linux-arm-kernel@lists.infradead.org; takahiro.akashi@linaro.org;
> >> will.deacon@arm.com; G.h. Gao <guanhua.gao@nxp.com>; Abhimanyu Saini
> >> <abhimanyu.saini@nxp.com>; kexec@lists.infradead.org
> >> Subject: Re: [PATCH][v2] arm64: Allocate elfcorehdr & crashkernel mem before
> >> any reservation
> >>
> >> Hi Poonam,
> >>
> >> On 08/01/18 04:31, Poonam Aggrwal wrote:
> >> > James Morse wrote:
> >> >> On 04/01/18 15:34, Poonam Aggrwal wrote:
> >> >>> Address for both crashkernel memory and elfcorehdr can be assigned
> >> >>> statically. For crashkernel memory it is via
> >> >>> crashkernel=SIZE@ADDRESS and elfcorehdr_addr via by kexec-utils in dump
> >> kernel device tree.
> >> >>
> >> >> There are some crashkernel=SIZE@ADDRESS values that the user can
> >> >> supply that we must reject. The obvious one is if it overlaps with
> >> >> the kernel text. (this patch won't spot this). We need to read the
> >> >> hardware's reserved regions from the DT before we allocate the
> >> >> crashkernel region, for example if the bootloader reserved a chunk of
> >> >> memory for a frame-buffer, I shouldn't be able to use that as crashkernel
> >> memory.
> >> >>
> >> >> (you shouldn't need to specify an address, doing so prevents the
> >> >> kernel from picking memory it can use)
> >> >>
> >> >>
> >> >>> So memory should be reserved for both the above areas before any
> >> >>> other memory reservations are done. Otherwise overlaps may happen
> >> >>> and memory allocations will fail leading to failure in booting the
> >> >>> dump capture kernel.
> >>
> >> >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index
> >> >>> 00e7b90..24ce15d 100644
> >> >>> --- a/arch/arm64/mm/init.c
> >> >>> +++ b/arch/arm64/mm/init.c
> >> >>> @@ -453,6 +453,14 @@ void __init arm64_memblock_init(void)
> >>
> >> >>> + reserve_elfcorehdr();
> >> >>
> >> >> (Moving reserve_elfcorehdr() makes sense, but..)
> >> >>
> >> >>
> >> >>> + reserve_crashkernel();
> >> >>
> >> >> reserve_crashkernel() does the allocation for the crashkernels reserved
> >> memory.
> >> >> I expect this to always fail in the kdump kernel because there isn't
> >> >> enough memory. (fdt_enforce_memory_region() at the top of this
> >> >> function calls memblock_cap_memory_range()).
> >> >>
> >> >> Moving this allocation above the early_init_fdt_scan_reserved_mem()
> >> >> below means we may allocate memory for the crashdump that is in use
> >> >> by firmware/hardware and described as reserved in the DT.
> >>
> >> > Yeah, this is a good point. So ideally the address of the crash kernel
> >> > should be diligently provided by the user based on the system.
> >>
> >> Even better: the region to store the crash kernel in should be chosen by the
> >> kernel.
> >> When using kdump I boot with 'crashkernel=1G', the kernel chooses where to
> >> place the reserved region.
> > Right. This is a commonly used.
> >  Even if I specified a reasonable physical address, the
> >> efistub may relocate the kernel over the top during boot as part of its KASLR
> >> work.
> > Agree
> >>
> >> (Why does anyone ever need to specify an offset here?)
> > offset is normally an optional argument. Request Takahiro to provide his inputs.  Does this imply any updates in kexec design/implementation/documentation?
> 
> offset is a optional argument. For relocatable kernels (and kernels
> which support KASLR), specifying offset is normally not needed.
> 
> Please refer to the 'Extended crashkernel syntax' documentation
> (<https://github.com/torvalds/linux/blob/master/Documentation/kdump/kdump.txt#L259>):
> 
> Extended crashkernel syntax
> ===========================
> 
> While the "crashkernel=size[@offset]" syntax is sufficient for most
> configurations, sometimes it's handy to have the reserved memory dependent
> on the value of System RAM -- that's mostly for distributors that pre-setup
> the kernel command line to avoid a unbootable system after some memory has
> been removed from the machine.
> 
> The syntax is:
> 
>     crashkernel=<range1>:<size1>[,<range2>:<size2>,...][@offset]
>     range=start-[end]
> 
> For example:
> 
>     crashkernel=512M-2G:64M,2G-:128M
> 
> This would mean:
> 
>     1) if the RAM is smaller than 512M, then don't reserve anything
>        (this is the "rescue" case)
>     2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
>     3) if the RAM size is larger than 2G, then reserve 128M
> 
> As James mentioned for arm64, in case of relocatable/kaslr kernels,
> the efistub may relocate the kernel over the top during boot as part
> of its KASLR.

It would be sad if we couldn't specify kaslr and kdump at the same time.
Since kaslr will skip any of memory regions whose attributes are not
CONVENTIONAL_MEMORY for allocating a relocated kernel image, we will be
able to have a dedicated range of memory reserved for kdump.
In this case, using an "offset" in "crashkernel=" will be crucial.

(I don't know how we can notify uefi of the region though.)

> So, the offset field may make more sense for
> non-relocatable/static kernels, but for newer kernels, its better to
> use the 'Extended crashkernel syntax' syntax which is also supported
> by newer distribution versions.

How better is it for the case?

I don't know exactly what you mean by "newer kernel/distribution", but
kdump on arm64 supports this feature from the day one.
(It is basically independent from architectures.)

Thanks,
-Takahiro AKASHI


> For e.g. see ubuntu trusty kdump-config man page -
> <http://manpages.ubuntu.com/manpages/trusty/man8/kdump-config.8.html>:
> 
>   kdump kernel relocation address does not match crashkernel= parameter:
>               For non-relocatable architectures,  the  kdump  kernel  must  be
>               built   with   a  predetermined  start  address.   This  message
>               indicates that the start address of the  kdump  kernel  and  the
>               start address in the crashkernel= parameter do not match.
> 
> Regards,
> Bhupesh
James Morse Jan. 19, 2018, 12:16 p.m. UTC | #7
Hello,

On 16/01/18 07:07, takahiro.akashi@linaro.org wrote:
> On Mon, Jan 15, 2018 at 10:14:05AM +0530, Bhupesh SHARMA wrote:
>> On Sat, Jan 13, 2018 at 8:37 AM, Poonam Aggrwal <poonam.aggrwal@nxp.com> wrote:
>>>> On 08/01/18 04:31, Poonam Aggrwal wrote:
>>>>> Yeah, this is a good point. So ideally the address of the crash kernel
>>>>> should be diligently provided by the user based on the system.

>>>> Even better: the region to store the crash kernel in should be chosen by the
>>>> kernel.
>>>> When using kdump I boot with 'crashkernel=1G', the kernel chooses where to
>>>> place the reserved region.
>>>> Even if I specified a reasonable physical address, the
>>>> efistub may relocate the kernel over the top during boot as part of its KASLR
>>>> work.

>>> Agree
>>>>
>>>> (Why does anyone ever need to specify an offset here?)
>>> offset is normally an optional argument. Request Takahiro to provide his inputs.  Does this imply any updates in kexec design/implementation/documentation?
>>
>> offset is a optional argument. For relocatable kernels (and kernels
>> which support KASLR), specifying offset is normally not needed.
>>
>> Please refer to the 'Extended crashkernel syntax' documentation
>> (<https://github.com/torvalds/linux/blob/master/Documentation/kdump/kdump.txt#L259>):
>>
>> Extended crashkernel syntax
>> ===========================
>>
>> While the "crashkernel=size[@offset]" syntax is sufficient for most
>> configurations, sometimes it's handy to have the reserved memory dependent
>> on the value of System RAM -- that's mostly for distributors that pre-setup
>> the kernel command line to avoid a unbootable system after some memory has
>> been removed from the machine.

This is to let the distribution provide a single value that works on machines
with very little RAM, and machines that need gigabytes in order to boot.

Where does specifying the offset/absolute-address come in?


>> As James mentioned for arm64, in case of relocatable/kaslr kernels,
>> the efistub may relocate the kernel over the top during boot as part
>> of its KASLR.
> 
> It would be sad if we couldn't specify kaslr and kdump at the same time.

We can. The problem comes when you specify an absolute-address that should be
reserved for user-space to eventually load the kdump kernel into. This is
fragile for a number of reasons.


> Since kaslr will skip any of memory regions whose attributes are not
> CONVENTIONAL_MEMORY for allocating a relocated kernel image, we will be
> able to have a dedicated range of memory reserved for kdump.
> In this case, using an "offset" in "crashkernel=" will be crucial.
> 
> (I don't know how we can notify uefi of the region though.)

I'm confused. panic()->kdump:boot doesn't go via UEFI. It passes information in
the DT:/chosen that may have been generated by the EFIStub, but it doesn't (and
must not) change the EFI memory map.

We need to decide where the crashdump kernel region is when the first-kernel
generates its page tables, as the protect/unprotect mechanism wants to be able
to unmap them.
There is no reason for UEFI to know about the kdump region, BootServices are
long-gone by the time its location has to be decided.


>> So, the offset field may make more sense for
>> non-relocatable/static kernels, but for newer kernels, its better to
>> use the 'Extended crashkernel syntax' syntax which is also supported
>> by newer distribution versions.

(this extended crashkernel syntax looks like a tangent: its about specifying
one-value string to specify a reservation-size on both small-memory and
large-memory machines).

I don't think 'relocatable kernels' are relevant here. The KASLR series changed
the kernel to no longer run from the linear map, so where in the linear map we
allocate memory for the crash-kernel to boot from can't matter. These changes
were merged before kexec/kdump support was added.

Even before this change, (I recall that:) the kernel would discard memory below
its text. This isn't a problem as kexec-tools (typically) locates the kernel at
the bottom of the region, and physical memory outside this range isn't
accessible anyway because of the "linux,usable-memory" property. When we do want
to access it, we remap it using the vmcore helpers.

I think this @offset must be for kernels that have to run from a
physical/virtual address that is known at compile time. We don't have this
problem on arm64, and specifying @offset makes kdump less reliable:
| cannot reserve crashkernel: region overlaps reserved memory


> How better is it for the case?
> 
> I don't know exactly what you mean by "newer kernel/distribution", but
> kdump on arm64 supports this feature from the day one.
> (It is basically independent from architectures.)

Support @offset? Yes, its core code allowing this.


>> For e.g. see ubuntu trusty kdump-config man page -
>> <http://manpages.ubuntu.com/manpages/trusty/man8/kdump-config.8.html>:
>>
>>   kdump kernel relocation address does not match crashkernel= parameter:
>>               For non-relocatable architectures,  the  kdump  kernel  must  be
>>               built   with   a  predetermined  start  address.   This  message
>>               indicates that the start address of the  kdump  kernel  and  the
>>               start address in the crashkernel= parameter do not match.

arm64 doesn't have this issue. The 'predetermined start address' is a relative
value stored in the header. This is so the bootloader can place the kernel
anywhere in memory and still have it boot.

I think we're in the weeds here: adding @offset to your 'crashkernel=' cmdline
option tells the kernel you know this address will be free and not-reserved when
it comes to reservation time. This isn't generally true.
Unless you wrote the DT and the bootloader, you can't know this.


Wasn't this patch moving the elfcorehdr reservation up to be before any dynamic
reservations, to prevent them overlapping?


Thanks,

James
Bhupesh Sharma Jan. 23, 2018, 5:08 a.m. UTC | #8
Hi,

On Fri, Jan 19, 2018 at 5:46 PM, James Morse <james.morse@arm.com> wrote:
> Hello,
>
> On 16/01/18 07:07, takahiro.akashi@linaro.org wrote:
>> On Mon, Jan 15, 2018 at 10:14:05AM +0530, Bhupesh SHARMA wrote:
>>> On Sat, Jan 13, 2018 at 8:37 AM, Poonam Aggrwal <poonam.aggrwal@nxp.com> wrote:
>>>>> On 08/01/18 04:31, Poonam Aggrwal wrote:
>>>>>> Yeah, this is a good point. So ideally the address of the crash kernel
>>>>>> should be diligently provided by the user based on the system.
>
>>>>> Even better: the region to store the crash kernel in should be chosen by the
>>>>> kernel.
>>>>> When using kdump I boot with 'crashkernel=1G', the kernel chooses where to
>>>>> place the reserved region.
>>>>> Even if I specified a reasonable physical address, the
>>>>> efistub may relocate the kernel over the top during boot as part of its KASLR
>>>>> work.
>
>>>> Agree
>>>>>
>>>>> (Why does anyone ever need to specify an offset here?)
>>>> offset is normally an optional argument. Request Takahiro to provide his inputs.  Does this imply any updates in kexec design/implementation/documentation?
>>>
>>> offset is a optional argument. For relocatable kernels (and kernels
>>> which support KASLR), specifying offset is normally not needed.
>>>
>>> Please refer to the 'Extended crashkernel syntax' documentation
>>> (<https://github.com/torvalds/linux/blob/master/Documentation/kdump/kdump.txt#L259>):
>>>
>>> Extended crashkernel syntax
>>> ===========================
>>>
>>> While the "crashkernel=size[@offset]" syntax is sufficient for most
>>> configurations, sometimes it's handy to have the reserved memory dependent
>>> on the value of System RAM -- that's mostly for distributors that pre-setup
>>> the kernel command line to avoid a unbootable system after some memory has
>>> been removed from the machine.
>
> This is to let the distribution provide a single value that works on machines
> with very little RAM, and machines that need gigabytes in order to boot.
>
> Where does specifying the offset/absolute-address come in?

It comes into picture where we know at compile time where we want to
place the crashkernel at.
I remember using it on a few arm 32-bit machines but they had a static
and well defined layout of the memory
map from the available system RAM, and required crashkernel to be
placed at predefined offset to prevent 'undefined' behaviour.

I don't think this is relevant though now with the newer KASLR enabled
kernels where we can RANDOMIZE the _text and also the module load
area.
So using a static offset value in this case would be risky to say the least ...

>
>>> As James mentioned for arm64, in case of relocatable/kaslr kernels,
>>> the efistub may relocate the kernel over the top during boot as part
>>> of its KASLR.
>>
>> It would be sad if we couldn't specify kaslr and kdump at the same time.
>
> We can. The problem comes when you specify an absolute-address that should be
> reserved for user-space to eventually load the kdump kernel into. This is
> fragile for a number of reasons.

... Fedora allows using kdump and KASLR both, so I am not sure that's a problem.

>> Since kaslr will skip any of memory regions whose attributes are not
>> CONVENTIONAL_MEMORY for allocating a relocated kernel image, we will be
>> able to have a dedicated range of memory reserved for kdump.
>> In this case, using an "offset" in "crashkernel=" will be crucial.
>>
>> (I don't know how we can notify uefi of the region though.)
>
> I'm confused. panic()->kdump:boot doesn't go via UEFI. It passes information in
> the DT:/chosen that may have been generated by the EFIStub, but it doesn't (and
> must not) change the EFI memory map.
>
> We need to decide where the crashdump kernel region is when the first-kernel
> generates its page tables, as the protect/unprotect mechanism wants to be able
> to unmap them.
> There is no reason for UEFI to know about the kdump region, BootServices are
> long-gone by the time its location has to be decided.

>>> So, the offset field may make more sense for
>>> non-relocatable/static kernels, but for newer kernels, its better to
>>> use the 'Extended crashkernel syntax' syntax which is also supported
>>> by newer distribution versions.
>
> (this extended crashkernel syntax looks like a tangent: its about specifying
> one-value string to specify a reservation-size on both small-memory and
> large-memory machines).
>
> I don't think 'relocatable kernels' are relevant here. The KASLR series changed
> the kernel to no longer run from the linear map, so where in the linear map we
> allocate memory for the crash-kernel to boot from can't matter. These changes
> were merged before kexec/kdump support was added.

Agreed.

> Even before this change, (I recall that:) the kernel would discard memory below
> its text. This isn't a problem as kexec-tools (typically) locates the kernel at
> the bottom of the region, and physical memory outside this range isn't
> accessible anyway because of the "linux,usable-memory" property. When we do want
> to access it, we remap it using the vmcore helpers.
>
> I think this @offset must be for kernels that have to run from a
> physical/virtual address that is known at compile time. We don't have this
> problem on arm64, and specifying @offset makes kdump less reliable:
> | cannot reserve crashkernel: region overlaps reserved memory
>
>
>> How better is it for the case?
>>
>> I don't know exactly what you mean by "newer kernel/distribution", but
>> kdump on arm64 supports this feature from the day one.
>> (It is basically independent from architectures.)
>
> Support @offset? Yes, its core code allowing this.
>
>
>>> For e.g. see ubuntu trusty kdump-config man page -
>>> <http://manpages.ubuntu.com/manpages/trusty/man8/kdump-config.8.html>:
>>>
>>>   kdump kernel relocation address does not match crashkernel= parameter:
>>>               For non-relocatable architectures,  the  kdump  kernel  must  be
>>>               built   with   a  predetermined  start  address.   This  message
>>>               indicates that the start address of the  kdump  kernel  and  the
>>>               start address in the crashkernel= parameter do not match.
>
> arm64 doesn't have this issue. The 'predetermined start address' is a relative
> value stored in the header. This is so the bootloader can place the kernel
> anywhere in memory and still have it boot.
>
> I think we're in the weeds here: adding @offset to your 'crashkernel=' cmdline
> option tells the kernel you know this address will be free and not-reserved when
> it comes to reservation time. This isn't generally true.
> Unless you wrote the DT and the bootloader, you can't know this.

Agreed. See my comments above. What I meant was that using @offset with newer
kernels/distributions which support KASLR can lead to undefined
behavior. What I tried to capture
from the ubuntu trusty man-page was that "kdump kernel must be built
with a predetermined start  address
for non-relocatable architectures".

Normally we use the same primary kernel as the kdump kernel, so if the
primary kernels is Relocatable + KASLR enabled,
the kdump kernel would support the same as well. In that case assuming
a fixed offset at which the kdump kernel can be loaded at or assuming
a static
start address for the kdump kernel can lead to issues.

Regards,
Bhupesh

> Wasn't this patch moving the elfcorehdr reservation up to be before any dynamic
> reservations, to prevent them overlapping?
>
>
> Thanks,
>
> James
diff mbox

Patch

diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index 00e7b90..24ce15d 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -453,6 +453,14 @@  void __init arm64_memblock_init(void)
 	 * Register the kernel text, kernel data, initrd, and initial
 	 * pagetables with memblock.
 	 */
+
+	/* make these the first reservations to avoid chances of
+	 * overlap
+	 */
+	reserve_elfcorehdr();
+
+	reserve_crashkernel();
+
 	memblock_reserve(__pa_symbol(_text), _end - _text);
 #ifdef CONFIG_BLK_DEV_INITRD
 	if (initrd_start) {
@@ -472,10 +480,6 @@  void __init arm64_memblock_init(void)
 	else
 		arm64_dma_phys_limit = PHYS_MASK + 1;
 
-	reserve_crashkernel();
-
-	reserve_elfcorehdr();
-
 	high_memory = __va(memblock_end_of_DRAM() - 1) + 1;
 
 	dma_contiguous_reserve(arm64_dma_phys_limit);