diff mbox series

mm/hotplug: Only respect mem= parameter during boot stage

Message ID 20191206150524.14687-1-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/hotplug: Only respect mem= parameter during boot stage | expand

Commit Message

Baoquan He Dec. 6, 2019, 3:05 p.m. UTC
In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
parameter") a global varialbe global max_mem_size is added to store
the value which is parsed from 'mem= '. This truly stops those
DIMM from being added into system memory during boot.

However, it also limits the later memory hotplug functionality. Any
memory board can't be hot added any more if its region is beyond the
max_mem_size. System will print error like below:

[  216.387164] acpi PNP0C80:02: add_memory failed
[  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
[  216.392187] acpi PNP0C80:02: Enumeration failure

From document of 'mem =' parameter, it should be a restriction during
boot, but not impact the system memory adding/removing after booting.

  mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory

So fix it by also checking if it's during SYSTEM_BOOTING stage when
restrict memory adding. Otherwise, skip the restriction.

Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/memory_hotplug.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

David Hildenbrand Dec. 9, 2019, 9:43 a.m. UTC | #1
On 06.12.19 16:05, Baoquan He wrote:
> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> parameter") a global varialbe global max_mem_size is added to store
> the value which is parsed from 'mem= '. This truly stops those
> DIMM from being added into system memory during boot.
> 
> However, it also limits the later memory hotplug functionality. Any
> memory board can't be hot added any more if its region is beyond the
> max_mem_size. System will print error like below:
> 
> [  216.387164] acpi PNP0C80:02: add_memory failed
> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> [  216.392187] acpi PNP0C80:02: Enumeration failure
> 
> From document of 'mem =' parameter, it should be a restriction during
> boot, but not impact the system memory adding/removing after booting.
> 
>   mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> 
> So fix it by also checking if it's during SYSTEM_BOOTING stage when
> restrict memory adding. Otherwise, skip the restriction.
> 
> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 55ac23ef11c1..5466a0a00901 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  	char *resource_name = "System RAM";
>  
> -	if (start + size > max_mem_size)
> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
>  		return ERR_PTR(-E2BIG);
>  
>  	/*
> 

Sounds sane to me.

Acked-by: David Hildenbrand <david@redhat.com>
Michal Hocko Dec. 9, 2019, 10:07 a.m. UTC | #2
On Fri 06-12-19 23:05:24, Baoquan He wrote:
> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> parameter") a global varialbe global max_mem_size is added to store
> the value which is parsed from 'mem= '. This truly stops those
> DIMM from being added into system memory during boot.
> 
> However, it also limits the later memory hotplug functionality. Any
> memory board can't be hot added any more if its region is beyond the
> max_mem_size. System will print error like below:
> 
> [  216.387164] acpi PNP0C80:02: add_memory failed
> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> [  216.392187] acpi PNP0C80:02: Enumeration failure
> 
> >From document of 'mem =' parameter, it should be a restriction during
> boot, but not impact the system memory adding/removing after booting.
> 
>   mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> 
> So fix it by also checking if it's during SYSTEM_BOOTING stage when
> restrict memory adding. Otherwise, skip the restriction.

Could you be more specific about why the boot vs. later hotplug makes
any difference? The documentation is explicit about the boot time but
considering this seems to be like that since ever I strongly suspect
that this is just an omission.

Btw. how have you tested the situation fixed by 357b4da50a62?

> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
>  mm/memory_hotplug.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 55ac23ef11c1..5466a0a00901 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  	char *resource_name = "System RAM";
>  
> -	if (start + size > max_mem_size)
> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
>  		return ERR_PTR(-E2BIG);
>  
>  	/*
> -- 
> 2.17.2
Jürgen Groß Dec. 9, 2019, 10:24 a.m. UTC | #3
On 09.12.19 11:07, Michal Hocko wrote:
> On Fri 06-12-19 23:05:24, Baoquan He wrote:
>> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
>> parameter") a global varialbe global max_mem_size is added to store
>> the value which is parsed from 'mem= '. This truly stops those
>> DIMM from being added into system memory during boot.
>>
>> However, it also limits the later memory hotplug functionality. Any
>> memory board can't be hot added any more if its region is beyond the
>> max_mem_size. System will print error like below:
>>
>> [  216.387164] acpi PNP0C80:02: add_memory failed
>> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
>> [  216.392187] acpi PNP0C80:02: Enumeration failure
>>
>> >From document of 'mem =' parameter, it should be a restriction during
>> boot, but not impact the system memory adding/removing after booting.
>>
>>    mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
>>
>> So fix it by also checking if it's during SYSTEM_BOOTING stage when
>> restrict memory adding. Otherwise, skip the restriction.
> 
> Could you be more specific about why the boot vs. later hotplug makes
> any difference? The documentation is explicit about the boot time but
> considering this seems to be like that since ever I strongly suspect
> that this is just an omission.
> 
> Btw. how have you tested the situation fixed by 357b4da50a62?

I guess he hasn't.

The backtrace of the problem at that time was:

[ 8321.876844]  [<ffffffff81019ab9>] dump_trace+0x59/0x340
[ 8321.882683]  [<ffffffff81019e8a>] show_stack_log_lvl+0xea/0x170
[ 8321.889298]  [<ffffffff8101ac31>] show_stack+0x21/0x40
[ 8321.895043]  [<ffffffff81319530>] dump_stack+0x5c/0x7c
[ 8321.900779]  [<ffffffff8107fbf1>] warn_slowpath_common+0x81/0xb0
[ 8321.907482]  [<ffffffff81009f54>] xen_alloc_pte+0x1d4/0x390
[ 8321.913718]  [<ffffffff81064950>] 
pmd_populate_kernel.constprop.6+0x40/0x80
[ 8321.921498]  [<ffffffff815ef0a8>] phys_pmd_init+0x210/0x255
[ 8321.927724]  [<ffffffff815ef2c7>] phys_pud_init+0x1da/0x247
[ 8321.933951]  [<ffffffff815efb81>] kernel_physical_mapping_init+0xf5/0x1d4
[ 8321.941533]  [<ffffffff815ebc7d>] init_memory_mapping+0x18d/0x380
[ 8321.948341]  [<ffffffff810647f9>] arch_add_memory+0x59/0xf0
[ 8321.954570]  [<ffffffff815eceed>] add_memory_resource+0x8d/0x160
[ 8321.961283]  [<ffffffff815ecff2>] add_memory+0x32/0xf0
[ 8321.967025]  [<ffffffff813e1c91>] acpi_memory_device_add+0x131/0x2e0
[ 8321.974128]  [<ffffffff8139f752>] acpi_bus_attach+0xe2/0x190
[ 8321.980453]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
[ 8321.986778]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
[ 8321.993103]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
[ 8321.999428]  [<ffffffff813a1157>] acpi_bus_scan+0x37/0x70
[ 8322.005461]  [<ffffffff81fba955>] acpi_scan_init+0x77/0x1b4
[ 8322.011690]  [<ffffffff81fba70c>] acpi_init+0x297/0x2b3
[ 8322.017530]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
[ 8322.023855]  [<ffffffff81f74266>] kernel_init_freeable+0x194/0x226
[ 8322.030760]  [<ffffffff815eb1ba>] kernel_init+0xa/0xe0
[ 8322.036503]  [<ffffffff815f7bc5>] ret_from_fork+0x55/0x80

So this patch would break it again.

I'd recommend ...

> 
>> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
>> Signed-off-by: Baoquan He <bhe@redhat.com>
>> ---
>>   mm/memory_hotplug.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 55ac23ef11c1..5466a0a00901 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>>   	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>   	char *resource_name = "System RAM";
>>   
>> -	if (start + size > max_mem_size)
>> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)

... changing this to: ... && system_state != SYSTEM_RUNNING

With that you can add my:

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
David Hildenbrand Dec. 9, 2019, 11:01 a.m. UTC | #4
On 09.12.19 11:24, Jürgen Groß wrote:
> On 09.12.19 11:07, Michal Hocko wrote:
>> On Fri 06-12-19 23:05:24, Baoquan He wrote:
>>> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
>>> parameter") a global varialbe global max_mem_size is added to store
>>> the value which is parsed from 'mem= '. This truly stops those
>>> DIMM from being added into system memory during boot.
>>>
>>> However, it also limits the later memory hotplug functionality. Any
>>> memory board can't be hot added any more if its region is beyond the
>>> max_mem_size. System will print error like below:
>>>
>>> [  216.387164] acpi PNP0C80:02: add_memory failed
>>> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
>>> [  216.392187] acpi PNP0C80:02: Enumeration failure
>>>
>>> >From document of 'mem =' parameter, it should be a restriction during
>>> boot, but not impact the system memory adding/removing after booting.
>>>
>>>    mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
>>>
>>> So fix it by also checking if it's during SYSTEM_BOOTING stage when
>>> restrict memory adding. Otherwise, skip the restriction.
>>
>> Could you be more specific about why the boot vs. later hotplug makes
>> any difference? The documentation is explicit about the boot time but
>> considering this seems to be like that since ever I strongly suspect
>> that this is just an omission.
>>
>> Btw. how have you tested the situation fixed by 357b4da50a62?
> 
> I guess he hasn't.
> 
> The backtrace of the problem at that time was:
> 
> [ 8321.876844]  [<ffffffff81019ab9>] dump_trace+0x59/0x340
> [ 8321.882683]  [<ffffffff81019e8a>] show_stack_log_lvl+0xea/0x170
> [ 8321.889298]  [<ffffffff8101ac31>] show_stack+0x21/0x40
> [ 8321.895043]  [<ffffffff81319530>] dump_stack+0x5c/0x7c
> [ 8321.900779]  [<ffffffff8107fbf1>] warn_slowpath_common+0x81/0xb0
> [ 8321.907482]  [<ffffffff81009f54>] xen_alloc_pte+0x1d4/0x390
> [ 8321.913718]  [<ffffffff81064950>] 
> pmd_populate_kernel.constprop.6+0x40/0x80
> [ 8321.921498]  [<ffffffff815ef0a8>] phys_pmd_init+0x210/0x255
> [ 8321.927724]  [<ffffffff815ef2c7>] phys_pud_init+0x1da/0x247
> [ 8321.933951]  [<ffffffff815efb81>] kernel_physical_mapping_init+0xf5/0x1d4
> [ 8321.941533]  [<ffffffff815ebc7d>] init_memory_mapping+0x18d/0x380
> [ 8321.948341]  [<ffffffff810647f9>] arch_add_memory+0x59/0xf0
> [ 8321.954570]  [<ffffffff815eceed>] add_memory_resource+0x8d/0x160
> [ 8321.961283]  [<ffffffff815ecff2>] add_memory+0x32/0xf0
> [ 8321.967025]  [<ffffffff813e1c91>] acpi_memory_device_add+0x131/0x2e0
> [ 8321.974128]  [<ffffffff8139f752>] acpi_bus_attach+0xe2/0x190
> [ 8321.980453]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> [ 8321.986778]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> [ 8321.993103]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> [ 8321.999428]  [<ffffffff813a1157>] acpi_bus_scan+0x37/0x70
> [ 8322.005461]  [<ffffffff81fba955>] acpi_scan_init+0x77/0x1b4
> [ 8322.011690]  [<ffffffff81fba70c>] acpi_init+0x297/0x2b3
> [ 8322.017530]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
> [ 8322.023855]  [<ffffffff81f74266>] kernel_init_freeable+0x194/0x226
> [ 8322.030760]  [<ffffffff815eb1ba>] kernel_init+0xa/0xe0
> [ 8322.036503]  [<ffffffff815f7bc5>] ret_from_fork+0x55/0x80
> 
> So this patch would break it again.
> 
> I'd recommend ...
> 
>>
>>> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>> ---
>>>   mm/memory_hotplug.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>> index 55ac23ef11c1..5466a0a00901 100644
>>> --- a/mm/memory_hotplug.c
>>> +++ b/mm/memory_hotplug.c
>>> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>>>   	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>>   	char *resource_name = "System RAM";
>>>   
>>> -	if (start + size > max_mem_size)
>>> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
> 
> ... changing this to: ... && system_state != SYSTEM_RUNNING

I think we usually use system_state < SYSTEM_RUNNING
Jürgen Groß Dec. 9, 2019, 11:08 a.m. UTC | #5
On 09.12.19 12:01, David Hildenbrand wrote:
> On 09.12.19 11:24, Jürgen Groß wrote:
>> On 09.12.19 11:07, Michal Hocko wrote:
>>> On Fri 06-12-19 23:05:24, Baoquan He wrote:
>>>> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
>>>> parameter") a global varialbe global max_mem_size is added to store
>>>> the value which is parsed from 'mem= '. This truly stops those
>>>> DIMM from being added into system memory during boot.
>>>>
>>>> However, it also limits the later memory hotplug functionality. Any
>>>> memory board can't be hot added any more if its region is beyond the
>>>> max_mem_size. System will print error like below:
>>>>
>>>> [  216.387164] acpi PNP0C80:02: add_memory failed
>>>> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
>>>> [  216.392187] acpi PNP0C80:02: Enumeration failure
>>>>
>>>> >From document of 'mem =' parameter, it should be a restriction during
>>>> boot, but not impact the system memory adding/removing after booting.
>>>>
>>>>     mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
>>>>
>>>> So fix it by also checking if it's during SYSTEM_BOOTING stage when
>>>> restrict memory adding. Otherwise, skip the restriction.
>>>
>>> Could you be more specific about why the boot vs. later hotplug makes
>>> any difference? The documentation is explicit about the boot time but
>>> considering this seems to be like that since ever I strongly suspect
>>> that this is just an omission.
>>>
>>> Btw. how have you tested the situation fixed by 357b4da50a62?
>>
>> I guess he hasn't.
>>
>> The backtrace of the problem at that time was:
>>
>> [ 8321.876844]  [<ffffffff81019ab9>] dump_trace+0x59/0x340
>> [ 8321.882683]  [<ffffffff81019e8a>] show_stack_log_lvl+0xea/0x170
>> [ 8321.889298]  [<ffffffff8101ac31>] show_stack+0x21/0x40
>> [ 8321.895043]  [<ffffffff81319530>] dump_stack+0x5c/0x7c
>> [ 8321.900779]  [<ffffffff8107fbf1>] warn_slowpath_common+0x81/0xb0
>> [ 8321.907482]  [<ffffffff81009f54>] xen_alloc_pte+0x1d4/0x390
>> [ 8321.913718]  [<ffffffff81064950>]
>> pmd_populate_kernel.constprop.6+0x40/0x80
>> [ 8321.921498]  [<ffffffff815ef0a8>] phys_pmd_init+0x210/0x255
>> [ 8321.927724]  [<ffffffff815ef2c7>] phys_pud_init+0x1da/0x247
>> [ 8321.933951]  [<ffffffff815efb81>] kernel_physical_mapping_init+0xf5/0x1d4
>> [ 8321.941533]  [<ffffffff815ebc7d>] init_memory_mapping+0x18d/0x380
>> [ 8321.948341]  [<ffffffff810647f9>] arch_add_memory+0x59/0xf0
>> [ 8321.954570]  [<ffffffff815eceed>] add_memory_resource+0x8d/0x160
>> [ 8321.961283]  [<ffffffff815ecff2>] add_memory+0x32/0xf0
>> [ 8321.967025]  [<ffffffff813e1c91>] acpi_memory_device_add+0x131/0x2e0
>> [ 8321.974128]  [<ffffffff8139f752>] acpi_bus_attach+0xe2/0x190
>> [ 8321.980453]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>> [ 8321.986778]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>> [ 8321.993103]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>> [ 8321.999428]  [<ffffffff813a1157>] acpi_bus_scan+0x37/0x70
>> [ 8322.005461]  [<ffffffff81fba955>] acpi_scan_init+0x77/0x1b4
>> [ 8322.011690]  [<ffffffff81fba70c>] acpi_init+0x297/0x2b3
>> [ 8322.017530]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
>> [ 8322.023855]  [<ffffffff81f74266>] kernel_init_freeable+0x194/0x226
>> [ 8322.030760]  [<ffffffff815eb1ba>] kernel_init+0xa/0xe0
>> [ 8322.036503]  [<ffffffff815f7bc5>] ret_from_fork+0x55/0x80
>>
>> So this patch would break it again.
>>
>> I'd recommend ...
>>
>>>
>>>> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
>>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>>> ---
>>>>    mm/memory_hotplug.c | 2 +-
>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>> index 55ac23ef11c1..5466a0a00901 100644
>>>> --- a/mm/memory_hotplug.c
>>>> +++ b/mm/memory_hotplug.c
>>>> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>>>>    	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>>>    	char *resource_name = "System RAM";
>>>>    
>>>> -	if (start + size > max_mem_size)
>>>> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
>>
>> ... changing this to: ... && system_state != SYSTEM_RUNNING
> 
> I think we usually use system_state < SYSTEM_RUNNING
> 

Works for me as well. :-)


Juergen
David Hildenbrand Dec. 9, 2019, 11:11 a.m. UTC | #6
On 09.12.19 12:08, Jürgen Groß wrote:
> On 09.12.19 12:01, David Hildenbrand wrote:
>> On 09.12.19 11:24, Jürgen Groß wrote:
>>> On 09.12.19 11:07, Michal Hocko wrote:
>>>> On Fri 06-12-19 23:05:24, Baoquan He wrote:
>>>>> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
>>>>> parameter") a global varialbe global max_mem_size is added to store
>>>>> the value which is parsed from 'mem= '. This truly stops those
>>>>> DIMM from being added into system memory during boot.
>>>>>
>>>>> However, it also limits the later memory hotplug functionality. Any
>>>>> memory board can't be hot added any more if its region is beyond the
>>>>> max_mem_size. System will print error like below:
>>>>>
>>>>> [  216.387164] acpi PNP0C80:02: add_memory failed
>>>>> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
>>>>> [  216.392187] acpi PNP0C80:02: Enumeration failure
>>>>>
>>>>> >From document of 'mem =' parameter, it should be a restriction during
>>>>> boot, but not impact the system memory adding/removing after booting.
>>>>>
>>>>>     mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
>>>>>
>>>>> So fix it by also checking if it's during SYSTEM_BOOTING stage when
>>>>> restrict memory adding. Otherwise, skip the restriction.
>>>>
>>>> Could you be more specific about why the boot vs. later hotplug makes
>>>> any difference? The documentation is explicit about the boot time but
>>>> considering this seems to be like that since ever I strongly suspect
>>>> that this is just an omission.
>>>>
>>>> Btw. how have you tested the situation fixed by 357b4da50a62?
>>>
>>> I guess he hasn't.
>>>
>>> The backtrace of the problem at that time was:
>>>
>>> [ 8321.876844]  [<ffffffff81019ab9>] dump_trace+0x59/0x340
>>> [ 8321.882683]  [<ffffffff81019e8a>] show_stack_log_lvl+0xea/0x170
>>> [ 8321.889298]  [<ffffffff8101ac31>] show_stack+0x21/0x40
>>> [ 8321.895043]  [<ffffffff81319530>] dump_stack+0x5c/0x7c
>>> [ 8321.900779]  [<ffffffff8107fbf1>] warn_slowpath_common+0x81/0xb0
>>> [ 8321.907482]  [<ffffffff81009f54>] xen_alloc_pte+0x1d4/0x390
>>> [ 8321.913718]  [<ffffffff81064950>]
>>> pmd_populate_kernel.constprop.6+0x40/0x80
>>> [ 8321.921498]  [<ffffffff815ef0a8>] phys_pmd_init+0x210/0x255
>>> [ 8321.927724]  [<ffffffff815ef2c7>] phys_pud_init+0x1da/0x247
>>> [ 8321.933951]  [<ffffffff815efb81>] kernel_physical_mapping_init+0xf5/0x1d4
>>> [ 8321.941533]  [<ffffffff815ebc7d>] init_memory_mapping+0x18d/0x380
>>> [ 8321.948341]  [<ffffffff810647f9>] arch_add_memory+0x59/0xf0
>>> [ 8321.954570]  [<ffffffff815eceed>] add_memory_resource+0x8d/0x160
>>> [ 8321.961283]  [<ffffffff815ecff2>] add_memory+0x32/0xf0
>>> [ 8321.967025]  [<ffffffff813e1c91>] acpi_memory_device_add+0x131/0x2e0
>>> [ 8321.974128]  [<ffffffff8139f752>] acpi_bus_attach+0xe2/0x190
>>> [ 8321.980453]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>>> [ 8321.986778]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>>> [ 8321.993103]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
>>> [ 8321.999428]  [<ffffffff813a1157>] acpi_bus_scan+0x37/0x70
>>> [ 8322.005461]  [<ffffffff81fba955>] acpi_scan_init+0x77/0x1b4
>>> [ 8322.011690]  [<ffffffff81fba70c>] acpi_init+0x297/0x2b3
>>> [ 8322.017530]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
>>> [ 8322.023855]  [<ffffffff81f74266>] kernel_init_freeable+0x194/0x226
>>> [ 8322.030760]  [<ffffffff815eb1ba>] kernel_init+0xa/0xe0
>>> [ 8322.036503]  [<ffffffff815f7bc5>] ret_from_fork+0x55/0x80
>>>
>>> So this patch would break it again.
>>>
>>> I'd recommend ...
>>>
>>>>
>>>>> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
>>>>> Signed-off-by: Baoquan He <bhe@redhat.com>
>>>>> ---
>>>>>    mm/memory_hotplug.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>>>>> index 55ac23ef11c1..5466a0a00901 100644
>>>>> --- a/mm/memory_hotplug.c
>>>>> +++ b/mm/memory_hotplug.c
>>>>> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>>>>>    	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>>>>>    	char *resource_name = "System RAM";
>>>>>    
>>>>> -	if (start + size > max_mem_size)
>>>>> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
>>>
>>> ... changing this to: ... && system_state != SYSTEM_RUNNING
>>
>> I think we usually use system_state < SYSTEM_RUNNING
>>
> 
> Works for me as well. :-)
> 

As this patch has to be resent, I'd also enjoy a comment explaining why
this special check is in place

/* Make sure memory hotplug works although mem= was specified */

or sth. like that :)

> 
> Juergen
>
Baoquan He Dec. 10, 2019, 7:24 a.m. UTC | #7
On 12/09/19 at 11:07am, Michal Hocko wrote:
> On Fri 06-12-19 23:05:24, Baoquan He wrote:
> > In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> > parameter") a global varialbe global max_mem_size is added to store
> > the value which is parsed from 'mem= '. This truly stops those
> > DIMM from being added into system memory during boot.
> > 
> > However, it also limits the later memory hotplug functionality. Any
> > memory board can't be hot added any more if its region is beyond the
> > max_mem_size. System will print error like below:
> > 
> > [  216.387164] acpi PNP0C80:02: add_memory failed
> > [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> > [  216.392187] acpi PNP0C80:02: Enumeration failure
> > 
> > >From document of 'mem =' parameter, it should be a restriction during
> > boot, but not impact the system memory adding/removing after booting.
> > 
> >   mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> > 
> > So fix it by also checking if it's during SYSTEM_BOOTING stage when
> > restrict memory adding. Otherwise, skip the restriction.
> 
> Could you be more specific about why the boot vs. later hotplug makes
> any difference? The documentation is explicit about the boot time but
> considering this seems to be like that since ever I strongly suspect
> that this is just an omission.

I think the 'mem=' updating in commit 357b4da50a62 will only affect
those hotplugable memory regions. When I tested it, there are three
memmory boards, one is the normal memory region with 4G memory, and the
other two are hotpluggable memory boards and recorded in ACPI tables,
each is 1GB. When put all them three onsite before boot, they will be
recognized by firmware and written into e820 table and/or EFI table, then
kernel can read them from e820 and them as system memory, we get 6G
memory. 

However, if add 'mem=', like 'mme=3G', w/o commit 357b4da50a62, in e820,
we will only get 3G memory. Later in acpi_init(), acpi scanning will
search those two memory regions, and try to add them into system call
because the two hotpluggable memory boards are power on and ready.
Then we will get 3G + 1G + 1G, 5G memory. the 1st 3G is from the normal
memory board, its last 1G is trimmed. Jurgen's patch is trying to fix this
because the adding happens during boot time, and conflicts with 'mem='. 

But after system bootup, we should be able to hot add/remove any memory
board. This should not be restricted by a boot-time kernel parameter
'mme='. This is what I am trying to fix.

> 
> Btw. how have you tested the situation fixed by 357b4da50a62?

I just focused on the testing of the hotplug after booting, missed this
one. Sorry for the confusion.

Thanks
Baoquan
Baoquan He Dec. 10, 2019, 8:04 a.m. UTC | #8
On 12/09/19 at 12:11pm, David Hildenbrand wrote:
> On 09.12.19 12:08, Jürgen Groß wrote:
> > On 09.12.19 12:01, David Hildenbrand wrote:
> >> On 09.12.19 11:24, Jürgen Groß wrote:
> >>> On 09.12.19 11:07, Michal Hocko wrote:
> >>>> On Fri 06-12-19 23:05:24, Baoquan He wrote:
> >>>>> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> >>>>> parameter") a global varialbe global max_mem_size is added to store
> >>>>> the value which is parsed from 'mem= '. This truly stops those
> >>>>> DIMM from being added into system memory during boot.
> >>>>>
> >>>>> However, it also limits the later memory hotplug functionality. Any
> >>>>> memory board can't be hot added any more if its region is beyond the
> >>>>> max_mem_size. System will print error like below:
> >>>>>
> >>>>> [  216.387164] acpi PNP0C80:02: add_memory failed
> >>>>> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> >>>>> [  216.392187] acpi PNP0C80:02: Enumeration failure
> >>>>>
> >>>>> >From document of 'mem =' parameter, it should be a restriction during
> >>>>> boot, but not impact the system memory adding/removing after booting.
> >>>>>
> >>>>>     mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> >>>>>
> >>>>> So fix it by also checking if it's during SYSTEM_BOOTING stage when
> >>>>> restrict memory adding. Otherwise, skip the restriction.
> >>>>
> >>>> Could you be more specific about why the boot vs. later hotplug makes
> >>>> any difference? The documentation is explicit about the boot time but
> >>>> considering this seems to be like that since ever I strongly suspect
> >>>> that this is just an omission.
> >>>>
> >>>> Btw. how have you tested the situation fixed by 357b4da50a62?
> >>>
> >>> I guess he hasn't.
> >>>
> >>> The backtrace of the problem at that time was:
> >>>
> >>> [ 8321.876844]  [<ffffffff81019ab9>] dump_trace+0x59/0x340
> >>> [ 8321.882683]  [<ffffffff81019e8a>] show_stack_log_lvl+0xea/0x170
> >>> [ 8321.889298]  [<ffffffff8101ac31>] show_stack+0x21/0x40
> >>> [ 8321.895043]  [<ffffffff81319530>] dump_stack+0x5c/0x7c
> >>> [ 8321.900779]  [<ffffffff8107fbf1>] warn_slowpath_common+0x81/0xb0
> >>> [ 8321.907482]  [<ffffffff81009f54>] xen_alloc_pte+0x1d4/0x390
> >>> [ 8321.913718]  [<ffffffff81064950>]
> >>> pmd_populate_kernel.constprop.6+0x40/0x80
> >>> [ 8321.921498]  [<ffffffff815ef0a8>] phys_pmd_init+0x210/0x255
> >>> [ 8321.927724]  [<ffffffff815ef2c7>] phys_pud_init+0x1da/0x247
> >>> [ 8321.933951]  [<ffffffff815efb81>] kernel_physical_mapping_init+0xf5/0x1d4
> >>> [ 8321.941533]  [<ffffffff815ebc7d>] init_memory_mapping+0x18d/0x380
> >>> [ 8321.948341]  [<ffffffff810647f9>] arch_add_memory+0x59/0xf0
> >>> [ 8321.954570]  [<ffffffff815eceed>] add_memory_resource+0x8d/0x160
> >>> [ 8321.961283]  [<ffffffff815ecff2>] add_memory+0x32/0xf0
> >>> [ 8321.967025]  [<ffffffff813e1c91>] acpi_memory_device_add+0x131/0x2e0
> >>> [ 8321.974128]  [<ffffffff8139f752>] acpi_bus_attach+0xe2/0x190
> >>> [ 8321.980453]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> >>> [ 8321.986778]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> >>> [ 8321.993103]  [<ffffffff8139f6ce>] acpi_bus_attach+0x5e/0x190
> >>> [ 8321.999428]  [<ffffffff813a1157>] acpi_bus_scan+0x37/0x70
> >>> [ 8322.005461]  [<ffffffff81fba955>] acpi_scan_init+0x77/0x1b4
> >>> [ 8322.011690]  [<ffffffff81fba70c>] acpi_init+0x297/0x2b3
> >>> [ 8322.017530]  [<ffffffff8100213a>] do_one_initcall+0xca/0x1f0
> >>> [ 8322.023855]  [<ffffffff81f74266>] kernel_init_freeable+0x194/0x226
> >>> [ 8322.030760]  [<ffffffff815eb1ba>] kernel_init+0xa/0xe0
> >>> [ 8322.036503]  [<ffffffff815f7bc5>] ret_from_fork+0x55/0x80
> >>>
> >>> So this patch would break it again.
> >>>
> >>> I'd recommend ...
> >>>
> >>>>
> >>>>> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> >>>>> Signed-off-by: Baoquan He <bhe@redhat.com>
> >>>>> ---
> >>>>>    mm/memory_hotplug.c | 2 +-
> >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> >>>>> index 55ac23ef11c1..5466a0a00901 100644
> >>>>> --- a/mm/memory_hotplug.c
> >>>>> +++ b/mm/memory_hotplug.c
> >>>>> @@ -105,7 +105,7 @@ static struct resource *register_memory_resource(u64 start, u64 size)
> >>>>>    	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
> >>>>>    	char *resource_name = "System RAM";
> >>>>>    
> >>>>> -	if (start + size > max_mem_size)
> >>>>> +	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
> >>>
> >>> ... changing this to: ... && system_state != SYSTEM_RUNNING
> >>
> >> I think we usually use system_state < SYSTEM_RUNNING
> >>
> > 
> > Works for me as well. :-)
> > 

Thanks for reviewing and suggestions, will correct it as
'system_state < SYSTEM_RUNNING'. 

> 
> As this patch has to be resent, I'd also enjoy a comment explaining why
> this special check is in place
> 
> /* Make sure memory hotplug works although mem= was specified */
> 
> or sth. like that :)

OK, will consider what is better to be placed here. Thanks.
Michal Hocko Dec. 10, 2019, 10:28 a.m. UTC | #9
On Tue 10-12-19 15:24:53, Baoquan He wrote:
> On 12/09/19 at 11:07am, Michal Hocko wrote:
> > On Fri 06-12-19 23:05:24, Baoquan He wrote:
> > > In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> > > parameter") a global varialbe global max_mem_size is added to store
> > > the value which is parsed from 'mem= '. This truly stops those
> > > DIMM from being added into system memory during boot.
> > > 
> > > However, it also limits the later memory hotplug functionality. Any
> > > memory board can't be hot added any more if its region is beyond the
> > > max_mem_size. System will print error like below:
> > > 
> > > [  216.387164] acpi PNP0C80:02: add_memory failed
> > > [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> > > [  216.392187] acpi PNP0C80:02: Enumeration failure
> > > 
> > > >From document of 'mem =' parameter, it should be a restriction during
> > > boot, but not impact the system memory adding/removing after booting.
> > > 
> > >   mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> > > 
> > > So fix it by also checking if it's during SYSTEM_BOOTING stage when
> > > restrict memory adding. Otherwise, skip the restriction.
> > 
> > Could you be more specific about why the boot vs. later hotplug makes
> > any difference? The documentation is explicit about the boot time but
> > considering this seems to be like that since ever I strongly suspect
> > that this is just an omission.
> 
> I think the 'mem=' updating in commit 357b4da50a62 will only affect
> those hotplugable memory regions. When I tested it, there are three
> memmory boards, one is the normal memory region with 4G memory, and the
> other two are hotpluggable memory boards and recorded in ACPI tables,
> each is 1GB. When put all them three onsite before boot, they will be
> recognized by firmware and written into e820 table and/or EFI table, then
> kernel can read them from e820 and them as system memory, we get 6G
> memory. 
> 
> However, if add 'mem=', like 'mme=3G', w/o commit 357b4da50a62, in e820,
> we will only get 3G memory. Later in acpi_init(), acpi scanning will
> search those two memory regions, and try to add them into system call
> because the two hotpluggable memory boards are power on and ready.
> Then we will get 3G + 1G + 1G, 5G memory. the 1st 3G is from the normal
> memory board, its last 1G is trimmed. Jurgen's patch is trying to fix this
> because the adding happens during boot time, and conflicts with 'mem='. 

Unless I misunderstand what you are saying this all is just expected.
You have restricted the memory explicitly and the result is that not all
the memory is visible.

> But after system bootup, we should be able to hot add/remove any memory
> board. This should not be restricted by a boot-time kernel parameter
> 'mme='. This is what I am trying to fix.

This is a simple statement without any actual explanation on why. Why is
hotplug memory special? What is the usecase? Who would want to use mem
parameter and later expect a memory above the restrected area to be
hotplugable?

David has provided an actual usecase [1] but this needs to be documented
somewhere so that we do not break that accidentally in the future.
Ideally both in code which adds the boot restriction and the kernel
command line documentation to be explicit about BOOT restriction.

[1] http://lkml.kernel.org/r/429622cf-f0f4-5d80-d39d-b0d8a6c6605f@redhat.com
Baoquan He Dec. 10, 2019, 10:43 a.m. UTC | #10
On 12/10/19 at 11:28am, Michal Hocko wrote:
> On Tue 10-12-19 15:24:53, Baoquan He wrote:
> > On 12/09/19 at 11:07am, Michal Hocko wrote:
> > > On Fri 06-12-19 23:05:24, Baoquan He wrote:
> > > > In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> > > > parameter") a global varialbe global max_mem_size is added to store
> > > > the value which is parsed from 'mem= '. This truly stops those
> > > > DIMM from being added into system memory during boot.
> > > > 
> > > > However, it also limits the later memory hotplug functionality. Any
> > > > memory board can't be hot added any more if its region is beyond the
> > > > max_mem_size. System will print error like below:
> > > > 
> > > > [  216.387164] acpi PNP0C80:02: add_memory failed
> > > > [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> > > > [  216.392187] acpi PNP0C80:02: Enumeration failure
> > > > 
> > > > >From document of 'mem =' parameter, it should be a restriction during
> > > > boot, but not impact the system memory adding/removing after booting.
> > > > 
> > > >   mem=nn[KMG]     [KNL,BOOT] Force usage of a specific amount of memory
> > > > 
> > > > So fix it by also checking if it's during SYSTEM_BOOTING stage when
> > > > restrict memory adding. Otherwise, skip the restriction.
> > > 
> > > Could you be more specific about why the boot vs. later hotplug makes
> > > any difference? The documentation is explicit about the boot time but
> > > considering this seems to be like that since ever I strongly suspect
> > > that this is just an omission.
> > 
> > I think the 'mem=' updating in commit 357b4da50a62 will only affect
> > those hotplugable memory regions. When I tested it, there are three
> > memmory boards, one is the normal memory region with 4G memory, and the
> > other two are hotpluggable memory boards and recorded in ACPI tables,
> > each is 1GB. When put all them three onsite before boot, they will be
> > recognized by firmware and written into e820 table and/or EFI table, then
> > kernel can read them from e820 and them as system memory, we get 6G
> > memory. 
> > 
> > However, if add 'mem=', like 'mme=3G', w/o commit 357b4da50a62, in e820,
> > we will only get 3G memory. Later in acpi_init(), acpi scanning will
> > search those two memory regions, and try to add them into system call
> > because the two hotpluggable memory boards are power on and ready.
> > Then we will get 3G + 1G + 1G, 5G memory. the 1st 3G is from the normal
> > memory board, its last 1G is trimmed. Jurgen's patch is trying to fix this
> > because the adding happens during boot time, and conflicts with 'mem='. 
> 
> Unless I misunderstand what you are saying this all is just expected.
> You have restricted the memory explicitly and the result is that not all
> the memory is visible.

Yes.

> 
> > But after system bootup, we should be able to hot add/remove any memory
> > board. This should not be restricted by a boot-time kernel parameter
> > 'mme='. This is what I am trying to fix.
> 
> This is a simple statement without any actual explanation on why. Why is
> hotplug memory special? What is the usecase? Who would want to use mem
> parameter and later expect a memory above the restrected area to be
> hotplugable?

The why is 'mem=' is used to restrict the amount of system ram during
boot. We have two ways to add system memory, one is installing DIMMs
before boot, the other is hot adding memory after boot. Without David's 
use case, we may need redefine 'mem=' and change its documentation in
kernel-parameters.txt, if we don't want to fix it like this. Otherwise,
'mem=' will limit the system's upper system ram always, that is not
expected.

> 
> David has provided an actual usecase [1] but this needs to be documented
> somewhere so that we do not break that accidentally in the future.
> Ideally both in code which adds the boot restriction and the kernel
> command line documentation to be explicit about BOOT restriction.
> 
> [1] http://lkml.kernel.org/r/429622cf-f0f4-5d80-d39d-b0d8a6c6605f@redhat.com

Yes, agree. As I replied in the v2 thread, I will add that into log, and
also will change text of 'mem=' in kernel-parameters.txt.

Thanks
Baoquan
Michal Hocko Dec. 10, 2019, 11:33 a.m. UTC | #11
On Tue 10-12-19 18:43:03, Baoquan He wrote:
> On 12/10/19 at 11:28am, Michal Hocko wrote:
> > On Tue 10-12-19 15:24:53, Baoquan He wrote:
[...]
> > > But after system bootup, we should be able to hot add/remove any memory
> > > board. This should not be restricted by a boot-time kernel parameter
> > > 'mme='. This is what I am trying to fix.
> > 
> > This is a simple statement without any actual explanation on why. Why is
> > hotplug memory special? What is the usecase? Who would want to use mem
> > parameter and later expect a memory above the restrected area to be
> > hotplugable?
> 
> The why is 'mem=' is used to restrict the amount of system ram during
> boot. We have two ways to add system memory, one is installing DIMMs
> before boot, the other is hot adding memory after boot. Without David's 
> use case, we may need redefine 'mem=' and change its documentation in
> kernel-parameters.txt, if we don't want to fix it like this. Otherwise,
> 'mem=' will limit the system's upper system ram always, that is not
> expected.

I really do not see why. It seems a pretty consistent behavior to me.
Because it essentially cut any memory above the given size. If a new
hotplugable memory fits into that cap then it just shows up. Quite
contrary I would consider it unexpected that a memory higher than the
given mem=XYZ is really there. But I do recognize a real usecase
mentioned elsewhere which beats the consistency argument here because
all setups where such a restriction would be really important are
debugging/workaround AFAICS.
Baoquan He Dec. 10, 2019, 12:55 p.m. UTC | #12
On 12/10/19 at 12:33pm, Michal Hocko wrote:
> On Tue 10-12-19 18:43:03, Baoquan He wrote:
> > On 12/10/19 at 11:28am, Michal Hocko wrote:
> > > On Tue 10-12-19 15:24:53, Baoquan He wrote:
> [...]
> > > > But after system bootup, we should be able to hot add/remove any memory
> > > > board. This should not be restricted by a boot-time kernel parameter
> > > > 'mme='. This is what I am trying to fix.
> > > 
> > > This is a simple statement without any actual explanation on why. Why is
> > > hotplug memory special? What is the usecase? Who would want to use mem
> > > parameter and later expect a memory above the restrected area to be
> > > hotplugable?
> > 
> > The why is 'mem=' is used to restrict the amount of system ram during
> > boot. We have two ways to add system memory, one is installing DIMMs
> > before boot, the other is hot adding memory after boot. Without David's 
> > use case, we may need redefine 'mem=' and change its documentation in
> > kernel-parameters.txt, if we don't want to fix it like this. Otherwise,
> > 'mem=' will limit the system's upper system ram always, that is not
> > expected.
> 
> I really do not see why. It seems a pretty consistent behavior to me.
> Because it essentially cut any memory above the given size. If a new
> hotplugable memory fits into that cap then it just shows up. Quite
> contrary I would consider it unexpected that a memory higher than the
> given mem=XYZ is really there. But I do recognize a real usecase
> mentioned elsewhere which beats the consistency argument here because
> all setups where such a restriction would be really important are
> debugging/workaround AFAICS.

All right. There could be me who have this misunderstanding. 

Anyway, I think now we all agree it's only a boot-time restriction on
the system RAM.

Btw, as you said at above, I am confused by the '[KNL,BOOT]', what does
the 'BOOT' mean in the documentation of 'mem='? I checked all parameters
with 'BOOT', still don't get it clearly.

Thanks
Baoquan
Michal Hocko Dec. 10, 2019, 1:32 p.m. UTC | #13
On Tue 10-12-19 20:55:57, Baoquan He wrote:
[...]
> Btw, as you said at above, I am confused by the '[KNL,BOOT]', what does
> the 'BOOT' mean in the documentation of 'mem='? I checked all parameters
> with 'BOOT', still don't get it clearly.

This is a good question indeed. I have checked closer and this is what
documentation says
Documentation/admin-guide/kernel-parameters.rst
"
        BOOT    Is a boot loader parameter.

Parameters denoted with BOOT are actually interpreted by the boot
loader, and have no meaning to the kernel directly.
"

and that really doesn't fit, right? So I went to check the full history
git tree just to get to 2.4.0-test5 and no explanation whatsoever.
Fun, isn't it? ;)
Baoquan He Dec. 10, 2019, 2:05 p.m. UTC | #14
On 12/10/19 at 02:32pm, Michal Hocko wrote:
> On Tue 10-12-19 20:55:57, Baoquan He wrote:
> [...]
> > Btw, as you said at above, I am confused by the '[KNL,BOOT]', what does
> > the 'BOOT' mean in the documentation of 'mem='? I checked all parameters
> > with 'BOOT', still don't get it clearly.
> 
> This is a good question indeed. I have checked closer and this is what
> documentation says
> Documentation/admin-guide/kernel-parameters.rst
> "
>         BOOT    Is a boot loader parameter.
> 
> Parameters denoted with BOOT are actually interpreted by the boot
> loader, and have no meaning to the kernel directly.
> "
> 
> and that really doesn't fit, right? So I went to check the full history
> git tree just to get to 2.4.0-test5 and no explanation whatsoever.
> Fun, isn't it? ;)

Yeah, very interesting. Finally I got their original purpose from
Documentation/x86/boot.rst.


Special Command Line Options
============================

If the command line provided by the boot loader is entered by the
user, the user may expect the following command line options to work.
They should normally not be deleted from the kernel command line even
though not all of them are actually meaningful to the kernel.  Boot
loader authors who need additional command line options for the boot
loader itself should get them registered in
Documentation/admin-guide/kernel-parameters.rst to make sure they will not
conflict with actual kernel options now or in the future.

...

So here, [KNL,BOOT], KNL means it's used for kernel, BOOT means it's
needed by boot loader.

I think we should at least add a note in kernel-parameters.txt to
explain this. Will add it.

Thanks
Baoquan
Michal Hocko Dec. 10, 2019, 2:19 p.m. UTC | #15
On Tue 10-12-19 22:05:34, Baoquan He wrote:
> On 12/10/19 at 02:32pm, Michal Hocko wrote:
> > On Tue 10-12-19 20:55:57, Baoquan He wrote:
> > [...]
> > > Btw, as you said at above, I am confused by the '[KNL,BOOT]', what does
> > > the 'BOOT' mean in the documentation of 'mem='? I checked all parameters
> > > with 'BOOT', still don't get it clearly.
> > 
> > This is a good question indeed. I have checked closer and this is what
> > documentation says
> > Documentation/admin-guide/kernel-parameters.rst
> > "
> >         BOOT    Is a boot loader parameter.
> > 
> > Parameters denoted with BOOT are actually interpreted by the boot
> > loader, and have no meaning to the kernel directly.
> > "
> > 
> > and that really doesn't fit, right? So I went to check the full history
> > git tree just to get to 2.4.0-test5 and no explanation whatsoever.
> > Fun, isn't it? ;)
> 
> Yeah, very interesting. Finally I got their original purpose from
> Documentation/x86/boot.rst.
> 
> 
> Special Command Line Options
> ============================
> 
> If the command line provided by the boot loader is entered by the
> user, the user may expect the following command line options to work.
> They should normally not be deleted from the kernel command line even
> though not all of them are actually meaningful to the kernel.  Boot
> loader authors who need additional command line options for the boot
> loader itself should get them registered in
> Documentation/admin-guide/kernel-parameters.rst to make sure they will not
> conflict with actual kernel options now or in the future.
> 
> ...
> 
> So here, [KNL,BOOT], KNL means it's used for kernel, BOOT means it's
> needed by boot loader.

OK, that clarifies this a bit. Thanks for referencing to it!
That should explain how the behavior is not boot time restricted at all
and the current implementation is actually correct. So a change to it
should clearly state the new usecase as we have already discussed. In
case there are bootloaders which really rely on the original strict
meaning then we should be able to compare cost/benfits of those two
usecases.
Baoquan He Dec. 11, 2019, 1:20 p.m. UTC | #16
On 12/10/19 at 03:19pm, Michal Hocko wrote:
> On Tue 10-12-19 22:05:34, Baoquan He wrote:
> > On 12/10/19 at 02:32pm, Michal Hocko wrote:
> > > On Tue 10-12-19 20:55:57, Baoquan He wrote:
> > > [...]
> > > > Btw, as you said at above, I am confused by the '[KNL,BOOT]', what does
> > > > the 'BOOT' mean in the documentation of 'mem='? I checked all parameters
> > > > with 'BOOT', still don't get it clearly.
> > > 
> > > This is a good question indeed. I have checked closer and this is what
> > > documentation says
> > > Documentation/admin-guide/kernel-parameters.rst
> > > "
> > >         BOOT    Is a boot loader parameter.
> > > 
> > > Parameters denoted with BOOT are actually interpreted by the boot
> > > loader, and have no meaning to the kernel directly.
> > > "
> > > 
> > > and that really doesn't fit, right? So I went to check the full history
> > > git tree just to get to 2.4.0-test5 and no explanation whatsoever.
> > > Fun, isn't it? ;)
> > 
> > Yeah, very interesting. Finally I got their original purpose from
> > Documentation/x86/boot.rst.
> > 
> > 
> > Special Command Line Options
> > ============================
> > 
> > If the command line provided by the boot loader is entered by the
> > user, the user may expect the following command line options to work.
> > They should normally not be deleted from the kernel command line even
> > though not all of them are actually meaningful to the kernel.  Boot
> > loader authors who need additional command line options for the boot
> > loader itself should get them registered in
> > Documentation/admin-guide/kernel-parameters.rst to make sure they will not
> > conflict with actual kernel options now or in the future.
> > 
> > ...
> > 
> > So here, [KNL,BOOT], KNL means it's used for kernel, BOOT means it's
> > needed by boot loader.
> 
> OK, that clarifies this a bit. Thanks for referencing to it!
> That should explain how the behavior is not boot time restricted at all
> and the current implementation is actually correct. So a change to it
> should clearly state the new usecase as we have already discussed. In
> case there are bootloaders which really rely on the original strict
> meaning then we should be able to compare cost/benfits of those two
> usecases.

Sounds reasonable to me. From the current parameters for x86, it only
impact the bootloader during boot-time, e.g 'mem= ' says bootloader need
this to place initrd. It might not give the trouble, anyway, we will
say.

Just a little more, we have test case for memory hotplug which only test
the DIMM added after boot. And the old 'mem= ' implementation in x86
only erazes memory regions above 'mem= ' in e820 table. That is why the
behaviour change immediately gave me a surprise when I noticed people
back ported Jurgen's patch to our distros. 

So glad to see all is clear, thanks.
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 55ac23ef11c1..5466a0a00901 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -105,7 +105,7 @@  static struct resource *register_memory_resource(u64 start, u64 size)
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 	char *resource_name = "System RAM";
 
-	if (start + size > max_mem_size)
+	if (start + size > max_mem_size && system_state == SYSTEM_BOOTING)
 		return ERR_PTR(-E2BIG);
 
 	/*