diff mbox series

[v3] mm/hotplug: Only respect mem= parameter during boot stage

Message ID 20200204050643.20925-1-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v3] mm/hotplug: Only respect mem= parameter during boot stage | expand

Commit Message

Baoquan He Feb. 4, 2020, 5:06 a.m. UTC
In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
parameter") a global varialbe max_mem_size is added to store
the value parsed from 'mem= ', then checked when memory region is
added. This truly stops those DIMMs from being added into system memory
during boot-time.

However, it also limits the later memory hotplug functionality. Any
DIMM can't be hotplugged any more if its region is beyond the
max_mem_size. We will get errors like:

[  216.387164] acpi PNP0C80:02: add_memory failed
[  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
[  216.392187] acpi PNP0C80:02: Enumeration failure

This will cause issue in a known use case where 'mem=' is added to
the hypervisor. The memory that lies after 'mem=' boundary will be
assigned to KVM guests. After commit 357b4da50a62 merged, memory
can't be extended dynamically if system memory on hypervisor is not
sufficient.

So fix it by also checking if it's during boot-time restricting to add
memory. Otherwise, skip the restriction.

And also add this use case to document of 'mem=' kernel parameter.

Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
Signed-off-by: Baoquan He <bhe@redhat.com>
---
v2->v3:
  In discussion of v1 and v2, People have concern about the use case
  related to the code change. So add the use case into patch log and
  document of 'mem=' in kernel-parameters.txt.

 Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++--
 mm/memory_hotplug.c                             |  8 +++++++-
 2 files changed, 18 insertions(+), 3 deletions(-)

Comments

Jürgen Groß Feb. 4, 2020, 7:53 a.m. UTC | #1
On 04.02.20 06:06, Baoquan He wrote:
> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> parameter") a global varialbe max_mem_size is added to store
> the value parsed from 'mem= ', then checked when memory region is
> added. This truly stops those DIMMs from being added into system memory
> during boot-time.
> 
> However, it also limits the later memory hotplug functionality. Any
> DIMM can't be hotplugged any more if its region is beyond the
> max_mem_size. We will get errors like:
> 
> [  216.387164] acpi PNP0C80:02: add_memory failed
> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> [  216.392187] acpi PNP0C80:02: Enumeration failure
> 
> This will cause issue in a known use case where 'mem=' is added to
> the hypervisor. The memory that lies after 'mem=' boundary will be
> assigned to KVM guests. After commit 357b4da50a62 merged, memory
> can't be extended dynamically if system memory on hypervisor is not
> sufficient.
> 
> So fix it by also checking if it's during boot-time restricting to add
> memory. Otherwise, skip the restriction.
> 
> And also add this use case to document of 'mem=' kernel parameter.
> 
> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> Signed-off-by: Baoquan He <bhe@redhat.com>

Reviewed-by: Juergen Gross <jgross@suse.com>


Juergen
Baoquan He Feb. 6, 2020, 2:58 a.m. UTC | #2
On 02/04/20 at 08:53am, Jürgen Groß wrote:
> On 04.02.20 06:06, Baoquan He wrote:
> > In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> > parameter") a global varialbe max_mem_size is added to store
> > the value parsed from 'mem= ', then checked when memory region is
> > added. This truly stops those DIMMs from being added into system memory
> > during boot-time.
> > 
> > However, it also limits the later memory hotplug functionality. Any
> > DIMM can't be hotplugged any more if its region is beyond the
> > max_mem_size. We will get errors like:
> > 
> > [  216.387164] acpi PNP0C80:02: add_memory failed
> > [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> > [  216.392187] acpi PNP0C80:02: Enumeration failure
> > 
> > This will cause issue in a known use case where 'mem=' is added to
> > the hypervisor. The memory that lies after 'mem=' boundary will be
> > assigned to KVM guests. After commit 357b4da50a62 merged, memory
> > can't be extended dynamically if system memory on hypervisor is not
> > sufficient.
> > 
> > So fix it by also checking if it's during boot-time restricting to add
> > memory. Otherwise, skip the restriction.
> > 
> > And also add this use case to document of 'mem=' kernel parameter.
> > 
> > Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> 
> Reviewed-by: Juergen Gross <jgross@suse.com>

Thanks, Juergen. Seems I should add more details to explain this. Will
post v4 with your 'Reviewed-by'.
David Hildenbrand Feb. 6, 2020, 8:55 a.m. UTC | #3
On 04.02.20 06:06, Baoquan He wrote:
> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> parameter") a global varialbe max_mem_size is added to store
> the value parsed from 'mem= ', then checked when memory region is
> added. This truly stops those DIMMs from being added into system memory
> during boot-time.
> 
> However, it also limits the later memory hotplug functionality. Any
> DIMM can't be hotplugged any more if its region is beyond the
> max_mem_size. We will get errors like:
> 
> [  216.387164] acpi PNP0C80:02: add_memory failed
> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> [  216.392187] acpi PNP0C80:02: Enumeration failure
> 
> This will cause issue in a known use case where 'mem=' is added to
> the hypervisor. The memory that lies after 'mem=' boundary will be
> assigned to KVM guests. After commit 357b4da50a62 merged, memory
> can't be extended dynamically if system memory on hypervisor is not
> sufficient.
> 
> So fix it by also checking if it's during boot-time restricting to add
> memory. Otherwise, skip the restriction.
> 
> And also add this use case to document of 'mem=' kernel parameter.
> 
> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> Signed-off-by: Baoquan He <bhe@redhat.com>
> ---
> v2->v3:
>   In discussion of v1 and v2, People have concern about the use case
>   related to the code change. So add the use case into patch log and
>   document of 'mem=' in kernel-parameters.txt.
> 
>  Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++--
>  mm/memory_hotplug.c                             |  8 +++++++-
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ddc5ccdd4cd1..b809767e5f74 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2533,13 +2533,22 @@
>  			For details see: Documentation/admin-guide/hw-vuln/mds.rst
>  
>  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
> -			Amount of memory to be used when the kernel is not able
> -			to see the whole system memory or for test.
> +			Amount of memory to be used in cases as follows:
> +
> +			1 for test;
> +			2 when the kernel is not able to see the whole system memory;
> +			3 memory that lies after 'mem=' boundary is excluded from
> +			 the hypervisor, then assigned to KVM guests.

I remember that there were more use cases, but forgot where that was
documented :)

I do wonder if we want to change that now without anybody complaining.
Yes, I brought up a possible use case but don't know if it is relevant
in practice (IOW, nobody complained yet :) ).

Would like to get Michals opinion on this.
Baoquan He Feb. 6, 2020, 9:44 a.m. UTC | #4
On 02/06/20 at 09:55am, David Hildenbrand wrote:
> On 04.02.20 06:06, Baoquan He wrote:
> > In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> > parameter") a global varialbe max_mem_size is added to store
> > the value parsed from 'mem= ', then checked when memory region is
> > added. This truly stops those DIMMs from being added into system memory
> > during boot-time.
> > 
> > However, it also limits the later memory hotplug functionality. Any
> > DIMM can't be hotplugged any more if its region is beyond the
> > max_mem_size. We will get errors like:
> > 
> > [  216.387164] acpi PNP0C80:02: add_memory failed
> > [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> > [  216.392187] acpi PNP0C80:02: Enumeration failure
> > 
> > This will cause issue in a known use case where 'mem=' is added to
> > the hypervisor. The memory that lies after 'mem=' boundary will be
> > assigned to KVM guests. After commit 357b4da50a62 merged, memory
> > can't be extended dynamically if system memory on hypervisor is not
> > sufficient.
> > 
> > So fix it by also checking if it's during boot-time restricting to add
> > memory. Otherwise, skip the restriction.
> > 
> > And also add this use case to document of 'mem=' kernel parameter.
> > 
> > Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> > Signed-off-by: Baoquan He <bhe@redhat.com>
> > ---
> > v2->v3:
> >   In discussion of v1 and v2, People have concern about the use case
> >   related to the code change. So add the use case into patch log and
> >   document of 'mem=' in kernel-parameters.txt.
> > 
> >  Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++--
> >  mm/memory_hotplug.c                             |  8 +++++++-
> >  2 files changed, 18 insertions(+), 3 deletions(-)
> > 
> > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> > index ddc5ccdd4cd1..b809767e5f74 100644
> > --- a/Documentation/admin-guide/kernel-parameters.txt
> > +++ b/Documentation/admin-guide/kernel-parameters.txt
> > @@ -2533,13 +2533,22 @@
> >  			For details see: Documentation/admin-guide/hw-vuln/mds.rst
> >  
> >  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
> > -			Amount of memory to be used when the kernel is not able
> > -			to see the whole system memory or for test.
> > +			Amount of memory to be used in cases as follows:
> > +
> > +			1 for test;
> > +			2 when the kernel is not able to see the whole system memory;
> > +			3 memory that lies after 'mem=' boundary is excluded from
> > +			 the hypervisor, then assigned to KVM guests.
> 
> I remember that there were more use cases, but forgot where that was
> documented :)

In fact, as long as it's not used for test, hotplug will be helpful, e.g
the 2nd use case. We use 'mem=' to skip these bad part of boot memory DIMMs,
while hotplug can help us to extend RAM with good DIMMs. No need to
discard the partly broken memmory board.

> 
> I do wonder if we want to change that now without anybody complaining.
> Yes, I brought up a possible use case but don't know if it is relevant
> in practice (IOW, nobody complained yet :) ).

Yes, I should hold it a while. Worry it's not clear enough. But in
kernel-parameters.txt, I can't write too many details about use case.

> 
> Would like to get Michals opinion on this.

Sure, will hold.  Thanks.
Michal Hocko Feb. 14, 2020, 10:53 a.m. UTC | #5
On Tue 04-02-20 13:06:43, Baoquan He wrote:
> In commit 357b4da50a62 ("x86: respect memory size limiting via mem=
> parameter") a global varialbe max_mem_size is added to store
> the value parsed from 'mem= ', then checked when memory region is
> added. This truly stops those DIMMs from being added into system memory
> during boot-time.
> 
> However, it also limits the later memory hotplug functionality. Any
> DIMM can't be hotplugged any more if its region is beyond the
> max_mem_size. We will get errors like:
> 
> [  216.387164] acpi PNP0C80:02: add_memory failed
> [  216.389301] acpi PNP0C80:02: acpi_memory_enable_device() error
> [  216.392187] acpi PNP0C80:02: Enumeration failure
> 
> This will cause issue in a known use case where 'mem=' is added to
> the hypervisor. The memory that lies after 'mem=' boundary will be
> assigned to KVM guests. After commit 357b4da50a62 merged, memory
> can't be extended dynamically if system memory on hypervisor is not
> sufficient.
> 
> So fix it by also checking if it's during boot-time restricting to add
> memory. Otherwise, skip the restriction.
> 
> And also add this use case to document of 'mem=' kernel parameter.

I have to say I am not entirely happy about this change but the breakage
seems to be real so we have to live with that. If there are usecases that
need to restrict the physical memory range for real we would have to add
a new command line parameter.

> Fixes: 357b4da50a62 ("x86: respect memory size limiting via mem= parameter")
> Signed-off-by: Baoquan He <bhe@redhat.com>

Acked-by: Michal Hocko <mhocko@suse.com>

> ---
> v2->v3:
>   In discussion of v1 and v2, People have concern about the use case
>   related to the code change. So add the use case into patch log and
>   document of 'mem=' in kernel-parameters.txt.
> 
>  Documentation/admin-guide/kernel-parameters.txt | 13 +++++++++++--
>  mm/memory_hotplug.c                             |  8 +++++++-
>  2 files changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index ddc5ccdd4cd1..b809767e5f74 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -2533,13 +2533,22 @@
>  			For details see: Documentation/admin-guide/hw-vuln/mds.rst
>  
>  	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
> -			Amount of memory to be used when the kernel is not able
> -			to see the whole system memory or for test.
> +			Amount of memory to be used in cases as follows:
> +
> +			1 for test;
> +			2 when the kernel is not able to see the whole system memory;
> +			3 memory that lies after 'mem=' boundary is excluded from
> +			 the hypervisor, then assigned to KVM guests.
> +
>  			[X86] Work as limiting max address. Use together
>  			with memmap= to avoid physical address space collisions.
>  			Without memmap= PCI devices could be placed at addresses
>  			belonging to unused RAM.
>  
> +			Note that this only takes effects during boot time since
> +			in above case 3, memory may need be hot added after boot
> +			if system memory of hypervisor is not sufficient.
> +
>  	mem=nopentium	[BUGS=X86-32] Disable usage of 4MB pages for kernel
>  			memory.
>  
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 36d80915ddc2..e6c75ceacf9a 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -105,7 +105,13 @@ static struct resource *register_memory_resource(u64 start, u64 size)
>  	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
>  	char *resource_name = "System RAM";
>  
> -	if (start + size > max_mem_size)
> +	/*
> +	 * Make sure value parsed from 'mem=' only restricts memory adding
> +	 * while booting, so that memory hotplug won't be impacted. Please
> +	 * refer to document of 'mem=' in kernel-parameters.txt for more
> +	 * details.
> +	 */
> +	if (start + size > max_mem_size && system_state < SYSTEM_RUNNING)
>  		return ERR_PTR(-E2BIG);
>  
>  	/*
> -- 
> 2.17.2
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index ddc5ccdd4cd1..b809767e5f74 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2533,13 +2533,22 @@ 
 			For details see: Documentation/admin-guide/hw-vuln/mds.rst
 
 	mem=nn[KMG]	[KNL,BOOT] Force usage of a specific amount of memory
-			Amount of memory to be used when the kernel is not able
-			to see the whole system memory or for test.
+			Amount of memory to be used in cases as follows:
+
+			1 for test;
+			2 when the kernel is not able to see the whole system memory;
+			3 memory that lies after 'mem=' boundary is excluded from
+			 the hypervisor, then assigned to KVM guests.
+
 			[X86] Work as limiting max address. Use together
 			with memmap= to avoid physical address space collisions.
 			Without memmap= PCI devices could be placed at addresses
 			belonging to unused RAM.
 
+			Note that this only takes effects during boot time since
+			in above case 3, memory may need be hot added after boot
+			if system memory of hypervisor is not sufficient.
+
 	mem=nopentium	[BUGS=X86-32] Disable usage of 4MB pages for kernel
 			memory.
 
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 36d80915ddc2..e6c75ceacf9a 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -105,7 +105,13 @@  static struct resource *register_memory_resource(u64 start, u64 size)
 	unsigned long flags =  IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
 	char *resource_name = "System RAM";
 
-	if (start + size > max_mem_size)
+	/*
+	 * Make sure value parsed from 'mem=' only restricts memory adding
+	 * while booting, so that memory hotplug won't be impacted. Please
+	 * refer to document of 'mem=' in kernel-parameters.txt for more
+	 * details.
+	 */
+	if (start + size > max_mem_size && system_state < SYSTEM_RUNNING)
 		return ERR_PTR(-E2BIG);
 
 	/*