diff mbox series

hugetlb: Add nohugepages parameter to prevent hugepages creation

Message ID 20191011223955.1435-1-gpiccoli@canonical.com (mailing list archive)
State New, archived
Headers show
Series hugetlb: Add nohugepages parameter to prevent hugepages creation | expand

Commit Message

Guilherme Piccoli Oct. 11, 2019, 10:39 p.m. UTC
Currently there are 2 ways for setting HugeTLB hugepages in kernel; either
users pass parameters on kernel command-line or they can write to sysfs
files (which is effectively the sysctl way).

Kdump kernels won't benefit from hugepages - in fact it's quite opposite,
it may be the case hugepages on kdump kernel can lead to OOM if kernel
gets unable to allocate demanded pages due to the fact the preallocated
hugepages are consuming a lot of memory.

This patch proposes a new kernel parameter to prevent the creation of
HugeTLB hugepages - we currently don't have a way to do that. We can
even have kdump scripts removing the kernel command-line options to
set hugepages, but it's not straightforward to prevent sysctl/sysfs
configuration, given it happens in later boot or anytime when the
system is running.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
---

About some decisions took in this patch:

* early_param() was used because I couldn't find a way to enforce
parameters' ordering when using __setup(), and we need nohugepages
processed before all other hugepages options.

* The return when sysctl handler is prevented to progress due to
nohugepages is -EINVAL, but could be changed; I've just followed
present code there, but I'm OK changing that if we have suggestions.

Thanks in advance for the review!
Cheers,


Guilherme

 Documentation/admin-guide/kernel-parameters.txt |  4 ++++
 mm/hugetlb.c                                    | 16 ++++++++++++++++
 2 files changed, 20 insertions(+)

Comments

Qian Cai Oct. 11, 2019, 11:35 p.m. UTC | #1
> On Oct 11, 2019, at 6:40 PM, Guilherme G. Piccoli <gpiccoli@canonical.com> wrote:
> 
> Kdump kernels won't benefit from hugepages - in fact it's quite opposite,
> it may be the case hugepages on kdump kernel can lead to OOM if kernel
> gets unable to allocate demanded pages due to the fact the preallocated
> hugepages are consuming a lot of memory.
> 
> This patch proposes a new kernel parameter to prevent the creation of
> HugeTLB hugepages - we currently don't have a way to do that. We can
> even have kdump scripts removing the kernel command-line options to
> set hugepages, but it's not straightforward to prevent sysctl/sysfs
> configuration, given it happens in later boot or anytime when the
> system is running.

Typically, kdump kernel has its own initramfs, and don’t even need to mount a rootfs, so I can’t see how sysfs/sysctl is relevant here.
Guilherme Piccoli Oct. 11, 2019, 11:41 p.m. UTC | #2
On Fri, Oct 11, 2019 at 8:36 PM Qian Cai <cai@lca.pw> wrote:
>
> Typically, kdump kernel has its own initramfs, and don’t even need to mount a rootfs, so I can’t see how sysfs/sysctl is relevant here.

Thanks for the quick response. Kdump in Ubuntu, for example, rely in
mounting the root filesystem.
Even in initrd-only approaches, we could have sysctl.conf being copied
to initramfs, and hugepages end-up getting set.

Also, I don't think the "nohugepages" is useful only for kdump - I
found odd we cannot prevent the creation of hugepages at all when
using sysfs writes.

Cheers,


Guilherme
Qian Cai Oct. 11, 2019, 11:52 p.m. UTC | #3
> On Oct 11, 2019, at 7:42 PM, Guilherme Piccoli <gpiccoli@canonical.com> wrote:
> 
> Thanks for the quick response. Kdump in Ubuntu, for example, rely in
> mounting the root filesystem.
> Even in initrd-only approaches, we could have sysctl.conf being copied
> to initramfs, and hugepages end-up getting set.

It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel.
Guilherme Piccoli Oct. 12, 2019, 12:41 a.m. UTC | #4
On Fri, Oct 11, 2019 at 8:52 PM Qian Cai <cai@lca.pw> wrote:
>
>
> It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel.

Makes sense, I agree with you. But still, there's no
formal/right/single way to do kdump, so I don't think we should rely
on "rootfs kdump is wrong" to refuse this patch's idea. If so, we
should then formalize the right way of kdumping.
Of course, this is only my opinion, let's see what people think about it!

Thanks,


Guilherme
Mike Kravetz Oct. 14, 2019, 6:25 p.m. UTC | #5
On 10/11/19 3:39 PM, Guilherme G. Piccoli wrote:
> Currently there are 2 ways for setting HugeTLB hugepages in kernel; either
> users pass parameters on kernel command-line or they can write to sysfs
> files (which is effectively the sysctl way).
> 
> Kdump kernels won't benefit from hugepages - in fact it's quite opposite,
> it may be the case hugepages on kdump kernel can lead to OOM if kernel
> gets unable to allocate demanded pages due to the fact the preallocated
> hugepages are consuming a lot of memory.
> 
> This patch proposes a new kernel parameter to prevent the creation of
> HugeTLB hugepages - we currently don't have a way to do that. We can
> even have kdump scripts removing the kernel command-line options to
> set hugepages, but it's not straightforward to prevent sysctl/sysfs
> configuration, given it happens in later boot or anytime when the
> system is running.
> 
> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
> ---
> 
> About some decisions took in this patch:
> 
> * early_param() was used because I couldn't find a way to enforce
> parameters' ordering when using __setup(), and we need nohugepages
> processed before all other hugepages options.

I don't know much about early_param(), so I will assume this works as you
describe.  However, a quick grep shows hugepage options for ia64 also with
early_param.

> * The return when sysctl handler is prevented to progress due to
> nohugepages is -EINVAL, but could be changed; I've just followed
> present code there, but I'm OK changing that if we have suggestions.

It looks like you only have short circuited/prevented nr_hugepages via
sysfs/sysctl.  Theoretically, one could set nr_overcommit_hugepages and
still allocate hugetlb pages.  So, if you REALLY want to shut things down
you need to stop this as well.

There is already a macro hugepages_supported() that can be set by arch
specific code.  I wonder how difficult it would be to 'overwrite' the
macro if nohugepages is specified.  Perhaps just a level of naming
indirection.  This would use the existing code to prevent all hugetlb usage.

It seems like there may be some discussion about 'the right' way to
do kdump.  I can't add to that discussion, but if such an option as
nohugepages is needed, I can help.
Guilherme Piccoli Oct. 15, 2019, 4:50 a.m. UTC | #6
On 14/10/2019 15:25, Mike Kravetz wrote:
>  [...]
> I don't know much about early_param(), so I will assume this works as you
> describe.  However, a quick grep shows hugepage options for ia64 also with
> early_param.
> 

Thanks a lot for the prompt and quite informative reply Mike.
I've checked this IA64 parameter after your mention, and it just sets
the hugepages size, I don't think it'll affect the purpose of this patch.


>> * The return when sysctl handler is prevented to progress due to
>> nohugepages is -EINVAL, but could be changed; I've just followed
>> present code there, but I'm OK changing that if we have suggestions.
> 
> It looks like you only have short circuited/prevented nr_hugepages via
> sysfs/sysctl.  Theoretically, one could set nr_overcommit_hugepages and
> still allocate hugetlb pages.  So, if you REALLY want to shut things down
> you need to stop this as well.
> 
> There is already a macro hugepages_supported() that can be set by arch
> specific code.  I wonder how difficult it would be to 'overwrite' the
> macro if nohugepages is specified.  Perhaps just a level of naming
> indirection.  This would use the existing code to prevent all hugetlb usage.
>

Outstanding! It's a much better idea to use hugepages_supported()
infrastructure, it prevents even the creation of hugepages-related sysfs
entries; I've worked a V2 with this modification, and it worked fine,
thanks for the suggestion.


> It seems like there may be some discussion about 'the right' way to
> do kdump.  I can't add to that discussion, but if such an option as
> nohugepages is needed, I can help.
> 

I think this parameter may be important/useful not only for kdump - it
is a legitimate way of disabling hugepages, something we don't currently
have on kernel. I'll submit a V2, for Ubuntu kdump specially this is
quite helpful!

Cheers,


Guilherme
Michal Hocko Oct. 15, 2019, 12:18 p.m. UTC | #7
On Fri 11-10-19 21:41:01, Guilherme Piccoli wrote:
> On Fri, Oct 11, 2019 at 8:52 PM Qian Cai <cai@lca.pw> wrote:
> >
> >
> > It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel.
> 
> Makes sense, I agree with you. But still, there's no
> formal/right/single way to do kdump, so I don't think we should rely
> on "rootfs kdump is wrong" to refuse this patch's idea. If so, we
> should then formalize the right way of kdumping.
> Of course, this is only my opinion, let's see what people think about it!

I do agree with Qian Cai here. Kdump kernel requires a very tailored
environment considering it is running in a very restricted
configuration. The hugetlb pre-allocation sounds like a tooling problem
and should be fixed at that layer.
Guilherme G. Piccoli Oct. 15, 2019, 1:58 p.m. UTC | #8
On 10/15/19 9:18 AM, Michal Hocko wrote:
> I do agree with Qian Cai here. Kdump kernel requires a very tailored
> environment considering it is running in a very restricted
> configuration. The hugetlb pre-allocation sounds like a tooling problem
> and should be fixed at that layer.
> 

Hi Michal, thanks for your response. Can you suggest me a current way of 
preventing hugepages for being created, using userspace? The goal for 
this patch is exactly this, introduce such a way.

As I've said before, Ubuntu kdump rely on rootfs mounting and there's no 
official statement or documentation that say it's wrong to do this way - 
although I agree initrd-only approach is more safe. But since Ubuntu 
kdump rely on rootfs mount, we couldn't find a way to effectively 
prevent the creation of hugepages completely, hence we tried to 
introduce one.

Cheers,


Guilherme
Michal Hocko Oct. 15, 2019, 2:05 p.m. UTC | #9
On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote:
> On 10/15/19 9:18 AM, Michal Hocko wrote:
> > I do agree with Qian Cai here. Kdump kernel requires a very tailored
> > environment considering it is running in a very restricted
> > configuration. The hugetlb pre-allocation sounds like a tooling problem
> > and should be fixed at that layer.
> > 
> 
> Hi Michal, thanks for your response. Can you suggest me a current way of
> preventing hugepages for being created, using userspace? The goal for this
> patch is exactly this, introduce such a way.

Simply restrict the environment to not allocate any hugepages? Kdump
already controls the kernel command line and it also starts only a very
minimal subset of services. So who is allocating those hugepages?
sysctls should be already excluded by default as Qian mentioned.
Guilherme G. Piccoli Oct. 15, 2019, 2:09 p.m. UTC | #10
On 10/15/19 11:05 AM, Michal Hocko wrote:
> On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote:
>> On 10/15/19 9:18 AM, Michal Hocko wrote:
>>> I do agree with Qian Cai here. Kdump kernel requires a very tailored
>>> environment considering it is running in a very restricted
>>> configuration. The hugetlb pre-allocation sounds like a tooling problem
>>> and should be fixed at that layer.
>>>
>>
>> Hi Michal, thanks for your response. Can you suggest me a current way of
>> preventing hugepages for being created, using userspace? The goal for this
>> patch is exactly this, introduce such a way.
> 
> Simply restrict the environment to not allocate any hugepages? Kdump
> already controls the kernel command line and it also starts only a very
> minimal subset of services. So who is allocating those hugepages?
> sysctls should be already excluded by default as Qian mentioned.
> 


OK, thanks Michal and Qian, I'll try to make things work from kdump 
perspective. The trick part is exactly preventing the sysctl to get 
applied heh

Cheers,


Guilherme
Mike Kravetz Oct. 18, 2019, 6:29 p.m. UTC | #11
On 10/15/19 7:09 AM, Guilherme G. Piccoli wrote:
> 
> 
> On 10/15/19 11:05 AM, Michal Hocko wrote:
>> On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote:
>>> On 10/15/19 9:18 AM, Michal Hocko wrote:
>>>> I do agree with Qian Cai here. Kdump kernel requires a very tailored
>>>> environment considering it is running in a very restricted
>>>> configuration. The hugetlb pre-allocation sounds like a tooling problem
>>>> and should be fixed at that layer.
>>>>
>>>
>>> Hi Michal, thanks for your response. Can you suggest me a current way of
>>> preventing hugepages for being created, using userspace? The goal for this
>>> patch is exactly this, introduce such a way.
>>
>> Simply restrict the environment to not allocate any hugepages? Kdump
>> already controls the kernel command line and it also starts only a very
>> minimal subset of services. So who is allocating those hugepages?
>> sysctls should be already excluded by default as Qian mentioned.
>>
> 
> 
> OK, thanks Michal and Qian, I'll try to make things work from kdump perspective. The trick part is exactly preventing the sysctl to get applied heh
> 

Please do let us know if this can be done in tooling.

I am not opposed to the approach taken in your v2 patch as it essentially
uses the hugepages_supported() functionality that exists today.  However,
it seems that other distros have ways around this issue.  As such, I would
prefer if the issue was addressed in the tooling.
Guilherme G. Piccoli Oct. 24, 2019, 4:21 p.m. UTC | #12
On 10/18/19 3:29 PM, Mike Kravetz wrote:
> On 10/15/19 7:09 AM, Guilherme G. Piccoli wrote:
> Please do let us know if this can be done in tooling.
> 
> I am not opposed to the approach taken in your v2 patch as it essentially
> uses the hugepages_supported() functionality that exists today.  However,
> it seems that other distros have ways around this issue.  As such, I would
> prefer if the issue was addressed in the tooling.
> 

Thanks a lot Mike! I'll work on this and let you know if I were able to 
prevent hugepages from the tooling.

Cheers,


Guilherme
diff mbox series

Patch

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index c7ac2f3ac99f..eebe0e7b30cf 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -2982,6 +2982,10 @@ 
 
 	nohugeiomap	[KNL,x86,PPC] Disable kernel huge I/O mappings.
 
+	nohugepages	[KNL] Disable HugeTLB hugepages completely, preventing
+			its setting either by kernel parameter or sysfs;
+			useful specially in kdump kernel.
+
 	nosmt		[KNL,S390] Disable symmetric multithreading (SMT).
 			Equivalent to smt=1.
 
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index ef37c85423a5..a6c7a68152e5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -43,6 +43,7 @@ 
 int hugetlb_max_hstate __read_mostly;
 unsigned int default_hstate_idx;
 struct hstate hstates[HUGE_MAX_HSTATE];
+static int disable_hugepages;
 /*
  * Minimum page order among possible hugepage sizes, set to a proper value
  * at boot time.
@@ -2550,6 +2551,9 @@  static ssize_t __nr_hugepages_store_common(bool obey_mempolicy,
 	int err;
 	nodemask_t nodes_allowed, *n_mask;
 
+	if (disable_hugepages)
+		return -EINVAL;
+
 	if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
 		return -EINVAL;
 
@@ -2978,6 +2982,9 @@  static int __init hugetlb_nrpages_setup(char *s)
 	unsigned long *mhp;
 	static unsigned long *last_mhp;
 
+	if (disable_hugepages)
+		return 1;
+
 	if (!parsed_valid_hugepagesz) {
 		pr_warn("hugepages = %s preceded by "
 			"an unsupported hugepagesz, ignoring\n", s);
@@ -3022,6 +3029,15 @@  static int __init hugetlb_default_setup(char *s)
 }
 __setup("default_hugepagesz=", hugetlb_default_setup);
 
+static int __init nohugepages_setup(char *str)
+{
+	disable_hugepages = 1;
+	pr_info("HugeTLB: hugepages disabled by kernel parameter\n");
+
+	return 0;
+}
+early_param("nohugepages", nohugepages_setup);
+
 static unsigned int cpuset_mems_nr(unsigned int *array)
 {
 	int node;