Message ID | 20191011223955.1435-1-gpiccoli@canonical.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hugetlb: Add nohugepages parameter to prevent hugepages creation | expand |
> On Oct 11, 2019, at 6:40 PM, Guilherme G. Piccoli <gpiccoli@canonical.com> wrote: > > Kdump kernels won't benefit from hugepages - in fact it's quite opposite, > it may be the case hugepages on kdump kernel can lead to OOM if kernel > gets unable to allocate demanded pages due to the fact the preallocated > hugepages are consuming a lot of memory. > > This patch proposes a new kernel parameter to prevent the creation of > HugeTLB hugepages - we currently don't have a way to do that. We can > even have kdump scripts removing the kernel command-line options to > set hugepages, but it's not straightforward to prevent sysctl/sysfs > configuration, given it happens in later boot or anytime when the > system is running. Typically, kdump kernel has its own initramfs, and don’t even need to mount a rootfs, so I can’t see how sysfs/sysctl is relevant here.
On Fri, Oct 11, 2019 at 8:36 PM Qian Cai <cai@lca.pw> wrote: > > Typically, kdump kernel has its own initramfs, and don’t even need to mount a rootfs, so I can’t see how sysfs/sysctl is relevant here. Thanks for the quick response. Kdump in Ubuntu, for example, rely in mounting the root filesystem. Even in initrd-only approaches, we could have sysctl.conf being copied to initramfs, and hugepages end-up getting set. Also, I don't think the "nohugepages" is useful only for kdump - I found odd we cannot prevent the creation of hugepages at all when using sysfs writes. Cheers, Guilherme
> On Oct 11, 2019, at 7:42 PM, Guilherme Piccoli <gpiccoli@canonical.com> wrote: > > Thanks for the quick response. Kdump in Ubuntu, for example, rely in > mounting the root filesystem. > Even in initrd-only approaches, we could have sysctl.conf being copied > to initramfs, and hugepages end-up getting set. It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel.
On Fri, Oct 11, 2019 at 8:52 PM Qian Cai <cai@lca.pw> wrote: > > > It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel. Makes sense, I agree with you. But still, there's no formal/right/single way to do kdump, so I don't think we should rely on "rootfs kdump is wrong" to refuse this patch's idea. If so, we should then formalize the right way of kdumping. Of course, this is only my opinion, let's see what people think about it! Thanks, Guilherme
On 10/11/19 3:39 PM, Guilherme G. Piccoli wrote: > Currently there are 2 ways for setting HugeTLB hugepages in kernel; either > users pass parameters on kernel command-line or they can write to sysfs > files (which is effectively the sysctl way). > > Kdump kernels won't benefit from hugepages - in fact it's quite opposite, > it may be the case hugepages on kdump kernel can lead to OOM if kernel > gets unable to allocate demanded pages due to the fact the preallocated > hugepages are consuming a lot of memory. > > This patch proposes a new kernel parameter to prevent the creation of > HugeTLB hugepages - we currently don't have a way to do that. We can > even have kdump scripts removing the kernel command-line options to > set hugepages, but it's not straightforward to prevent sysctl/sysfs > configuration, given it happens in later boot or anytime when the > system is running. > > Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> > --- > > About some decisions took in this patch: > > * early_param() was used because I couldn't find a way to enforce > parameters' ordering when using __setup(), and we need nohugepages > processed before all other hugepages options. I don't know much about early_param(), so I will assume this works as you describe. However, a quick grep shows hugepage options for ia64 also with early_param. > * The return when sysctl handler is prevented to progress due to > nohugepages is -EINVAL, but could be changed; I've just followed > present code there, but I'm OK changing that if we have suggestions. It looks like you only have short circuited/prevented nr_hugepages via sysfs/sysctl. Theoretically, one could set nr_overcommit_hugepages and still allocate hugetlb pages. So, if you REALLY want to shut things down you need to stop this as well. There is already a macro hugepages_supported() that can be set by arch specific code. I wonder how difficult it would be to 'overwrite' the macro if nohugepages is specified. Perhaps just a level of naming indirection. This would use the existing code to prevent all hugetlb usage. It seems like there may be some discussion about 'the right' way to do kdump. I can't add to that discussion, but if such an option as nohugepages is needed, I can help.
On 14/10/2019 15:25, Mike Kravetz wrote: > [...] > I don't know much about early_param(), so I will assume this works as you > describe. However, a quick grep shows hugepage options for ia64 also with > early_param. > Thanks a lot for the prompt and quite informative reply Mike. I've checked this IA64 parameter after your mention, and it just sets the hugepages size, I don't think it'll affect the purpose of this patch. >> * The return when sysctl handler is prevented to progress due to >> nohugepages is -EINVAL, but could be changed; I've just followed >> present code there, but I'm OK changing that if we have suggestions. > > It looks like you only have short circuited/prevented nr_hugepages via > sysfs/sysctl. Theoretically, one could set nr_overcommit_hugepages and > still allocate hugetlb pages. So, if you REALLY want to shut things down > you need to stop this as well. > > There is already a macro hugepages_supported() that can be set by arch > specific code. I wonder how difficult it would be to 'overwrite' the > macro if nohugepages is specified. Perhaps just a level of naming > indirection. This would use the existing code to prevent all hugetlb usage. > Outstanding! It's a much better idea to use hugepages_supported() infrastructure, it prevents even the creation of hugepages-related sysfs entries; I've worked a V2 with this modification, and it worked fine, thanks for the suggestion. > It seems like there may be some discussion about 'the right' way to > do kdump. I can't add to that discussion, but if such an option as > nohugepages is needed, I can help. > I think this parameter may be important/useful not only for kdump - it is a legitimate way of disabling hugepages, something we don't currently have on kernel. I'll submit a V2, for Ubuntu kdump specially this is quite helpful! Cheers, Guilherme
On Fri 11-10-19 21:41:01, Guilherme Piccoli wrote: > On Fri, Oct 11, 2019 at 8:52 PM Qian Cai <cai@lca.pw> wrote: > > > > > > It simply error-prone to reuse the sysctl.conf from the first kernel, as it could contains lots of things that will kill kdump kernel. > > Makes sense, I agree with you. But still, there's no > formal/right/single way to do kdump, so I don't think we should rely > on "rootfs kdump is wrong" to refuse this patch's idea. If so, we > should then formalize the right way of kdumping. > Of course, this is only my opinion, let's see what people think about it! I do agree with Qian Cai here. Kdump kernel requires a very tailored environment considering it is running in a very restricted configuration. The hugetlb pre-allocation sounds like a tooling problem and should be fixed at that layer.
On 10/15/19 9:18 AM, Michal Hocko wrote: > I do agree with Qian Cai here. Kdump kernel requires a very tailored > environment considering it is running in a very restricted > configuration. The hugetlb pre-allocation sounds like a tooling problem > and should be fixed at that layer. > Hi Michal, thanks for your response. Can you suggest me a current way of preventing hugepages for being created, using userspace? The goal for this patch is exactly this, introduce such a way. As I've said before, Ubuntu kdump rely on rootfs mounting and there's no official statement or documentation that say it's wrong to do this way - although I agree initrd-only approach is more safe. But since Ubuntu kdump rely on rootfs mount, we couldn't find a way to effectively prevent the creation of hugepages completely, hence we tried to introduce one. Cheers, Guilherme
On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote: > On 10/15/19 9:18 AM, Michal Hocko wrote: > > I do agree with Qian Cai here. Kdump kernel requires a very tailored > > environment considering it is running in a very restricted > > configuration. The hugetlb pre-allocation sounds like a tooling problem > > and should be fixed at that layer. > > > > Hi Michal, thanks for your response. Can you suggest me a current way of > preventing hugepages for being created, using userspace? The goal for this > patch is exactly this, introduce such a way. Simply restrict the environment to not allocate any hugepages? Kdump already controls the kernel command line and it also starts only a very minimal subset of services. So who is allocating those hugepages? sysctls should be already excluded by default as Qian mentioned.
On 10/15/19 11:05 AM, Michal Hocko wrote: > On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote: >> On 10/15/19 9:18 AM, Michal Hocko wrote: >>> I do agree with Qian Cai here. Kdump kernel requires a very tailored >>> environment considering it is running in a very restricted >>> configuration. The hugetlb pre-allocation sounds like a tooling problem >>> and should be fixed at that layer. >>> >> >> Hi Michal, thanks for your response. Can you suggest me a current way of >> preventing hugepages for being created, using userspace? The goal for this >> patch is exactly this, introduce such a way. > > Simply restrict the environment to not allocate any hugepages? Kdump > already controls the kernel command line and it also starts only a very > minimal subset of services. So who is allocating those hugepages? > sysctls should be already excluded by default as Qian mentioned. > OK, thanks Michal and Qian, I'll try to make things work from kdump perspective. The trick part is exactly preventing the sysctl to get applied heh Cheers, Guilherme
On 10/15/19 7:09 AM, Guilherme G. Piccoli wrote: > > > On 10/15/19 11:05 AM, Michal Hocko wrote: >> On Tue 15-10-19 10:58:36, Guilherme G. Piccoli wrote: >>> On 10/15/19 9:18 AM, Michal Hocko wrote: >>>> I do agree with Qian Cai here. Kdump kernel requires a very tailored >>>> environment considering it is running in a very restricted >>>> configuration. The hugetlb pre-allocation sounds like a tooling problem >>>> and should be fixed at that layer. >>>> >>> >>> Hi Michal, thanks for your response. Can you suggest me a current way of >>> preventing hugepages for being created, using userspace? The goal for this >>> patch is exactly this, introduce such a way. >> >> Simply restrict the environment to not allocate any hugepages? Kdump >> already controls the kernel command line and it also starts only a very >> minimal subset of services. So who is allocating those hugepages? >> sysctls should be already excluded by default as Qian mentioned. >> > > > OK, thanks Michal and Qian, I'll try to make things work from kdump perspective. The trick part is exactly preventing the sysctl to get applied heh > Please do let us know if this can be done in tooling. I am not opposed to the approach taken in your v2 patch as it essentially uses the hugepages_supported() functionality that exists today. However, it seems that other distros have ways around this issue. As such, I would prefer if the issue was addressed in the tooling.
On 10/18/19 3:29 PM, Mike Kravetz wrote: > On 10/15/19 7:09 AM, Guilherme G. Piccoli wrote: > Please do let us know if this can be done in tooling. > > I am not opposed to the approach taken in your v2 patch as it essentially > uses the hugepages_supported() functionality that exists today. However, > it seems that other distros have ways around this issue. As such, I would > prefer if the issue was addressed in the tooling. > Thanks a lot Mike! I'll work on this and let you know if I were able to prevent hugepages from the tooling. Cheers, Guilherme
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index c7ac2f3ac99f..eebe0e7b30cf 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2982,6 +2982,10 @@ nohugeiomap [KNL,x86,PPC] Disable kernel huge I/O mappings. + nohugepages [KNL] Disable HugeTLB hugepages completely, preventing + its setting either by kernel parameter or sysfs; + useful specially in kdump kernel. + nosmt [KNL,S390] Disable symmetric multithreading (SMT). Equivalent to smt=1. diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ef37c85423a5..a6c7a68152e5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -43,6 +43,7 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; struct hstate hstates[HUGE_MAX_HSTATE]; +static int disable_hugepages; /* * Minimum page order among possible hugepage sizes, set to a proper value * at boot time. @@ -2550,6 +2551,9 @@ static ssize_t __nr_hugepages_store_common(bool obey_mempolicy, int err; nodemask_t nodes_allowed, *n_mask; + if (disable_hugepages) + return -EINVAL; + if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported()) return -EINVAL; @@ -2978,6 +2982,9 @@ static int __init hugetlb_nrpages_setup(char *s) unsigned long *mhp; static unsigned long *last_mhp; + if (disable_hugepages) + return 1; + if (!parsed_valid_hugepagesz) { pr_warn("hugepages = %s preceded by " "an unsupported hugepagesz, ignoring\n", s); @@ -3022,6 +3029,15 @@ static int __init hugetlb_default_setup(char *s) } __setup("default_hugepagesz=", hugetlb_default_setup); +static int __init nohugepages_setup(char *str) +{ + disable_hugepages = 1; + pr_info("HugeTLB: hugepages disabled by kernel parameter\n"); + + return 0; +} +early_param("nohugepages", nohugepages_setup); + static unsigned int cpuset_mems_nr(unsigned int *array) { int node;
Currently there are 2 ways for setting HugeTLB hugepages in kernel; either users pass parameters on kernel command-line or they can write to sysfs files (which is effectively the sysctl way). Kdump kernels won't benefit from hugepages - in fact it's quite opposite, it may be the case hugepages on kdump kernel can lead to OOM if kernel gets unable to allocate demanded pages due to the fact the preallocated hugepages are consuming a lot of memory. This patch proposes a new kernel parameter to prevent the creation of HugeTLB hugepages - we currently don't have a way to do that. We can even have kdump scripts removing the kernel command-line options to set hugepages, but it's not straightforward to prevent sysctl/sysfs configuration, given it happens in later boot or anytime when the system is running. Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> --- About some decisions took in this patch: * early_param() was used because I couldn't find a way to enforce parameters' ordering when using __setup(), and we need nohugepages processed before all other hugepages options. * The return when sysctl handler is prevented to progress due to nohugepages is -EINVAL, but could be changed; I've just followed present code there, but I'm OK changing that if we have suggestions. Thanks in advance for the review! Cheers, Guilherme Documentation/admin-guide/kernel-parameters.txt | 4 ++++ mm/hugetlb.c | 16 ++++++++++++++++ 2 files changed, 20 insertions(+)