Message ID | 20200723032248.24772-5-bhe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/hugetlb: Small cleanup and improvement | expand |
On 07/23/2020 08:52 AM, Baoquan He wrote: > A customer complained that no message is logged wh en the number of > persistent huge pages is not changed to the exact value written to > the sysfs or proc nr_hugepages file. > > In the current code, a best effort is made to satisfy requests made > via the nr_hugepages file. However, requests may be only partially > satisfied. > > Log a message if the code was unsuccessful in fully satisfying a > request. This includes both increasing and decreasing the number > of persistent huge pages. But is kernel expected to warn for all such situations where the user requested resources could not be allocated completely ? Otherwise, it does not make sense to add an warning for just one such situation.
On 07/23/20 at 11:46am, Anshuman Khandual wrote: > > > On 07/23/2020 08:52 AM, Baoquan He wrote: > > A customer complained that no message is logged wh en the number of > > persistent huge pages is not changed to the exact value written to > > the sysfs or proc nr_hugepages file. > > > > In the current code, a best effort is made to satisfy requests made > > via the nr_hugepages file. However, requests may be only partially > > satisfied. > > > > Log a message if the code was unsuccessful in fully satisfying a > > request. This includes both increasing and decreasing the number > > of persistent huge pages. > > But is kernel expected to warn for all such situations where the user > requested resources could not be allocated completely ? Otherwise, it > does not make sense to add an warning for just one such situation. It's not for just one such situation, we have already had one to warn out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). As Mike said, in one time of persistent huge page number setting, comparing the old value with the new vlaue is good enough for customer to get the information. However, if customer want to detect and analyze previous setting failure, logging message will be helpful. So I think logging the failure or partial success makes sense. Thanks Baoquan
On 7/23/20 2:11 AM, Baoquan He wrote: > On 07/23/20 at 11:46am, Anshuman Khandual wrote: >> >> >> On 07/23/2020 08:52 AM, Baoquan He wrote: >>> A customer complained that no message is logged wh en the number of >>> persistent huge pages is not changed to the exact value written to >>> the sysfs or proc nr_hugepages file. >>> >>> In the current code, a best effort is made to satisfy requests made >>> via the nr_hugepages file. However, requests may be only partially >>> satisfied. >>> >>> Log a message if the code was unsuccessful in fully satisfying a >>> request. This includes both increasing and decreasing the number >>> of persistent huge pages. >> >> But is kernel expected to warn for all such situations where the user >> requested resources could not be allocated completely ? Otherwise, it >> does not make sense to add an warning for just one such situation. > > It's not for just one such situation, we have already had one to warn > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). Those are a little different in that they are warnings based on kernel command line parameters. > As Mike said, in one time of persistent huge page number setting, > comparing the old value with the new vlaue is good enough for customer > to get the information. However, if customer want to detect and analyze > previous setting failure, logging message will be helpful. So I think > logging the failure or partial success makes sense. I can understand the argument against adding a new warning for this. You could even argue that this condition has existed since the time hugetlb was added to the kernel which was long ago. And, nobody has complained enough to add a warning. I have even heard of a sysadmin practice of asking for a ridiculously large amount of hugetlb pages just so that the kernel will allocate as many as possible. They do not 'expect' to get the ridiculous amount they asked for. In such cases, this will be a new warning in their log. As mentioned in a previous e-mail, when one sets nr_hugepages by writing to the sysfs or proc file, one needs to read the file to determine if the number of requested pages were actually allocated. Anyone who does not do this is just asking for trouble. Yet, I imagine that it may happen. To be honest, I do not see this log message as something that would be helpful to end users. Rather, I could see this as being useful to support people. Support always asks for system logs and this could point out a possible issue with hugetlb usage. I do not feel strongly one way or another about adding the warning. Since it is fairly trivial and could help diagnose issues I am in favor of adding it. If people feel strongly that it should not be added, I am open to those arguments.
On 07/23/20 at 11:21am, Mike Kravetz wrote: > On 7/23/20 2:11 AM, Baoquan He wrote: > > On 07/23/20 at 11:46am, Anshuman Khandual wrote: > >> > >> > >> On 07/23/2020 08:52 AM, Baoquan He wrote: > >>> A customer complained that no message is logged wh en the number of > >>> persistent huge pages is not changed to the exact value written to > >>> the sysfs or proc nr_hugepages file. > >>> > >>> In the current code, a best effort is made to satisfy requests made > >>> via the nr_hugepages file. However, requests may be only partially > >>> satisfied. > >>> > >>> Log a message if the code was unsuccessful in fully satisfying a > >>> request. This includes both increasing and decreasing the number > >>> of persistent huge pages. > >> > >> But is kernel expected to warn for all such situations where the user > >> requested resources could not be allocated completely ? Otherwise, it > >> does not make sense to add an warning for just one such situation. > > > > It's not for just one such situation, we have already had one to warn > > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). > > Those are a little different in that they are warnings based on kernel > command line parameters. > > > As Mike said, in one time of persistent huge page number setting, > > comparing the old value with the new vlaue is good enough for customer > > to get the information. However, if customer want to detect and analyze > > previous setting failure, logging message will be helpful. So I think > > logging the failure or partial success makes sense. > > I can understand the argument against adding a new warning for this. > You could even argue that this condition has existed since the time > hugetlb was added to the kernel which was long ago. And, nobody has > complained enough to add a warning. I have even heard of a sysadmin > practice of asking for a ridiculously large amount of hugetlb pages > just so that the kernel will allocate as many as possible. They do > not 'expect' to get the ridiculous amount they asked for. In such > cases, this will be a new warning in their log. > > As mentioned in a previous e-mail, when one sets nr_hugepages by writing > to the sysfs or proc file, one needs to read the file to determine if the > number of requested pages were actually allocated. Anyone who does not > do this is just asking for trouble. Yet, I imagine that it may happen. > > To be honest, I do not see this log message as something that would be > helpful to end users. Rather, I could see this as being useful to support > people. Support always asks for system logs and this could point out a > possible issue with hugetlb usage. > > I do not feel strongly one way or another about adding the warning. Since > it is fairly trivial and could help diagnose issues I am in favor of adding > it. If people feel strongly that it should not be added, I am open to > those arguments. Seems it's all done, and very fair. I appreciate your understanding on this issue. Will see if any strong concern is raised on the log adding.
Hi Mike, On 07/23/20 at 11:21am, Mike Kravetz wrote: > On 7/23/20 2:11 AM, Baoquan He wrote: ... > >> But is kernel expected to warn for all such situations where the user > >> requested resources could not be allocated completely ? Otherwise, it > >> does not make sense to add an warning for just one such situation. > > > > It's not for just one such situation, we have already had one to warn > > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). > > Those are a little different in that they are warnings based on kernel > command line parameters. > > > As Mike said, in one time of persistent huge page number setting, > > comparing the old value with the new vlaue is good enough for customer > > to get the information. However, if customer want to detect and analyze > > previous setting failure, logging message will be helpful. So I think > > logging the failure or partial success makes sense. > > I can understand the argument against adding a new warning for this. > You could even argue that this condition has existed since the time > hugetlb was added to the kernel which was long ago. And, nobody has > complained enough to add a warning. I have even heard of a sysadmin > practice of asking for a ridiculously large amount of hugetlb pages > just so that the kernel will allocate as many as possible. They do > not 'expect' to get the ridiculous amount they asked for. In such > cases, this will be a new warning in their log. > > As mentioned in a previous e-mail, when one sets nr_hugepages by writing > to the sysfs or proc file, one needs to read the file to determine if the > number of requested pages were actually allocated. Anyone who does not > do this is just asking for trouble. Yet, I imagine that it may happen. > > To be honest, I do not see this log message as something that would be > helpful to end users. Rather, I could see this as being useful to support > people. Support always asks for system logs and this could point out a > possible issue with hugetlb usage. > > I do not feel strongly one way or another about adding the warning. Since > it is fairly trivial and could help diagnose issues I am in favor of adding > it. If people feel strongly that it should not be added, I am open to > those arguments. Ping! It's been a while, seems no objection to log the message. Do you consider accepting this patch or offering an Ack? Thanks Baoquan
Cc: Michal On 8/10/20 7:11 PM, Baoquan He wrote: > Hi Mike, > > On 07/23/20 at 11:21am, Mike Kravetz wrote: >> On 7/23/20 2:11 AM, Baoquan He wrote: > ... >>>> But is kernel expected to warn for all such situations where the user >>>> requested resources could not be allocated completely ? Otherwise, it >>>> does not make sense to add an warning for just one such situation. >>> >>> It's not for just one such situation, we have already had one to warn >>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). >> >> Those are a little different in that they are warnings based on kernel >> command line parameters. >> >>> As Mike said, in one time of persistent huge page number setting, >>> comparing the old value with the new vlaue is good enough for customer >>> to get the information. However, if customer want to detect and analyze >>> previous setting failure, logging message will be helpful. So I think >>> logging the failure or partial success makes sense. >> >> I can understand the argument against adding a new warning for this. >> You could even argue that this condition has existed since the time >> hugetlb was added to the kernel which was long ago. And, nobody has >> complained enough to add a warning. I have even heard of a sysadmin >> practice of asking for a ridiculously large amount of hugetlb pages >> just so that the kernel will allocate as many as possible. They do >> not 'expect' to get the ridiculous amount they asked for. In such >> cases, this will be a new warning in their log. >> >> As mentioned in a previous e-mail, when one sets nr_hugepages by writing >> to the sysfs or proc file, one needs to read the file to determine if the >> number of requested pages were actually allocated. Anyone who does not >> do this is just asking for trouble. Yet, I imagine that it may happen. >> >> To be honest, I do not see this log message as something that would be >> helpful to end users. Rather, I could see this as being useful to support >> people. Support always asks for system logs and this could point out a >> possible issue with hugetlb usage. >> >> I do not feel strongly one way or another about adding the warning. Since >> it is fairly trivial and could help diagnose issues I am in favor of adding >> it. If people feel strongly that it should not be added, I am open to >> those arguments. > > Ping! > > It's been a while, seems no objection to log the message. Do you > consider accepting this patch or offering an Ack? > > Thanks > Baoquan Adding Michal as he has had opinions about hugetlbfs log messages in the past.
On Mon 10-08-20 20:35:25, Mike Kravetz wrote: > Cc: Michal > > On 8/10/20 7:11 PM, Baoquan He wrote: > > Hi Mike, > > > > On 07/23/20 at 11:21am, Mike Kravetz wrote: > >> On 7/23/20 2:11 AM, Baoquan He wrote: > > ... > >>>> But is kernel expected to warn for all such situations where the user > >>>> requested resources could not be allocated completely ? Otherwise, it > >>>> does not make sense to add an warning for just one such situation. > >>> > >>> It's not for just one such situation, we have already had one to warn > >>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages(). > >> > >> Those are a little different in that they are warnings based on kernel > >> command line parameters. > >> > >>> As Mike said, in one time of persistent huge page number setting, > >>> comparing the old value with the new vlaue is good enough for customer > >>> to get the information. However, if customer want to detect and analyze > >>> previous setting failure, logging message will be helpful. So I think > >>> logging the failure or partial success makes sense. > >> > >> I can understand the argument against adding a new warning for this. > >> You could even argue that this condition has existed since the time > >> hugetlb was added to the kernel which was long ago. And, nobody has > >> complained enough to add a warning. I have even heard of a sysadmin > >> practice of asking for a ridiculously large amount of hugetlb pages > >> just so that the kernel will allocate as many as possible. They do > >> not 'expect' to get the ridiculous amount they asked for. In such > >> cases, this will be a new warning in their log. > >> > >> As mentioned in a previous e-mail, when one sets nr_hugepages by writing > >> to the sysfs or proc file, one needs to read the file to determine if the > >> number of requested pages were actually allocated. Anyone who does not > >> do this is just asking for trouble. Yet, I imagine that it may happen. > >> > >> To be honest, I do not see this log message as something that would be > >> helpful to end users. Rather, I could see this as being useful to support > >> people. Support always asks for system logs and this could point out a > >> possible issue with hugetlb usage. > >> > >> I do not feel strongly one way or another about adding the warning. Since > >> it is fairly trivial and could help diagnose issues I am in favor of adding > >> it. If people feel strongly that it should not be added, I am open to > >> those arguments. > > > > Ping! > > > > It's been a while, seems no objection to log the message. Do you > > consider accepting this patch or offering an Ack? > > > > Thanks > > Baoquan > > Adding Michal as he has had opinions about hugetlbfs log messages in the past. My opinion is that the warning is too late to add at this stage. It would have been much better if the user interface has provided a reasonable feedback on how much the request was sucessful. But this is not the case (except for few error cases) and we have to live with the interface where the caller has to read the value after writing to it. Lame but a reality. I have heard about people making an opportunistic attempt to grab as many hugetlb pages as possible and they do expect the failure and scale the request size down. I do not think those would appreciate warnings. That being said I would rather keep the existing behavior even though it is suboptimal. It is just trivial to add the check in the userspace without risking complains by other users. Besides the warning is not really telling us much more than a subsequent read anyway. You are not going to learn why the allocation has failed because that one is done (intentionaly) as __GFP_NOWARN.
On 8/11/20 12:24 AM, Michal Hocko wrote: > > My opinion is that the warning is too late to add at this stage. It > would have been much better if the user interface has provided a > reasonable feedback on how much the request was sucessful. But this > is not the case (except for few error cases) and we have to live with > the interface where the caller has to read the value after writing to > it. Lame but a reality. > > I have heard about people making an opportunistic attempt to grab as > many hugetlb pages as possible and they do expect the failure and scale > the request size down. I do not think those would appreciate warnings. > > That being said I would rather keep the existing behavior even though it > is suboptimal. It is just trivial to add the check in the userspace > without risking complains by other users. Besides the warning is not > really telling us much more than a subsequent read anyway. You are not > going to learn why the allocation has failed because that one is done > (intentionaly) as __GFP_NOWARN. > Thanks Michal. As previously stated, I do not have a strong opinion about this. Because of this, let's just leave things as they are and not add the message. It is pretty clear that a user needs to read the value after writing to determine if all pages were allocated. The log message would add little benefit to the end user.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c14837854392..b5aa32a13569 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -2661,7 +2661,7 @@ static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed, static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, nodemask_t *nodes_allowed) { - unsigned long min_count, ret; + unsigned long min_count, ret, old_max, new_max; NODEMASK_ALLOC(nodemask_t, node_alloc_noretry, GFP_KERNEL); /* @@ -2723,6 +2723,7 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, * pool might be one hugepage larger than it needs to be, but * within all the constraints specified by the sysctls. */ + old_max = persistent_huge_pages(h); while (h->surplus_huge_pages && count > persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, -1)) break; @@ -2779,8 +2780,20 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, } out: h->max_huge_pages = persistent_huge_pages(h); + new_max = h->max_huge_pages; spin_unlock(&hugetlb_lock); + if (count != new_max) { + char buf[32]; + + string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32); + pr_warn("HugeTLB: %s %lu of page size %s failed. Only %s %lu hugepages.\n", + count > old_max ? "increasing" : "decreasing", + abs(count - old_max), buf, + count > old_max ? "increased" : "decreased", + abs(old_max - new_max)); + } + NODEMASK_FREE(node_alloc_noretry); return 0;
A customer complained that no message is logged when the number of persistent huge pages is not changed to the exact value written to the sysfs or proc nr_hugepages file. In the current code, a best effort is made to satisfy requests made via the nr_hugepages file. However, requests may be only partially satisfied. Log a message if the code was unsuccessful in fully satisfying a request. This includes both increasing and decreasing the number of persistent huge pages. Signed-off-by: Baoquan He <bhe@redhat.com> --- mm/hugetlb.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)