diff mbox series

[v2,4/4] mm/hugetl.c: warn out if expected count of huge pages adjustment is not achieved

Message ID 20200723032248.24772-5-bhe@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm/hugetlb: Small cleanup and improvement | expand

Commit Message

Baoquan He July 23, 2020, 3:22 a.m. UTC
A customer complained that no message is logged when the number of
persistent huge pages is not changed to the exact value written to
the sysfs or proc nr_hugepages file.

In the current code, a best effort is made to satisfy requests made
via the nr_hugepages file.  However, requests may be only partially
satisfied.

Log a message if the code was unsuccessful in fully satisfying a
request. This includes both increasing and decreasing the number
of persistent huge pages.

Signed-off-by: Baoquan He <bhe@redhat.com>
---
 mm/hugetlb.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Anshuman Khandual July 23, 2020, 6:16 a.m. UTC | #1
On 07/23/2020 08:52 AM, Baoquan He wrote:
> A customer complained that no message is logged wh	en the number of
> persistent huge pages is not changed to the exact value written to
> the sysfs or proc nr_hugepages file.
> 
> In the current code, a best effort is made to satisfy requests made
> via the nr_hugepages file.  However, requests may be only partially
> satisfied.
> 
> Log a message if the code was unsuccessful in fully satisfying a
> request. This includes both increasing and decreasing the number
> of persistent huge pages.

But is kernel expected to warn for all such situations where the user
requested resources could not be allocated completely ? Otherwise, it
does not make sense to add an warning for just one such situation.
Baoquan He July 23, 2020, 9:11 a.m. UTC | #2
On 07/23/20 at 11:46am, Anshuman Khandual wrote:
> 
> 
> On 07/23/2020 08:52 AM, Baoquan He wrote:
> > A customer complained that no message is logged wh	en the number of
> > persistent huge pages is not changed to the exact value written to
> > the sysfs or proc nr_hugepages file.
> > 
> > In the current code, a best effort is made to satisfy requests made
> > via the nr_hugepages file.  However, requests may be only partially
> > satisfied.
> > 
> > Log a message if the code was unsuccessful in fully satisfying a
> > request. This includes both increasing and decreasing the number
> > of persistent huge pages.
> 
> But is kernel expected to warn for all such situations where the user
> requested resources could not be allocated completely ? Otherwise, it
> does not make sense to add an warning for just one such situation.

It's not for just one such situation, we have already had one to warn
out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().

As Mike said, in one time of persistent huge page number setting,
comparing the old value with the new vlaue is good enough for customer
to get the information. However, if customer want to detect and analyze
previous setting failure, logging message will be helpful. So I think
logging the failure or partial success makes sense.

Thanks
Baoquan
Mike Kravetz July 23, 2020, 6:21 p.m. UTC | #3
On 7/23/20 2:11 AM, Baoquan He wrote:
> On 07/23/20 at 11:46am, Anshuman Khandual wrote:
>>
>>
>> On 07/23/2020 08:52 AM, Baoquan He wrote:
>>> A customer complained that no message is logged wh	en the number of
>>> persistent huge pages is not changed to the exact value written to
>>> the sysfs or proc nr_hugepages file.
>>>
>>> In the current code, a best effort is made to satisfy requests made
>>> via the nr_hugepages file.  However, requests may be only partially
>>> satisfied.
>>>
>>> Log a message if the code was unsuccessful in fully satisfying a
>>> request. This includes both increasing and decreasing the number
>>> of persistent huge pages.
>>
>> But is kernel expected to warn for all such situations where the user
>> requested resources could not be allocated completely ? Otherwise, it
>> does not make sense to add an warning for just one such situation.
> 
> It's not for just one such situation, we have already had one to warn
> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().

Those are a little different in that they are warnings based on kernel
command line parameters.

> As Mike said, in one time of persistent huge page number setting,
> comparing the old value with the new vlaue is good enough for customer
> to get the information. However, if customer want to detect and analyze
> previous setting failure, logging message will be helpful. So I think
> logging the failure or partial success makes sense.

I can understand the argument against adding a new warning for this.
You could even argue that this condition has existed since the time
hugetlb was added to the kernel which was long ago.  And, nobody has
complained enough to add a warning.  I have even heard of a sysadmin
practice of asking for a ridiculously large amount of hugetlb pages
just so that the kernel will allocate as many as possible.  They do
not 'expect' to get the ridiculous amount they asked for.  In such
cases, this will be a new warning in their log.

As mentioned in a previous e-mail, when one sets nr_hugepages by writing
to the sysfs or proc file, one needs to read the file to determine if the
number of requested pages were actually allocated.  Anyone who does not
do this is just asking for trouble.  Yet, I imagine that it may happen.

To be honest, I do not see this log message as something that would be
helpful to end users.  Rather, I could see this as being useful to support
people.  Support always asks for system logs and this could point out a
possible issue with hugetlb usage.

I do not feel strongly one way or another about adding the warning.  Since
it is fairly trivial and could help diagnose issues I am in favor of adding
it.  If people feel strongly that it should not be added, I am open to
those arguments.
Baoquan He July 24, 2020, 2:59 p.m. UTC | #4
On 07/23/20 at 11:21am, Mike Kravetz wrote:
> On 7/23/20 2:11 AM, Baoquan He wrote:
> > On 07/23/20 at 11:46am, Anshuman Khandual wrote:
> >>
> >>
> >> On 07/23/2020 08:52 AM, Baoquan He wrote:
> >>> A customer complained that no message is logged wh	en the number of
> >>> persistent huge pages is not changed to the exact value written to
> >>> the sysfs or proc nr_hugepages file.
> >>>
> >>> In the current code, a best effort is made to satisfy requests made
> >>> via the nr_hugepages file.  However, requests may be only partially
> >>> satisfied.
> >>>
> >>> Log a message if the code was unsuccessful in fully satisfying a
> >>> request. This includes both increasing and decreasing the number
> >>> of persistent huge pages.
> >>
> >> But is kernel expected to warn for all such situations where the user
> >> requested resources could not be allocated completely ? Otherwise, it
> >> does not make sense to add an warning for just one such situation.
> > 
> > It's not for just one such situation, we have already had one to warn
> > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
> 
> Those are a little different in that they are warnings based on kernel
> command line parameters.
> 
> > As Mike said, in one time of persistent huge page number setting,
> > comparing the old value with the new vlaue is good enough for customer
> > to get the information. However, if customer want to detect and analyze
> > previous setting failure, logging message will be helpful. So I think
> > logging the failure or partial success makes sense.
> 
> I can understand the argument against adding a new warning for this.
> You could even argue that this condition has existed since the time
> hugetlb was added to the kernel which was long ago.  And, nobody has
> complained enough to add a warning.  I have even heard of a sysadmin
> practice of asking for a ridiculously large amount of hugetlb pages
> just so that the kernel will allocate as many as possible.  They do
> not 'expect' to get the ridiculous amount they asked for.  In such
> cases, this will be a new warning in their log.
> 
> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> to the sysfs or proc file, one needs to read the file to determine if the
> number of requested pages were actually allocated.  Anyone who does not
> do this is just asking for trouble.  Yet, I imagine that it may happen.
> 
> To be honest, I do not see this log message as something that would be
> helpful to end users.  Rather, I could see this as being useful to support
> people.  Support always asks for system logs and this could point out a
> possible issue with hugetlb usage.
> 
> I do not feel strongly one way or another about adding the warning.  Since
> it is fairly trivial and could help diagnose issues I am in favor of adding
> it.  If people feel strongly that it should not be added, I am open to
> those arguments.

Seems it's all done, and very fair. I appreciate your understanding on
this issue. Will see if any strong concern is raised on the log adding.
Baoquan He Aug. 11, 2020, 2:11 a.m. UTC | #5
Hi Mike,

On 07/23/20 at 11:21am, Mike Kravetz wrote:
> On 7/23/20 2:11 AM, Baoquan He wrote:
...
> >> But is kernel expected to warn for all such situations where the user
> >> requested resources could not be allocated completely ? Otherwise, it
> >> does not make sense to add an warning for just one such situation.
> > 
> > It's not for just one such situation, we have already had one to warn
> > out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
> 
> Those are a little different in that they are warnings based on kernel
> command line parameters.
> 
> > As Mike said, in one time of persistent huge page number setting,
> > comparing the old value with the new vlaue is good enough for customer
> > to get the information. However, if customer want to detect and analyze
> > previous setting failure, logging message will be helpful. So I think
> > logging the failure or partial success makes sense.
> 
> I can understand the argument against adding a new warning for this.
> You could even argue that this condition has existed since the time
> hugetlb was added to the kernel which was long ago.  And, nobody has
> complained enough to add a warning.  I have even heard of a sysadmin
> practice of asking for a ridiculously large amount of hugetlb pages
> just so that the kernel will allocate as many as possible.  They do
> not 'expect' to get the ridiculous amount they asked for.  In such
> cases, this will be a new warning in their log.
> 
> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> to the sysfs or proc file, one needs to read the file to determine if the
> number of requested pages were actually allocated.  Anyone who does not
> do this is just asking for trouble.  Yet, I imagine that it may happen.
> 
> To be honest, I do not see this log message as something that would be
> helpful to end users.  Rather, I could see this as being useful to support
> people.  Support always asks for system logs and this could point out a
> possible issue with hugetlb usage.
> 
> I do not feel strongly one way or another about adding the warning.  Since
> it is fairly trivial and could help diagnose issues I am in favor of adding
> it.  If people feel strongly that it should not be added, I am open to
> those arguments.

Ping!

It's been a while, seems no objection to log the message. Do you
consider accepting this patch or offering an Ack?

Thanks
Baoquan
Mike Kravetz Aug. 11, 2020, 3:35 a.m. UTC | #6
Cc: Michal

On 8/10/20 7:11 PM, Baoquan He wrote:
> Hi Mike,
> 
> On 07/23/20 at 11:21am, Mike Kravetz wrote:
>> On 7/23/20 2:11 AM, Baoquan He wrote:
> ...
>>>> But is kernel expected to warn for all such situations where the user
>>>> requested resources could not be allocated completely ? Otherwise, it
>>>> does not make sense to add an warning for just one such situation.
>>>
>>> It's not for just one such situation, we have already had one to warn
>>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
>>
>> Those are a little different in that they are warnings based on kernel
>> command line parameters.
>>
>>> As Mike said, in one time of persistent huge page number setting,
>>> comparing the old value with the new vlaue is good enough for customer
>>> to get the information. However, if customer want to detect and analyze
>>> previous setting failure, logging message will be helpful. So I think
>>> logging the failure or partial success makes sense.
>>
>> I can understand the argument against adding a new warning for this.
>> You could even argue that this condition has existed since the time
>> hugetlb was added to the kernel which was long ago.  And, nobody has
>> complained enough to add a warning.  I have even heard of a sysadmin
>> practice of asking for a ridiculously large amount of hugetlb pages
>> just so that the kernel will allocate as many as possible.  They do
>> not 'expect' to get the ridiculous amount they asked for.  In such
>> cases, this will be a new warning in their log.
>>
>> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
>> to the sysfs or proc file, one needs to read the file to determine if the
>> number of requested pages were actually allocated.  Anyone who does not
>> do this is just asking for trouble.  Yet, I imagine that it may happen.
>>
>> To be honest, I do not see this log message as something that would be
>> helpful to end users.  Rather, I could see this as being useful to support
>> people.  Support always asks for system logs and this could point out a
>> possible issue with hugetlb usage.
>>
>> I do not feel strongly one way or another about adding the warning.  Since
>> it is fairly trivial and could help diagnose issues I am in favor of adding
>> it.  If people feel strongly that it should not be added, I am open to
>> those arguments.
> 
> Ping!
> 
> It's been a while, seems no objection to log the message. Do you
> consider accepting this patch or offering an Ack?
> 
> Thanks
> Baoquan

Adding Michal as he has had opinions about hugetlbfs log messages in the past.
Michal Hocko Aug. 11, 2020, 7:24 a.m. UTC | #7
On Mon 10-08-20 20:35:25, Mike Kravetz wrote:
> Cc: Michal
> 
> On 8/10/20 7:11 PM, Baoquan He wrote:
> > Hi Mike,
> > 
> > On 07/23/20 at 11:21am, Mike Kravetz wrote:
> >> On 7/23/20 2:11 AM, Baoquan He wrote:
> > ...
> >>>> But is kernel expected to warn for all such situations where the user
> >>>> requested resources could not be allocated completely ? Otherwise, it
> >>>> does not make sense to add an warning for just one such situation.
> >>>
> >>> It's not for just one such situation, we have already had one to warn
> >>> out in mm/hugetlb.c, please check hugetlb_hstate_alloc_pages().
> >>
> >> Those are a little different in that they are warnings based on kernel
> >> command line parameters.
> >>
> >>> As Mike said, in one time of persistent huge page number setting,
> >>> comparing the old value with the new vlaue is good enough for customer
> >>> to get the information. However, if customer want to detect and analyze
> >>> previous setting failure, logging message will be helpful. So I think
> >>> logging the failure or partial success makes sense.
> >>
> >> I can understand the argument against adding a new warning for this.
> >> You could even argue that this condition has existed since the time
> >> hugetlb was added to the kernel which was long ago.  And, nobody has
> >> complained enough to add a warning.  I have even heard of a sysadmin
> >> practice of asking for a ridiculously large amount of hugetlb pages
> >> just so that the kernel will allocate as many as possible.  They do
> >> not 'expect' to get the ridiculous amount they asked for.  In such
> >> cases, this will be a new warning in their log.
> >>
> >> As mentioned in a previous e-mail, when one sets nr_hugepages by writing
> >> to the sysfs or proc file, one needs to read the file to determine if the
> >> number of requested pages were actually allocated.  Anyone who does not
> >> do this is just asking for trouble.  Yet, I imagine that it may happen.
> >>
> >> To be honest, I do not see this log message as something that would be
> >> helpful to end users.  Rather, I could see this as being useful to support
> >> people.  Support always asks for system logs and this could point out a
> >> possible issue with hugetlb usage.
> >>
> >> I do not feel strongly one way or another about adding the warning.  Since
> >> it is fairly trivial and could help diagnose issues I am in favor of adding
> >> it.  If people feel strongly that it should not be added, I am open to
> >> those arguments.
> > 
> > Ping!
> > 
> > It's been a while, seems no objection to log the message. Do you
> > consider accepting this patch or offering an Ack?
> > 
> > Thanks
> > Baoquan
> 
> Adding Michal as he has had opinions about hugetlbfs log messages in the past.

My opinion is that the warning is too late to add at this stage. It
would have been much better if the user interface has provided a
reasonable feedback on how much the request was sucessful. But this
is not the case (except for few error cases) and we have to live with
the interface where the caller has to read the value after writing to
it. Lame but a reality.

I have heard about people making an opportunistic attempt to grab as
many hugetlb pages as possible and they do expect the failure and scale
the request size down. I do not think those would appreciate warnings.

That being said I would rather keep the existing behavior even though it
is suboptimal. It is just trivial to add the check in the userspace
without risking complains by other users. Besides the warning is not
really telling us much more than a subsequent read anyway. You are not
going to learn why the allocation has failed because that one is done
(intentionaly) as __GFP_NOWARN.
Mike Kravetz Aug. 11, 2020, 11:11 p.m. UTC | #8
On 8/11/20 12:24 AM, Michal Hocko wrote:
> 
> My opinion is that the warning is too late to add at this stage. It
> would have been much better if the user interface has provided a
> reasonable feedback on how much the request was sucessful. But this
> is not the case (except for few error cases) and we have to live with
> the interface where the caller has to read the value after writing to
> it. Lame but a reality.
> 
> I have heard about people making an opportunistic attempt to grab as
> many hugetlb pages as possible and they do expect the failure and scale
> the request size down. I do not think those would appreciate warnings.
> 
> That being said I would rather keep the existing behavior even though it
> is suboptimal. It is just trivial to add the check in the userspace
> without risking complains by other users. Besides the warning is not
> really telling us much more than a subsequent read anyway. You are not
> going to learn why the allocation has failed because that one is done
> (intentionaly) as __GFP_NOWARN.
> 

Thanks Michal.

As previously stated, I do not have a strong opinion about this.  Because of
this, let's just leave things as they are and not add the message.

It is pretty clear that a user needs to read the value after writing to
determine if all pages were allocated.  The log message would add little
benefit to the end user.
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index c14837854392..b5aa32a13569 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2661,7 +2661,7 @@  static int adjust_pool_surplus(struct hstate *h, nodemask_t *nodes_allowed,
 static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 			      nodemask_t *nodes_allowed)
 {
-	unsigned long min_count, ret;
+	unsigned long min_count, ret, old_max, new_max;
 	NODEMASK_ALLOC(nodemask_t, node_alloc_noretry, GFP_KERNEL);
 
 	/*
@@ -2723,6 +2723,7 @@  static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	 * pool might be one hugepage larger than it needs to be, but
 	 * within all the constraints specified by the sysctls.
 	 */
+	old_max = persistent_huge_pages(h);
 	while (h->surplus_huge_pages && count > persistent_huge_pages(h)) {
 		if (!adjust_pool_surplus(h, nodes_allowed, -1))
 			break;
@@ -2779,8 +2780,20 @@  static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid,
 	}
 out:
 	h->max_huge_pages = persistent_huge_pages(h);
+	new_max = h->max_huge_pages;
 	spin_unlock(&hugetlb_lock);
 
+	if (count != new_max) {
+		char buf[32];
+
+		string_get_size(huge_page_size(h), 1, STRING_UNITS_2, buf, 32);
+		pr_warn("HugeTLB: %s %lu of page size %s failed. Only %s %lu hugepages.\n",
+			count > old_max ? "increasing" : "decreasing",
+			abs(count - old_max), buf,
+			count > old_max ? "increased" : "decreased",
+			abs(old_max - new_max));
+	}
+
 	NODEMASK_FREE(node_alloc_noretry);
 
 	return 0;