diff mbox series

page_alloc: avoid the negative free for meminfo available

Message ID 20230103072807.19578-1-jaewon31.kim@samsung.com (mailing list archive)
State New
Headers show
Series page_alloc: avoid the negative free for meminfo available | expand

Commit Message

Jaewon Kim Jan. 3, 2023, 7:28 a.m. UTC
The totalreserve_pages could be higher than the free because of
watermark high or watermark boost. Handle this situation and fix it to 0
free size.

Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
---
 mm/page_alloc.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Lorenzo Stoakes Jan. 3, 2023, 7:35 a.m. UTC | #1
On Tue, Jan 03, 2023 at 04:28:07PM +0900, Jaewon Kim wrote:
> The totalreserve_pages could be higher than the free because of
> watermark high or watermark boost. Handle this situation and fix it to 0
> free size.
>
> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 218b28ee49ed..e510ae83d5f3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>  	 * without causing swapping or OOM.
>  	 */
>  	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> +	if (available < 0)
> +		available = 0;
>
>  	/*
>  	 * Not all the page cache can be freed, otherwise the system will
> --
> 2.17.1
>

We already reset to zero at the end of the function, wouldn't resetting to zero
here potentially skew the result?
Jaewon Kim Jan. 3, 2023, 7:50 a.m. UTC | #2
>--------- Original Message ---------
>Sender : Lorenzo Stoakes?<lstoakes@gmail.com>
>Date : 2023-01-03 16:35 (GMT+9)
>Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
>?
>On Tue, Jan 03, 2023 at 04:28:07PM +0900, Jaewon Kim wrote:
>> The totalreserve_pages could be higher than the free because of
>> watermark high or watermark boost. Handle this situation and fix it to 0
>> free size.
>>
>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> ---
>> ?mm/page_alloc.c | 2 ++
>> ?1 file changed, 2 insertions(+)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 218b28ee49ed..e510ae83d5f3 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>> ? ? ? ? ? * without causing swapping or OOM.
>> ? ? ? ? ? */
>> ? ? ? ? ?available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> + ? ? ? ?if (available < 0)
>> + ? ? ? ? ? ? ? ?available = 0;
>>
>> ? ? ? ? ?/*
>> ? ? ? ? ? * Not all the page cache can be freed, otherwise the system will
>> --
>> 2.17.1
>>
>
>We already reset to zero at the end of the function, wouldn't resetting to zero
>here potentially skew the result?
>

Hello

I did not mean the negative of the final available, we should account the actual size
by removing some improper portion of it. The free should be not negative in that perspective.
If negative, other parts like pagecache an reclailable would be decreased.

Actually pagecache and reclaimable are caculated with min, so I think reseting to zero
at the end the function is not necessary.

br
Michal Hocko Jan. 3, 2023, 8:03 a.m. UTC | #3
On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> The totalreserve_pages could be higher than the free because of
> watermark high or watermark boost. Handle this situation and fix it to 0
> free size.

What is the actual problem you are trying to address by this change?

> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 218b28ee49ed..e510ae83d5f3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>  	 * without causing swapping or OOM.
>  	 */
>  	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> +	if (available < 0)
> +		available = 0;
>  
>  	/*
>  	 * Not all the page cache can be freed, otherwise the system will
> -- 
> 2.17.1
Jaewon Kim Jan. 3, 2023, 8:20 a.m. UTC | #4
>On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> The totalreserve_pages could be higher than the free because of
>> watermark high or watermark boost. Handle this situation and fix it to 0
>> free size.
>
>What is the actual problem you are trying to address by this change?

Hello

As described on the original commit,
  34e431b0ae39 /proc/meminfo: provide estimated available memory
mm is tring to provide the avaiable memory to user space.

But if free is negative, the available memory shown to userspace
would be shown smaller thatn the actual available size. The userspace
may do unwanted memory shrinking actions like process kills.

I think the logic sholud account the positive size only.

BR

>
>> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> ---
>>  mm/page_alloc.c | 2 ++
>>  1 file changed, 2 insertions(+)
>> 
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 218b28ee49ed..e510ae83d5f3 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>>  	 * without causing swapping or OOM.
>>  	 */
>>  	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> +	if (available < 0)
>> +		available = 0;
>>  
>>  	/*
>>  	 * Not all the page cache can be freed, otherwise the system will
>> -- 
>> 2.17.1
>
>-- 
>Michal Hocko
>SUSE Labs



 
--------- Original Message ---------
Sender : Michal Hocko <mhocko@suse.com>
Date : 2023-01-03 17:03 (GMT+9)
Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
 
On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> The totalreserve_pages could be higher than the free because of
> watermark high or watermark boost. Handle this situation and fix it to 0
> free size.

What is the actual problem you are trying to address by this change?

> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> ---
>  mm/page_alloc.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 218b28ee49ed..e510ae83d5f3 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>           * without causing swapping or OOM.
>           */
>          available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> +        if (available < 0)
> +                available = 0;
>  
>          /*
>           * Not all the page cache can be freed, otherwise the system will
> -- 
> 2.17.1
Michal Hocko Jan. 3, 2023, 8:32 a.m. UTC | #5
On Tue 03-01-23 17:20:08, 김재원 wrote:
> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> >> The totalreserve_pages could be higher than the free because of
> >> watermark high or watermark boost. Handle this situation and fix it to 0
> >> free size.
> >
> >What is the actual problem you are trying to address by this change?
> 
> Hello
> 
> As described on the original commit,
>   34e431b0ae39 /proc/meminfo: provide estimated available memory
> mm is tring to provide the avaiable memory to user space.
> 
> But if free is negative, the available memory shown to userspace
> would be shown smaller thatn the actual available size. The userspace
> may do unwanted memory shrinking actions like process kills.

Do you have any specific example? Have you seen this happening in
practice or is this based on the code inspection?

Also does this patch actually fix anything? Say the system is really
struggling and we are under min watermark. Shouldn't that lead to
Available to be reported as 0 without even looking at other counters?

> I think the logic sholud account the positive size only.
> 
> BR
> 
> >
> >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> >> ---
> >>  mm/page_alloc.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >> 
> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> >> index 218b28ee49ed..e510ae83d5f3 100644
> >> --- a/mm/page_alloc.c
> >> +++ b/mm/page_alloc.c
> >> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
> >>  	 * without causing swapping or OOM.
> >>  	 */
> >>  	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> >> +	if (available < 0)
> >> +		available = 0;
> >>  
> >>  	/*
> >>  	 * Not all the page cache can be freed, otherwise the system will
> >> -- 
> >> 2.17.1
> >
> >-- 
> >Michal Hocko
> >SUSE Labs
> 
> 
> 
>  
> --------- Original Message ---------
> Sender : Michal Hocko <mhocko@suse.com>
> Date : 2023-01-03 17:03 (GMT+9)
> Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
>  
> On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> > The totalreserve_pages could be higher than the free because of
> > watermark high or watermark boost. Handle this situation and fix it to 0
> > free size.
> 
> What is the actual problem you are trying to address by this change?
> 
> > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
> > ---
> >  mm/page_alloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 218b28ee49ed..e510ae83d5f3 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5948,6 +5948,8 @@ long si_mem_available(void)
> >           * without causing swapping or OOM.
> >           */
> >          available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
> > +        if (available < 0)
> > +                available = 0;
> >  
> >          /*
> >           * Not all the page cache can be freed, otherwise the system will
> > -- 
> > 2.17.1
> 
> -- 
> Michal Hocko
> SUSE Labs
>
Jaewon Kim Jan. 3, 2023, 9:22 a.m. UTC | #6
>> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> >> The totalreserve_pages could be higher than the free because of
>> >> watermark high or watermark boost. Handle this situation and fix it to 0
>> >> free size.
>> >
>> >What is the actual problem you are trying to address by this change?
>> 
>> Hello
>> 
>> As described on the original commit,
>>   34e431b0ae39 /proc/meminfo: provide estimated available memory
>> mm is tring to provide the avaiable memory to user space.
>> 
>> But if free is negative, the available memory shown to userspace
>> would be shown smaller thatn the actual available size. The userspace
>> may do unwanted memory shrinking actions like process kills.
>
>Do you have any specific example? Have you seen this happening in
>practice or is this based on the code inspection?

I found this from a device using v5.10 based kernel.
Actually the log was printed by user space in its format after reading /proc/meminfo.

MemFree          38220 KB
MemAvailable     90008 KB
Active(file)    137116 KB
Inactive(file)  124128 KB
SReclaimable    100960 KB

Here's /proc/zoneinfo for wmark info.

------ ZONEINFO (/proc/zoneinfo) ------
Node 0, zone    DMA32
  pages free     17059
        min      862
        low      9790
        high     18718
        spanned  524288
        present  497920
        managed  413348
Node 0, zone   Normal
  pages free     12795
        min      1044
        low      11855
        high     22666
        spanned  8388608
        present  524288
        managed  500548

The pagecache at this time, seems to be 174,664 KB.
  pagecache -= min(pagecache / 2, wmark_low)
We also need to add the reclaimable and the actual free on it to be MemAvaiable.

The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
because the big wmark high 165,536 seems to be used.

>
>Also does this patch actually fix anything? Say the system is really
>struggling and we are under min watermark. Shouldn't that lead to
>Available to be reported as 0 without even looking at other counters?
>

Sorry but I did not understand, this mis-calculation can be happened
above the min watermark. Do you think the wmark high should be extracted
all the time even if the free is negative?


>> I think the logic sholud account the positive size only.
>> 
>> BR
>> 
>> >
>> >> Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> >> ---
>> >>  mm/page_alloc.c | 2 ++
>> >>  1 file changed, 2 insertions(+)
>> >> 
>> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> >> index 218b28ee49ed..e510ae83d5f3 100644
>> >> --- a/mm/page_alloc.c
>> >> +++ b/mm/page_alloc.c
>> >> @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>> >>  	 * without causing swapping or OOM.
>> >>  	 */
>> >>  	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> >> +	if (available < 0)
>> >> +		available = 0;
>> >>  
>> >>  	/*
>> >>  	 * Not all the page cache can be freed, otherwise the system will
>> >> -- 
>> >> 2.17.1
>> >
>> >-- 
>> >Michal Hocko
>> >SUSE Labs
>> 
>> 
>> 
>>  
>> --------- Original Message ---------
>> Sender : Michal Hocko <mhocko@suse.com>
>> Date : 2023-01-03 17:03 (GMT+9)
>> Title : Re: [PATCH] page_alloc: avoid the negative free for meminfo available
>>  
>> On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> > The totalreserve_pages could be higher than the free because of
>> > watermark high or watermark boost. Handle this situation and fix it to 0
>> > free size.
>> 
>> What is the actual problem you are trying to address by this change?
>> 
>> > Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
>> > ---
>> >  mm/page_alloc.c | 2 ++
>> >  1 file changed, 2 insertions(+)
>> > 
>> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> > index 218b28ee49ed..e510ae83d5f3 100644
>> > --- a/mm/page_alloc.c
>> > +++ b/mm/page_alloc.c
>> > @@ -5948,6 +5948,8 @@ long si_mem_available(void)
>> >           * without causing swapping or OOM.
>> >           */
>> >          available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
>> > +        if (available < 0)
>> > +                available = 0;
>> >  
>> >          /*
>> >           * Not all the page cache can be freed, otherwise the system will
>> > -- 
>> > 2.17.1
>> 
>> -- 
>> Michal Hocko
>> SUSE Labs
>>
Michal Hocko Jan. 3, 2023, 10:20 a.m. UTC | #7
On Tue 03-01-23 18:22:32, 김재원 wrote:
> >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
> >> >> The totalreserve_pages could be higher than the free because of
> >> >> watermark high or watermark boost. Handle this situation and fix it to 0
> >> >> free size.
> >> >
> >> >What is the actual problem you are trying to address by this change?
> >> 
> >> Hello
> >> 
> >> As described on the original commit,
> >>   34e431b0ae39 /proc/meminfo: provide estimated available memory
> >> mm is tring to provide the avaiable memory to user space.
> >> 
> >> But if free is negative, the available memory shown to userspace
> >> would be shown smaller thatn the actual available size. The userspace
> >> may do unwanted memory shrinking actions like process kills.
> >
> >Do you have any specific example? Have you seen this happening in
> >practice or is this based on the code inspection?
> 
> I found this from a device using v5.10 based kernel.
> Actually the log was printed by user space in its format after reading /proc/meminfo.
> 
> MemFree          38220 KB
> MemAvailable     90008 KB
> Active(file)    137116 KB
> Inactive(file)  124128 KB
> SReclaimable    100960 KB
> 
> Here's /proc/zoneinfo for wmark info.
> 
> ------ ZONEINFO (/proc/zoneinfo) ------
> Node 0, zone    DMA32
>   pages free     17059
>         min      862
>         low      9790
>         high     18718
>         spanned  524288
>         present  497920
>         managed  413348
> Node 0, zone   Normal
>   pages free     12795
>         min      1044
>         low      11855
>         high     22666
>         spanned  8388608
>         present  524288
>         managed  500548
> 
> The pagecache at this time, seems to be 174,664 KB.
>   pagecache -= min(pagecache / 2, wmark_low)
> We also need to add the reclaimable and the actual free on it to be MemAvaiable.
> 
> The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
> because the big wmark high 165,536 seems to be used.

How have you concluded that? Are you saying that a userspace would be
behaving more sanely when considering more memory to be available?
Please see more on the semantics below.

> >Also does this patch actually fix anything? Say the system is really
> >struggling and we are under min watermark. Shouldn't that lead to
> >Available to be reported as 0 without even looking at other counters?
> >
> 
> Sorry but I did not understand,

What I meant here is that the core of the high level definition says:
"An estimate of how much memory is available for starting new
applications, without swapping." If the system is close enough to watermarks 
that NR_FREE_PAGES < reserves then it is likely that further memory
allocations will not do without reclaim and potentially swapout.

So the question really is whether just clamping the value to 0 is
actually making MemAvailable more "correct"? See my point?

The actual value is never going to be lazer cut precise. Close to
watermark behavior will vary wildly depending on the memory
reclaimability. Kswapd might easily keep up with memory demand but it
also could get stuck. MemAvailable should be considered a hint rather
than an exact value IMHO.
Jaewon Kim Jan. 3, 2023, 10:39 a.m. UTC | #8
>> >> >On Tue 03-01-23 16:28:07, Jaewon Kim wrote:
>> >> >> The totalreserve_pages could be higher than the free because of
>> >> >> watermark high or watermark boost. Handle this situation and fix it to 0
>> >> >> free size.
>> >> >
>> >> >What is the actual problem you are trying to address by this change?
>> >> 
>> >> Hello
>> >> 
>> >> As described on the original commit,
>> >>   34e431b0ae39 /proc/meminfo: provide estimated available memory
>> >> mm is tring to provide the avaiable memory to user space.
>> >> 
>> >> But if free is negative, the available memory shown to userspace
>> >> would be shown smaller thatn the actual available size. The userspace
>> >> may do unwanted memory shrinking actions like process kills.
>> >
>> >Do you have any specific example? Have you seen this happening in
>> >practice or is this based on the code inspection?
>> 
>> I found this from a device using v5.10 based kernel.
>> Actually the log was printed by user space in its format after reading /proc/meminfo.
>> 
>> MemFree          38220 KB
>> MemAvailable     90008 KB
>> Active(file)    137116 KB
>> Inactive(file)  124128 KB
>> SReclaimable    100960 KB
>> 
>> Here's /proc/zoneinfo for wmark info.
>> 
>> ------ ZONEINFO (/proc/zoneinfo) ------
>> Node 0, zone    DMA32
>>   pages free     17059
>>         min      862
>>         low      9790
>>         high     18718
>>         spanned  524288
>>         present  497920
>>         managed  413348
>> Node 0, zone   Normal
>>   pages free     12795
>>         min      1044
>>         low      11855
>>         high     22666
>>         spanned  8388608
>>         present  524288
>>         managed  500548
>> 
>> The pagecache at this time, seems to be 174,664 KB.
>>   pagecache -= min(pagecache / 2, wmark_low)
>> We also need to add the reclaimable and the actual free on it to be MemAvaiable.
>> 
>> The MemAvailable should be bigger at leat this 174,664 KB, but it was 90,008 KB only
>> because the big wmark high 165,536 seems to be used.
>
>How have you concluded that? Are you saying that a userspace would be
>behaving more sanely when considering more memory to be available?
>Please see more on the semantics below.
>
>> >Also does this patch actually fix anything? Say the system is really
>> >struggling and we are under min watermark. Shouldn't that lead to
>> >Available to be reported as 0 without even looking at other counters?
>> >
>> 
>> Sorry but I did not understand,
>
>What I meant here is that the core of the high level definition says:
>"An estimate of how much memory is available for starting new
>applications, without swapping." If the system is close enough to watermarks 
>that NR_FREE_PAGES < reserves then it is likely that further memory
>allocations will not do without reclaim and potentially swapout.

Yes reclaim would be needed in that case.

I think it is just a matter of perspective.
If I follow you, I think, the totalreserve_pages should be considered
as must-have free size.

>
>So the question really is whether just clamping the value to 0 is
>actually making MemAvailable more "correct"? See my point?
>
>The actual value is never going to be lazer cut precise. Close to
>watermark behavior will vary wildly depending on the memory
>reclaimability. Kswapd might easily keep up with memory demand but it
>also could get stuck. MemAvailable should be considered a hint rather
>than an exact value IMHO.

Yeah correct, it is not perfect.
I will drop my patch.
It was nice discussion.
Thank you
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 218b28ee49ed..e510ae83d5f3 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5948,6 +5948,8 @@  long si_mem_available(void)
 	 * without causing swapping or OOM.
 	 */
 	available = global_zone_page_state(NR_FREE_PAGES) - totalreserve_pages;
+	if (available < 0)
+		available = 0;
 
 	/*
 	 * Not all the page cache can be freed, otherwise the system will