diff mbox series

mm: shmem: use proper gfp flags for shmem_writepage()

Message ID 1572991351-86061-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive)
State New, archived
Headers show
Series mm: shmem: use proper gfp flags for shmem_writepage() | expand

Commit Message

Yang Shi Nov. 5, 2019, 10:02 p.m. UTC
The shmem_writepage() uses GFP_ATOMIC to allocate swap cache.
GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH |
__GFP_ATOMIC | __GFP_KSWAPD_RECLAIM.  However, shmem_writepage() should
write out to swap only in response to memory pressure, so
__GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself
or in direct reclaim already.

In addition, XArray node allocations from PF_MEMALLOC contexts could
completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
reserves from being allocated.

Here just copy the gfp flags used by add_to_swap().

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
---
 mm/shmem.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michal Hocko Nov. 6, 2019, 3:18 p.m. UTC | #1
On Wed 06-11-19 06:02:31, Yang Shi wrote:
> The shmem_writepage() uses GFP_ATOMIC to allocate swap cache.
> GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH |
> __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM.  However, shmem_writepage() should
> write out to swap only in response to memory pressure, so
> __GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself
> or in direct reclaim already.

What kind of problem are you trying to fix here?

> In addition, XArray node allocations from PF_MEMALLOC contexts could
> completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
> reserves from being allocated.

I am not really familiar with XArray much, could you be more specific
please?

> Here just copy the gfp flags used by add_to_swap().
> 
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> ---
>  mm/shmem.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 220be9f..9691dec 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1369,7 +1369,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>  	if (list_empty(&info->swaplist))
>  		list_add(&info->swaplist, &shmem_swaplist);
>  
> -	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
> +	if (add_to_swap_cache(page, swap,
> +			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
>  		spin_lock_irq(&info->lock);
>  		shmem_recalc_inode(inode);
>  		info->swapped++;
> -- 
> 1.8.3.1
Yang Shi Nov. 6, 2019, 6:23 p.m. UTC | #2
On 11/6/19 7:18 AM, Michal Hocko wrote:
> On Wed 06-11-19 06:02:31, Yang Shi wrote:
>> The shmem_writepage() uses GFP_ATOMIC to allocate swap cache.
>> GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH |
>> __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM.  However, shmem_writepage() should
>> write out to swap only in response to memory pressure, so
>> __GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself
>> or in direct reclaim already.
> What kind of problem are you trying to fix here?

I didn't run into any visible problem. I just happened to find this 
inconsistency when I was looking into the other problem.

The add_to_swap() does:

int add_to_swap(struct page *page)
{
...
err = add_to_swap_cache(page, entry,
                         __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
...
}

Actually, shmem_writepage() does almost the same thing and both of them 
are called in reclaim context, so I didn't see why they should use 
different gfp flag. And, GFP_ATOMIC is also different from the old 
definition as I mentioned in the commit log.

>
>> In addition, XArray node allocations from PF_MEMALLOC contexts could
>> completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
>> reserves from being allocated.
> I am not really familiar with XArray much, could you be more specific
> please?

It comes from the comments of add_to_swap(), says:

/*
          * XArray node allocations from PF_MEMALLOC contexts could
          * completely exhaust the page allocator. __GFP_NOMEMALLOC
          * stops emergency reserves from being allocated.

And, it looks the original comment came from pre-git time, TBH I'm not 
quite sure about the specific problem which incurred this. I suspect it 
may be because PF_MEMALLOC context allows ALLOC_NO_WATERMARK.

>
>> Here just copy the gfp flags used by add_to_swap().
>>
>> Cc: Hugh Dickins <hughd@google.com>
>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>> ---
>>   mm/shmem.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 220be9f..9691dec 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1369,7 +1369,8 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>>   	if (list_empty(&info->swaplist))
>>   		list_add(&info->swaplist, &shmem_swaplist);
>>   
>> -	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
>> +	if (add_to_swap_cache(page, swap,
>> +			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
>>   		spin_lock_irq(&info->lock);
>>   		shmem_recalc_inode(inode);
>>   		info->swapped++;
>> -- 
>> 1.8.3.1
Hugh Dickins Nov. 6, 2019, 6:59 p.m. UTC | #3
On Wed, 6 Nov 2019, Yang Shi wrote:
> On 11/6/19 7:18 AM, Michal Hocko wrote:
> > On Wed 06-11-19 06:02:31, Yang Shi wrote:
> > > The shmem_writepage() uses GFP_ATOMIC to allocate swap cache.
> > > GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH |
> > > __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM.  However, shmem_writepage() should
> > > write out to swap only in response to memory pressure, so
> > > __GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself
> > > or in direct reclaim already.
> > What kind of problem are you trying to fix here?
> 
> I didn't run into any visible problem. I just happened to find this
> inconsistency when I was looking into the other problem.

Yes, I don't think it fixes any actual problem: just a cleanup to
make the two calls look the same when they don't need to be different
(whereas the call from __read_swap_cache_async() rightly uses a 
lower priority gfp).

If it does fix a problem, then you need to worry also about the
	 * TODO: this could cause a theoretical memory reclaim
	 * deadlock in the swap out path.
comment still against the call in add_to_swap(): but I think that
is equally theoretical, demanding no attention since 2.6.12.

> 
> The add_to_swap() does:
> 
> int add_to_swap(struct page *page)
> {
> ...
> err = add_to_swap_cache(page, entry,
>                         __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
> ...
> }
> 
> Actually, shmem_writepage() does almost the same thing and both of them are
> called in reclaim context, so I didn't see why they should use different gfp
> flag. And, GFP_ATOMIC is also different from the old definition as I
> mentioned in the commit log.
> 
> > 
> > > In addition, XArray node allocations from PF_MEMALLOC contexts could
> > > completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
> > > reserves from being allocated.
> > I am not really familiar with XArray much, could you be more specific
> > please?
> 
> It comes from the comments of add_to_swap(), says:
> 
> /*
>          * XArray node allocations from PF_MEMALLOC contexts could
>          * completely exhaust the page allocator. __GFP_NOMEMALLOC
>          * stops emergency reserves from being allocated.
> 
> And, it looks the original comment came from pre-git time, TBH I'm not quite
> sure about the specific problem which incurred this. I suspect it may be
> because PF_MEMALLOC context allows ALLOC_NO_WATERMARK.
> 
> > 
> > > Here just copy the gfp flags used by add_to_swap().
> > > 
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
> > > ---
> > >   mm/shmem.c | 3 ++-
> > >   1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/mm/shmem.c b/mm/shmem.c
> > > index 220be9f..9691dec 100644
> > > --- a/mm/shmem.c
> > > +++ b/mm/shmem.c
> > > @@ -1369,7 +1369,8 @@ static int shmem_writepage(struct page *page,
> > > struct writeback_control *wbc)
> > >   	if (list_empty(&info->swaplist))
> > >   		list_add(&info->swaplist, &shmem_swaplist);
> > >   -	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
> > > +	if (add_to_swap_cache(page, swap,
> > > +			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
> > >   		spin_lock_irq(&info->lock);
> > >   		shmem_recalc_inode(inode);
> > >   		info->swapped++;
> > > -- 
> > > 1.8.3.1
Yang Shi Nov. 7, 2019, 8:19 p.m. UTC | #4
On 11/6/19 10:59 AM, Hugh Dickins wrote:
> On Wed, 6 Nov 2019, Yang Shi wrote:
>> On 11/6/19 7:18 AM, Michal Hocko wrote:
>>> On Wed 06-11-19 06:02:31, Yang Shi wrote:
>>>> The shmem_writepage() uses GFP_ATOMIC to allocate swap cache.
>>>> GFP_ATOMIC used to mean __GFP_HIGH, but now it means __GFP_HIGH |
>>>> __GFP_ATOMIC | __GFP_KSWAPD_RECLAIM.  However, shmem_writepage() should
>>>> write out to swap only in response to memory pressure, so
>>>> __GFP_KSWAPD_RECLAIM looks useless since the caller may be kswapd itself
>>>> or in direct reclaim already.
>>> What kind of problem are you trying to fix here?
>> I didn't run into any visible problem. I just happened to find this
>> inconsistency when I was looking into the other problem.
> Yes, I don't think it fixes any actual problem: just a cleanup to
> make the two calls look the same when they don't need to be different
> (whereas the call from __read_swap_cache_async() rightly uses a
> lower priority gfp).

I'm supposed it is because __read_swap_cache_async()is typically called 
from page fault context which is less crucial than reclaim.

Shall I consider this as an ack but with commit log rephrased to reflect 
the cleanup?

>
> If it does fix a problem, then you need to worry also about the
> 	 * TODO: this could cause a theoretical memory reclaim
> 	 * deadlock in the swap out path.
> comment still against the call in add_to_swap(): but I think that
> is equally theoretical, demanding no attention since 2.6.12.
>
>> The add_to_swap() does:
>>
>> int add_to_swap(struct page *page)
>> {
>> ...
>> err = add_to_swap_cache(page, entry,
>>                          __GFP_HIGH|__GFP_NOMEMALLOC|__GFP_NOWARN);
>> ...
>> }
>>
>> Actually, shmem_writepage() does almost the same thing and both of them are
>> called in reclaim context, so I didn't see why they should use different gfp
>> flag. And, GFP_ATOMIC is also different from the old definition as I
>> mentioned in the commit log.
>>
>>>> In addition, XArray node allocations from PF_MEMALLOC contexts could
>>>> completely exhaust the page allocator, __GFP_NOMEMALLOC stops emergency
>>>> reserves from being allocated.
>>> I am not really familiar with XArray much, could you be more specific
>>> please?
>> It comes from the comments of add_to_swap(), says:
>>
>> /*
>>           * XArray node allocations from PF_MEMALLOC contexts could
>>           * completely exhaust the page allocator. __GFP_NOMEMALLOC
>>           * stops emergency reserves from being allocated.
>>
>> And, it looks the original comment came from pre-git time, TBH I'm not quite
>> sure about the specific problem which incurred this. I suspect it may be
>> because PF_MEMALLOC context allows ALLOC_NO_WATERMARK.
>>
>>>> Here just copy the gfp flags used by add_to_swap().
>>>>
>>>> Cc: Hugh Dickins <hughd@google.com>
>>>> Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
>>>> ---
>>>>    mm/shmem.c | 3 ++-
>>>>    1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/shmem.c b/mm/shmem.c
>>>> index 220be9f..9691dec 100644
>>>> --- a/mm/shmem.c
>>>> +++ b/mm/shmem.c
>>>> @@ -1369,7 +1369,8 @@ static int shmem_writepage(struct page *page,
>>>> struct writeback_control *wbc)
>>>>    	if (list_empty(&info->swaplist))
>>>>    		list_add(&info->swaplist, &shmem_swaplist);
>>>>    -	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
>>>> +	if (add_to_swap_cache(page, swap,
>>>> +			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
>>>>    		spin_lock_irq(&info->lock);
>>>>    		shmem_recalc_inode(inode);
>>>>    		info->swapped++;
>>>> -- 
>>>> 1.8.3.1
Hugh Dickins Nov. 7, 2019, 10:59 p.m. UTC | #5
On Thu, 7 Nov 2019, Yang Shi wrote:
> On 11/6/19 10:59 AM, Hugh Dickins wrote:
> > Yes, I don't think it fixes any actual problem: just a cleanup to
> > make the two calls look the same when they don't need to be different
> > (whereas the call from __read_swap_cache_async() rightly uses a
> > lower priority gfp).
> 
> I'm supposed it is because __read_swap_cache_async()is typically called from
> page fault context which is less crucial than reclaim.

Exactly.

> 
> Shall I consider this as an ack but with commit log rephrased to reflect the
> cleanup?

Okay,
Acked-by: Hugh Dickins <hughd@google.com>
diff mbox series

Patch

diff --git a/mm/shmem.c b/mm/shmem.c
index 220be9f..9691dec 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1369,7 +1369,8 @@  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	if (list_empty(&info->swaplist))
 		list_add(&info->swaplist, &shmem_swaplist);
 
-	if (add_to_swap_cache(page, swap, GFP_ATOMIC) == 0) {
+	if (add_to_swap_cache(page, swap,
+			__GFP_HIGH | __GFP_NOMEMALLOC | __GFP_NOWARN) == 0) {
 		spin_lock_irq(&info->lock);
 		shmem_recalc_inode(inode);
 		info->swapped++;