diff mbox series

[v2,1/5] mm: compaction: get reference before non LRU movable folio isolation

Message ID 20240829145456.2591719-2-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: convert to folio_isolate_movable() | expand

Commit Message

Kefeng Wang Aug. 29, 2024, 2:54 p.m. UTC
Non-LRU movable folio isolation will fail if it can't grab a reference
in isolate_movable_page(), so folio_get_nontail_page() could be called
ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
this is also prepare to convert isolate_movable_page() to take a folio.
Since the reference count of the non-LRU movable folio is increased,
a folio_put() is needed whether the folio is isolated or not.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 mm/compaction.c | 38 +++++++++++++++++++-------------------
 1 file changed, 19 insertions(+), 19 deletions(-)

Comments

Matthew Wilcox Aug. 31, 2024, 2:04 p.m. UTC | #1
On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
> Non-LRU movable folio isolation will fail if it can't grab a reference
> in isolate_movable_page(), so folio_get_nontail_page() could be called
> ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
> this is also prepare to convert isolate_movable_page() to take a folio.
> Since the reference count of the non-LRU movable folio is increased,
> a folio_put() is needed whether the folio is isolated or not.

There's a reason I stopped where I did when converting this function
to use folios.  Usually I would explain, but I think it would do you
good to think about why for a bit.

Andrew, please drop these patches.

> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  mm/compaction.c | 38 +++++++++++++++++++-------------------
>  1 file changed, 19 insertions(+), 19 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a2b16b08cbbf..8998574da065 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1074,41 +1074,41 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>  			}
>  		}
>  
> +		/*
> +		 * Be careful not to clear lru flag until after we're
> +		 * sure the folio is not being freed elsewhere -- the
> +		 * folio release code relies on it.
> +		 */
> +		folio = folio_get_nontail_page(page);
> +		if (unlikely(!folio))
> +			goto isolate_fail;
> +
>  		/*
>  		 * Check may be lockless but that's ok as we recheck later.
> -		 * It's possible to migrate LRU and non-lru movable pages.
> -		 * Skip any other type of page
> +		 * It's possible to migrate LRU and non-lru movable folioss.
> +		 * Skip any other type of folios.
>  		 */
> -		if (!PageLRU(page)) {
> +		if (!folio_test_lru(folio)) {
>  			/*
> -			 * __PageMovable can return false positive so we need
> -			 * to verify it under page_lock.
> +			 * __folio_test_movable can return false positive so
> +			 * we need to verify it under page_lock.
>  			 */
> -			if (unlikely(__PageMovable(page)) &&
> -					!PageIsolated(page)) {
> +			if (unlikely(__folio_test_movable(folio)) &&
> +					!folio_test_isolated(folio)) {
>  				if (locked) {
>  					unlock_page_lruvec_irqrestore(locked, flags);
>  					locked = NULL;
>  				}
>  
> -				if (isolate_movable_page(page, mode)) {
> -					folio = page_folio(page);
> +				if (isolate_movable_page(&folio->page, mode)) {
> +					folio_put(folio);
>  					goto isolate_success;
>  				}
>  			}
>  
> -			goto isolate_fail;
> +			goto isolate_fail_put;
>  		}
>  
> -		/*
> -		 * Be careful not to clear PageLRU until after we're
> -		 * sure the page is not being freed elsewhere -- the
> -		 * page release code relies on it.
> -		 */
> -		folio = folio_get_nontail_page(page);
> -		if (unlikely(!folio))
> -			goto isolate_fail;
> -
>  		/*
>  		 * Migration will fail if an anonymous page is pinned in memory,
>  		 * so avoid taking lru_lock and isolating it unnecessarily in an
> -- 
> 2.27.0
>
Andrew Morton Sept. 1, 2024, 10:54 p.m. UTC | #2
On Sat, 31 Aug 2024 15:04:16 +0100 Matthew Wilcox <willy@infradead.org> wrote:

> On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
> > Non-LRU movable folio isolation will fail if it can't grab a reference
> > in isolate_movable_page(), so folio_get_nontail_page() could be called
> > ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
> > this is also prepare to convert isolate_movable_page() to take a folio.
> > Since the reference count of the non-LRU movable folio is increased,
> > a folio_put() is needed whether the folio is isolated or not.
> 
> There's a reason I stopped where I did when converting this function
> to use folios.  Usually I would explain, but I think it would do you
> good to think about why for a bit.

Can we have a code comment explaining this thinking to other readers?

> Andrew, please drop these patches.

Sure.
Baolin Wang Sept. 2, 2024, 7:57 a.m. UTC | #3
On 2024/8/29 22:54, Kefeng Wang wrote:
> Non-LRU movable folio isolation will fail if it can't grab a reference
> in isolate_movable_page(), so folio_get_nontail_page() could be called
> ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
> this is also prepare to convert isolate_movable_page() to take a folio.
> Since the reference count of the non-LRU movable folio is increased,
> a folio_put() is needed whether the folio is isolated or not.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>

The changes look good to me. However, need to address Matthew's concerns 
first.
Kefeng Wang Sept. 2, 2024, 10:44 a.m. UTC | #4
On 2024/8/31 22:04, Matthew Wilcox wrote:
> On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
>> Non-LRU movable folio isolation will fail if it can't grab a reference
>> in isolate_movable_page(), so folio_get_nontail_page() could be called
>> ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
>> this is also prepare to convert isolate_movable_page() to take a folio.
>> Since the reference count of the non-LRU movable folio is increased,
>> a folio_put() is needed whether the folio is isolated or not.
> 
> There's a reason I stopped where I did when converting this function
> to use folios.  Usually I would explain, but I think it would do you
> good to think about why for a bit.

Hm, I don't find the reason,

The major change is that we move folio_get_nontail_page ahead, so we
may try add a reference for each page, it always fails to isolate
with/without this changes, so I suppose that there is no issue here,
for other cases, PageBuddy/PageHuge is also handled before !PageLRU,
although the check is lockless, we will re-check under lock. so is
some page's reference can't try to grab or there are some special
handling for movable page.

The minimized changes could be something like below, just add a new
folio_get_nontail_page() before isolate_movable_page(), other patches
don't need to be changed.

diff --git a/mm/compaction.c b/mm/compaction.c
index a2b16b08cbbf..68331ba331e5 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1091,9 +1091,13 @@ isolate_migratepages_block(struct compact_control 
*cc, unsigned long low_pfn,
                                         locked = NULL;
                                 }

-                               if (isolate_movable_page(page, mode)) {
-                                       folio = page_folio(page);
-                                       goto isolate_success;
+                               folio = folio_get_nontail_page(page);
+                               if (folio) {
+                                       if 
(isolate_movable_page(&folio->page, mode)) {
+                                               folio_put(folio);
+                                               goto isolate_success;
+                                       }
+                                       goto isolate_fail_put;
                                 }
                         }


but I am not clear about the reason why this has issue, could you share
more tips, thank.


> 
> Andrew, please drop these patches.
> 
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   mm/compaction.c | 38 +++++++++++++++++++-------------------
>>   1 file changed, 19 insertions(+), 19 deletions(-)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index a2b16b08cbbf..8998574da065 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -1074,41 +1074,41 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
>>   			}
>>   		}
>>   
>> +		/*
>> +		 * Be careful not to clear lru flag until after we're
>> +		 * sure the folio is not being freed elsewhere -- the
>> +		 * folio release code relies on it.
>> +		 */
>> +		folio = folio_get_nontail_page(page);
>> +		if (unlikely(!folio))
>> +			goto isolate_fail;
>> +
>>   		/*
>>   		 * Check may be lockless but that's ok as we recheck later.
>> -		 * It's possible to migrate LRU and non-lru movable pages.
>> -		 * Skip any other type of page
>> +		 * It's possible to migrate LRU and non-lru movable folioss.
>> +		 * Skip any other type of folios.
>>   		 */
>> -		if (!PageLRU(page)) {
>> +		if (!folio_test_lru(folio)) {
>>   			/*
>> -			 * __PageMovable can return false positive so we need
>> -			 * to verify it under page_lock.
>> +			 * __folio_test_movable can return false positive so
>> +			 * we need to verify it under page_lock.
>>   			 */
>> -			if (unlikely(__PageMovable(page)) &&
>> -					!PageIsolated(page)) {
>> +			if (unlikely(__folio_test_movable(folio)) &&
>> +					!folio_test_isolated(folio)) {
>>   				if (locked) {
>>   					unlock_page_lruvec_irqrestore(locked, flags);
>>   					locked = NULL;
>>   				}
>>   
>> -				if (isolate_movable_page(page, mode)) {
>> -					folio = page_folio(page);
>> +				if (isolate_movable_page(&folio->page, mode)) {
>> +					folio_put(folio);
>>   					goto isolate_success;
>>   				}
>>   			}
>>   
>> -			goto isolate_fail;
>> +			goto isolate_fail_put;
>>   		}
>>   
>> -		/*
>> -		 * Be careful not to clear PageLRU until after we're
>> -		 * sure the page is not being freed elsewhere -- the
>> -		 * page release code relies on it.
>> -		 */
>> -		folio = folio_get_nontail_page(page);
>> -		if (unlikely(!folio))
>> -			goto isolate_fail;
>> -
>>   		/*
>>   		 * Migration will fail if an anonymous page is pinned in memory,
>>   		 * so avoid taking lru_lock and isolating it unnecessarily in an
>> -- 
>> 2.27.0
>>
>
Kefeng Wang Sept. 4, 2024, 9:57 a.m. UTC | #5
On 2024/9/2 18:44, Kefeng Wang wrote:
> 
> 
> On 2024/8/31 22:04, Matthew Wilcox wrote:
>> On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
>>> Non-LRU movable folio isolation will fail if it can't grab a reference
>>> in isolate_movable_page(), so folio_get_nontail_page() could be called
>>> ahead to unify the handling of non-LRU movable/LRU folio isolation a 
>>> bit,
>>> this is also prepare to convert isolate_movable_page() to take a folio.
>>> Since the reference count of the non-LRU movable folio is increased,
>>> a folio_put() is needed whether the folio is isolated or not.
>>
>> There's a reason I stopped where I did when converting this function
>> to use folios.  Usually I would explain, but I think it would do you
>> good to think about why for a bit.

Hi Matthew, could you explain it, many thanks.

> 
> Hm, I don't find the reason,
> 
> The major change is that we move folio_get_nontail_page ahead, so we
> may try add a reference for each page, it always fails to isolate
> with/without this changes, so I suppose that there is no issue here,
> for other cases, PageBuddy/PageHuge is also handled before !PageLRU,
> although the check is lockless, we will re-check under lock. so is
> some page's reference can't try to grab or there are some special
> handling for movable page.
> 
> The minimized changes could be something like below, just add a new
> folio_get_nontail_page() before isolate_movable_page(), other patches
> don't need to be changed.
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index a2b16b08cbbf..68331ba331e5 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -1091,9 +1091,13 @@ isolate_migratepages_block(struct compact_control 
> *cc, unsigned long low_pfn,
>                                          locked = NULL;
>                                  }
> 
> -                               if (isolate_movable_page(page, mode)) {
> -                                       folio = page_folio(page);
> -                                       goto isolate_success;
> +                               folio = folio_get_nontail_page(page);
> +                               if (folio) {
> +                                       if 
> (isolate_movable_page(&folio->page, mode)) {
> +                                               folio_put(folio);
> +                                               goto isolate_success;
> +                                       }
> +                                       goto isolate_fail_put;
>                                  }
>                          }
> 
> 
> but I am not clear about the reason why this has issue, could you share
> more tips, thank.
> 
> 
>>
>> Andrew, please drop these patches.
>>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>   mm/compaction.c | 38 +++++++++++++++++++-------------------
>>>   1 file changed, 19 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index a2b16b08cbbf..8998574da065 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -1074,41 +1074,41 @@ isolate_migratepages_block(struct 
>>> compact_control *cc, unsigned long low_pfn,
>>>               }
>>>           }
>>> +        /*
>>> +         * Be careful not to clear lru flag until after we're
>>> +         * sure the folio is not being freed elsewhere -- the
>>> +         * folio release code relies on it.
>>> +         */
>>> +        folio = folio_get_nontail_page(page);
>>> +        if (unlikely(!folio))
>>> +            goto isolate_fail;
>>> +
>>>           /*
>>>            * Check may be lockless but that's ok as we recheck later.
>>> -         * It's possible to migrate LRU and non-lru movable pages.
>>> -         * Skip any other type of page
>>> +         * It's possible to migrate LRU and non-lru movable folioss.
>>> +         * Skip any other type of folios.
>>>            */
>>> -        if (!PageLRU(page)) {
>>> +        if (!folio_test_lru(folio)) {
>>>               /*
>>> -             * __PageMovable can return false positive so we need
>>> -             * to verify it under page_lock.
>>> +             * __folio_test_movable can return false positive so
>>> +             * we need to verify it under page_lock.
>>>                */
>>> -            if (unlikely(__PageMovable(page)) &&
>>> -                    !PageIsolated(page)) {
>>> +            if (unlikely(__folio_test_movable(folio)) &&
>>> +                    !folio_test_isolated(folio)) {
>>>                   if (locked) {
>>>                       unlock_page_lruvec_irqrestore(locked, flags);
>>>                       locked = NULL;
>>>                   }
>>> -                if (isolate_movable_page(page, mode)) {
>>> -                    folio = page_folio(page);
>>> +                if (isolate_movable_page(&folio->page, mode)) {
>>> +                    folio_put(folio);
>>>                       goto isolate_success;
>>>                   }
>>>               }
>>> -            goto isolate_fail;
>>> +            goto isolate_fail_put;
>>>           }
>>> -        /*
>>> -         * Be careful not to clear PageLRU until after we're
>>> -         * sure the page is not being freed elsewhere -- the
>>> -         * page release code relies on it.
>>> -         */
>>> -        folio = folio_get_nontail_page(page);
>>> -        if (unlikely(!folio))
>>> -            goto isolate_fail;
>>> -
>>>           /*
>>>            * Migration will fail if an anonymous page is pinned in 
>>> memory,
>>>            * so avoid taking lru_lock and isolating it unnecessarily 
>>> in an
>>> -- 
>>> 2.27.0
>>>
>>
>
Matthew Wilcox Sept. 4, 2024, 7:02 p.m. UTC | #6
On Mon, Sep 02, 2024 at 06:44:09PM +0800, Kefeng Wang wrote:
> On 2024/8/31 22:04, Matthew Wilcox wrote:
> > On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
> > > Non-LRU movable folio isolation will fail if it can't grab a reference
> > > in isolate_movable_page(), so folio_get_nontail_page() could be called
> > > ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
> > > this is also prepare to convert isolate_movable_page() to take a folio.
> > > Since the reference count of the non-LRU movable folio is increased,
> > > a folio_put() is needed whether the folio is isolated or not.
> > 
> > There's a reason I stopped where I did when converting this function
> > to use folios.  Usually I would explain, but I think it would do you
> > good to think about why for a bit.
> 
> Hm, I don't find the reason,
> 
> The major change is that we move folio_get_nontail_page ahead, so we
> may try add a reference for each page, it always fails to isolate
> with/without this changes, so I suppose that there is no issue here,

You haven't considered the effect on others.  Taking the refcount on a
page will necessarily dirty the cacheline.  This is compaction code, so
someone else may have this page allocated.  The check is done without a
refcount in order to minimise the effect if this page cannot be migrated.

Try doing this on a NUMA system to really see the effects.

More broadly, the problem is that you're sending patches faster than I
can review them, and Andrew is picking them up.  I don't know what to
do about that.
Kefeng Wang Sept. 5, 2024, 4:39 a.m. UTC | #7
On 2024/9/5 3:02, Matthew Wilcox wrote:
> On Mon, Sep 02, 2024 at 06:44:09PM +0800, Kefeng Wang wrote:
>> On 2024/8/31 22:04, Matthew Wilcox wrote:
>>> On Thu, Aug 29, 2024 at 10:54:52PM +0800, Kefeng Wang wrote:
>>>> Non-LRU movable folio isolation will fail if it can't grab a reference
>>>> in isolate_movable_page(), so folio_get_nontail_page() could be called
>>>> ahead to unify the handling of non-LRU movable/LRU folio isolation a bit,
>>>> this is also prepare to convert isolate_movable_page() to take a folio.
>>>> Since the reference count of the non-LRU movable folio is increased,
>>>> a folio_put() is needed whether the folio is isolated or not.
>>>
>>> There's a reason I stopped where I did when converting this function
>>> to use folios.  Usually I would explain, but I think it would do you
>>> good to think about why for a bit.
>>
>> Hm, I don't find the reason,
>>
>> The major change is that we move folio_get_nontail_page ahead, so we
>> may try add a reference for each page, it always fails to isolate
>> with/without this changes, so I suppose that there is no issue here,
> 
> You haven't considered the effect on others.  Taking the refcount on a
> page will necessarily dirty the cacheline.  This is compaction code, so
> someone else may have this page allocated.  The check is done without a
> refcount in order to minimise the effect if this page cannot be migrated.
> 

Indeed, I asked whether the taking the refcount on a page
unconditionally is accepted or not in v1, but I should wait for more
time to see if there are more comments, thanks for your explanation,
will consider not only the current code modification, but also other
impacts of changes as much as possible.

> Try doing this on a NUMA system to really see the effects.
> 
> More broadly, the problem is that you're sending patches faster than I
> can review them, and Andrew is picking them up.  I don't know what to
> do about that.

OK, I will slowdown for the new version to collect more comments.
diff mbox series

Patch

diff --git a/mm/compaction.c b/mm/compaction.c
index a2b16b08cbbf..8998574da065 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -1074,41 +1074,41 @@  isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn,
 			}
 		}
 
+		/*
+		 * Be careful not to clear lru flag until after we're
+		 * sure the folio is not being freed elsewhere -- the
+		 * folio release code relies on it.
+		 */
+		folio = folio_get_nontail_page(page);
+		if (unlikely(!folio))
+			goto isolate_fail;
+
 		/*
 		 * Check may be lockless but that's ok as we recheck later.
-		 * It's possible to migrate LRU and non-lru movable pages.
-		 * Skip any other type of page
+		 * It's possible to migrate LRU and non-lru movable folioss.
+		 * Skip any other type of folios.
 		 */
-		if (!PageLRU(page)) {
+		if (!folio_test_lru(folio)) {
 			/*
-			 * __PageMovable can return false positive so we need
-			 * to verify it under page_lock.
+			 * __folio_test_movable can return false positive so
+			 * we need to verify it under page_lock.
 			 */
-			if (unlikely(__PageMovable(page)) &&
-					!PageIsolated(page)) {
+			if (unlikely(__folio_test_movable(folio)) &&
+					!folio_test_isolated(folio)) {
 				if (locked) {
 					unlock_page_lruvec_irqrestore(locked, flags);
 					locked = NULL;
 				}
 
-				if (isolate_movable_page(page, mode)) {
-					folio = page_folio(page);
+				if (isolate_movable_page(&folio->page, mode)) {
+					folio_put(folio);
 					goto isolate_success;
 				}
 			}
 
-			goto isolate_fail;
+			goto isolate_fail_put;
 		}
 
-		/*
-		 * Be careful not to clear PageLRU until after we're
-		 * sure the page is not being freed elsewhere -- the
-		 * page release code relies on it.
-		 */
-		folio = folio_get_nontail_page(page);
-		if (unlikely(!folio))
-			goto isolate_fail;
-
 		/*
 		 * Migration will fail if an anonymous page is pinned in memory,
 		 * so avoid taking lru_lock and isolating it unnecessarily in an