diff mbox series

[v2] mm/page_isolation: fix potential missing call to unset_migratetype_isolate()

Message ID 20210913115125.33617-1-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series [v2] mm/page_isolation: fix potential missing call to unset_migratetype_isolate() | expand

Commit Message

Miaohe Lin Sept. 13, 2021, 11:51 a.m. UTC
In start_isolate_page_range() undo path, pfn_to_online_page() just checks
the first pfn in a pageblock while __first_valid_page() will traverse the
pageblock until the first online pfn is found. So we may miss the call to
unset_migratetype_isolate() in undo path and pages will remain isolated
unexpectedly. Fix this by calling undo_isolate_page_range() and this will
also help to simplify the code further.

Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: <stable@vger.kernel.org>
---
v1->v2:
  Simplify the code further per David Hildenbrand.
---
 mm/page_isolation.c | 20 +++-----------------
 1 file changed, 3 insertions(+), 17 deletions(-)

Comments

Michal Hocko Sept. 13, 2021, 12:12 p.m. UTC | #1
On Mon 13-09-21 19:51:25, Miaohe Lin wrote:
> In start_isolate_page_range() undo path, pfn_to_online_page() just checks
> the first pfn in a pageblock while __first_valid_page() will traverse the
> pageblock until the first online pfn is found. So we may miss the call to
> unset_migratetype_isolate() in undo path and pages will remain isolated
> unexpectedly. Fix this by calling undo_isolate_page_range() and this will
> also help to simplify the code further.

I like the clean up part but is this a real problem that requires CC
stable? Have you ever seen this to be a real problem? It looks like
something based on reading the code.

> Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> Cc: <stable@vger.kernel.org>
> ---
> v1->v2:
>   Simplify the code further per David Hildenbrand.
> ---
>  mm/page_isolation.c | 20 +++-----------------
>  1 file changed, 3 insertions(+), 17 deletions(-)
> 
> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
> index a95c2c6562d0..f93cc63d8fa1 100644
> --- a/mm/page_isolation.c
> +++ b/mm/page_isolation.c
> @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  			     unsigned migratetype, int flags)
>  {
>  	unsigned long pfn;
> -	unsigned long undo_pfn;
>  	struct page *page;
>  
>  	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
> @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>  	     pfn < end_pfn;
>  	     pfn += pageblock_nr_pages) {
>  		page = __first_valid_page(pfn, pageblock_nr_pages);
> -		if (page) {
> -			if (set_migratetype_isolate(page, migratetype, flags)) {
> -				undo_pfn = pfn;
> -				goto undo;
> -			}
> +		if (page && set_migratetype_isolate(page, migratetype, flags)) {
> +			undo_isolate_page_range(start_pfn, pfn, migratetype);
> +			return -EBUSY;
>  		}
>  	}
>  	return 0;
> -undo:
> -	for (pfn = start_pfn;
> -	     pfn < undo_pfn;
> -	     pfn += pageblock_nr_pages) {
> -		struct page *page = pfn_to_online_page(pfn);
> -		if (!page)
> -			continue;
> -		unset_migratetype_isolate(page, migratetype);
> -	}
> -
> -	return -EBUSY;
>  }
>  
>  /*
> -- 
> 2.23.0
David Hildenbrand Sept. 13, 2021, 12:20 p.m. UTC | #2
On 13.09.21 14:12, Michal Hocko wrote:
> On Mon 13-09-21 19:51:25, Miaohe Lin wrote:
>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks
>> the first pfn in a pageblock while __first_valid_page() will traverse the
>> pageblock until the first online pfn is found. So we may miss the call to
>> unset_migratetype_isolate() in undo path and pages will remain isolated
>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will
>> also help to simplify the code further.
> 
> I like the clean up part but is this a real problem that requires CC
> stable? Have you ever seen this to be a real problem? It looks like
> something based on reading the code.

We discussed that it isn't an issue anymore (we never call it on memory 
holes), but might have been an issue on older kernels, back when we 
didn't have the "memory holes" check in the memory offlining path in place.

Agreed, these details belong into this description.

> 
>> Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>> v1->v2:
>>    Simplify the code further per David Hildenbrand.
>> ---
>>   mm/page_isolation.c | 20 +++-----------------
>>   1 file changed, 3 insertions(+), 17 deletions(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index a95c2c6562d0..f93cc63d8fa1 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>   			     unsigned migratetype, int flags)
>>   {
>>   	unsigned long pfn;
>> -	unsigned long undo_pfn;
>>   	struct page *page;
>>   
>>   	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>> @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>   	     pfn < end_pfn;
>>   	     pfn += pageblock_nr_pages) {
>>   		page = __first_valid_page(pfn, pageblock_nr_pages);
>> -		if (page) {
>> -			if (set_migratetype_isolate(page, migratetype, flags)) {
>> -				undo_pfn = pfn;
>> -				goto undo;
>> -			}
>> +		if (page && set_migratetype_isolate(page, migratetype, flags)) {
>> +			undo_isolate_page_range(start_pfn, pfn, migratetype);
>> +			return -EBUSY;
>>   		}
>>   	}
>>   	return 0;
>> -undo:
>> -	for (pfn = start_pfn;
>> -	     pfn < undo_pfn;
>> -	     pfn += pageblock_nr_pages) {
>> -		struct page *page = pfn_to_online_page(pfn);
>> -		if (!page)
>> -			continue;
>> -		unset_migratetype_isolate(page, migratetype);
>> -	}
>> -
>> -	return -EBUSY;
>>   }
>>   
>>   /*
>> -- 
>> 2.23.0
>
Miaohe Lin Sept. 13, 2021, 12:43 p.m. UTC | #3
On 2021/9/13 20:20, David Hildenbrand wrote:
> On 13.09.21 14:12, Michal Hocko wrote:
>> On Mon 13-09-21 19:51:25, Miaohe Lin wrote:
>>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks
>>> the first pfn in a pageblock while __first_valid_page() will traverse the
>>> pageblock until the first online pfn is found. So we may miss the call to
>>> unset_migratetype_isolate() in undo path and pages will remain isolated
>>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will
>>> also help to simplify the code further.
>>
>> I like the clean up part but is this a real problem that requires CC
>> stable? Have you ever seen this to be a real problem? It looks like
>> something based on reading the code.

I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug.

> 
> We discussed that it isn't an issue anymore (we never call it on memory holes), but might have been an issue on older kernels, back when we didn't have the "memory holes" check in the memory offlining path in place.

So is the Cc:stable needed in this case?

Many thanks for both of you.

> 
> Agreed, these details belong into this description.
> 
>>
>>> Fixes: 2ce13640b3f4 ("mm: __first_valid_page skip over offline pages")
>>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>>> Cc: <stable@vger.kernel.org>
>>> ---
>>> v1->v2:
>>>    Simplify the code further per David Hildenbrand.
>>> ---
>>>   mm/page_isolation.c | 20 +++-----------------
>>>   1 file changed, 3 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>>> index a95c2c6562d0..f93cc63d8fa1 100644
>>> --- a/mm/page_isolation.c
>>> +++ b/mm/page_isolation.c
>>> @@ -183,7 +183,6 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>                    unsigned migratetype, int flags)
>>>   {
>>>       unsigned long pfn;
>>> -    unsigned long undo_pfn;
>>>       struct page *page;
>>>         BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
>>> @@ -193,25 +192,12 @@ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>>            pfn < end_pfn;
>>>            pfn += pageblock_nr_pages) {
>>>           page = __first_valid_page(pfn, pageblock_nr_pages);
>>> -        if (page) {
>>> -            if (set_migratetype_isolate(page, migratetype, flags)) {
>>> -                undo_pfn = pfn;
>>> -                goto undo;
>>> -            }
>>> +        if (page && set_migratetype_isolate(page, migratetype, flags)) {
>>> +            undo_isolate_page_range(start_pfn, pfn, migratetype);
>>> +            return -EBUSY;
>>>           }
>>>       }
>>>       return 0;
>>> -undo:
>>> -    for (pfn = start_pfn;
>>> -         pfn < undo_pfn;
>>> -         pfn += pageblock_nr_pages) {
>>> -        struct page *page = pfn_to_online_page(pfn);
>>> -        if (!page)
>>> -            continue;
>>> -        unset_migratetype_isolate(page, migratetype);
>>> -    }
>>> -
>>> -    return -EBUSY;
>>>   }
>>>     /*
>>> -- 
>>> 2.23.0
>>
> 
>
Michal Hocko Sept. 13, 2021, 12:59 p.m. UTC | #4
On Mon 13-09-21 20:43:35, Miaohe Lin wrote:
> On 2021/9/13 20:20, David Hildenbrand wrote:
> > On 13.09.21 14:12, Michal Hocko wrote:
> >> On Mon 13-09-21 19:51:25, Miaohe Lin wrote:
> >>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks
> >>> the first pfn in a pageblock while __first_valid_page() will traverse the
> >>> pageblock until the first online pfn is found. So we may miss the call to
> >>> unset_migratetype_isolate() in undo path and pages will remain isolated
> >>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will
> >>> also help to simplify the code further.
> >>
> >> I like the clean up part but is this a real problem that requires CC
> >> stable? Have you ever seen this to be a real problem? It looks like
> >> something based on reading the code.
> 
> I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug.

Make it clear in the changelog

> > We discussed that it isn't an issue anymore (we never call it on
> > memory holes), but might have been an issue on older kernels, back
> > when we didn't have the "memory holes" check in the memory offlining
> > path in place.
> 
> So is the Cc:stable needed in this case?

I do not think so. Even if this was happening in the practice then the
practical consequences would be pretty minor, right? (few pageblocks
stay isolated thus unavailable).

I do realize that the stable tree is in a hoarding mode for quite some
years but my general approach has been (in line with the documentation)
to mark and backport only fixes that really do matter.
Andrew Morton Sept. 14, 2021, 2:51 a.m. UTC | #5
On Mon, 13 Sep 2021 14:59:42 +0200 Michal Hocko <mhocko@suse.com> wrote:

> I do realize that the stable tree is in a hoarding mode for quite some
> years but my general approach has been (in line with the documentation)
> to mark and backport only fixes that really do matter.

Me2.  There has to be some risk-vs-reward test to be passed...
Miaohe Lin Sept. 14, 2021, 3:09 a.m. UTC | #6
On 2021/9/13 20:59, Michal Hocko wrote:
> On Mon 13-09-21 20:43:35, Miaohe Lin wrote:
>> On 2021/9/13 20:20, David Hildenbrand wrote:
>>> On 13.09.21 14:12, Michal Hocko wrote:
>>>> On Mon 13-09-21 19:51:25, Miaohe Lin wrote:
>>>>> In start_isolate_page_range() undo path, pfn_to_online_page() just checks
>>>>> the first pfn in a pageblock while __first_valid_page() will traverse the
>>>>> pageblock until the first online pfn is found. So we may miss the call to
>>>>> unset_migratetype_isolate() in undo path and pages will remain isolated
>>>>> unexpectedly. Fix this by calling undo_isolate_page_range() and this will
>>>>> also help to simplify the code further.
>>>>
>>>> I like the clean up part but is this a real problem that requires CC
>>>> stable? Have you ever seen this to be a real problem? It looks like
>>>> something based on reading the code.
>>
>> I'm sorry but I haven't seen this to be a real problem. It's a theoretical bug.
> 
> Make it clear in the changelog

Will do.

> 
>>> We discussed that it isn't an issue anymore (we never call it on
>>> memory holes), but might have been an issue on older kernels, back
>>> when we didn't have the "memory holes" check in the memory offlining
>>> path in place.
>>
>> So is the Cc:stable needed in this case?
> 
> I do not think so. Even if this was happening in the practice then the
> practical consequences would be pretty minor, right? (few pageblocks
> stay isolated thus unavailable).
> 
> I do realize that the stable tree is in a hoarding mode for quite some
> years but my general approach has been (in line with the documentation)
> to mark and backport only fixes that really do matter.

So even the Fixes tag should be removed ?

Many thanks.

>
Miaohe Lin Sept. 14, 2021, 3:10 a.m. UTC | #7
On 2021/9/14 10:51, Andrew Morton wrote:
> On Mon, 13 Sep 2021 14:59:42 +0200 Michal Hocko <mhocko@suse.com> wrote:
> 
>> I do realize that the stable tree is in a hoarding mode for quite some
>> years but my general approach has been (in line with the documentation)
>> to mark and backport only fixes that really do matter.
> 
> Me2.  There has to be some risk-vs-reward test to be passed...

Will keep this in mind when I try to cc stable. Thanks.

> .
>
Michal Hocko Sept. 14, 2021, 7:06 a.m. UTC | #8
On Tue 14-09-21 11:09:47, Miaohe Lin wrote:
[...]
> So even the Fixes tag should be removed ?

I would keep that one there. Fixes tag is useful to frame the scope of
the fix. For example when somebody is backporting the commit mentioned
in the Fixes tag then a) a lot of follow up patches with Fixes can tell
you this won't be an easy ride and you might want to reconsider risks
vs. benefit b) it helps to collect follow up fixes more easily.

That is a different story from cc: stable which just collects patches
and push them to all consumers of the stable branch if they apply.

To conclude, the Fixes tag is a generaly useful tag to bind patches
together and let people evaluate how important that is while Cc stable
is an indication that a fix is serious enough to push to all stable
users.
Miaohe Lin Sept. 14, 2021, 9:27 a.m. UTC | #9
On 2021/9/14 15:06, Michal Hocko wrote:
> On Tue 14-09-21 11:09:47, Miaohe Lin wrote:
> [...]
>> So even the Fixes tag should be removed ?
> 
> I would keep that one there. Fixes tag is useful to frame the scope of
> the fix. For example when somebody is backporting the commit mentioned
> in the Fixes tag then a) a lot of follow up patches with Fixes can tell
> you this won't be an easy ride and you might want to reconsider risks
> vs. benefit b) it helps to collect follow up fixes more easily.
> 
> That is a different story from cc: stable which just collects patches
> and push them to all consumers of the stable branch if they apply.
> 
> To conclude, the Fixes tag is a generaly useful tag to bind patches
> together and let people evaluate how important that is while Cc stable
> is an indication that a fix is serious enough to push to all stable
> users.
> 

I see. Many thanks for your detailed explanation! :)
diff mbox series

Patch

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index a95c2c6562d0..f93cc63d8fa1 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -183,7 +183,6 @@  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 			     unsigned migratetype, int flags)
 {
 	unsigned long pfn;
-	unsigned long undo_pfn;
 	struct page *page;
 
 	BUG_ON(!IS_ALIGNED(start_pfn, pageblock_nr_pages));
@@ -193,25 +192,12 @@  int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
 	     pfn < end_pfn;
 	     pfn += pageblock_nr_pages) {
 		page = __first_valid_page(pfn, pageblock_nr_pages);
-		if (page) {
-			if (set_migratetype_isolate(page, migratetype, flags)) {
-				undo_pfn = pfn;
-				goto undo;
-			}
+		if (page && set_migratetype_isolate(page, migratetype, flags)) {
+			undo_isolate_page_range(start_pfn, pfn, migratetype);
+			return -EBUSY;
 		}
 	}
 	return 0;
-undo:
-	for (pfn = start_pfn;
-	     pfn < undo_pfn;
-	     pfn += pageblock_nr_pages) {
-		struct page *page = pfn_to_online_page(pfn);
-		if (!page)
-			continue;
-		unset_migratetype_isolate(page, migratetype);
-	}
-
-	return -EBUSY;
 }
 
 /*