diff mbox series

[3/3] mm/memory_hotplug: make HWPoisoned dirty swapcache pages unmovable

Message ID 20210821094246.10149-4-linmiaohe@huawei.com (mailing list archive)
State New
Headers show
Series Cleanup and fixups for memory hotplug | expand

Commit Message

Miaohe Lin Aug. 21, 2021, 9:42 a.m. UTC
HWPoisoned dirty swapcache pages are kept for killing owner processes.
We should not offline these pages or do_swap_page() would access the
offline pages and lead to bad ending.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
---
 mm/memory_hotplug.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

HORIGUCHI NAOYA(堀口 直也) Aug. 23, 2021, 8:26 a.m. UTC | #1
On Sat, Aug 21, 2021 at 05:42:46PM +0800, Miaohe Lin wrote:
> HWPoisoned dirty swapcache pages are kept for killing owner processes.
> We should not offline these pages or do_swap_page() would access the
> offline pages and lead to bad ending.
> 

Thank you for the report.  I'm not yet sure of the whole picture of this
issue.  do_swap_page() is expected to return with fault VM_FAULT_HWPOISON
when called via the access to the error page, so I wonder why this doesn't
work for your situation.  And what is the "bad ending" in the description?

I feel that aborting memory hotremove due to a hwpoisoned dirty swapcache
might be too hard, so I'd like to find another solution if we have.
# You may separate this patch from former two to make them merged to
# mainline soon.

Thanks,
Naoya Horiguchi

> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
> ---
>  mm/memory_hotplug.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 9fd0be32a281..0488eed3327c 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1664,6 +1664,12 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>  		 */
>  		if (PageOffline(page) && page_count(page))
>  			return -EBUSY;
> +		/*
> +		 * HWPoisoned dirty swapcache pages are definitely unmovable
> +		 * because they are kept for killing owner processes.
> +		 */
> +		if (PageHWPoison(page) && PageSwapCache(page))
> +			return -EBUSY;
Miaohe Lin Aug. 23, 2021, 9:14 a.m. UTC | #2
On 2021/8/23 16:26, HORIGUCHI NAOYA(堀口 直也) wrote:
> On Sat, Aug 21, 2021 at 05:42:46PM +0800, Miaohe Lin wrote:
>> HWPoisoned dirty swapcache pages are kept for killing owner processes.
>> We should not offline these pages or do_swap_page() would access the
>> offline pages and lead to bad ending.
>>
> 
> Thank you for the report.  I'm not yet sure of the whole picture of this
> issue.  do_swap_page() is expected to return with fault VM_FAULT_HWPOISON
> when called via the access to the error page, so I wonder why this doesn't
> work for your situation.  And what is the "bad ending" in the description?
> 

IMO we might hotremove the page while SwapCache still have ref to it. Thus the page
struct would be accessed after offlined. The page struct should be invalid in this case
and this would make do_swap_page fragile. Or am I miss something?

> I feel that aborting memory hotremove due to a hwpoisoned dirty swapcache
> might be too hard, so I'd like to find another solution if we have.

If there is a better way, we can just drop this one.

Many thanks for your review and reply! :)

> # You may separate this patch from former two to make them merged to
> # mainline soon.
> 
> Thanks,
> Naoya Horiguchi
> 
>> Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
>> ---
>>  mm/memory_hotplug.c | 6 ++++++
>>  1 file changed, 6 insertions(+)
>>
>> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
>> index 9fd0be32a281..0488eed3327c 100644
>> --- a/mm/memory_hotplug.c
>> +++ b/mm/memory_hotplug.c
>> @@ -1664,6 +1664,12 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
>>  		 */
>>  		if (PageOffline(page) && page_count(page))
>>  			return -EBUSY;
>> +		/*
>> +		 * HWPoisoned dirty swapcache pages are definitely unmovable
>> +		 * because they are kept for killing owner processes.
>> +		 */
>> +		if (PageHWPoison(page) && PageSwapCache(page))
>> +			return -EBUSY;
Andrew Morton Nov. 4, 2021, 10:07 p.m. UTC | #3
On Mon, 23 Aug 2021 17:14:29 +0800 Miaohe Lin <linmiaohe@huawei.com> wrote:

> On 2021/8/23 16:26, HORIGUCHI NAOYA(堀口 直也) wrote:
> > On Sat, Aug 21, 2021 at 05:42:46PM +0800, Miaohe Lin wrote:
> >> HWPoisoned dirty swapcache pages are kept for killing owner processes.
> >> We should not offline these pages or do_swap_page() would access the
> >> offline pages and lead to bad ending.
> >>
> > 
> > Thank you for the report.  I'm not yet sure of the whole picture of this
> > issue.  do_swap_page() is expected to return with fault VM_FAULT_HWPOISON
> > when called via the access to the error page, so I wonder why this doesn't
> > work for your situation.  And what is the "bad ending" in the description?
> > 
> 
> IMO we might hotremove the page while SwapCache still have ref to it. Thus the page
> struct would be accessed after offlined. The page struct should be invalid in this case
> and this would make do_swap_page fragile. Or am I miss something?
> 
> > I feel that aborting memory hotremove due to a hwpoisoned dirty swapcache
> > might be too hard, so I'd like to find another solution if we have.
> 
> If there is a better way, we can just drop this one.
> 
> Many thanks for your review and reply! :)
> 
> > # You may separate this patch from former two to make them merged to
> > # mainline soon.
>
> ...
>
> >> --- a/mm/memory_hotplug.c
> >> +++ b/mm/memory_hotplug.c
> >> @@ -1664,6 +1664,12 @@ static int scan_movable_pages(unsigned long start, unsigned long end,
> >>  		 */
> >>  		if (PageOffline(page) && page_count(page))
> >>  			return -EBUSY;
> >> +		/*
> >> +		 * HWPoisoned dirty swapcache pages are definitely unmovable
> >> +		 * because they are kept for killing owner processes.
> >> +		 */
> >> +		if (PageHWPoison(page) && PageSwapCache(page))
> >> +			return -EBUSY;
> 

I'll drop this.  Please resend something if you still believe that
changes are desirable.
diff mbox series

Patch

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9fd0be32a281..0488eed3327c 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1664,6 +1664,12 @@  static int scan_movable_pages(unsigned long start, unsigned long end,
 		 */
 		if (PageOffline(page) && page_count(page))
 			return -EBUSY;
+		/*
+		 * HWPoisoned dirty swapcache pages are definitely unmovable
+		 * because they are kept for killing owner processes.
+		 */
+		if (PageHWPoison(page) && PageSwapCache(page))
+			return -EBUSY;
 
 		if (!PageHuge(page))
 			continue;