diff mbox series

[RFC,v2,1/9] mm: optimize do_wp_page() for exclusive pages in the swapcache

Message ID 20220126095557.32392-2-david@redhat.com (mailing list archive)
State New
Headers show
Series mm: COW fixes part 1: fix the COW security issue for THP and swap | expand

Commit Message

David Hildenbrand Jan. 26, 2022, 9:55 a.m. UTC
Liang Zhang reported [1] that the current COW logic in do_wp_page() is
sub-optimal when it comes to swap+read fault+write fault of anonymous
pages that have a single user, visible via a performance degradation in
the redis benchmark. Something similar was previously reported [2] by
Nadav with a simple reproducer.

Let's optimize for pages that have been added to the swapcache but only
have an exclusive owner. Try removing the swapcache reference if there is
hope that we're the exclusive user.

We will fail removing the swapcache reference in two scenarios:
(1) There are additional swap entries referencing the page: copying
    instead of reusing is the right thing to do.
(2) The page is under writeback: theoretically we might be able to reuse
    in some cases, however, we cannot remove the additional reference
    and will have to copy.

Further, we might have additional references from the LRU pagevecs,
which will force us to copy instead of being able to reuse. We'll try
handling such references for some scenarios next. Concurrent writeback
cannot be handled easily and we'll always have to copy.

While at it, remove the superfluous page_mapcount() check: it's
implicitly covered by the page_count() for ordinary anon pages.

[1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com
[2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com

Reported-by: Liang Zhang <zhangliang5@huawei.com>
Reported-by: Nadav Amit <nadav.amit@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

Comments

Matthew Wilcox Jan. 26, 2022, 2:25 p.m. UTC | #1
On Wed, Jan 26, 2022 at 10:55:49AM +0100, David Hildenbrand wrote:
> Liang Zhang reported [1] that the current COW logic in do_wp_page() is
> sub-optimal when it comes to swap+read fault+write fault of anonymous
> pages that have a single user, visible via a performance degradation in
> the redis benchmark. Something similar was previously reported [2] by
> Nadav with a simple reproducer.
> 
> Let's optimize for pages that have been added to the swapcache but only
> have an exclusive owner. Try removing the swapcache reference if there is
> hope that we're the exclusive user.
> 
> We will fail removing the swapcache reference in two scenarios:
> (1) There are additional swap entries referencing the page: copying
>     instead of reusing is the right thing to do.
> (2) The page is under writeback: theoretically we might be able to reuse
>     in some cases, however, we cannot remove the additional reference
>     and will have to copy.
> 
> Further, we might have additional references from the LRU pagevecs,
> which will force us to copy instead of being able to reuse. We'll try
> handling such references for some scenarios next. Concurrent writeback
> cannot be handled easily and we'll always have to copy.
> 
> While at it, remove the superfluous page_mapcount() check: it's
> implicitly covered by the page_count() for ordinary anon pages.
> 
> [1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com
> [2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com
> 
> Reported-by: Liang Zhang <zhangliang5@huawei.com>
> Reported-by: Nadav Amit <nadav.amit@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Vlastimil Babka Jan. 28, 2022, 12:53 p.m. UTC | #2
On 1/26/22 10:55, David Hildenbrand wrote:
> Liang Zhang reported [1] that the current COW logic in do_wp_page() is
> sub-optimal when it comes to swap+read fault+write fault of anonymous
> pages that have a single user, visible via a performance degradation in
> the redis benchmark. Something similar was previously reported [2] by
> Nadav with a simple reproducer.

Can we make the description more self-contained? I.e. describe that
sub-optimal COW means we copy when it's not necessary, and this can happen
if swap-out is followed by a swap-in for read and a then a write fault
(IIUC), because the swap cache reference increases page_count()...

> Let's optimize for pages that have been added to the swapcache but only
> have an exclusive owner. Try removing the swapcache reference if there is
> hope that we're the exclusive user.

Can we expect any downside for reclaim efficiency due to the more aggressive
removal from swapcache? Probably not, as we are doing the removal when the
page is about to get dirty, so we wouldn't be able to reuse any previously
swapped out content anyway. Maybe it's even beneficial?

> We will fail removing the swapcache reference in two scenarios:
> (1) There are additional swap entries referencing the page: copying
>     instead of reusing is the right thing to do.
> (2) The page is under writeback: theoretically we might be able to reuse
>     in some cases, however, we cannot remove the additional reference
>     and will have to copy.
> 
> Further, we might have additional references from the LRU pagevecs,
> which will force us to copy instead of being able to reuse. We'll try
> handling such references for some scenarios next. Concurrent writeback
> cannot be handled easily and we'll always have to copy.
> 
> While at it, remove the superfluous page_mapcount() check: it's
> implicitly covered by the page_count() for ordinary anon pages.
> 
> [1] https://lkml.kernel.org/r/20220113140318.11117-1-zhangliang5@huawei.com
> [2] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com
> 
> Reported-by: Liang Zhang <zhangliang5@huawei.com>
> Reported-by: Nadav Amit <nadav.amit@gmail.com>
> Signed-off-by: David Hildenbrand <david@redhat.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

> ---
>  mm/memory.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index c125c4969913..bcd3b7c50891 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3291,19 +3291,27 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>  	if (PageAnon(vmf->page)) {
>  		struct page *page = vmf->page;
>  
> -		/* PageKsm() doesn't necessarily raise the page refcount */
> -		if (PageKsm(page) || page_count(page) != 1)
> +		/*
> +		 * We have to verify under page lock: these early checks are
> +		 * just an optimization to avoid locking the page and freeing
> +		 * the swapcache if there is little hope that we can reuse.
> +		 *
> +		 * PageKsm() doesn't necessarily raise the page refcount.
> +		 */
> +		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
>  			goto copy;
>  		if (!trylock_page(page))
>  			goto copy;
> -		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
> +		if (PageSwapCache(page))
> +			try_to_free_swap(page);
> +		if (PageKsm(page) || page_count(page) != 1) {
>  			unlock_page(page);
>  			goto copy;
>  		}
>  		/*
> -		 * Ok, we've got the only map reference, and the only
> -		 * page count reference, and the page is locked,
> -		 * it's dark out, and we're wearing sunglasses. Hit it.
> +		 * Ok, we've got the only page reference from our mapping
> +		 * and the page is locked, it's dark out, and we're wearing
> +		 * sunglasses. Hit it.
>  		 */
>  		unlock_page(page);
>  		wp_page_reuse(vmf);
David Hildenbrand Jan. 28, 2022, 1:44 p.m. UTC | #3
On 28.01.22 13:53, Vlastimil Babka wrote:
> On 1/26/22 10:55, David Hildenbrand wrote:
>> Liang Zhang reported [1] that the current COW logic in do_wp_page() is
>> sub-optimal when it comes to swap+read fault+write fault of anonymous
>> pages that have a single user, visible via a performance degradation in
>> the redis benchmark. Something similar was previously reported [2] by
>> Nadav with a simple reproducer.
> 
> Can we make the description more self-contained? I.e. describe that
> sub-optimal COW means we copy when it's not necessary, and this can happen
> if swap-out is followed by a swap-in for read and a then a write fault
> (IIUC), because the swap cache reference increases page_count()..

Sure, I can add some more details.

> 
>> Let's optimize for pages that have been added to the swapcache but only
>> have an exclusive owner. Try removing the swapcache reference if there is
>> hope that we're the exclusive user.
> 
> Can we expect any downside for reclaim efficiency due to the more aggressive
> removal from swapcache? Probably not, as we are doing the removal when the
> page is about to get dirty, so we wouldn't be able to reuse any previously
> swapped out content anyway. Maybe it's even beneficial?

We're only try removing the swapcache reference if it's likely that we
succeed in reusing the page after we succeeded in removing the swapcache
reference (IOW, no other swap entries, no writeback).

So the early "page_count(page) > 1 + PageSwapCache(page)" check is of
big value I think. It should be rare that we remove the reference and
still end up with "page_count(page) != 1".

And if we're reusing the page (re-dirtying it), removing it from the
swapache is the right thing to do: reuse_swap_cache() would have done
the same.

[...]

>>
>> Reported-by: Liang Zhang <zhangliang5@huawei.com>
>> Reported-by: Nadav Amit <nadav.amit@gmail.com>
>> Signed-off-by: David Hildenbrand <david@redhat.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index c125c4969913..bcd3b7c50891 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3291,19 +3291,27 @@  static vm_fault_t do_wp_page(struct vm_fault *vmf)
 	if (PageAnon(vmf->page)) {
 		struct page *page = vmf->page;
 
-		/* PageKsm() doesn't necessarily raise the page refcount */
-		if (PageKsm(page) || page_count(page) != 1)
+		/*
+		 * We have to verify under page lock: these early checks are
+		 * just an optimization to avoid locking the page and freeing
+		 * the swapcache if there is little hope that we can reuse.
+		 *
+		 * PageKsm() doesn't necessarily raise the page refcount.
+		 */
+		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
 			goto copy;
 		if (!trylock_page(page))
 			goto copy;
-		if (PageKsm(page) || page_mapcount(page) != 1 || page_count(page) != 1) {
+		if (PageSwapCache(page))
+			try_to_free_swap(page);
+		if (PageKsm(page) || page_count(page) != 1) {
 			unlock_page(page);
 			goto copy;
 		}
 		/*
-		 * Ok, we've got the only map reference, and the only
-		 * page count reference, and the page is locked,
-		 * it's dark out, and we're wearing sunglasses. Hit it.
+		 * Ok, we've got the only page reference from our mapping
+		 * and the page is locked, it's dark out, and we're wearing
+		 * sunglasses. Hit it.
 		 */
 		unlock_page(page);
 		wp_page_reuse(vmf);