diff mbox series

[RFC,v2,2/9] mm: optimize do_wp_page() for fresh pages in local LRU pagevecs

Message ID 20220126095557.32392-3-david@redhat.com (mailing list archive)
State New
Headers show
Series mm: COW fixes part 1: fix the COW security issue for THP and swap | expand

Commit Message

David Hildenbrand Jan. 26, 2022, 9:55 a.m. UTC
For example, if a page just got swapped in via a read fault, the LRU
pagevecs might still hold a reference to the page. If we trigger a
write fault on such a page, the additional reference from the LRU
pagevecs will prohibit reusing the page.

Let's conditionally drain the local LRU pagevecs when we stumble over a
!PageLRU() page. We cannot easily drain remote LRU pagevecs and it might
not be desirable performance-wise. Consequently, this will only avoid
copying in some cases.

We cannot easily handle the following cases and we will always have to
copy:

(1) The page is referenced in the LRU pagevecs of other CPUs. We really
    would have to drain the LRU pagevecs of all CPUs -- most probably
    copying is much cheaper.

(2) The page is already PageLRU() but is getting moved between LRU
    lists, for example, for activation (e.g., mark_page_accessed()),
    deactivation (MADV_COLD), or lazyfree (MADV_FREE). We'd have to
    drain mostly unconditionally, which might be bad performance-wise.
    Most probably this won't happen too often in practice.

Note that there are other reasons why an anon page might temporarily not
be PageLRU(): for example, compaction and migration have to isolate LRU
pages from the LRU lists first (isolate_lru_page()), moving them to
temporary local lists and clearing PageLRU() and holding an additional
reference on the page. In that case, we'll always copy.

This change seems to be fairly effective with the reproducer [1] shared
by Nadav, as long as writeback is done synchronously, for example, using
zram. However, with asynchronous writeback, we'll usually fail to free the
swapcache because the page is still under writeback: something we cannot
easily optimize for, and maybe it's not really relevant in practice.

[1] https://lkml.kernel.org/r/0480D692-D9B2-429A-9A88-9BBA1331AC3A@gmail.com

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/memory.c | 12 +++++++++++-
 1 file changed, 11 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Jan. 26, 2022, 2:31 p.m. UTC | #1
On Wed, Jan 26, 2022 at 10:55:50AM +0100, David Hildenbrand wrote:
> diff --git a/mm/memory.c b/mm/memory.c
> index bcd3b7c50891..61d67ceef734 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -3298,7 +3298,17 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>  		 *
>  		 * PageKsm() doesn't necessarily raise the page refcount.
>  		 */
> -		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
> +		if (PageKsm(page))
> +			goto copy;
> +		if (page_count(page) > 1 + PageSwapCache(page) + !PageLRU(page))
> +			goto copy;
> +		if (!PageLRU(page))
> +			/*
> +			 * Note: We cannot easily detect+handle references from
> +			 * remote LRU pagevecs or references to PageLRU() pages.
> +			 */
> +			lru_add_drain();
> +		if (page_count(page) > 1 + PageSwapCache(page))
>  			goto copy;

I worry we're starting to get too accurate here.  How about:

		if (PageKsm(page) || page_count(page) > 3)
			goto copy;
		if (!PageLRU(page))
			lru_add_drain();
		if (!trylock_page(page))
			goto copy;
...

>  		if (!trylock_page(page))
>  			goto copy;
> -- 
> 2.34.1
>
David Hildenbrand Jan. 26, 2022, 2:36 p.m. UTC | #2
On 26.01.22 15:31, Matthew Wilcox wrote:
> On Wed, Jan 26, 2022 at 10:55:50AM +0100, David Hildenbrand wrote:
>> diff --git a/mm/memory.c b/mm/memory.c
>> index bcd3b7c50891..61d67ceef734 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -3298,7 +3298,17 @@ static vm_fault_t do_wp_page(struct vm_fault *vmf)
>>  		 *
>>  		 * PageKsm() doesn't necessarily raise the page refcount.
>>  		 */
>> -		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
>> +		if (PageKsm(page))
>> +			goto copy;
>> +		if (page_count(page) > 1 + PageSwapCache(page) + !PageLRU(page))
>> +			goto copy;
>> +		if (!PageLRU(page))
>> +			/*
>> +			 * Note: We cannot easily detect+handle references from
>> +			 * remote LRU pagevecs or references to PageLRU() pages.
>> +			 */
>> +			lru_add_drain();
>> +		if (page_count(page) > 1 + PageSwapCache(page))
>>  			goto copy;
> 
> I worry we're starting to get too accurate here.  How about:
> 

In that simplified case we'll essentially trylock_page() and for most
pages that are ordinarily shared by e.g., 2 processes.

> 		if (PageKsm(page) || page_count(page) > 3)
> 			goto copy;
> 		if (!PageLRU(page))
> 			lru_add_drain();
> 		if (!trylock_page(page))
> 			goto copy;
> ...
> 

I think we might at least want the

if (page_count(page) > 1 + PageSwapCache(page))
	goto copy;

check here.

>>  		if (!trylock_page(page))
>>  			goto copy;
>> -- 
>> 2.34.1
>>
>
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index bcd3b7c50891..61d67ceef734 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3298,7 +3298,17 @@  static vm_fault_t do_wp_page(struct vm_fault *vmf)
 		 *
 		 * PageKsm() doesn't necessarily raise the page refcount.
 		 */
-		if (PageKsm(page) || page_count(page) > 1 + PageSwapCache(page))
+		if (PageKsm(page))
+			goto copy;
+		if (page_count(page) > 1 + PageSwapCache(page) + !PageLRU(page))
+			goto copy;
+		if (!PageLRU(page))
+			/*
+			 * Note: We cannot easily detect+handle references from
+			 * remote LRU pagevecs or references to PageLRU() pages.
+			 */
+			lru_add_drain();
+		if (page_count(page) > 1 + PageSwapCache(page))
 			goto copy;
 		if (!trylock_page(page))
 			goto copy;