[0/7] support large folio swap-out and swap-in for shmem

Message ID	cover.1717673614.git.baolin.wang@linux.alibaba.com (mailing list archive)
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Baolin Wang <baolin.wang@linux.alibaba.com> To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, chrisl@kernel.org, ying.huang@intel.com, 21cnbao@gmail.com, ryan.roberts@arm.com, shy828301@gmail.com, ziy@nvidia.com, ioworker0@gmail.com, da.gomez@samsung.com, p.raghav@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH 0/7] support large folio swap-out and swap-in for shmem Date: Thu, 6 Jun 2024 19:58:50 +0800 Message-Id: <cover.1717673614.git.baolin.wang@linux.alibaba.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	support large folio swap-out and swap-in for shmem \| expand [0/7] support large folio swap-out and swap-in for shmem [1/7] mm: vmscan: add validation before spliting shmem large folio [2/7] mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM flag setting [3/7] mm: shmem: support large folio allocation for shmem_replace_folio() [4/7] mm: shmem: extend shmem_partial_swap_usage() to support large folio swap [6/7] mm: shmem: use swap_free_nr() to free shmem swap entries [7/7] mm: shmem: support large folio swap out

Baolin Wang June 6, 2024, 11:58 a.m. UTC

Shmem will support large folio allocation [1] [2] to get a better performance,
however, the memory reclaim still splits the precious large folios when trying
to swap-out shmem, which may lead to the memory fragmentation issue and can not
take advantage of the large folio for shmeme.

Moreover, the swap code already supports for swapping out large folio without
split, and large folio swap-in[3] series is queued into mm-unstable branch.
Hence this patch set also supports the large folio swap-out and swap-in for
shmem.

[1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
[2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
[3] https://lore.kernel.org/all/20240508224040.190469-6-21cnbao@gmail.com/T/

Changes from RFC:
 - Rebased to the latest mm-unstable.
 - Drop the counter name fixing patch, which was queued into mm-hotfixes-stable
 branch.

Baolin Wang (7):
  mm: vmscan: add validation before spliting shmem large folio
  mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM
    flag setting
  mm: shmem: support large folio allocation for shmem_replace_folio()
  mm: shmem: extend shmem_partial_swap_usage() to support large folio
    swap
  mm: add new 'orders' parameter for find_get_entries() and
    find_lock_entries()
  mm: shmem: use swap_free_nr() to free shmem swap entries
  mm: shmem: support large folio swap out

 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
 include/linux/swap.h                      |  4 +-
 include/linux/writeback.h                 |  1 +
 mm/filemap.c                              | 27 ++++++-
 mm/internal.h                             |  4 +-
 mm/shmem.c                                | 58 ++++++++------
 mm/swapfile.c                             | 98 ++++++++++++-----------
 mm/truncate.c                             |  8 +-
 mm/vmscan.c                               | 22 ++++-
 9 files changed, 140 insertions(+), 83 deletions(-)

Daniel Gomez June 10, 2024, 3:23 p.m. UTC | #1

Hi Baolin,

On Thu, Jun 06, 2024 at 07:58:55PM +0800, Baolin Wang wrote:
> In the following patches, shmem will support the swap out of large folios,
> which means the shmem mappings may contain large order swap entries, so an
> 'orders' array is added for find_get_entries() and find_lock_entries() to
> obtain the order size of shmem swap entries, which will help in the release
> of shmem large folio swap entries.
> 
> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  mm/filemap.c  | 27 +++++++++++++++++++++++++--
>  mm/internal.h |  4 ++--
>  mm/shmem.c    | 17 +++++++++--------
>  mm/truncate.c |  8 ++++----
>  4 files changed, 40 insertions(+), 16 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 37061aafd191..47fcd9ee6012 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -2036,14 +2036,24 @@ static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
>   * Return: The number of entries which were found.
>   */
>  unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices)
> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices,
> +		int *orders)
>  {
>  	XA_STATE(xas, &mapping->i_pages, *start);
>  	struct folio *folio;
> +	int order;
>  
>  	rcu_read_lock();
>  	while ((folio = find_get_entry(&xas, end, XA_PRESENT)) != NULL) {
>  		indices[fbatch->nr] = xas.xa_index;
> +		if (orders) {
> +			if (!xa_is_value(folio))
> +				order = folio_order(folio);
> +			else
> +				order = xa_get_order(xas.xa, xas.xa_index);
> +
> +			orders[fbatch->nr] = order;
> +		}
>  		if (!folio_batch_add(fbatch, folio))
>  			break;
>  	}
> @@ -2056,6 +2066,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>  		folio = fbatch->folios[idx];
>  		if (!xa_is_value(folio))
>  			nr = folio_nr_pages(folio);
> +		else if (orders)
> +			nr = 1 << orders[idx];
>  		*start = indices[idx] + nr;
>  	}
>  	return folio_batch_count(fbatch);
> @@ -2082,10 +2094,12 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>   * Return: The number of entries which were found.
>   */
>  unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices)
> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices,
> +		int *orders)
>  {
>  	XA_STATE(xas, &mapping->i_pages, *start);
>  	struct folio *folio;
> +	int order;
>  
>  	rcu_read_lock();
>  	while ((folio = find_get_entry(&xas, end, XA_PRESENT))) {
> @@ -2099,9 +2113,16 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>  			if (folio->mapping != mapping ||
>  			    folio_test_writeback(folio))
>  				goto unlock;
> +			if (orders)
> +				order = folio_order(folio);
>  			VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index),
>  					folio);
> +		} else if (orders) {
> +			order = xa_get_order(xas.xa, xas.xa_index);
>  		}
> +
> +		if (orders)
> +			orders[fbatch->nr] = order;
>  		indices[fbatch->nr] = xas.xa_index;
>  		if (!folio_batch_add(fbatch, folio))
>  			break;
> @@ -2120,6 +2141,8 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>  		folio = fbatch->folios[idx];
>  		if (!xa_is_value(folio))
>  			nr = folio_nr_pages(folio);
> +		else if (orders)
> +			nr = 1 << orders[idx];
>  		*start = indices[idx] + nr;
>  	}
>  	return folio_batch_count(fbatch);
> diff --git a/mm/internal.h b/mm/internal.h
> index 3419c329b3bc..0b5adb6c33cc 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -339,9 +339,9 @@ static inline void force_page_cache_readahead(struct address_space *mapping,
>  }
>  
>  unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices);
> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices, int *orders);
>  unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices);
> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices, int *orders);
>  void filemap_free_folio(struct address_space *mapping, struct folio *folio);
>  int truncate_inode_folio(struct address_space *mapping, struct folio *folio);
>  bool truncate_inode_partial_folio(struct folio *folio, loff_t start,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 0ac71580decb..28ba603d87b8 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -840,14 +840,14 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
>   * Remove swap entry from page cache, free the swap and its page cache.
>   */
>  static int shmem_free_swap(struct address_space *mapping,
> -			   pgoff_t index, void *radswap)
> +			   pgoff_t index, void *radswap, int order)
>  {
>  	void *old;

Matthew Wilcox suggested [1] returning the number of pages freed in shmem_free_swap().

[1] https://lore.kernel.org/all/ZQRf2pGWurrE0uO+@casper.infradead.org/

Which I submitted here:
https://lore.kernel.org/all/20231028211518.3424020-5-da.gomez@samsung.com/

Do you agree with the suggestion? If so, could we update my patch to use
free_swap_and_cache_nr() or include that here?

>  
>  	old = xa_cmpxchg_irq(&mapping->i_pages, index, radswap, NULL, 0);
>  	if (old != radswap)
>  		return -ENOENT;
> -	free_swap_and_cache(radix_to_swp_entry(radswap));
> +	free_swap_and_cache_nr(radix_to_swp_entry(radswap), 1 << order);
>  	return 0;
>  }
>  
> @@ -981,6 +981,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  	pgoff_t end = (lend + 1) >> PAGE_SHIFT;
>  	struct folio_batch fbatch;
>  	pgoff_t indices[PAGEVEC_SIZE];
> +	int orders[PAGEVEC_SIZE];
>  	struct folio *folio;
>  	bool same_folio;
>  	long nr_swaps_freed = 0;
> @@ -996,15 +997,15 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  	folio_batch_init(&fbatch);
>  	index = start;
>  	while (index < end && find_lock_entries(mapping, &index, end - 1,
> -			&fbatch, indices)) {
> +			&fbatch, indices, orders)) {
>  		for (i = 0; i < folio_batch_count(&fbatch); i++) {
>  			folio = fbatch.folios[i];
>  
>  			if (xa_is_value(folio)) {
>  				if (unfalloc)
>  					continue;
> -				nr_swaps_freed += !shmem_free_swap(mapping,
> -							indices[i], folio);
> +				if (!shmem_free_swap(mapping, indices[i], folio, orders[i]))
> +					nr_swaps_freed += 1 << orders[i];
>  				continue;
>  			}
>  
> @@ -1058,7 +1059,7 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  		cond_resched();
>  
>  		if (!find_get_entries(mapping, &index, end - 1, &fbatch,
> -				indices)) {
> +				indices, orders)) {
>  			/* If all gone or hole-punch or unfalloc, we're done */
>  			if (index == start || end != -1)
>  				break;
> @@ -1072,12 +1073,12 @@ static void shmem_undo_range(struct inode *inode, loff_t lstart, loff_t lend,
>  			if (xa_is_value(folio)) {
>  				if (unfalloc)
>  					continue;
> -				if (shmem_free_swap(mapping, indices[i], folio)) {
> +				if (shmem_free_swap(mapping, indices[i], folio, orders[i])) {
>  					/* Swap was replaced by page: retry */
>  					index = indices[i];
>  					break;
>  				}
> -				nr_swaps_freed++;
> +				nr_swaps_freed += 1 << orders[i];
>  				continue;
>  			}
>  
> diff --git a/mm/truncate.c b/mm/truncate.c
> index 5ce62a939e55..3a4bc9dba451 100644
> --- a/mm/truncate.c
> +++ b/mm/truncate.c
> @@ -352,7 +352,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  	folio_batch_init(&fbatch);
>  	index = start;
>  	while (index < end && find_lock_entries(mapping, &index, end - 1,
> -			&fbatch, indices)) {
> +			&fbatch, indices, NULL)) {
>  		truncate_folio_batch_exceptionals(mapping, &fbatch, indices);
>  		for (i = 0; i < folio_batch_count(&fbatch); i++)
>  			truncate_cleanup_folio(fbatch.folios[i]);
> @@ -392,7 +392,7 @@ void truncate_inode_pages_range(struct address_space *mapping,
>  	while (index < end) {
>  		cond_resched();
>  		if (!find_get_entries(mapping, &index, end - 1, &fbatch,
> -				indices)) {
> +				indices, NULL)) {
>  			/* If all gone from start onwards, we're done */
>  			if (index == start)
>  				break;
> @@ -496,7 +496,7 @@ unsigned long mapping_try_invalidate(struct address_space *mapping,
>  	int i;
>  
>  	folio_batch_init(&fbatch);
> -	while (find_lock_entries(mapping, &index, end, &fbatch, indices)) {
> +	while (find_lock_entries(mapping, &index, end, &fbatch, indices, NULL)) {
>  		for (i = 0; i < folio_batch_count(&fbatch); i++) {
>  			struct folio *folio = fbatch.folios[i];
>  
> @@ -622,7 +622,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping,
>  
>  	folio_batch_init(&fbatch);
>  	index = start;
> -	while (find_get_entries(mapping, &index, end, &fbatch, indices)) {
> +	while (find_get_entries(mapping, &index, end, &fbatch, indices, NULL)) {
>  		for (i = 0; i < folio_batch_count(&fbatch); i++) {
>  			struct folio *folio = fbatch.folios[i];
>  
> -- 
> 2.39.3
> 

Daniel

Matthew Wilcox June 10, 2024, 4:59 p.m. UTC | #2

On Thu, Jun 06, 2024 at 07:58:55PM +0800, Baolin Wang wrote:
> In the following patches, shmem will support the swap out of large folios,
> which means the shmem mappings may contain large order swap entries, so an
> 'orders' array is added for find_get_entries() and find_lock_entries() to
> obtain the order size of shmem swap entries, which will help in the release
> of shmem large folio swap entries.

I am not a fan.  I was hoping that 'order' would be encoded in the swap
entry, not passed as a separate parameter.

As I understand it, we currently have a free bit, or
swp_to_radix_entry() would not work.  We can use that as detailed
here to encode the order in a single bit.

https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder

Baolin Wang June 11, 2024, 3:31 a.m. UTC | #3

On 2024/6/10 23:23, Daniel Gomez wrote:
> Hi Baolin,
> 
> On Thu, Jun 06, 2024 at 07:58:55PM +0800, Baolin Wang wrote:
>> In the following patches, shmem will support the swap out of large folios,
>> which means the shmem mappings may contain large order swap entries, so an
>> 'orders' array is added for find_get_entries() and find_lock_entries() to
>> obtain the order size of shmem swap entries, which will help in the release
>> of shmem large folio swap entries.
>>
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   mm/filemap.c  | 27 +++++++++++++++++++++++++--
>>   mm/internal.h |  4 ++--
>>   mm/shmem.c    | 17 +++++++++--------
>>   mm/truncate.c |  8 ++++----
>>   4 files changed, 40 insertions(+), 16 deletions(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 37061aafd191..47fcd9ee6012 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -2036,14 +2036,24 @@ static inline struct folio *find_get_entry(struct xa_state *xas, pgoff_t max,
>>    * Return: The number of entries which were found.
>>    */
>>   unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices)
>> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices,
>> +		int *orders)
>>   {
>>   	XA_STATE(xas, &mapping->i_pages, *start);
>>   	struct folio *folio;
>> +	int order;
>>   
>>   	rcu_read_lock();
>>   	while ((folio = find_get_entry(&xas, end, XA_PRESENT)) != NULL) {
>>   		indices[fbatch->nr] = xas.xa_index;
>> +		if (orders) {
>> +			if (!xa_is_value(folio))
>> +				order = folio_order(folio);
>> +			else
>> +				order = xa_get_order(xas.xa, xas.xa_index);
>> +
>> +			orders[fbatch->nr] = order;
>> +		}
>>   		if (!folio_batch_add(fbatch, folio))
>>   			break;
>>   	}
>> @@ -2056,6 +2066,8 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>>   		folio = fbatch->folios[idx];
>>   		if (!xa_is_value(folio))
>>   			nr = folio_nr_pages(folio);
>> +		else if (orders)
>> +			nr = 1 << orders[idx];
>>   		*start = indices[idx] + nr;
>>   	}
>>   	return folio_batch_count(fbatch);
>> @@ -2082,10 +2094,12 @@ unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>>    * Return: The number of entries which were found.
>>    */
>>   unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices)
>> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices,
>> +		int *orders)
>>   {
>>   	XA_STATE(xas, &mapping->i_pages, *start);
>>   	struct folio *folio;
>> +	int order;
>>   
>>   	rcu_read_lock();
>>   	while ((folio = find_get_entry(&xas, end, XA_PRESENT))) {
>> @@ -2099,9 +2113,16 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>>   			if (folio->mapping != mapping ||
>>   			    folio_test_writeback(folio))
>>   				goto unlock;
>> +			if (orders)
>> +				order = folio_order(folio);
>>   			VM_BUG_ON_FOLIO(!folio_contains(folio, xas.xa_index),
>>   					folio);
>> +		} else if (orders) {
>> +			order = xa_get_order(xas.xa, xas.xa_index);
>>   		}
>> +
>> +		if (orders)
>> +			orders[fbatch->nr] = order;
>>   		indices[fbatch->nr] = xas.xa_index;
>>   		if (!folio_batch_add(fbatch, folio))
>>   			break;
>> @@ -2120,6 +2141,8 @@ unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>>   		folio = fbatch->folios[idx];
>>   		if (!xa_is_value(folio))
>>   			nr = folio_nr_pages(folio);
>> +		else if (orders)
>> +			nr = 1 << orders[idx];
>>   		*start = indices[idx] + nr;
>>   	}
>>   	return folio_batch_count(fbatch);
>> diff --git a/mm/internal.h b/mm/internal.h
>> index 3419c329b3bc..0b5adb6c33cc 100644
>> --- a/mm/internal.h
>> +++ b/mm/internal.h
>> @@ -339,9 +339,9 @@ static inline void force_page_cache_readahead(struct address_space *mapping,
>>   }
>>   
>>   unsigned find_lock_entries(struct address_space *mapping, pgoff_t *start,
>> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices);
>> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices, int *orders);
>>   unsigned find_get_entries(struct address_space *mapping, pgoff_t *start,
>> -		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices);
>> +		pgoff_t end, struct folio_batch *fbatch, pgoff_t *indices, int *orders);
>>   void filemap_free_folio(struct address_space *mapping, struct folio *folio);
>>   int truncate_inode_folio(struct address_space *mapping, struct folio *folio);
>>   bool truncate_inode_partial_folio(struct folio *folio, loff_t start,
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 0ac71580decb..28ba603d87b8 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -840,14 +840,14 @@ static void shmem_delete_from_page_cache(struct folio *folio, void *radswap)
>>    * Remove swap entry from page cache, free the swap and its page cache.
>>    */
>>   static int shmem_free_swap(struct address_space *mapping,
>> -			   pgoff_t index, void *radswap)
>> +			   pgoff_t index, void *radswap, int order)
>>   {
>>   	void *old;
> 
> Matthew Wilcox suggested [1] returning the number of pages freed in shmem_free_swap().
> 
> [1] https://lore.kernel.org/all/ZQRf2pGWurrE0uO+@casper.infradead.org/
> 
> Which I submitted here:
> https://lore.kernel.org/all/20231028211518.3424020-5-da.gomez@samsung.com/
> 
> Do you agree with the suggestion? If so, could we update my patch to use
> free_swap_and_cache_nr() or include that here?

Yes, this looks good to me. But we still need some modification for 
find_lock_entries() and find_get_entries() to update the '*start' 
correctly. I will include your changes into this patch in next version. 
Thanks.

Baolin Wang June 11, 2024, 3:38 a.m. UTC | #4

On 2024/6/11 00:59, Matthew Wilcox wrote:
> On Thu, Jun 06, 2024 at 07:58:55PM +0800, Baolin Wang wrote:
>> In the following patches, shmem will support the swap out of large folios,
>> which means the shmem mappings may contain large order swap entries, so an
>> 'orders' array is added for find_get_entries() and find_lock_entries() to
>> obtain the order size of shmem swap entries, which will help in the release
>> of shmem large folio swap entries.
> 
> I am not a fan.

With Daniel's suggestion, I think I can drop the 'order' parameter if 
you don't like it.

I was hoping that 'order' would be encoded in the swap
> entry, not passed as a separate parameter.
> 
> As I understand it, we currently have a free bit, or
> swp_to_radix_entry() would not work.  We can use that as detailed
> here to encode the order in a single bit.
> 
> https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder

OK. This seems to deserve a separate patch set. I will look at your 
suggestion in detail. Thanks.

Hugh Dickins June 12, 2024, 5:46 a.m. UTC | #5

On Thu, 6 Jun 2024, Baolin Wang wrote:

> Shmem will support large folio allocation [1] [2] to get a better performance,
> however, the memory reclaim still splits the precious large folios when trying
> to swap-out shmem, which may lead to the memory fragmentation issue and can not
> take advantage of the large folio for shmeme.
> 
> Moreover, the swap code already supports for swapping out large folio without
> split, and large folio swap-in[3] series is queued into mm-unstable branch.
> Hence this patch set also supports the large folio swap-out and swap-in for
> shmem.
> 
> [1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
> [2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
> [3] https://lore.kernel.org/all/20240508224040.190469-6-21cnbao@gmail.com/T/
> 
> Changes from RFC:
>  - Rebased to the latest mm-unstable.
>  - Drop the counter name fixing patch, which was queued into mm-hotfixes-stable
>  branch.
> 
> Baolin Wang (7):
>   mm: vmscan: add validation before spliting shmem large folio
>   mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM
>     flag setting
>   mm: shmem: support large folio allocation for shmem_replace_folio()
>   mm: shmem: extend shmem_partial_swap_usage() to support large folio
>     swap
>   mm: add new 'orders' parameter for find_get_entries() and
>     find_lock_entries()
>   mm: shmem: use swap_free_nr() to free shmem swap entries
>   mm: shmem: support large folio swap out
> 
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
>  include/linux/swap.h                      |  4 +-
>  include/linux/writeback.h                 |  1 +
>  mm/filemap.c                              | 27 ++++++-
>  mm/internal.h                             |  4 +-
>  mm/shmem.c                                | 58 ++++++++------
>  mm/swapfile.c                             | 98 ++++++++++++-----------
>  mm/truncate.c                             |  8 +-
>  mm/vmscan.c                               | 22 ++++-
>  9 files changed, 140 insertions(+), 83 deletions(-)

I wanted to have some tests running, while looking through these
and your shmem mTHP patches; but I wasted too much time on that by
applying these on top and hitting crash, OOMs and dreadful thrashing -
testing did not get very far at all.

Perhaps all easily fixed, but I don't have more time to spend on it,
and think this series cannot expect to go into 6.11: I'll have another
try with it next cycle.

I really must turn my attention to your shmem mTHP series: no doubt
I'll have minor adjustments to ask there - but several other people
are also waiting for me to respond (or given up on me completely).

The little crash fix needed in this series appears to be:

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2053,7 +2053,8 @@ static int shmem_swapin_folio(struct ino
 			goto failed;
 	}
 
-	error = shmem_add_to_page_cache(folio, mapping, index,
+	error = shmem_add_to_page_cache(folio, mapping,
+					round_down(index, nr_pages),
 					swp_to_radix_entry(swap), gfp);
 	if (error)
 		goto failed;

Then the OOMs and dreadful thrashing are due to refcount confusion:
I did not even glance at these patches to work out what's wanted,
but a printk in __remove_mapping() showed that folio->_refcount was
1024 where 513 was expected, so reclaim was freeing none of them.

Hugh

Baolin Wang June 12, 2024, 6:23 a.m. UTC | #6

Hi Hugh,

On 2024/6/12 13:46, Hugh Dickins wrote:
> On Thu, 6 Jun 2024, Baolin Wang wrote:
> 
>> Shmem will support large folio allocation [1] [2] to get a better performance,
>> however, the memory reclaim still splits the precious large folios when trying
>> to swap-out shmem, which may lead to the memory fragmentation issue and can not
>> take advantage of the large folio for shmeme.
>>
>> Moreover, the swap code already supports for swapping out large folio without
>> split, and large folio swap-in[3] series is queued into mm-unstable branch.
>> Hence this patch set also supports the large folio swap-out and swap-in for
>> shmem.
>>
>> [1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
>> [2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
>> [3] https://lore.kernel.org/all/20240508224040.190469-6-21cnbao@gmail.com/T/
>>
>> Changes from RFC:
>>   - Rebased to the latest mm-unstable.
>>   - Drop the counter name fixing patch, which was queued into mm-hotfixes-stable
>>   branch.
>>
>> Baolin Wang (7):
>>    mm: vmscan: add validation before spliting shmem large folio
>>    mm: swap: extend swap_shmem_alloc() to support batch SWAP_MAP_SHMEM
>>      flag setting
>>    mm: shmem: support large folio allocation for shmem_replace_folio()
>>    mm: shmem: extend shmem_partial_swap_usage() to support large folio
>>      swap
>>    mm: add new 'orders' parameter for find_get_entries() and
>>      find_lock_entries()
>>    mm: shmem: use swap_free_nr() to free shmem swap entries
>>    mm: shmem: support large folio swap out
>>
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
>>   include/linux/swap.h                      |  4 +-
>>   include/linux/writeback.h                 |  1 +
>>   mm/filemap.c                              | 27 ++++++-
>>   mm/internal.h                             |  4 +-
>>   mm/shmem.c                                | 58 ++++++++------
>>   mm/swapfile.c                             | 98 ++++++++++++-----------
>>   mm/truncate.c                             |  8 +-
>>   mm/vmscan.c                               | 22 ++++-
>>   9 files changed, 140 insertions(+), 83 deletions(-)
> 
> I wanted to have some tests running, while looking through these
> and your shmem mTHP patches; but I wasted too much time on that by
> applying these on top and hitting crash, OOMs and dreadful thrashing -
> testing did not get very far at all.

Thanks for testing. I am sorry I haven't found the issues with my testing.

> Perhaps all easily fixed, but I don't have more time to spend on it,
> and think this series cannot expect to go into 6.11: I'll have another
> try with it next cycle.
> 
> I really must turn my attention to your shmem mTHP series: no doubt
> I'll have minor adjustments to ask there - but several other people
> are also waiting for me to respond (or given up on me completely).

Sure. Thanks.

> 
> The little crash fix needed in this series appears to be:
> 
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2053,7 +2053,8 @@ static int shmem_swapin_folio(struct ino
>   			goto failed;
>   	}
>   
> -	error = shmem_add_to_page_cache(folio, mapping, index,
> +	error = shmem_add_to_page_cache(folio, mapping,
> +					round_down(index, nr_pages),
>   					swp_to_radix_entry(swap), gfp);
>   	if (error)
>   		goto failed;

Good catch. I missed this.

> Then the OOMs and dreadful thrashing are due to refcount confusion:
> I did not even glance at these patches to work out what's wanted,
> but a printk in __remove_mapping() showed that folio->_refcount was
> 1024 where 513 was expected, so reclaim was freeing none of them.

I will look at this issue and continue to do more tesing before sending 
out new version. Thanks.

[0/7] support large folio swap-out and swap-in for shmem

Message

Comments