diff mbox series

[v5,9/9] mm: shmem: support large folio swap out

Message ID d80c21abd20e1b0f5ca66b330f074060fb2f082d.1723434324.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New
Headers show
Series support large folio swap-out and swap-in for shmem | expand

Commit Message

Baolin Wang Aug. 12, 2024, 7:42 a.m. UTC
Shmem will support large folio allocation [1] [2] to get a better performance,
however, the memory reclaim still splits the precious large folios when trying
to swap out shmem, which may lead to the memory fragmentation issue and can not
take advantage of the large folio for shmeme.

Moreover, the swap code already supports for swapping out large folio without
split, hence this patch set supports the large folio swap out for shmem.

Note the i915_gem_shmem driver still need to be split when swapping, thus
add a new flag 'split_large_folio' for writeback_control to indicate spliting
the large folio.

[1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
[2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
 include/linux/writeback.h                 |  4 +++
 mm/shmem.c                                | 12 ++++++---
 mm/vmscan.c                               | 32 ++++++++++++++++++-----
 4 files changed, 38 insertions(+), 11 deletions(-)

Comments

Hugh Dickins Aug. 25, 2024, 11:14 p.m. UTC | #1
On Mon, 12 Aug 2024, Baolin Wang wrote:

> Shmem will support large folio allocation [1] [2] to get a better performance,
> however, the memory reclaim still splits the precious large folios when trying
> to swap out shmem, which may lead to the memory fragmentation issue and can not
> take advantage of the large folio for shmeme.
> 
> Moreover, the swap code already supports for swapping out large folio without
> split, hence this patch set supports the large folio swap out for shmem.
> 
> Note the i915_gem_shmem driver still need to be split when swapping, thus
> add a new flag 'split_large_folio' for writeback_control to indicate spliting
> the large folio.

Is that last paragraph a misunderstanding? The i915 THP splitting in
shmem_writepage() was to avoid mm VM_BUG_ONs and crashes when shmem.c
did not support huge page swapout: but now you are enabling that support,
and such VM_BUG_ONs and crashes are gone (so far as I can see: and this
is written on a laptop using the i915 driver).

I cannot think of why i915 itself would care how mm implements swapout
(beyond enjoying faster): I think all the wbc->split_large_folio you
introduce here should be reverted.  But you may know better!

I do need a further change to shmem_writepage() here: see fixup patch
below: that's written to apply on top of this 9/9, but I'd be glad to
see a replacement with wbc->split_large_folio gone, and just one
!IS_ENABLED(CONFIG_THP_SWAP) instead.

> 
> [1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
> [2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/

I get "Not found" for that [2] link.

> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
>  include/linux/writeback.h                 |  4 +++
>  mm/shmem.c                                | 12 ++++++---
>  mm/vmscan.c                               | 32 ++++++++++++++++++-----
>  4 files changed, 38 insertions(+), 11 deletions(-)

[PATCH] mm: shmem: shmem_writepage() split folio at EOF before swapout

Working in a constrained (size= or nr_blocks=) huge=always tmpfs relies
on swapout to split a large folio at EOF, to trim off its excess before
hitting premature ENOSPC: shmem_unused_huge_shrink() contains no code to
handle splitting huge swap blocks, and nobody would want that to be added.

Signed-off-by: Hugh Dickins <hughd@google.com>
---
 mm/shmem.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 37c300f69baf..4dd0570962fa 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1459,6 +1459,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	swp_entry_t swap;
 	pgoff_t index;
 	int nr_pages;
+	bool split = false;
 
 	/*
 	 * Our capabilities prevent regular writeback or sync from ever calling
@@ -1480,8 +1481,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
 	 * and its shmem_writeback() needs them to be split when swapping.
+	 *
+	 * And shrinkage of pages beyond i_size does not split swap, so
+	 * swapout of a large folio crossing i_size needs to split too
+	 * (unless fallocate has been used to preallocate beyond EOF).
 	 */
-	if (wbc->split_large_folio && folio_test_large(folio)) {
+	if (folio_test_large(folio)) {
+		split = wbc->split_large_folio;
+		index = shmem_fallocend(inode,
+			DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
+		if (index > folio->index && index < folio_next_index(folio))
+			split = true;
+	}
+
+	if (split) {
 try_split:
 		/* Ensure the subpages are still dirty */
 		folio_test_set_dirty(folio);
Baolin Wang Aug. 27, 2024, 6:58 a.m. UTC | #2
On 2024/8/26 07:14, Hugh Dickins wrote:
> On Mon, 12 Aug 2024, Baolin Wang wrote:
> 
>> Shmem will support large folio allocation [1] [2] to get a better performance,
>> however, the memory reclaim still splits the precious large folios when trying
>> to swap out shmem, which may lead to the memory fragmentation issue and can not
>> take advantage of the large folio for shmeme.
>>
>> Moreover, the swap code already supports for swapping out large folio without
>> split, hence this patch set supports the large folio swap out for shmem.
>>
>> Note the i915_gem_shmem driver still need to be split when swapping, thus
>> add a new flag 'split_large_folio' for writeback_control to indicate spliting
>> the large folio.
> 
> Is that last paragraph a misunderstanding? The i915 THP splitting in
> shmem_writepage() was to avoid mm VM_BUG_ONs and crashes when shmem.c
> did not support huge page swapout: but now you are enabling that support,
> and such VM_BUG_ONs and crashes are gone (so far as I can see: and this
> is written on a laptop using the i915 driver).

Thanks for the history, and I understand.

> I cannot think of why i915 itself would care how mm implements swapout
> (beyond enjoying faster): I think all the wbc->split_large_folio you
> introduce here should be reverted.  But you may know better!
> 
> I do need a further change to shmem_writepage() here: see fixup patch
> below: that's written to apply on top of this 9/9, but I'd be glad to
> see a replacement with wbc->split_large_folio gone, and just one
> !IS_ENABLED(CONFIG_THP_SWAP) instead.

Sure. After Andrew queuing your fixes, I can send a proper fix patch to 
remove the 'wbc->split_large_folio'.

>> [1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
>> [2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/
> 
> I get "Not found" for that [2] link.

Weird, I can access the link, not sure why.

> 
>> Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>   drivers/gpu/drm/i915/gem/i915_gem_shmem.c |  1 +
>>   include/linux/writeback.h                 |  4 +++
>>   mm/shmem.c                                | 12 ++++++---
>>   mm/vmscan.c                               | 32 ++++++++++++++++++-----
>>   4 files changed, 38 insertions(+), 11 deletions(-)
> 
> [PATCH] mm: shmem: shmem_writepage() split folio at EOF before swapout
> 
> Working in a constrained (size= or nr_blocks=) huge=always tmpfs relies
> on swapout to split a large folio at EOF, to trim off its excess before
> hitting premature ENOSPC: shmem_unused_huge_shrink() contains no code to
> handle splitting huge swap blocks, and nobody would want that to be added.
> 
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
>   mm/shmem.c | 15 ++++++++++++++-
>   1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 37c300f69baf..4dd0570962fa 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1459,6 +1459,7 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   	swp_entry_t swap;
>   	pgoff_t index;
>   	int nr_pages;
> +	bool split = false;
>   
>   	/*
>   	 * Our capabilities prevent regular writeback or sync from ever calling
> @@ -1480,8 +1481,20 @@ static int shmem_writepage(struct page *page, struct writeback_control *wbc)
>   	 * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or
>   	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
>   	 * and its shmem_writeback() needs them to be split when swapping.
> +	 *
> +	 * And shrinkage of pages beyond i_size does not split swap, so
> +	 * swapout of a large folio crossing i_size needs to split too
> +	 * (unless fallocate has been used to preallocate beyond EOF).
>   	 */
> -	if (wbc->split_large_folio && folio_test_large(folio)) {
> +	if (folio_test_large(folio)) {
> +		split = wbc->split_large_folio;
> +		index = shmem_fallocend(inode,
> +			DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE));
> +		if (index > folio->index && index < folio_next_index(folio))
> +			split = true;
> +	}
> +
> +	if (split) {
>   try_split:
>   		/* Ensure the subpages are still dirty */
>   		folio_test_set_dirty(folio);

Thanks for the fix, Hugh. Very appreciated for your reviewing.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
index c5e1c718a6d2..c66cb9c585e1 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
@@ -308,6 +308,7 @@  void __shmem_writeback(size_t size, struct address_space *mapping)
 		.range_start = 0,
 		.range_end = LLONG_MAX,
 		.for_reclaim = 1,
+		.split_large_folio = 1,
 	};
 	unsigned long i;
 
diff --git a/include/linux/writeback.h b/include/linux/writeback.h
index 1a54676d843a..10100e22d5c6 100644
--- a/include/linux/writeback.h
+++ b/include/linux/writeback.h
@@ -63,6 +63,7 @@  struct writeback_control {
 	unsigned range_cyclic:1;	/* range_start is cyclic */
 	unsigned for_sync:1;		/* sync(2) WB_SYNC_ALL writeback */
 	unsigned unpinned_netfs_wb:1;	/* Cleared I_PINNING_NETFS_WB */
+	unsigned split_large_folio:1;	/* Split large folio for shmem writeback */
 
 	/*
 	 * When writeback IOs are bounced through async layers, only the
@@ -79,6 +80,9 @@  struct writeback_control {
 	 */
 	struct swap_iocb **swap_plug;
 
+	/* Target list for splitting a large folio */
+	struct list_head *list;
+
 	/* internal fields used by the ->writepages implementation: */
 	struct folio_batch fbatch;
 	pgoff_t index;
diff --git a/mm/shmem.c b/mm/shmem.c
index 996062dc196b..50aeb03c4d34 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -795,7 +795,6 @@  static int shmem_add_to_page_cache(struct folio *folio,
 	VM_BUG_ON_FOLIO(index != round_down(index, nr), folio);
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio);
-	VM_BUG_ON(expected && folio_test_large(folio));
 
 	folio_ref_add(folio, nr);
 	folio->mapping = mapping;
@@ -1482,10 +1481,11 @@  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	 * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages,
 	 * and its shmem_writeback() needs them to be split when swapping.
 	 */
-	if (folio_test_large(folio)) {
+	if (wbc->split_large_folio && folio_test_large(folio)) {
+try_split:
 		/* Ensure the subpages are still dirty */
 		folio_test_set_dirty(folio);
-		if (split_huge_page(page) < 0)
+		if (split_huge_page_to_list_to_order(page, wbc->list, 0))
 			goto redirty;
 		folio = page_folio(page);
 		folio_clear_dirty(folio);
@@ -1527,8 +1527,12 @@  static int shmem_writepage(struct page *page, struct writeback_control *wbc)
 	}
 
 	swap = folio_alloc_swap(folio);
-	if (!swap.val)
+	if (!swap.val) {
+		if (nr_pages > 1)
+			goto try_split;
+
 		goto redirty;
+	}
 
 	/*
 	 * Add inode to shmem_unuse()'s list of swapped-out inodes,
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 96ce889ea3d0..ba7b67218caf 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -628,7 +628,7 @@  typedef enum {
  * Calls ->writepage().
  */
 static pageout_t pageout(struct folio *folio, struct address_space *mapping,
-			 struct swap_iocb **plug)
+			 struct swap_iocb **plug, struct list_head *folio_list)
 {
 	/*
 	 * If the folio is dirty, only perform writeback if that write
@@ -676,6 +676,16 @@  static pageout_t pageout(struct folio *folio, struct address_space *mapping,
 			.swap_plug = plug,
 		};
 
+		/*
+		 * The large shmem folio can be split if CONFIG_THP_SWAP is
+		 * not enabled or contiguous swap entries are failed to
+		 * allocate.
+		 */
+		if (shmem_mapping(mapping) && folio_test_large(folio)) {
+			wbc.list = folio_list;
+			wbc.split_large_folio = !IS_ENABLED(CONFIG_THP_SWAP);
+		}
+
 		folio_set_reclaim(folio);
 		res = mapping->a_ops->writepage(&folio->page, &wbc);
 		if (res < 0)
@@ -1257,11 +1267,6 @@  static unsigned int shrink_folio_list(struct list_head *folio_list,
 						goto activate_locked_split;
 				}
 			}
-		} else if (folio_test_swapbacked(folio) &&
-			   folio_test_large(folio)) {
-			/* Split shmem folio */
-			if (split_folio_to_list(folio, folio_list))
-				goto keep_locked;
 		}
 
 		/*
@@ -1362,12 +1367,25 @@  static unsigned int shrink_folio_list(struct list_head *folio_list,
 			 * starts and then write it out here.
 			 */
 			try_to_unmap_flush_dirty();
-			switch (pageout(folio, mapping, &plug)) {
+			switch (pageout(folio, mapping, &plug, folio_list)) {
 			case PAGE_KEEP:
 				goto keep_locked;
 			case PAGE_ACTIVATE:
+				/*
+				 * If shmem folio is split when writeback to swap,
+				 * the tail pages will make their own pass through
+				 * this function and be accounted then.
+				 */
+				if (nr_pages > 1 && !folio_test_large(folio)) {
+					sc->nr_scanned -= (nr_pages - 1);
+					nr_pages = 1;
+				}
 				goto activate_locked;
 			case PAGE_SUCCESS:
+				if (nr_pages > 1 && !folio_test_large(folio)) {
+					sc->nr_scanned -= (nr_pages - 1);
+					nr_pages = 1;
+				}
 				stat->nr_pageout += nr_pages;
 
 				if (folio_test_writeback(folio))