diff mbox series

[linux-next,v3] swap_state: update shadow_nodes for anonymous page

Message ID 202301131736452546903@zte.com.cn (mailing list archive)
State New
Headers show
Series [linux-next,v3] swap_state: update shadow_nodes for anonymous page | expand

Commit Message

Yang Yang Jan. 13, 2023, 9:36 a.m. UTC
From: Yang Yang <yang.yang29@zte.com.cn>

Shadow_nodes is for shadow nodes reclaiming of workingset handling,
it is updated when page cache add or delete since long time ago
workingset only supported page cache. But when workingset supports
anonymous page detection, we missied updating shadow nodes for
it. This caused that shadow nodes of anonymous page will never be
reclaimd by scan_shadow_nodes() even they use much memory and
system memory is tense.

So update shadow_nodes of anonymous page when swap cache is
add or delete by calling  xas_set_update(..workingset_update_node).

Fixes: aae466b0052e ("mm/swap: implement workingset detection for anonymous LRU")
Signed-off-by: Yang Yang <yang.yang29@zte.com.cn>
Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
---
change for v3
- Modify git log of explain of this patch do in imperative mood. Thanks to
Bagas Sanjaya.
change for v2
- Include a description of the user-visible effect. Add fixes tag. Modify comments.
Also call workingset_update_node() in clear_shadow_from_swap_cache(). Thanks
to Matthew Wilcox.
---
 include/linux/xarray.h | 3 ++-
 mm/swap_state.c        | 6 ++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

Comments

Matthew Wilcox Jan. 16, 2023, 7:51 p.m. UTC | #1
On Fri, Jan 13, 2023 at 05:36:45PM +0800, yang.yang29@zte.com.cn wrote:
> From: Yang Yang <yang.yang29@zte.com.cn>
> 
> Shadow_nodes is for shadow nodes reclaiming of workingset handling,
> it is updated when page cache add or delete since long time ago
> workingset only supported page cache. But when workingset supports
> anonymous page detection, we missied updating shadow nodes for
> it. This caused that shadow nodes of anonymous page will never be
> reclaimd by scan_shadow_nodes() even they use much memory and
> system memory is tense.
> 
> So update shadow_nodes of anonymous page when swap cache is
> add or delete by calling  xas_set_update(..workingset_update_node).

What testing did you do of this?  I have this crash in today's testing:

04304 BUG: kernel NULL pointer dereference, address: 0000000000000080
04304 #PF: supervisor read access in kernel mode
04304 #PF: error_code(0x0000) - not-present page
04304 PGD 0 P4D 0
04304 Oops: 0000 [#1] PREEMPT SMP NOPTI
04304 CPU: 4 PID: 3219629 Comm: sh Kdump: loaded Not tainted 6.2.0-rc4-next-20230116-00016-gd289d3de8ce5-dirty #69
04304 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-4 04/01/2014
04304 RIP: 0010:_raw_spin_trylock+0x12/0x50
04304 Code: e0 41 5c 5d c3 89 c6 48 89 df e8 89 06 00 00 4c 89 e0 5b 41 5c 5d c3 90 55 48 89 e5 53 48 89 fb bf 01 00 00 00 e8 be 5b 71 ff <8b> 03 85 c0 75 16 ba 01 00 00 00 f0 0f b1 13 b8 01 00 00 00 75 06
04304 RSP: 0018:ffff888059afbbb8 EFLAGS: 00010093
04304 RAX: 0000000000000003 RBX: 0000000000000080 RCX: 0000000000000000
04304 RDX: 0000000000000000 RSI: ffff8880033e24c8 RDI: 0000000000000001
04304 RBP: ffff888059afbbc0 R08: 0000000000000000 R09: ffff888059afbd68
04304 R10: ffff88807d9db868 R11: 0000000000000000 R12: ffff8880033e24c0
04304 R13: ffff88800a1d8008 R14: ffff8880033e24c8 R15: ffff8880033e24c0
04304 FS:  00007feeeabc6740(0000) GS:ffff88807d900000(0000) knlGS:0000000000000000
04304 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
04304 CR2: 0000000000000080 CR3: 0000000059830003 CR4: 0000000000770ea0
04304 PKRU: 55555554
04304 Call Trace:
04304  <TASK>
04304  shadow_lru_isolate+0x3a/0x120
04304  __list_lru_walk_one+0xa3/0x190
04304  ? memcg_list_lru_alloc+0x330/0x330
04304  ? memcg_list_lru_alloc+0x330/0x330
04304  list_lru_walk_one_irq+0x59/0x80
04304  scan_shadow_nodes+0x27/0x30
04304  do_shrink_slab+0x13b/0x2e0
04304  shrink_slab+0x92/0x250
04304  drop_slab+0x41/0x90
04304  drop_caches_sysctl_handler+0x70/0x80
04304  proc_sys_call_handler+0x162/0x210
04304  proc_sys_write+0xe/0x10
04304  vfs_write+0x1c7/0x3a0
04304  ksys_write+0x57/0xd0
04304  __x64_sys_write+0x14/0x20
04304  do_syscall_64+0x34/0x80
04304  entry_SYSCALL_64_after_hwframe+0x63/0xcd
04304 RIP: 0033:0x7feeeacc1190

Decoding it, shadow_lru_isolate+0x3a/0x120 maps back to this line:

        if (!spin_trylock(&mapping->host->i_lock)) {

i_lock is at offset 128 of struct inode, so that matches the dump.
I believe that swapper_spaces never have ->host set, so I don't
believe you've tested this patch since 51b8c1fe250d went in
back in 2021.
Yang Yang Jan. 17, 2023, 1:27 a.m. UTC | #2
> What testing did you do of this?  I have this crash in today's testing:

My test is this: 
1.Configure zram for swap.
2.Run some program malloc and access large memory, make sure they
can cause swap.
3.Watch count_shadow_nodes() and shadow_lru_isolate() to make sure
that shadow_nodes are really shrinking by adding printk().

Really sorry for inadequate test, I will try more tests include drop_caches
by sysctl.
Yang Yang Jan. 18, 2023, 12:17 p.m. UTC | #3
> i_lock is at offset 128 of struct inode, so that matches the dump.
> I believe that swapper_spaces never have ->host set, so I don't
> believe you've tested this patch since 51b8c1fe250d went in
> back in 2021.

You are totally right. I reproduce the panic in linux-next, and fix
it by patch v4. I should be more careful, since I used Linux 5.14
to test the patch which is a mistake.

Much apologies for the time wasted.

Thanks.
diff mbox series

Patch

diff --git a/include/linux/xarray.h b/include/linux/xarray.h
index 44dd6d6e01bc..5cc1f718fec9 100644
--- a/include/linux/xarray.h
+++ b/include/linux/xarray.h
@@ -1643,7 +1643,8 @@  static inline void xas_set_order(struct xa_state *xas, unsigned long index,
  * @update: Function to call when updating a node.
  *
  * The XArray can notify a caller after it has updated an xa_node.
- * This is advanced functionality and is only needed by the page cache.
+ * This is advanced functionality and is only needed by the page cache
+ * and swap cache.
  */
 static inline void xas_set_update(struct xa_state *xas, xa_update_node_t update)
 {
diff --git a/mm/swap_state.c b/mm/swap_state.c
index cb9aaa00951d..7a003d8abb37 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -94,6 +94,8 @@  int add_to_swap_cache(struct folio *folio, swp_entry_t entry,
 	unsigned long i, nr = folio_nr_pages(folio);
 	void *old;

+	xas_set_update(&xas, workingset_update_node);
+
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_swapcache(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio);
@@ -145,6 +147,8 @@  void __delete_from_swap_cache(struct folio *folio,
 	pgoff_t idx = swp_offset(entry);
 	XA_STATE(xas, &address_space->i_pages, idx);

+	xas_set_update(&xas, workingset_update_node);
+
 	VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio);
 	VM_BUG_ON_FOLIO(!folio_test_swapcache(folio), folio);
 	VM_BUG_ON_FOLIO(folio_test_writeback(folio), folio);
@@ -252,6 +256,8 @@  void clear_shadow_from_swap_cache(int type, unsigned long begin,
 		struct address_space *address_space = swap_address_space(entry);
 		XA_STATE(xas, &address_space->i_pages, curr);

+		xas_set_update(&xas, workingset_update_node);
+
 		xa_lock_irq(&address_space->i_pages);
 		xas_for_each(&xas, old, end) {
 			if (!xa_is_value(old))