diff mbox series

mm: swap: prevent possible data-race in __try_to_reclaim_swap

Message ID 20241004142504.4379-1-aha310510@gmail.com (mailing list archive)
State New
Headers show
Series mm: swap: prevent possible data-race in __try_to_reclaim_swap | expand

Commit Message

Jeongjun Park Oct. 4, 2024, 2:25 p.m. UTC
A report [1] was uploaded from syzbot.

In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip 
slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages 
from folio without folio_lock protection. 

In the currently reported KCSAN log, it is assumed that the actual data-race 
will not occur because the calltrace that does WRITE already obtains the 
folio_lock and then writes. 

However, the existing __try_to_reclaim_swap() function was already implemented 
to perform reads under folio_lock protection [1], and there is a risk of a 
data-race occurring through a function other than the one shown in the KCSAN 
log. 

Therefore, I think it is appropriate to change all read operations for 
folio to be performed under folio_lock.

[1]

==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap

write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
 __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
 delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
 folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
 free_swap_cache mm/swap_state.c:293 [inline]
 free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
 tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
 tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
 zap_pte_range mm/memory.c:1700 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 do_group_exit+0x102/0x150 kernel/exit.c:1088
 get_signal+0xf2a/0x1070 kernel/signal.c:2917
 arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
 do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
 __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
 free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
 zap_pte_range mm/memory.c:1656 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 __do_sys_exit kernel/exit.c:1055 [inline]
 __se_sys_exit kernel/exit.c:1053 [inline]
 __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
 x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x0000000000000242 -> 0x0000000000000000

Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
 mm/swapfile.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

--

Comments

Matthew Wilcox Oct. 4, 2024, 2:34 p.m. UTC | #1
On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> A report [1] was uploaded from syzbot.
> 
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip 
> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages 
> from folio without folio_lock protection. 

Umm.  You don't need folio_lock to read nr_pages.  Holding a refcount
is sufficient to stabilise nr_pages.  I cannot speak to folio->swap
though (and the KCSAN report does appear to be pointing to folio->swap).
Jeongjun Park Oct. 4, 2024, 2:50 p.m. UTC | #2
Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> > A report [1] was uploaded from syzbot.
> >
> > In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> > slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> > from folio without folio_lock protection.
>
> Umm.  You don't need folio_lock to read nr_pages.  Holding a refcount
> is sufficient to stabilise nr_pages.  I cannot speak to folio->swap
> though (and the KCSAN report does appear to be pointing to folio->swap).
>

That's right. It looks like KCSAN log occurs when reading folio->swap.
In fact, since most of the code reads folio->swap under the protection
of folio_lock, it is possible to modify only the part that reads folio->swap
and the code that reads offset to operate under the protection of
folio_lock.

However, even if reading nr_pages does not require folio_lock, I don't
think it is very desirable to modify only this code to not be protected
by folio_lock.

Regards,
Jeongjun Park
diff mbox series

Patch

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..904c21256fc2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -193,13 +193,6 @@  static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	folio = filemap_get_folio(address_space, swap_cache_index(entry));
 	if (IS_ERR(folio))
 		return 0;
-
-	/* offset could point to the middle of a large folio */
-	entry = folio->swap;
-	offset = swp_offset(entry);
-	nr_pages = folio_nr_pages(folio);
-	ret = -nr_pages;
-
 	/*
 	 * When this function is called from scan_swap_map_slots() and it's
 	 * called by vmscan.c at reclaiming folios. So we hold a folio lock
@@ -210,6 +203,12 @@  static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	if (!folio_trylock(folio))
 		goto out;
 
+	/* offset could point to the middle of a large folio */
+	entry = folio->swap;
+	offset = swp_offset(entry);
+	nr_pages = folio_nr_pages(folio);
+	ret = -nr_pages;
+
 	need_reclaim = ((flags & TTRS_ANYWAY) ||
 			((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
 			((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));