diff mbox series

mm: swap: prevent possible data-race in __try_to_reclaim_swap

Message ID 20241004142504.4379-1-aha310510@gmail.com (mailing list archive)
State New
Headers show
Series mm: swap: prevent possible data-race in __try_to_reclaim_swap | expand

Commit Message

Jeongjun Park Oct. 4, 2024, 2:25 p.m. UTC
A report [1] was uploaded from syzbot.

In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip 
slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages 
from folio without folio_lock protection. 

In the currently reported KCSAN log, it is assumed that the actual data-race 
will not occur because the calltrace that does WRITE already obtains the 
folio_lock and then writes. 

However, the existing __try_to_reclaim_swap() function was already implemented 
to perform reads under folio_lock protection [1], and there is a risk of a 
data-race occurring through a function other than the one shown in the KCSAN 
log. 

Therefore, I think it is appropriate to change all read operations for 
folio to be performed under folio_lock.

[1]

==================================================================
BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap

write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
 __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
 delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
 folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
 free_swap_cache mm/swap_state.c:293 [inline]
 free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
 __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
 tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
 tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
 tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
 zap_pte_range mm/memory.c:1700 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 do_group_exit+0x102/0x150 kernel/exit.c:1088
 get_signal+0xf2a/0x1070 kernel/signal.c:2917
 arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
 exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
 exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
 __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
 syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
 do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
 __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
 free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
 zap_pte_range mm/memory.c:1656 [inline]
 zap_pmd_range mm/memory.c:1739 [inline]
 zap_pud_range mm/memory.c:1768 [inline]
 zap_p4d_range mm/memory.c:1789 [inline]
 unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
 unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
 unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
 exit_mmap+0x18a/0x690 mm/mmap.c:1864
 __mmput+0x28/0x1b0 kernel/fork.c:1347
 mmput+0x4c/0x60 kernel/fork.c:1369
 exit_mm+0xe4/0x190 kernel/exit.c:571
 do_exit+0x55e/0x17f0 kernel/exit.c:926
 __do_sys_exit kernel/exit.c:1055 [inline]
 __se_sys_exit kernel/exit.c:1053 [inline]
 __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
 x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

value changed: 0x0000000000000242 -> 0x0000000000000000

Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
Signed-off-by: Jeongjun Park <aha310510@gmail.com>
---
 mm/swapfile.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

--

Comments

Matthew Wilcox Oct. 4, 2024, 2:34 p.m. UTC | #1
On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> A report [1] was uploaded from syzbot.
> 
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip 
> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages 
> from folio without folio_lock protection. 

Umm.  You don't need folio_lock to read nr_pages.  Holding a refcount
is sufficient to stabilise nr_pages.  I cannot speak to folio->swap
though (and the KCSAN report does appear to be pointing to folio->swap).
Jeongjun Park Oct. 4, 2024, 2:50 p.m. UTC | #2
Matthew Wilcox <willy@infradead.org> wrote:
>
> On Fri, Oct 04, 2024 at 11:25:04PM +0900, Jeongjun Park wrote:
> > A report [1] was uploaded from syzbot.
> >
> > In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> > slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> > from folio without folio_lock protection.
>
> Umm.  You don't need folio_lock to read nr_pages.  Holding a refcount
> is sufficient to stabilise nr_pages.  I cannot speak to folio->swap
> though (and the KCSAN report does appear to be pointing to folio->swap).
>

That's right. It looks like KCSAN log occurs when reading folio->swap.
In fact, since most of the code reads folio->swap under the protection
of folio_lock, it is possible to modify only the part that reads folio->swap
and the code that reads offset to operate under the protection of
folio_lock.

However, even if reading nr_pages does not require folio_lock, I don't
think it is very desirable to modify only this code to not be protected
by folio_lock.

Regards,
Jeongjun Park
Kairui Song Oct. 6, 2024, 8:15 p.m. UTC | #3
On Fri, Oct 4, 2024 at 10:26 PM Jeongjun Park <aha310510@gmail.com> wrote:
>
> A report [1] was uploaded from syzbot.
>
> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
> from folio without folio_lock protection.
>
> In the currently reported KCSAN log, it is assumed that the actual data-race
> will not occur because the calltrace that does WRITE already obtains the
> folio_lock and then writes.
>
> However, the existing __try_to_reclaim_swap() function was already implemented
> to perform reads under folio_lock protection [1], and there is a risk of a
> data-race occurring through a function other than the one shown in the KCSAN
> log.
>
> Therefore, I think it is appropriate to change all read operations for
> folio to be performed under folio_lock.
>
> [1]
>
> ==================================================================
> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
>
> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
>  __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
>  delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
>  folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
>  free_swap_cache mm/swap_state.c:293 [inline]
>  free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
>  __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
>  tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
>  tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
>  tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
>  zap_pte_range mm/memory.c:1700 [inline]
>  zap_pmd_range mm/memory.c:1739 [inline]
>  zap_pud_range mm/memory.c:1768 [inline]
>  zap_p4d_range mm/memory.c:1789 [inline]
>  unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
>  unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>  unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>  exit_mmap+0x18a/0x690 mm/mmap.c:1864
>  __mmput+0x28/0x1b0 kernel/fork.c:1347
>  mmput+0x4c/0x60 kernel/fork.c:1369
>  exit_mm+0xe4/0x190 kernel/exit.c:571
>  do_exit+0x55e/0x17f0 kernel/exit.c:926
>  do_group_exit+0x102/0x150 kernel/exit.c:1088
>  get_signal+0xf2a/0x1070 kernel/signal.c:2917
>  arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
>  exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>  exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>  __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>  syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
>  do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
>  __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
>  free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
>  zap_pte_range mm/memory.c:1656 [inline]
>  zap_pmd_range mm/memory.c:1739 [inline]
>  zap_pud_range mm/memory.c:1768 [inline]
>  zap_p4d_range mm/memory.c:1789 [inline]
>  unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
>  unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>  unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>  exit_mmap+0x18a/0x690 mm/mmap.c:1864
>  __mmput+0x28/0x1b0 kernel/fork.c:1347
>  mmput+0x4c/0x60 kernel/fork.c:1369
>  exit_mm+0xe4/0x190 kernel/exit.c:571
>  do_exit+0x55e/0x17f0 kernel/exit.c:926
>  __do_sys_exit kernel/exit.c:1055 [inline]
>  __se_sys_exit kernel/exit.c:1053 [inline]
>  __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
>  x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
>
> value changed: 0x0000000000000242 -> 0x0000000000000000
>
> Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
> ---
>  mm/swapfile.c | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
>
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 0cded32414a1..904c21256fc2 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>         folio = filemap_get_folio(address_space, swap_cache_index(entry));
>         if (IS_ERR(folio))
>                 return 0;
> -
> -       /* offset could point to the middle of a large folio */
> -       entry = folio->swap;
> -       offset = swp_offset(entry);
> -       nr_pages = folio_nr_pages(folio);
> -       ret = -nr_pages;
> -
>         /*
>          * When this function is called from scan_swap_map_slots() and it's
>          * called by vmscan.c at reclaiming folios. So we hold a folio lock
> @@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>         if (!folio_trylock(folio))
>                 goto out;
>
> +       /* offset could point to the middle of a large folio */
> +       entry = folio->swap;
> +       offset = swp_offset(entry);
> +       nr_pages = folio_nr_pages(folio);
> +       ret = -nr_pages;
> +
>         need_reclaim = ((flags & TTRS_ANYWAY) ||
>                         ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
>                         ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
> --
>

Thanks for catching this!

This could lead to real problems, holding reference is not enough for
protecting folio->swap. There are several BUG_ONs later that will be
triggered if it changed.

But you still have to keep `nr_pages ` and `ret` before the
`folio_trylock `, or `ret` will be uninitialized if folio_trylock
fails, this function should always return the page number even if the
try lock failed. And as WIlly said, `folio_nr_pages` doesn't require
folio lock.
Jeongjun Park Oct. 7, 2024, 12:49 a.m. UTC | #4
Kairui Song <ryncsn@gmail.com> wrote:
> 
> On Fri, Oct 4, 2024 at 10:26 PM Jeongjun Park <aha310510@gmail.com> wrote:
>> 
>> A report [1] was uploaded from syzbot.
>> 
>> In the previous commit 862590ac3708 ("mm: swap: allow cache reclaim to skip
>> slot cache"), the __try_to_reclaim_swap() function reads offset and nr_pages
>> from folio without folio_lock protection.
>> 
>> In the currently reported KCSAN log, it is assumed that the actual data-race
>> will not occur because the calltrace that does WRITE already obtains the
>> folio_lock and then writes.
>> 
>> However, the existing __try_to_reclaim_swap() function was already implemented
>> to perform reads under folio_lock protection [1], and there is a risk of a
>> data-race occurring through a function other than the one shown in the KCSAN
>> log.
>> 
>> Therefore, I think it is appropriate to change all read operations for
>> folio to be performed under folio_lock.
>> 
>> [1]
>> 
>> ==================================================================
>> BUG: KCSAN: data-race in __delete_from_swap_cache / __try_to_reclaim_swap
>> 
>> write to 0xffffea0004c90328 of 8 bytes by task 5186 on cpu 0:
>> __delete_from_swap_cache+0x1f0/0x290 mm/swap_state.c:163
>> delete_from_swap_cache+0x72/0xe0 mm/swap_state.c:243
>> folio_free_swap+0x1d8/0x1f0 mm/swapfile.c:1850
>> free_swap_cache mm/swap_state.c:293 [inline]
>> free_pages_and_swap_cache+0x1fc/0x410 mm/swap_state.c:325
>> __tlb_batch_free_encoded_pages mm/mmu_gather.c:136 [inline]
>> tlb_batch_pages_flush mm/mmu_gather.c:149 [inline]
>> tlb_flush_mmu_free mm/mmu_gather.c:366 [inline]
>> tlb_flush_mmu+0x2cf/0x440 mm/mmu_gather.c:373
>> zap_pte_range mm/memory.c:1700 [inline]
>> zap_pmd_range mm/memory.c:1739 [inline]
>> zap_pud_range mm/memory.c:1768 [inline]
>> zap_p4d_range mm/memory.c:1789 [inline]
>> unmap_page_range+0x1f3c/0x22d0 mm/memory.c:1810
>> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>> exit_mmap+0x18a/0x690 mm/mmap.c:1864
>> __mmput+0x28/0x1b0 kernel/fork.c:1347
>> mmput+0x4c/0x60 kernel/fork.c:1369
>> exit_mm+0xe4/0x190 kernel/exit.c:571
>> do_exit+0x55e/0x17f0 kernel/exit.c:926
>> do_group_exit+0x102/0x150 kernel/exit.c:1088
>> get_signal+0xf2a/0x1070 kernel/signal.c:2917
>> arch_do_signal_or_restart+0x95/0x4b0 arch/x86/kernel/signal.c:337
>> exit_to_user_mode_loop kernel/entry/common.c:111 [inline]
>> exit_to_user_mode_prepare include/linux/entry-common.h:328 [inline]
>> __syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
>> syscall_exit_to_user_mode+0x59/0x130 kernel/entry/common.c:218
>> do_syscall_64+0xd6/0x1c0 arch/x86/entry/common.c:89
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> 
>> read to 0xffffea0004c90328 of 8 bytes by task 5189 on cpu 1:
>> __try_to_reclaim_swap+0x9d/0x510 mm/swapfile.c:198
>> free_swap_and_cache_nr+0x45d/0x8a0 mm/swapfile.c:1915
>> zap_pte_range mm/memory.c:1656 [inline]
>> zap_pmd_range mm/memory.c:1739 [inline]
>> zap_pud_range mm/memory.c:1768 [inline]
>> zap_p4d_range mm/memory.c:1789 [inline]
>> unmap_page_range+0xcf8/0x22d0 mm/memory.c:1810
>> unmap_single_vma+0x142/0x1d0 mm/memory.c:1856
>> unmap_vmas+0x18d/0x2b0 mm/memory.c:1900
>> exit_mmap+0x18a/0x690 mm/mmap.c:1864
>> __mmput+0x28/0x1b0 kernel/fork.c:1347
>> mmput+0x4c/0x60 kernel/fork.c:1369
>> exit_mm+0xe4/0x190 kernel/exit.c:571
>> do_exit+0x55e/0x17f0 kernel/exit.c:926
>> __do_sys_exit kernel/exit.c:1055 [inline]
>> __se_sys_exit kernel/exit.c:1053 [inline]
>> __x64_sys_exit+0x1f/0x20 kernel/exit.c:1053
>> x64_sys_call+0x2d46/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:61
>> do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>> do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83
>> entry_SYSCALL_64_after_hwframe+0x77/0x7f
>> 
>> value changed: 0x0000000000000242 -> 0x0000000000000000
>> 
>> Reported-by: syzbot+fa43f1b63e3aa6f66329@syzkaller.appspotmail.com
>> Fixes: 862590ac3708 ("mm: swap: allow cache reclaim to skip slot cache")
>> Signed-off-by: Jeongjun Park <aha310510@gmail.com>
>> ---
>> mm/swapfile.c | 13 ++++++-------
>> 1 file changed, 6 insertions(+), 7 deletions(-)
>> 
>> diff --git a/mm/swapfile.c b/mm/swapfile.c
>> index 0cded32414a1..904c21256fc2 100644
>> --- a/mm/swapfile.c
>> +++ b/mm/swapfile.c
>> @@ -193,13 +193,6 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>>        folio = filemap_get_folio(address_space, swap_cache_index(entry));
>>        if (IS_ERR(folio))
>>                return 0;
>> -
>> -       /* offset could point to the middle of a large folio */
>> -       entry = folio->swap;
>> -       offset = swp_offset(entry);
>> -       nr_pages = folio_nr_pages(folio);
>> -       ret = -nr_pages;
>> -
>>        /*
>>         * When this function is called from scan_swap_map_slots() and it's
>>         * called by vmscan.c at reclaiming folios. So we hold a folio lock
>> @@ -210,6 +203,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
>>        if (!folio_trylock(folio))
>>                goto out;
>> 
>> +       /* offset could point to the middle of a large folio */
>> +       entry = folio->swap;
>> +       offset = swp_offset(entry);
>> +       nr_pages = folio_nr_pages(folio);
>> +       ret = -nr_pages;
>> +
>>        need_reclaim = ((flags & TTRS_ANYWAY) ||
>>                        ((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
>>                        ((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
>> --
>> 
> 
> Thanks for catching this!
> 
> This could lead to real problems, holding reference is not enough for
> protecting folio->swap. There are several BUG_ONs later that will be
> triggered if it changed.
> 
> But you still have to keep `nr_pages ` and `ret` before the
> `folio_trylock `, or `ret` will be uninitialized if folio_trylock
> fails, this function should always return the page number even if the
> try lock failed. And as WIlly said, `folio_nr_pages` doesn't require
> folio lock.

Oh, I see. After looking at the code again, I realized that 
if we can get the folio, you should return a valid nr_pages 
even if folio_trylock fails. 

I'll send v2 patch with that shortly.

Regards,
Jeongjun Park
kernel test robot Oct. 7, 2024, 5:06 a.m. UTC | #5
Hi Jeongjun,

kernel test robot noticed the following build warnings:

[auto build test WARNING on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link:    https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/

All warnings (new ones prefixed by >>):

>> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
     203 |         if (!folio_trylock(folio))
         |             ^~~~~~~~~~~~~~~~~~~~~
   mm/swapfile.c:254:9: note: uninitialized use occurs here
     254 |         return ret;
         |                ^~~
   mm/swapfile.c:203:2: note: remove the 'if' if its condition is always false
     203 |         if (!folio_trylock(folio))
         |         ^~~~~~~~~~~~~~~~~~~~~~~~~~
     204 |                 goto out;
         |                 ~~~~~~~~
   mm/swapfile.c:190:9: note: initialize the variable 'ret' to silence this warning
     190 |         int ret, nr_pages;
         |                ^
         |                 = 0
   1 warning generated.


vim +203 mm/swapfile.c

bea67dcc5eea0f Barry Song              2024-08-08  177  
a62fb92ac12ed3 Ryan Roberts            2024-04-08  178  /*
a62fb92ac12ed3 Ryan Roberts            2024-04-08  179   * returns number of pages in the folio that backs the swap entry. If positive,
a62fb92ac12ed3 Ryan Roberts            2024-04-08  180   * the folio was reclaimed. If negative, the folio was not reclaimed. If 0, no
a62fb92ac12ed3 Ryan Roberts            2024-04-08  181   * folio was associated with the swap entry.
a62fb92ac12ed3 Ryan Roberts            2024-04-08  182   */
bcd49e86710b42 Huang Ying              2018-10-26  183  static int __try_to_reclaim_swap(struct swap_info_struct *si,
bcd49e86710b42 Huang Ying              2018-10-26  184  				 unsigned long offset, unsigned long flags)
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  185  {
efa90a981bbc89 Hugh Dickins            2009-12-14  186  	swp_entry_t entry = swp_entry(si->type, offset);
862590ac3708e1 Kairui Song             2024-07-30  187  	struct address_space *address_space = swap_address_space(entry);
862590ac3708e1 Kairui Song             2024-07-30  188  	struct swap_cluster_info *ci;
2c3f6194b008b2 Matthew Wilcox (Oracle  2022-09-02  189) 	struct folio *folio;
862590ac3708e1 Kairui Song             2024-07-30  190  	int ret, nr_pages;
862590ac3708e1 Kairui Song             2024-07-30  191  	bool need_reclaim;
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  192  
862590ac3708e1 Kairui Song             2024-07-30  193  	folio = filemap_get_folio(address_space, swap_cache_index(entry));
66dabbb65d673a Christoph Hellwig       2023-03-07  194  	if (IS_ERR(folio))
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  195  		return 0;
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  196  	/*
bcd49e86710b42 Huang Ying              2018-10-26  197  	 * When this function is called from scan_swap_map_slots() and it's
2c3f6194b008b2 Matthew Wilcox (Oracle  2022-09-02  198) 	 * called by vmscan.c at reclaiming folios. So we hold a folio lock
bcd49e86710b42 Huang Ying              2018-10-26  199  	 * here. We have to use trylock for avoiding deadlock. This is a special
2c3f6194b008b2 Matthew Wilcox (Oracle  2022-09-02  200) 	 * case and you should use folio_free_swap() with explicit folio_lock()
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  201  	 * in usual operations.
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  202  	 */
862590ac3708e1 Kairui Song             2024-07-30 @203  	if (!folio_trylock(folio))
862590ac3708e1 Kairui Song             2024-07-30  204  		goto out;
862590ac3708e1 Kairui Song             2024-07-30  205  
b2dbc30a2a909d Jeongjun Park           2024-10-04  206  	/* offset could point to the middle of a large folio */
b2dbc30a2a909d Jeongjun Park           2024-10-04  207  	entry = folio->swap;
b2dbc30a2a909d Jeongjun Park           2024-10-04  208  	offset = swp_offset(entry);
b2dbc30a2a909d Jeongjun Park           2024-10-04  209  	nr_pages = folio_nr_pages(folio);
b2dbc30a2a909d Jeongjun Park           2024-10-04  210  	ret = -nr_pages;
b2dbc30a2a909d Jeongjun Park           2024-10-04  211  
862590ac3708e1 Kairui Song             2024-07-30  212  	need_reclaim = ((flags & TTRS_ANYWAY) ||
2c3f6194b008b2 Matthew Wilcox (Oracle  2022-09-02  213) 			((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
862590ac3708e1 Kairui Song             2024-07-30  214  			((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));
862590ac3708e1 Kairui Song             2024-07-30  215  	if (!need_reclaim || !folio_swapcache_freeable(folio))
862590ac3708e1 Kairui Song             2024-07-30  216  		goto out_unlock;
862590ac3708e1 Kairui Song             2024-07-30  217  
862590ac3708e1 Kairui Song             2024-07-30  218  	/*
862590ac3708e1 Kairui Song             2024-07-30  219  	 * It's safe to delete the folio from swap cache only if the folio's
862590ac3708e1 Kairui Song             2024-07-30  220  	 * swap_map is HAS_CACHE only, which means the slots have no page table
862590ac3708e1 Kairui Song             2024-07-30  221  	 * reference or pending writeback, and can't be allocated to others.
862590ac3708e1 Kairui Song             2024-07-30  222  	 */
862590ac3708e1 Kairui Song             2024-07-30  223  	ci = lock_cluster_or_swap_info(si, offset);
862590ac3708e1 Kairui Song             2024-07-30  224  	need_reclaim = swap_is_has_cache(si, offset, nr_pages);
862590ac3708e1 Kairui Song             2024-07-30  225  	unlock_cluster_or_swap_info(si, ci);
862590ac3708e1 Kairui Song             2024-07-30  226  	if (!need_reclaim)
862590ac3708e1 Kairui Song             2024-07-30  227  		goto out_unlock;
862590ac3708e1 Kairui Song             2024-07-30  228  
862590ac3708e1 Kairui Song             2024-07-30  229  	if (!(flags & TTRS_DIRECT)) {
862590ac3708e1 Kairui Song             2024-07-30  230  		/* Free through slot cache */
862590ac3708e1 Kairui Song             2024-07-30  231  		delete_from_swap_cache(folio);
862590ac3708e1 Kairui Song             2024-07-30  232  		folio_set_dirty(folio);
862590ac3708e1 Kairui Song             2024-07-30  233  		ret = nr_pages;
862590ac3708e1 Kairui Song             2024-07-30  234  		goto out_unlock;
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  235  	}
862590ac3708e1 Kairui Song             2024-07-30  236  
862590ac3708e1 Kairui Song             2024-07-30  237  	xa_lock_irq(&address_space->i_pages);
862590ac3708e1 Kairui Song             2024-07-30  238  	__delete_from_swap_cache(folio, entry, NULL);
862590ac3708e1 Kairui Song             2024-07-30  239  	xa_unlock_irq(&address_space->i_pages);
862590ac3708e1 Kairui Song             2024-07-30  240  	folio_ref_sub(folio, nr_pages);
862590ac3708e1 Kairui Song             2024-07-30  241  	folio_set_dirty(folio);
862590ac3708e1 Kairui Song             2024-07-30  242  
862590ac3708e1 Kairui Song             2024-07-30  243  	spin_lock(&si->lock);
862590ac3708e1 Kairui Song             2024-07-30  244  	/* Only sinple page folio can be backed by zswap */
862590ac3708e1 Kairui Song             2024-07-30  245  	if (nr_pages == 1)
862590ac3708e1 Kairui Song             2024-07-30  246  		zswap_invalidate(entry);
862590ac3708e1 Kairui Song             2024-07-30  247  	swap_entry_range_free(si, entry, nr_pages);
862590ac3708e1 Kairui Song             2024-07-30  248  	spin_unlock(&si->lock);
862590ac3708e1 Kairui Song             2024-07-30  249  	ret = nr_pages;
862590ac3708e1 Kairui Song             2024-07-30  250  out_unlock:
862590ac3708e1 Kairui Song             2024-07-30  251  	folio_unlock(folio);
862590ac3708e1 Kairui Song             2024-07-30  252  out:
2c3f6194b008b2 Matthew Wilcox (Oracle  2022-09-02  253) 	folio_put(folio);
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  254  	return ret;
c9e444103b5e7a KAMEZAWA Hiroyuki       2009-06-16  255  }
355cfa73ddff2f KAMEZAWA Hiroyuki       2009-06-16  256
Andrew Morton Oct. 8, 2024, 1:30 a.m. UTC | #6
On Mon, 7 Oct 2024 13:06:49 +0800 kernel test robot <lkp@intel.com> wrote:

> Hi Jeongjun,
> 
> kernel test robot noticed the following build warnings:
> 
> [auto build test WARNING on akpm-mm/mm-everything]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> patch link:    https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
> patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
> config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
> compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/
> 
> All warnings (new ones prefixed by >>):
> 
> >> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
>      203 |         if (!folio_trylock(folio))
>          |             ^~~~~~~~~~~~~~~~~~~~~
>    mm/swapfile.c:254:9: note: uninitialized use occurs here
>      254 |         return ret;

This warning can't be correct?
Jeongjun Park Oct. 8, 2024, 2:35 a.m. UTC | #7
Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 7 Oct 2024 13:06:49 +0800 kernel test robot <lkp@intel.com> wrote:
>
> > Hi Jeongjun,
> >
> > kernel test robot noticed the following build warnings:
> >
> > [auto build test WARNING on akpm-mm/mm-everything]
> >
> > url:    https://github.com/intel-lab-lkp/linux/commits/Jeongjun-Park/mm-swap-prevent-possible-data-race-in-__try_to_reclaim_swap/20241004-222733
> > base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
> > patch link:    https://lore.kernel.org/r/20241004142504.4379-1-aha310510%40gmail.com
> > patch subject: [PATCH] mm: swap: prevent possible data-race in __try_to_reclaim_swap
> > config: x86_64-kexec (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/config)
> > compiler: clang version 18.1.8 (https://github.com/llvm/llvm-project 3b5b5c1ec4a3095ab096dd780e84d7ab81f3d7ff)
> > reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20241007/202410071223.t0yF8vP8-lkp@intel.com/reproduce)
> >
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > | Reported-by: kernel test robot <lkp@intel.com>
> > | Closes: https://lore.kernel.org/oe-kbuild-all/202410071223.t0yF8vP8-lkp@intel.com/
> >
> > All warnings (new ones prefixed by >>):
> >
> > >> mm/swapfile.c:203:6: warning: variable 'ret' is used uninitialized whenever 'if' condition is true [-Wsometimes-uninitialized]
> >      203 |         if (!folio_trylock(folio))
> >          |             ^~~~~~~~~~~~~~~~~~~~~
> >    mm/swapfile.c:254:9: note: uninitialized use occurs here
> >      254 |         return ret;
>
> This warning can't be correct?

I think it's correct. Even if folio_trylock fails, the return value
should be -nr_pages. Not initializing ret like in the v1 patch
goes against the design purpose of the function.

So I think it's right to apply the v2 patch that I sent you.

Regards,
Jeongjun Park
diff mbox series

Patch

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..904c21256fc2 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -193,13 +193,6 @@  static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	folio = filemap_get_folio(address_space, swap_cache_index(entry));
 	if (IS_ERR(folio))
 		return 0;
-
-	/* offset could point to the middle of a large folio */
-	entry = folio->swap;
-	offset = swp_offset(entry);
-	nr_pages = folio_nr_pages(folio);
-	ret = -nr_pages;
-
 	/*
 	 * When this function is called from scan_swap_map_slots() and it's
 	 * called by vmscan.c at reclaiming folios. So we hold a folio lock
@@ -210,6 +203,12 @@  static int __try_to_reclaim_swap(struct swap_info_struct *si,
 	if (!folio_trylock(folio))
 		goto out;
 
+	/* offset could point to the middle of a large folio */
+	entry = folio->swap;
+	offset = swp_offset(entry);
+	nr_pages = folio_nr_pages(folio);
+	ret = -nr_pages;
+
 	need_reclaim = ((flags & TTRS_ANYWAY) ||
 			((flags & TTRS_UNMAPPED) && !folio_mapped(folio)) ||
 			((flags & TTRS_FULL) && mem_cgroup_swap_full(folio)));