diff mbox series

mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page

Message ID 20250222024617.2790609-1-mawupeng1@huawei.com (mailing list archive)
State New
Headers show
Series mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page | expand

Commit Message

mawupeng Feb. 22, 2025, 2:46 a.m. UTC
From: Ma Wupeng <mawupeng1@huawei.com>

During our test, infinite loop is produced during #PF will lead to infinite
error log as follow:

   get_swap_device: Bad swap file entry 114000000

Digging into the source, we found that the swap entry is invalid due to
unknown reason, and this lead to invalid swap_info_struct. Excessive log
printing can fill up the prioritized log space, leading to the purging of
originally valid logs and hindering problem troubleshooting. To make this
more robust, kill this task.

Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
---
 include/linux/swap.h | 1 +
 mm/memory.c          | 9 ++++++++-
 mm/swapfile.c        | 2 +-
 3 files changed, 10 insertions(+), 2 deletions(-)

Comments

Matthew Wilcox Feb. 22, 2025, 3:45 a.m. UTC | #1
On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
> Digging into the source, we found that the swap entry is invalid due to
> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> printing can fill up the prioritized log space, leading to the purging of
> originally valid logs and hindering problem troubleshooting. To make this
> more robust, kill this task.

this seems like a very bad way to fix this problem
mawupeng Feb. 22, 2025, 3:59 a.m. UTC | #2
On 2025/2/22 11:45, Matthew Wilcox wrote:
> On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote:
>> Digging into the source, we found that the swap entry is invalid due to
>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
>> printing can fill up the prioritized log space, leading to the purging of
>> originally valid logs and hindering problem troubleshooting. To make this
>> more robust, kill this task.
> 
> this seems like a very bad way to fix this problem

Sure, It's a bad way to fix this. Just a proper way to make it more robust?
Since it will produce lots of invalid and same log?

>
Kairui Song Feb. 22, 2025, 7:33 a.m. UTC | #3
On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
>
> From: Ma Wupeng <mawupeng1@huawei.com>
>
> During our test, infinite loop is produced during #PF will lead to infinite
> error log as follow:
>
>    get_swap_device: Bad swap file entry 114000000
>
> Digging into the source, we found that the swap entry is invalid due to
> unknown reason, and this lead to invalid swap_info_struct. Excessive log

Hi Wupeng,

What is the kernel version you are using? If it's another bug causing
this invalid swap entry, we should fix that bug instead, not
workaround it.

This looks kind of similar to another PATCH & Bug report, corrupted
page table or swap entry:
https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/

Might be the same kernel bug? Gaoxu mentioned the bug was observed on
Kernel 6.6.30 (android version), and neither of these two workarounds
will fix it completely, the invalid value could cause many other
issues too. We definitely need to find out the root cause.

> printing can fill up the prioritized log space, leading to the purging of
> originally valid logs and hindering problem troubleshooting. To make this
> more robust, kill this task.
>
> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> ---
>  include/linux/swap.h | 1 +
>  mm/memory.c          | 9 ++++++++-
>  mm/swapfile.c        | 2 +-
>  3 files changed, 10 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index b13b72645db3..0fa39cf66bc4 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -508,6 +508,7 @@ struct backing_dev_info;
>  extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
>  extern void exit_swap_address_space(unsigned int type);
>  extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
>  sector_t swap_folio_sector(struct folio *folio);
>
>  static inline void put_swap_device(struct swap_info_struct *si)
> diff --git a/mm/memory.c b/mm/memory.c
> index b4d3d4893267..2d36e5a644d1 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>
>         /* Prevent swapoff from happening to us. */
>         si = get_swap_device(entry);
> -       if (unlikely(!si))
> +       if (unlikely(!si)) {
> +               if (unlikely(!_swap_info_get(entry)))
> +                       /*
> +                        * return VM_FAULT_SIGBUS for invalid swap entry to
> +                        * avoid infinite #PF.
> +                        */
> +                       ret = VM_FAULT_SIGBUS;

This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
get_swap_device will return NULL.
mawupeng Feb. 22, 2025, 7:41 a.m. UTC | #4
On 2025/2/22 15:33, Kairui Song wrote:
> On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
>>
>> From: Ma Wupeng <mawupeng1@huawei.com>
>>
>> During our test, infinite loop is produced during #PF will lead to infinite
>> error log as follow:
>>
>>    get_swap_device: Bad swap file entry 114000000
>>
>> Digging into the source, we found that the swap entry is invalid due to
>> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> 
> Hi Wupeng,
> 
> What is the kernel version you are using? If it's another bug causing
> this invalid swap entry, we should fix that bug instead, not
> workaround it.
> 
> This looks kind of similar to another PATCH & Bug report, corrupted
> page table or swap entry:
> https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> 
> Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> Kernel 6.6.30 (android version), and neither of these two workarounds
> will fix it completely, the invalid value could cause many other
> issues too. We definitely need to find out the root cause.

We are having this problem in linux-v5.10, since the log is lost and swap
is not enabled in this machines, maybe memory corrupted in the pt.

> 
>> printing can fill up the prioritized log space, leading to the purging of
>> originally valid logs and hindering problem troubleshooting. To make this
>> more robust, kill this task.
>>
>> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
>> ---
>>  include/linux/swap.h | 1 +
>>  mm/memory.c          | 9 ++++++++-
>>  mm/swapfile.c        | 2 +-
>>  3 files changed, 10 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/swap.h b/include/linux/swap.h
>> index b13b72645db3..0fa39cf66bc4 100644
>> --- a/include/linux/swap.h
>> +++ b/include/linux/swap.h
>> @@ -508,6 +508,7 @@ struct backing_dev_info;
>>  extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
>>  extern void exit_swap_address_space(unsigned int type);
>>  extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
>> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
>>  sector_t swap_folio_sector(struct folio *folio);
>>
>>  static inline void put_swap_device(struct swap_info_struct *si)
>> diff --git a/mm/memory.c b/mm/memory.c
>> index b4d3d4893267..2d36e5a644d1 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
>>
>>         /* Prevent swapoff from happening to us. */
>>         si = get_swap_device(entry);
>> -       if (unlikely(!si))
>> +       if (unlikely(!si)) {
>> +               if (unlikely(!_swap_info_get(entry)))
>> +                       /*
>> +                        * return VM_FAULT_SIGBUS for invalid swap entry to
>> +                        * avoid infinite #PF.
>> +                        */
>> +                       ret = VM_FAULT_SIGBUS;
> 
> This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> get_swap_device will return NULL.

If swap is off, All swap pages should be swap in as expected, so
such entry can not trigger do_swap_page?
Kairui Song Feb. 22, 2025, 8:02 a.m. UTC | #5
On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote:
> On 2025/2/22 15:33, Kairui Song wrote:
> > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
> >>
> >> From: Ma Wupeng <mawupeng1@huawei.com>
> >>
> >> During our test, infinite loop is produced during #PF will lead to infinite
> >> error log as follow:
> >>
> >>    get_swap_device: Bad swap file entry 114000000
> >>
> >> Digging into the source, we found that the swap entry is invalid due to
> >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> >
> > Hi Wupeng,
> >
> > What is the kernel version you are using? If it's another bug causing
> > this invalid swap entry, we should fix that bug instead, not
> > workaround it.
> >
> > This looks kind of similar to another PATCH & Bug report, corrupted
> > page table or swap entry:
> > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> >
> > Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> > Kernel 6.6.30 (android version), and neither of these two workarounds
> > will fix it completely, the invalid value could cause many other
> > issues too. We definitely need to find out the root cause.
>
> We are having this problem in linux-v5.10, since the log is lost and swap
> is not enabled in this machines, maybe memory corrupted in the pt.

Thanks for the info, that's very strange. Since you didn't even enable
SWAP, it must be something else corrupted the page table I think

> >
> >> printing can fill up the prioritized log space, leading to the purging of
> >> originally valid logs and hindering problem troubleshooting. To make this
> >> more robust, kill this task.
> >>
> >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> >> ---
> >>  include/linux/swap.h | 1 +
> >>  mm/memory.c          | 9 ++++++++-
> >>  mm/swapfile.c        | 2 +-
> >>  3 files changed, 10 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> >> index b13b72645db3..0fa39cf66bc4 100644
> >> --- a/include/linux/swap.h
> >> +++ b/include/linux/swap.h
> >> @@ -508,6 +508,7 @@ struct backing_dev_info;
> >>  extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
> >>  extern void exit_swap_address_space(unsigned int type);
> >>  extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
> >>  sector_t swap_folio_sector(struct folio *folio);
> >>
> >>  static inline void put_swap_device(struct swap_info_struct *si)
> >> diff --git a/mm/memory.c b/mm/memory.c
> >> index b4d3d4893267..2d36e5a644d1 100644
> >> --- a/mm/memory.c
> >> +++ b/mm/memory.c
> >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >>
> >>         /* Prevent swapoff from happening to us. */
> >>         si = get_swap_device(entry);
> >> -       if (unlikely(!si))
> >> +       if (unlikely(!si)) {
> >> +               if (unlikely(!_swap_info_get(entry)))
> >> +                       /*
> >> +                        * return VM_FAULT_SIGBUS for invalid swap entry to
> >> +                        * avoid infinite #PF.
> >> +                        */
> >> +                       ret = VM_FAULT_SIGBUS;
> >
> > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> > get_swap_device will return NULL.
>
> If swap is off, All swap pages should be swap in as expected, so
> such entry can not trigger do_swap_page?

do_swap_page may get blocked due to some random reason, and then a
concurrent swapoff could swap in the entry and disable the device.
Very unlikely to trigger but in theory possible.
Barry Song Feb. 22, 2025, 9:58 a.m. UTC | #6
On Sat, Feb 22, 2025 at 9:03 PM Kairui Song <ryncsn@gmail.com> wrote:
>
> On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote:
> > On 2025/2/22 15:33, Kairui Song wrote:
> > > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote:
> > >>
> > >> From: Ma Wupeng <mawupeng1@huawei.com>
> > >>
> > >> During our test, infinite loop is produced during #PF will lead to infinite
> > >> error log as follow:
> > >>
> > >>    get_swap_device: Bad swap file entry 114000000
> > >>
> > >> Digging into the source, we found that the swap entry is invalid due to
> > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log
> > >
> > > Hi Wupeng,
> > >
> > > What is the kernel version you are using? If it's another bug causing
> > > this invalid swap entry, we should fix that bug instead, not
> > > workaround it.
> > >
> > > This looks kind of similar to another PATCH & Bug report, corrupted
> > > page table or swap entry:
> > > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/
> > >
> > > Might be the same kernel bug? Gaoxu mentioned the bug was observed on
> > > Kernel 6.6.30 (android version), and neither of these two workarounds
> > > will fix it completely, the invalid value could cause many other
> > > issues too. We definitely need to find out the root cause.
> >
> > We are having this problem in linux-v5.10, since the log is lost and swap
> > is not enabled in this machines, maybe memory corrupted in the pt.
>
> Thanks for the info, that's very strange. Since you didn't even enable
> SWAP, it must be something else corrupted the page table I think
>
> > >
> > >> printing can fill up the prioritized log space, leading to the purging of
> > >> originally valid logs and hindering problem troubleshooting. To make this
> > >> more robust, kill this task.
> > >>
> > >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com>
> > >> ---
> > >>  include/linux/swap.h | 1 +
> > >>  mm/memory.c          | 9 ++++++++-
> > >>  mm/swapfile.c        | 2 +-
> > >>  3 files changed, 10 insertions(+), 2 deletions(-)
> > >>
> > >> diff --git a/include/linux/swap.h b/include/linux/swap.h
> > >> index b13b72645db3..0fa39cf66bc4 100644
> > >> --- a/include/linux/swap.h
> > >> +++ b/include/linux/swap.h
> > >> @@ -508,6 +508,7 @@ struct backing_dev_info;
> > >>  extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
> > >>  extern void exit_swap_address_space(unsigned int type);
> > >>  extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
> > >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry);
> > >>  sector_t swap_folio_sector(struct folio *folio);
> > >>
> > >>  static inline void put_swap_device(struct swap_info_struct *si)
> > >> diff --git a/mm/memory.c b/mm/memory.c
> > >> index b4d3d4893267..2d36e5a644d1 100644
> > >> --- a/mm/memory.c
> > >> +++ b/mm/memory.c
> > >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> > >>
> > >>         /* Prevent swapoff from happening to us. */
> > >>         si = get_swap_device(entry);
> > >> -       if (unlikely(!si))
> > >> +       if (unlikely(!si)) {
> > >> +               if (unlikely(!_swap_info_get(entry)))
> > >> +                       /*
> > >> +                        * return VM_FAULT_SIGBUS for invalid swap entry to
> > >> +                        * avoid infinite #PF.
> > >> +                        */
> > >> +                       ret = VM_FAULT_SIGBUS;
> > >
> > > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff
> > > get_swap_device will return NULL.
> >
> > If swap is off, All swap pages should be swap in as expected, so
> > such entry can not trigger do_swap_page?
>
> do_swap_page may get blocked due to some random reason, and then a
> concurrent swapoff could swap in the entry and disable the device.
> Very unlikely to trigger but in theory possible.

The "goto out" in do_swap_page() should have handled this case. If swapoff
occurred before the actual swap-in began, we should have aborted the
swap-in, and userspace would retry.

        /* Prevent swapoff from happening to us. */
        si = get_swap_device(entry);
        if (unlikely(!si))
                goto out;

Thanks
Barry
diff mbox series

Patch

diff --git a/include/linux/swap.h b/include/linux/swap.h
index b13b72645db3..0fa39cf66bc4 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -508,6 +508,7 @@  struct backing_dev_info;
 extern int init_swap_address_space(unsigned int type, unsigned long nr_pages);
 extern void exit_swap_address_space(unsigned int type);
 extern struct swap_info_struct *get_swap_device(swp_entry_t entry);
+struct swap_info_struct *_swap_info_get(swp_entry_t entry);
 sector_t swap_folio_sector(struct folio *folio);
 
 static inline void put_swap_device(struct swap_info_struct *si)
diff --git a/mm/memory.c b/mm/memory.c
index b4d3d4893267..2d36e5a644d1 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -4365,8 +4365,15 @@  vm_fault_t do_swap_page(struct vm_fault *vmf)
 
 	/* Prevent swapoff from happening to us. */
 	si = get_swap_device(entry);
-	if (unlikely(!si))
+	if (unlikely(!si)) {
+		if (unlikely(!_swap_info_get(entry)))
+			/*
+			 * return VM_FAULT_SIGBUS for invalid swap entry to
+			 * avoid infinite #PF.
+			 */
+			ret = VM_FAULT_SIGBUS;
 		goto out;
+	}
 
 	folio = swap_cache_get_folio(entry, vma, vmf->address);
 	if (folio)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index ba19430dd4ea..8f580eff0ecb 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1287,7 +1287,7 @@  int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order)
 	return n_ret;
 }
 
-static struct swap_info_struct *_swap_info_get(swp_entry_t entry)
+struct swap_info_struct *_swap_info_get(swp_entry_t entry)
 {
 	struct swap_info_struct *si;
 	unsigned long offset;