Message ID | 20250222024617.2790609-1-mawupeng1@huawei.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: swap: Avoid infinite loop if no valid swap entry found during do_swap_page | expand |
On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote: > Digging into the source, we found that the swap entry is invalid due to > unknown reason, and this lead to invalid swap_info_struct. Excessive log > printing can fill up the prioritized log space, leading to the purging of > originally valid logs and hindering problem troubleshooting. To make this > more robust, kill this task. this seems like a very bad way to fix this problem
On 2025/2/22 11:45, Matthew Wilcox wrote: > On Sat, Feb 22, 2025 at 10:46:17AM +0800, Wupeng Ma wrote: >> Digging into the source, we found that the swap entry is invalid due to >> unknown reason, and this lead to invalid swap_info_struct. Excessive log >> printing can fill up the prioritized log space, leading to the purging of >> originally valid logs and hindering problem troubleshooting. To make this >> more robust, kill this task. > > this seems like a very bad way to fix this problem Sure, It's a bad way to fix this. Just a proper way to make it more robust? Since it will produce lots of invalid and same log? >
On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote: > > From: Ma Wupeng <mawupeng1@huawei.com> > > During our test, infinite loop is produced during #PF will lead to infinite > error log as follow: > > get_swap_device: Bad swap file entry 114000000 > > Digging into the source, we found that the swap entry is invalid due to > unknown reason, and this lead to invalid swap_info_struct. Excessive log Hi Wupeng, What is the kernel version you are using? If it's another bug causing this invalid swap entry, we should fix that bug instead, not workaround it. This looks kind of similar to another PATCH & Bug report, corrupted page table or swap entry: https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/ Might be the same kernel bug? Gaoxu mentioned the bug was observed on Kernel 6.6.30 (android version), and neither of these two workarounds will fix it completely, the invalid value could cause many other issues too. We definitely need to find out the root cause. > printing can fill up the prioritized log space, leading to the purging of > originally valid logs and hindering problem troubleshooting. To make this > more robust, kill this task. > > Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > --- > include/linux/swap.h | 1 + > mm/memory.c | 9 ++++++++- > mm/swapfile.c | 2 +- > 3 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index b13b72645db3..0fa39cf66bc4 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -508,6 +508,7 @@ struct backing_dev_info; > extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); > extern void exit_swap_address_space(unsigned int type); > extern struct swap_info_struct *get_swap_device(swp_entry_t entry); > +struct swap_info_struct *_swap_info_get(swp_entry_t entry); > sector_t swap_folio_sector(struct folio *folio); > > static inline void put_swap_device(struct swap_info_struct *si) > diff --git a/mm/memory.c b/mm/memory.c > index b4d3d4893267..2d36e5a644d1 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > /* Prevent swapoff from happening to us. */ > si = get_swap_device(entry); > - if (unlikely(!si)) > + if (unlikely(!si)) { > + if (unlikely(!_swap_info_get(entry))) > + /* > + * return VM_FAULT_SIGBUS for invalid swap entry to > + * avoid infinite #PF. > + */ > + ret = VM_FAULT_SIGBUS; This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff get_swap_device will return NULL.
On 2025/2/22 15:33, Kairui Song wrote: > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote: >> >> From: Ma Wupeng <mawupeng1@huawei.com> >> >> During our test, infinite loop is produced during #PF will lead to infinite >> error log as follow: >> >> get_swap_device: Bad swap file entry 114000000 >> >> Digging into the source, we found that the swap entry is invalid due to >> unknown reason, and this lead to invalid swap_info_struct. Excessive log > > Hi Wupeng, > > What is the kernel version you are using? If it's another bug causing > this invalid swap entry, we should fix that bug instead, not > workaround it. > > This looks kind of similar to another PATCH & Bug report, corrupted > page table or swap entry: > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/ > > Might be the same kernel bug? Gaoxu mentioned the bug was observed on > Kernel 6.6.30 (android version), and neither of these two workarounds > will fix it completely, the invalid value could cause many other > issues too. We definitely need to find out the root cause. We are having this problem in linux-v5.10, since the log is lost and swap is not enabled in this machines, maybe memory corrupted in the pt. > >> printing can fill up the prioritized log space, leading to the purging of >> originally valid logs and hindering problem troubleshooting. To make this >> more robust, kill this task. >> >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> >> --- >> include/linux/swap.h | 1 + >> mm/memory.c | 9 ++++++++- >> mm/swapfile.c | 2 +- >> 3 files changed, 10 insertions(+), 2 deletions(-) >> >> diff --git a/include/linux/swap.h b/include/linux/swap.h >> index b13b72645db3..0fa39cf66bc4 100644 >> --- a/include/linux/swap.h >> +++ b/include/linux/swap.h >> @@ -508,6 +508,7 @@ struct backing_dev_info; >> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); >> extern void exit_swap_address_space(unsigned int type); >> extern struct swap_info_struct *get_swap_device(swp_entry_t entry); >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry); >> sector_t swap_folio_sector(struct folio *folio); >> >> static inline void put_swap_device(struct swap_info_struct *si) >> diff --git a/mm/memory.c b/mm/memory.c >> index b4d3d4893267..2d36e5a644d1 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) >> >> /* Prevent swapoff from happening to us. */ >> si = get_swap_device(entry); >> - if (unlikely(!si)) >> + if (unlikely(!si)) { >> + if (unlikely(!_swap_info_get(entry))) >> + /* >> + * return VM_FAULT_SIGBUS for invalid swap entry to >> + * avoid infinite #PF. >> + */ >> + ret = VM_FAULT_SIGBUS; > > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff > get_swap_device will return NULL. If swap is off, All swap pages should be swap in as expected, so such entry can not trigger do_swap_page?
On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote: > On 2025/2/22 15:33, Kairui Song wrote: > > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote: > >> > >> From: Ma Wupeng <mawupeng1@huawei.com> > >> > >> During our test, infinite loop is produced during #PF will lead to infinite > >> error log as follow: > >> > >> get_swap_device: Bad swap file entry 114000000 > >> > >> Digging into the source, we found that the swap entry is invalid due to > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log > > > > Hi Wupeng, > > > > What is the kernel version you are using? If it's another bug causing > > this invalid swap entry, we should fix that bug instead, not > > workaround it. > > > > This looks kind of similar to another PATCH & Bug report, corrupted > > page table or swap entry: > > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/ > > > > Might be the same kernel bug? Gaoxu mentioned the bug was observed on > > Kernel 6.6.30 (android version), and neither of these two workarounds > > will fix it completely, the invalid value could cause many other > > issues too. We definitely need to find out the root cause. > > We are having this problem in linux-v5.10, since the log is lost and swap > is not enabled in this machines, maybe memory corrupted in the pt. Thanks for the info, that's very strange. Since you didn't even enable SWAP, it must be something else corrupted the page table I think > > > >> printing can fill up the prioritized log space, leading to the purging of > >> originally valid logs and hindering problem troubleshooting. To make this > >> more robust, kill this task. > >> > >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > >> --- > >> include/linux/swap.h | 1 + > >> mm/memory.c | 9 ++++++++- > >> mm/swapfile.c | 2 +- > >> 3 files changed, 10 insertions(+), 2 deletions(-) > >> > >> diff --git a/include/linux/swap.h b/include/linux/swap.h > >> index b13b72645db3..0fa39cf66bc4 100644 > >> --- a/include/linux/swap.h > >> +++ b/include/linux/swap.h > >> @@ -508,6 +508,7 @@ struct backing_dev_info; > >> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); > >> extern void exit_swap_address_space(unsigned int type); > >> extern struct swap_info_struct *get_swap_device(swp_entry_t entry); > >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry); > >> sector_t swap_folio_sector(struct folio *folio); > >> > >> static inline void put_swap_device(struct swap_info_struct *si) > >> diff --git a/mm/memory.c b/mm/memory.c > >> index b4d3d4893267..2d36e5a644d1 100644 > >> --- a/mm/memory.c > >> +++ b/mm/memory.c > >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >> > >> /* Prevent swapoff from happening to us. */ > >> si = get_swap_device(entry); > >> - if (unlikely(!si)) > >> + if (unlikely(!si)) { > >> + if (unlikely(!_swap_info_get(entry))) > >> + /* > >> + * return VM_FAULT_SIGBUS for invalid swap entry to > >> + * avoid infinite #PF. > >> + */ > >> + ret = VM_FAULT_SIGBUS; > > > > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff > > get_swap_device will return NULL. > > If swap is off, All swap pages should be swap in as expected, so > such entry can not trigger do_swap_page? do_swap_page may get blocked due to some random reason, and then a concurrent swapoff could swap in the entry and disable the device. Very unlikely to trigger but in theory possible.
On Sat, Feb 22, 2025 at 9:03 PM Kairui Song <ryncsn@gmail.com> wrote: > > On Sat, Feb 22, 2025 at 3:41 PM mawupeng <mawupeng1@huawei.com> wrote: > > On 2025/2/22 15:33, Kairui Song wrote: > > > On Sat, Feb 22, 2025 at 10:56 AM Wupeng Ma <mawupeng1@huawei.com> wrote: > > >> > > >> From: Ma Wupeng <mawupeng1@huawei.com> > > >> > > >> During our test, infinite loop is produced during #PF will lead to infinite > > >> error log as follow: > > >> > > >> get_swap_device: Bad swap file entry 114000000 > > >> > > >> Digging into the source, we found that the swap entry is invalid due to > > >> unknown reason, and this lead to invalid swap_info_struct. Excessive log > > > > > > Hi Wupeng, > > > > > > What is the kernel version you are using? If it's another bug causing > > > this invalid swap entry, we should fix that bug instead, not > > > workaround it. > > > > > > This looks kind of similar to another PATCH & Bug report, corrupted > > > page table or swap entry: > > > https://lore.kernel.org/linux-mm/e223b0e6ba2f4924984b1917cc717bd5@honor.com/ > > > > > > Might be the same kernel bug? Gaoxu mentioned the bug was observed on > > > Kernel 6.6.30 (android version), and neither of these two workarounds > > > will fix it completely, the invalid value could cause many other > > > issues too. We definitely need to find out the root cause. > > > > We are having this problem in linux-v5.10, since the log is lost and swap > > is not enabled in this machines, maybe memory corrupted in the pt. > > Thanks for the info, that's very strange. Since you didn't even enable > SWAP, it must be something else corrupted the page table I think > > > > > > >> printing can fill up the prioritized log space, leading to the purging of > > >> originally valid logs and hindering problem troubleshooting. To make this > > >> more robust, kill this task. > > >> > > >> Signed-off-by: Ma Wupeng <mawupeng1@huawei.com> > > >> --- > > >> include/linux/swap.h | 1 + > > >> mm/memory.c | 9 ++++++++- > > >> mm/swapfile.c | 2 +- > > >> 3 files changed, 10 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/include/linux/swap.h b/include/linux/swap.h > > >> index b13b72645db3..0fa39cf66bc4 100644 > > >> --- a/include/linux/swap.h > > >> +++ b/include/linux/swap.h > > >> @@ -508,6 +508,7 @@ struct backing_dev_info; > > >> extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); > > >> extern void exit_swap_address_space(unsigned int type); > > >> extern struct swap_info_struct *get_swap_device(swp_entry_t entry); > > >> +struct swap_info_struct *_swap_info_get(swp_entry_t entry); > > >> sector_t swap_folio_sector(struct folio *folio); > > >> > > >> static inline void put_swap_device(struct swap_info_struct *si) > > >> diff --git a/mm/memory.c b/mm/memory.c > > >> index b4d3d4893267..2d36e5a644d1 100644 > > >> --- a/mm/memory.c > > >> +++ b/mm/memory.c > > >> @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > > >> > > >> /* Prevent swapoff from happening to us. */ > > >> si = get_swap_device(entry); > > >> - if (unlikely(!si)) > > >> + if (unlikely(!si)) { > > >> + if (unlikely(!_swap_info_get(entry))) > > >> + /* > > >> + * return VM_FAULT_SIGBUS for invalid swap entry to > > >> + * avoid infinite #PF. > > >> + */ > > >> + ret = VM_FAULT_SIGBUS; > > > > > > This could lead to VM_FAULT_SIGBUS on swapoff. After swapoff > > > get_swap_device will return NULL. > > > > If swap is off, All swap pages should be swap in as expected, so > > such entry can not trigger do_swap_page? > > do_swap_page may get blocked due to some random reason, and then a > concurrent swapoff could swap in the entry and disable the device. > Very unlikely to trigger but in theory possible. The "goto out" in do_swap_page() should have handled this case. If swapoff occurred before the actual swap-in began, we should have aborted the swap-in, and userspace would retry. /* Prevent swapoff from happening to us. */ si = get_swap_device(entry); if (unlikely(!si)) goto out; Thanks Barry
diff --git a/include/linux/swap.h b/include/linux/swap.h index b13b72645db3..0fa39cf66bc4 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -508,6 +508,7 @@ struct backing_dev_info; extern int init_swap_address_space(unsigned int type, unsigned long nr_pages); extern void exit_swap_address_space(unsigned int type); extern struct swap_info_struct *get_swap_device(swp_entry_t entry); +struct swap_info_struct *_swap_info_get(swp_entry_t entry); sector_t swap_folio_sector(struct folio *folio); static inline void put_swap_device(struct swap_info_struct *si) diff --git a/mm/memory.c b/mm/memory.c index b4d3d4893267..2d36e5a644d1 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4365,8 +4365,15 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) /* Prevent swapoff from happening to us. */ si = get_swap_device(entry); - if (unlikely(!si)) + if (unlikely(!si)) { + if (unlikely(!_swap_info_get(entry))) + /* + * return VM_FAULT_SIGBUS for invalid swap entry to + * avoid infinite #PF. + */ + ret = VM_FAULT_SIGBUS; goto out; + } folio = swap_cache_get_folio(entry, vma, vmf->address); if (folio) diff --git a/mm/swapfile.c b/mm/swapfile.c index ba19430dd4ea..8f580eff0ecb 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1287,7 +1287,7 @@ int get_swap_pages(int n_goal, swp_entry_t swp_entries[], int entry_order) return n_ret; } -static struct swap_info_struct *_swap_info_get(swp_entry_t entry) +struct swap_info_struct *_swap_info_get(swp_entry_t entry) { struct swap_info_struct *si; unsigned long offset;