diff mbox series

mm: Fix possible NULL pointer dereference in __swap_duplicate

Message ID 44655569e3a1419f800952004f07e714@honor.com (mailing list archive)
State New
Headers show
Series mm: Fix possible NULL pointer dereference in __swap_duplicate | expand

Commit Message

gaoxu Feb. 12, 2025, 3:13 a.m. UTC
swp_swap_info() may return null; it is necessary to check the return value
to avoid NULL pointer dereference. The code for other calls to
swp_swap_info() includes checks, and __swap_duplicate() should also
include checks.

The reason why swp_swap_info() returns NULL is unclear; it may be due to
CPU cache issues or DDR bit flips. The probability of this issue is very
small, and the stack info we encountered is as follows:
Unable to handle kernel NULL pointer dereference at virtual address
0000000000000058
[RB/E]rb_sreason_str_set: sreason_str set null_pointer
Mem abort info:
  ESR = 0x0000000096000005
  EC = 0x25: DABT (current EL), IL = 32 bits
  SET = 0, FnV = 0
  EA = 0, S1PTW = 0
  FSC = 0x05: level 1 translation fault
Data abort info:
  ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
  CM = 0, WnR = 0, TnD = 0, TagAccess = 0
  GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
[0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
pud=0000000000000000
Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
Skip md ftrace buffer dump for: 0x1609e0
...
pc : swap_duplicate+0x44/0x164
lr : copy_page_range+0x508/0x1e78
sp : ffffffc0f2a699e0
x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
Call trace:
 swap_duplicate+0x44/0x164
 copy_page_range+0x508/0x1e78
 copy_process+0x1278/0x21cc
 kernel_clone+0x90/0x438
 __arm64_sys_clone+0x5c/0x8c
 invoke_syscall+0x58/0x110
 do_el0_svc+0x8c/0xe0
 el0_svc+0x38/0x9c
 el0t_64_sync_handler+0x44/0xec
 el0t_64_sync+0x1a8/0x1ac
Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
---[ end trace 0000000000000000 ]---
Kernel panic - not syncing: Oops: Fatal exception
SMP: stopping secondary CPUs

The patch seems to only provide a workaround, but there are no more
effective software solutions to handle the bit flips problem. This path
will change the issue from a system crash to a process exception, thereby
reducing the impact on the entire machine.

Signed-off-by: gao xu <gaoxu2@honor.com>
---
 mm/swapfile.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andrew Morton Feb. 13, 2025, 12:18 a.m. UTC | #1
On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote:

> swp_swap_info() may return null; it is necessary to check the return value
> to avoid NULL pointer dereference. The code for other calls to
> swp_swap_info() includes checks, and __swap_duplicate() should also
> include checks.

Actually very few of the swp_swap_info() callers check for a NULL return.

> The reason why swp_swap_info() returns NULL is unclear; it may be due to
> CPU cache issues or DDR bit flips.

Quite possibly it's a kernel bug.

> The probability of this issue is very
> small, and the stack info we encountered is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000058
>
> ...
>
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
>  	int err, i;
>  
>  	si = swp_swap_info(entry);
> +	if (unlikely(!si))
> +		return -EINVAL;
>  
>  	offset = swp_offset(entry);
>  	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);

OK, I guess avoiding the crash is good.  But please let's include a
WARN so that we can perhaps fix the bug, if one is there.
Nhat Pham Feb. 13, 2025, 1:41 a.m. UTC | #2
On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote:
>
> swp_swap_info() may return null; it is necessary to check the return value
> to avoid NULL pointer dereference. The code for other calls to
> swp_swap_info() includes checks, and __swap_duplicate() should also
> include checks.
>
> The reason why swp_swap_info() returns NULL is unclear; it may be due to
> CPU cache issues or DDR bit flips. The probability of this issue is very
> small, and the stack info we encountered is as follows:
> Unable to handle kernel NULL pointer dereference at virtual address
> 0000000000000058
> [RB/E]rb_sreason_str_set: sreason_str set null_pointer
> Mem abort info:
>   ESR = 0x0000000096000005
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
>   FSC = 0x05: level 1 translation fault
> Data abort info:
>   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
>   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
>   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
> user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000
> [0000000000000058] pgd=0000000000000000, p4d=0000000000000000,
> pud=0000000000000000
> Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP
> Skip md ftrace buffer dump for: 0x1609e0
> ...
> pc : swap_duplicate+0x44/0x164
> lr : copy_page_range+0x508/0x1e78
> sp : ffffffc0f2a699e0
> x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> Call trace:
>  swap_duplicate+0x44/0x164
>  copy_page_range+0x508/0x1e78
>  copy_process+0x1278/0x21cc
>  kernel_clone+0x90/0x438
>  __arm64_sys_clone+0x5c/0x8c
>  invoke_syscall+0x58/0x110
>  do_el0_svc+0x8c/0xe0
>  el0_svc+0x38/0x9c
>  el0t_64_sync_handler+0x44/0xec
>  el0t_64_sync+0x1a8/0x1ac
> Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8)
> ---[ end trace 0000000000000000 ]---
> Kernel panic - not syncing: Oops: Fatal exception
> SMP: stopping secondary CPUs
>
> The patch seems to only provide a workaround, but there are no more
> effective software solutions to handle the bit flips problem. This path
> will change the issue from a system crash to a process exception, thereby
> reducing the impact on the entire machine.
>
> Signed-off-by: gao xu <gaoxu2@honor.com>

Yeah this smells like a bug. A bit strange though - I have eyeballed
the code, and we (should have?) locked the PTE before resolving it
into the swap entry format. Which should have been enough to prevent
the swap entry from being unmapped and freed up. Which should have
been enough to prevent swapoff...?

(are you even doing concurrent swapoff?)

Can you provide more context? What kernel version is this, what kind
of workload is this, any reproducer, etc.?
gaoxu Feb. 13, 2025, 8:51 a.m. UTC | #3
> 
> On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote:
> >
> > swp_swap_info() may return null; it is necessary to check the return
> > value to avoid NULL pointer dereference. The code for other calls to
> > swp_swap_info() includes checks, and __swap_duplicate() should also
> > include checks.
> >
> > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > to CPU cache issues or DDR bit flips. The probability of this issue is
> > very small, and the stack info we encountered is as follows:
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000000000058
> > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> >   ESR = 0x0000000096000005
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> >   FSC = 0x05: level 1 translation fault Data abort info:
> >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > pgd=0000000000000000, p4d=0000000000000000,
> > pud=0000000000000000
> > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > buffer dump for: 0x1609e0 ...
> > pc : swap_duplicate+0x44/0x164
> > lr : copy_page_range+0x508/0x1e78
> > sp : ffffffc0f2a699e0
> > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > trace:
> >  swap_duplicate+0x44/0x164
> >  copy_page_range+0x508/0x1e78
> >  copy_process+0x1278/0x21cc
> >  kernel_clone+0x90/0x438
> >  __arm64_sys_clone+0x5c/0x8c
> >  invoke_syscall+0x58/0x110
> >  do_el0_svc+0x8c/0xe0
> >  el0_svc+0x38/0x9c
> >  el0t_64_sync_handler+0x44/0xec
> >  el0t_64_sync+0x1a8/0x1ac
> > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > exception
> > SMP: stopping secondary CPUs
> >
> > The patch seems to only provide a workaround, but there are no more
> > effective software solutions to handle the bit flips problem. This
> > path will change the issue from a system crash to a process exception,
> > thereby reducing the impact on the entire machine.
> >
> > Signed-off-by: gao xu <gaoxu2@honor.com>
> 
> Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and
> we (should have?) locked the PTE before resolving it into the swap entry format.
> Which should have been enough to prevent the swap entry from being
> unmapped and freed up. Which should have been enough to prevent swapoff...?
> 
> (are you even doing concurrent swapoff?)
No, the swapoff operation was not executed.
> 
> Can you provide more context? What kernel version is this, what kind of
> workload is this, any reproducer, etc.?
kernel version is linux 6.6,  Android15 - linux6.6.30.

The issues encountered by mobile users during usage.
The system load should not be high, as there is no info related to low
memory found in the logs.
The probability of this issue occurring is very low and irregular.
We cannot reproduce the problem during stress testing in the laboratory.

I found someone reporting a similar issue on the web, see:
https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html
https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.155581/
https://forums.unraid.net/topic/145497-server-crashes-with-repeated-get_swap_device-bad-swap-file-entry-3ffffffffffff/
gaoxu Feb. 13, 2025, 1:08 p.m. UTC | #4
> 
> On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote:
> 
> > swp_swap_info() may return null; it is necessary to check the return
> > value to avoid NULL pointer dereference. The code for other calls to
> > swp_swap_info() includes checks, and __swap_duplicate() should also
> > include checks.
> 
> Actually very few of the swp_swap_info() callers check for a NULL return.
The swapfile.c file contains three instances where the return value of
swp_swap_info() is checked for a NULL return. In other files that call
swp_swap_info(), I have confirmed that there are no such checks.
The description in the patch is inaccurate, and I have made modifications
in patch v2.
> 
> > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > to CPU cache issues or DDR bit flips.
> 
> Quite possibly it's a kernel bug.
> 
> > The probability of this issue is very
> > small, and the stack info we encountered is as follows:
> > Unable to handle kernel NULL pointer dereference at virtual address
> > 0000000000000058
> >
> > ...
> >
> > --- a/mm/swapfile.c
> > +++ b/mm/swapfile.c
> > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> unsigned char usage, int nr)
> >  	int err, i;
> >
> >  	si = swp_swap_info(entry);
> > +	if (unlikely(!si))
> > +		return -EINVAL;
> >
> >  	offset = swp_offset(entry);
> >  	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> 
> OK, I guess avoiding the crash is good.  But please let's include a WARN so that
> we can perhaps fix the bug, if one is there.
Good. I'll change it as mentioned and send a new patch.
	si = swp_swap_info(entry);
+	if (unlikely(!si)) {
+		WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val);
+		return -EINVAL;
+	}
Barry Song Feb. 13, 2025, 11:07 p.m. UTC | #5
On Thu, Feb 13, 2025 at 9:52 PM gaoxu <gaoxu2@honor.com> wrote:
>
> >
> > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote:
> > >
> > > swp_swap_info() may return null; it is necessary to check the return
> > > value to avoid NULL pointer dereference. The code for other calls to
> > > swp_swap_info() includes checks, and __swap_duplicate() should also
> > > include checks.
> > >
> > > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > > to CPU cache issues or DDR bit flips. The probability of this issue is
> > > very small, and the stack info we encountered is as follows:
> > > Unable to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000058
> > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > >   ESR = 0x0000000096000005
> > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > >   SET = 0, FnV = 0
> > >   EA = 0, S1PTW = 0
> > >   FSC = 0x05: level 1 translation fault Data abort info:
> > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages,
> > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > pgd=0000000000000000, p4d=0000000000000000,
> > > pud=0000000000000000
> > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace
> > > buffer dump for: 0x1609e0 ...
> > > pc : swap_duplicate+0x44/0x164
> > > lr : copy_page_range+0x508/0x1e78
> > > sp : ffffffc0f2a699e0
> > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000
> > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001
> > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call
> > > trace:
> > >  swap_duplicate+0x44/0x164
> > >  copy_page_range+0x508/0x1e78
> > >  copy_process+0x1278/0x21cc
> > >  kernel_clone+0x90/0x438
> > >  __arm64_sys_clone+0x5c/0x8c
> > >  invoke_syscall+0x58/0x110
> > >  do_el0_svc+0x8c/0xe0
> > >  el0_svc+0x38/0x9c
> > >  el0t_64_sync_handler+0x44/0xec
> > >  el0t_64_sync+0x1a8/0x1ac
> > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace
> > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > exception
> > > SMP: stopping secondary CPUs
> > >
> > > The patch seems to only provide a workaround, but there are no more
> > > effective software solutions to handle the bit flips problem. This
> > > path will change the issue from a system crash to a process exception,
> > > thereby reducing the impact on the entire machine.
> > >
> > > Signed-off-by: gao xu <gaoxu2@honor.com>
> >
> > Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and
> > we (should have?) locked the PTE before resolving it into the swap entry format.
> > Which should have been enough to prevent the swap entry from being
> > unmapped and freed up. Which should have been enough to prevent swapoff...?
> >
> > (are you even doing concurrent swapoff?)
> No, the swapoff operation was not executed.
> >
> > Can you provide more context? What kernel version is this, what kind of
> > workload is this, any reproducer, etc.?
> kernel version is linux 6.6,  Android15 - linux6.6.30.
>
> The issues encountered by mobile users during usage.
> The system load should not be high, as there is no info related to low
> memory found in the logs.
> The probability of this issue occurring is very low and irregular.
> We cannot reproduce the problem during stress testing in the laboratory.
>
> I found someone reporting a similar issue on the web, see:
> https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html
> https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.155581/
> https://forums.unraid.net/topic/145497-server-crashes-with-repeated-get_swap_device-bad-swap-file-entry-3ffffffffffff/

It might be a non-swap entry mistakenly passed to swap functions. I remember
fixing a similar issue in the Android Common Kernel 6.6:

https://android.googlesource.com/kernel/common/+/119351fe20bc73b71c6

where a migration entry is mistakenly passed to swap APIs.

In any case, we need to identify and fix the actual bug.

>
>
>

Thanks
Barry
Yosry Ahmed Feb. 13, 2025, 11:35 p.m. UTC | #6
On Thu, Feb 13, 2025 at 01:08:54PM +0000, gaoxu wrote:
> > 
> > On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote:
> > 
> > > swp_swap_info() may return null; it is necessary to check the return
> > > value to avoid NULL pointer dereference. The code for other calls to
> > > swp_swap_info() includes checks, and __swap_duplicate() should also
> > > include checks.
> > 
> > Actually very few of the swp_swap_info() callers check for a NULL return.
> The swapfile.c file contains three instances where the return value of
> swp_swap_info() is checked for a NULL return. In other files that call
> swp_swap_info(), I have confirmed that there are no such checks.
> The description in the patch is inaccurate, and I have made modifications
> in patch v2.
> > 
> > > The reason why swp_swap_info() returns NULL is unclear; it may be due
> > > to CPU cache issues or DDR bit flips.
> > 
> > Quite possibly it's a kernel bug.
> > 
> > > The probability of this issue is very
> > > small, and the stack info we encountered is as follows:
> > > Unable to handle kernel NULL pointer dereference at virtual address
> > > 0000000000000058
> > >
> > > ...
> > >
> > > --- a/mm/swapfile.c
> > > +++ b/mm/swapfile.c
> > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry,
> > unsigned char usage, int nr)
> > >  	int err, i;
> > >
> > >  	si = swp_swap_info(entry);
> > > +	if (unlikely(!si))
> > > +		return -EINVAL;
> > >
> > >  	offset = swp_offset(entry);
> > >  	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
> > 
> > OK, I guess avoiding the crash is good.  But please let's include a WARN so that
> > we can perhaps fix the bug, if one is there.
> Good. I'll change it as mentioned and send a new patch.
> 	si = swp_swap_info(entry);
> +	if (unlikely(!si)) {
> +		WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val);

WARN() already contains unlikely(). Also, no need to print the function
name it's already in the stack trace.

We should probably just do if (WARN_ON_ONCE(!si)).

> +		return -EINVAL;
> +	}
> 
>
gaoxu Feb. 14, 2025, 10 a.m. UTC | #7
> >
> > >
> > > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote:
> > > >
> > > > swp_swap_info() may return null; it is necessary to check the
> > > > return value to avoid NULL pointer dereference. The code for other
> > > > calls to
> > > > swp_swap_info() includes checks, and __swap_duplicate() should
> > > > also include checks.
> > > >
> > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > due to CPU cache issues or DDR bit flips. The probability of this
> > > > issue is very small, and the stack info we encountered is as
> > > > follows:
> > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > address
> > > > 0000000000000058
> > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info:
> > > >   ESR = 0x0000000096000005
> > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > >   SET = 0, FnV = 0
> > > >   EA = 0, S1PTW = 0
> > > >   FSC = 0x05: level 1 translation fault Data abort info:
> > > >   ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000
> > > >   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
> > > >   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k
> > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058]
> > > > pgd=0000000000000000, p4d=0000000000000000,
> > > > pud=0000000000000000
> > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md
> > > > ftrace buffer dump for: 0x1609e0 ...
> > > > pc : swap_duplicate+0x44/0x164
> > > > lr : copy_page_range+0x508/0x1e78
> > > > sp : ffffffc0f2a699e0
> > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388
> > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073
> > > > x23: 00000000002d2d2f x22: 0000000000000008 x21:
> 0000000000000000
> > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0
> > > > x17: 0000000000000000 x16: 0010000000000001 x15:
> 0040000000000001
> > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff
> > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006
> > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10
> > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000
> > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f
> > > > Call
> > > > trace:
> > > >  swap_duplicate+0x44/0x164
> > > >  copy_page_range+0x508/0x1e78
> > > >  copy_process+0x1278/0x21cc
> > > >  kernel_clone+0x90/0x438
> > > >  __arm64_sys_clone+0x5c/0x8c
> > > >  invoke_syscall+0x58/0x110
> > > >  do_el0_svc+0x8c/0xe0
> > > >  el0_svc+0x38/0x9c
> > > >  el0t_64_sync_handler+0x44/0xec
> > > >  el0t_64_sync+0x1a8/0x1ac
> > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end
> > > > trace
> > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal
> > > > exception
> > > > SMP: stopping secondary CPUs
> > > >
> > > > The patch seems to only provide a workaround, but there are no
> > > > more effective software solutions to handle the bit flips problem.
> > > > This path will change the issue from a system crash to a process
> > > > exception, thereby reducing the impact on the entire machine.
> > > >
> > > > Signed-off-by: gao xu <gaoxu2@honor.com>
> > >
> > > Yeah this smells like a bug. A bit strange though - I have eyeballed
> > > the code, and we (should have?) locked the PTE before resolving it into the
> swap entry format.
> > > Which should have been enough to prevent the swap entry from being
> > > unmapped and freed up. Which should have been enough to prevent
> swapoff...?
> > >
> > > (are you even doing concurrent swapoff?)
> > No, the swapoff operation was not executed.
> > >
> > > Can you provide more context? What kernel version is this, what kind
> > > of workload is this, any reproducer, etc.?
> > kernel version is linux 6.6,  Android15 - linux6.6.30.
> >
> > The issues encountered by mobile users during usage.
> > The system load should not be high, as there is no info related to low
> > memory found in the logs.
> > The probability of this issue occurring is very low and irregular.
> > We cannot reproduce the problem during stress testing in the laboratory.
> >
> > I found someone reporting a similar issue on the web, see:
> > https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html
> > https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.
> > 155581/
> > https://forums.unraid.net/topic/145497-server-crashes-with-repeated-ge
> > t_swap_device-bad-swap-file-entry-3ffffffffffff/
> 
> It might be a non-swap entry mistakenly passed to swap functions. I remember
> fixing a similar issue in the Android Common Kernel 6.6:
> 
> https://android.googlesource.com/kernel/common/+/119351fe20bc73b71c6
> 
> where a migration entry is mistakenly passed to swap APIs.
> 
> In any case, we need to identify and fix the actual bug.
As I mentioned in my previous email, the root cause of this issue is
unclear, and we do not know why this entry is behaving abnormally.
The probability of this issue occurring is very low, and it cannot be
reproduced locally. We are now looking to transform a null pointer crash
issue into an individual process issue, aiming to reduce the impact of
the problem on actual users. I believe everyone understands that this
approach is also valuable.

Of course, we can also refer to the link you provided to make the corresponding modifications,
This modification can also help avoid the null pointer issue. such as:
-   if (swap_duplicate(entry) < 0) {
+  if (!non_swap_entry(entry) && swap_duplicate(entry) < 0) {
	   set_pte_at(mm, address, pvmw.pte, pteval);
	   ret = false;
	   page_vma_mapped_walk_done(&pvmw);
	   break;
   }

> 
> >
> >
> >
> 
> Thanks
> Barry
gaoxu Feb. 14, 2025, 10:04 a.m. UTC | #8
> > >
> > > > swp_swap_info() may return null; it is necessary to check the
> > > > return value to avoid NULL pointer dereference. The code for other
> > > > calls to
> > > > swp_swap_info() includes checks, and __swap_duplicate() should
> > > > also include checks.
> > >
> > > Actually very few of the swp_swap_info() callers check for a NULL return.
> > The swapfile.c file contains three instances where the return value of
> > swp_swap_info() is checked for a NULL return. In other files that call
> > swp_swap_info(), I have confirmed that there are no such checks.
> > The description in the patch is inaccurate, and I have made
> > modifications in patch v2.
> > >
> > > > The reason why swp_swap_info() returns NULL is unclear; it may be
> > > > due to CPU cache issues or DDR bit flips.
> > >
> > > Quite possibly it's a kernel bug.
> > >
> > > > The probability of this issue is very small, and the stack info we
> > > > encountered is as follows:
> > > > Unable to handle kernel NULL pointer dereference at virtual
> > > > address
> > > > 0000000000000058
> > > >
> > > > ...
> > > >
> > > > --- a/mm/swapfile.c
> > > > +++ b/mm/swapfile.c
> > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t
> > > > entry,
> > > unsigned char usage, int nr)
> > > >  	int err, i;
> > > >
> > > >  	si = swp_swap_info(entry);
> > > > +	if (unlikely(!si))
> > > > +		return -EINVAL;
> > > >
> > > >  	offset = swp_offset(entry);
> > > >  	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset %
> SWAPFILE_CLUSTER);
> > >
> > > OK, I guess avoiding the crash is good.  But please let's include a
> > > WARN so that we can perhaps fix the bug, if one is there.
> > Good. I'll change it as mentioned and send a new patch.
> > 	si = swp_swap_info(entry);
> > +	if (unlikely(!si)) {
> > +		WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val);
> 
> WARN() already contains unlikely(). Also, no need to print the function name it's
> already in the stack trace.
> 
> We should probably just do if (WARN_ON_ONCE(!si)).
> 
> > +		return -EINVAL;
> > +	}
Yes, thank you for your suggestion. This modification makes the code simpler.
	si = swp_swap_info(entry);
+	if (WARN_ON_ONCE(!si))
+		return -EINVAL;

> >
> >
diff mbox series

Patch

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 7448a3876..0b253b8ca 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3521,6 +3521,8 @@  static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr)
 	int err, i;
 
 	si = swp_swap_info(entry);
+	if (unlikely(!si))
+		return -EINVAL;
 
 	offset = swp_offset(entry);
 	VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);