Message ID | 44655569e3a1419f800952004f07e714@honor.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: Fix possible NULL pointer dereference in __swap_duplicate | expand |
On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote: > swp_swap_info() may return null; it is necessary to check the return value > to avoid NULL pointer dereference. The code for other calls to > swp_swap_info() includes checks, and __swap_duplicate() should also > include checks. Actually very few of the swp_swap_info() callers check for a NULL return. > The reason why swp_swap_info() returns NULL is unclear; it may be due to > CPU cache issues or DDR bit flips. Quite possibly it's a kernel bug. > The probability of this issue is very > small, and the stack info we encountered is as follows: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000058 > > ... > > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) > int err, i; > > si = swp_swap_info(entry); > + if (unlikely(!si)) > + return -EINVAL; > > offset = swp_offset(entry); > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); OK, I guess avoiding the crash is good. But please let's include a WARN so that we can perhaps fix the bug, if one is there.
On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote: > > swp_swap_info() may return null; it is necessary to check the return value > to avoid NULL pointer dereference. The code for other calls to > swp_swap_info() includes checks, and __swap_duplicate() should also > include checks. > > The reason why swp_swap_info() returns NULL is unclear; it may be due to > CPU cache issues or DDR bit flips. The probability of this issue is very > small, and the stack info we encountered is as follows: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000058 > [RB/E]rb_sreason_str_set: sreason_str set null_pointer > Mem abort info: > ESR = 0x0000000096000005 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x05: level 1 translation fault > Data abort info: > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 > [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, > pud=0000000000000000 > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP > Skip md ftrace buffer dump for: 0x1609e0 > ... > pc : swap_duplicate+0x44/0x164 > lr : copy_page_range+0x508/0x1e78 > sp : ffffffc0f2a699e0 > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > Call trace: > swap_duplicate+0x44/0x164 > copy_page_range+0x508/0x1e78 > copy_process+0x1278/0x21cc > kernel_clone+0x90/0x438 > __arm64_sys_clone+0x5c/0x8c > invoke_syscall+0x58/0x110 > do_el0_svc+0x8c/0xe0 > el0_svc+0x38/0x9c > el0t_64_sync_handler+0x44/0xec > el0t_64_sync+0x1a8/0x1ac > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Oops: Fatal exception > SMP: stopping secondary CPUs > > The patch seems to only provide a workaround, but there are no more > effective software solutions to handle the bit flips problem. This path > will change the issue from a system crash to a process exception, thereby > reducing the impact on the entire machine. > > Signed-off-by: gao xu <gaoxu2@honor.com> Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and we (should have?) locked the PTE before resolving it into the swap entry format. Which should have been enough to prevent the swap entry from being unmapped and freed up. Which should have been enough to prevent swapoff...? (are you even doing concurrent swapoff?) Can you provide more context? What kernel version is this, what kind of workload is this, any reproducer, etc.?
> > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote: > > > > swp_swap_info() may return null; it is necessary to check the return > > value to avoid NULL pointer dereference. The code for other calls to > > swp_swap_info() includes checks, and __swap_duplicate() should also > > include checks. > > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > to CPU cache issues or DDR bit flips. The probability of this issue is > > very small, and the stack info we encountered is as follows: > > Unable to handle kernel NULL pointer dereference at virtual address > > 0000000000000058 > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > ESR = 0x0000000096000005 > > EC = 0x25: DABT (current EL), IL = 32 bits > > SET = 0, FnV = 0 > > EA = 0, S1PTW = 0 > > FSC = 0x05: level 1 translation fault Data abort info: > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > pgd=0000000000000000, p4d=0000000000000000, > > pud=0000000000000000 > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace > > buffer dump for: 0x1609e0 ... > > pc : swap_duplicate+0x44/0x164 > > lr : copy_page_range+0x508/0x1e78 > > sp : ffffffc0f2a699e0 > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call > > trace: > > swap_duplicate+0x44/0x164 > > copy_page_range+0x508/0x1e78 > > copy_process+0x1278/0x21cc > > kernel_clone+0x90/0x438 > > __arm64_sys_clone+0x5c/0x8c > > invoke_syscall+0x58/0x110 > > do_el0_svc+0x8c/0xe0 > > el0_svc+0x38/0x9c > > el0t_64_sync_handler+0x44/0xec > > el0t_64_sync+0x1a8/0x1ac > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > exception > > SMP: stopping secondary CPUs > > > > The patch seems to only provide a workaround, but there are no more > > effective software solutions to handle the bit flips problem. This > > path will change the issue from a system crash to a process exception, > > thereby reducing the impact on the entire machine. > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and > we (should have?) locked the PTE before resolving it into the swap entry format. > Which should have been enough to prevent the swap entry from being > unmapped and freed up. Which should have been enough to prevent swapoff...? > > (are you even doing concurrent swapoff?) No, the swapoff operation was not executed. > > Can you provide more context? What kernel version is this, what kind of > workload is this, any reproducer, etc.? kernel version is linux 6.6, Android15 - linux6.6.30. The issues encountered by mobile users during usage. The system load should not be high, as there is no info related to low memory found in the logs. The probability of this issue occurring is very low and irregular. We cannot reproduce the problem during stress testing in the laboratory. I found someone reporting a similar issue on the web, see: https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.155581/ https://forums.unraid.net/topic/145497-server-crashes-with-repeated-get_swap_device-bad-swap-file-entry-3ffffffffffff/
> > On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote: > > > swp_swap_info() may return null; it is necessary to check the return > > value to avoid NULL pointer dereference. The code for other calls to > > swp_swap_info() includes checks, and __swap_duplicate() should also > > include checks. > > Actually very few of the swp_swap_info() callers check for a NULL return. The swapfile.c file contains three instances where the return value of swp_swap_info() is checked for a NULL return. In other files that call swp_swap_info(), I have confirmed that there are no such checks. The description in the patch is inaccurate, and I have made modifications in patch v2. > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > to CPU cache issues or DDR bit flips. > > Quite possibly it's a kernel bug. > > > The probability of this issue is very > > small, and the stack info we encountered is as follows: > > Unable to handle kernel NULL pointer dereference at virtual address > > 0000000000000058 > > > > ... > > > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, > unsigned char usage, int nr) > > int err, i; > > > > si = swp_swap_info(entry); > > + if (unlikely(!si)) > > + return -EINVAL; > > > > offset = swp_offset(entry); > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); > > OK, I guess avoiding the crash is good. But please let's include a WARN so that > we can perhaps fix the bug, if one is there. Good. I'll change it as mentioned and send a new patch. si = swp_swap_info(entry); + if (unlikely(!si)) { + WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val); + return -EINVAL; + }
On Thu, Feb 13, 2025 at 9:52 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > swp_swap_info() may return null; it is necessary to check the return > > > value to avoid NULL pointer dereference. The code for other calls to > > > swp_swap_info() includes checks, and __swap_duplicate() should also > > > include checks. > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > > to CPU cache issues or DDR bit flips. The probability of this issue is > > > very small, and the stack info we encountered is as follows: > > > Unable to handle kernel NULL pointer dereference at virtual address > > > 0000000000000058 > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > > ESR = 0x0000000096000005 > > > EC = 0x25: DABT (current EL), IL = 32 bits > > > SET = 0, FnV = 0 > > > EA = 0, S1PTW = 0 > > > FSC = 0x05: level 1 translation fault Data abort info: > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, > > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > > pgd=0000000000000000, p4d=0000000000000000, > > > pud=0000000000000000 > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace > > > buffer dump for: 0x1609e0 ... > > > pc : swap_duplicate+0x44/0x164 > > > lr : copy_page_range+0x508/0x1e78 > > > sp : ffffffc0f2a699e0 > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call > > > trace: > > > swap_duplicate+0x44/0x164 > > > copy_page_range+0x508/0x1e78 > > > copy_process+0x1278/0x21cc > > > kernel_clone+0x90/0x438 > > > __arm64_sys_clone+0x5c/0x8c > > > invoke_syscall+0x58/0x110 > > > do_el0_svc+0x8c/0xe0 > > > el0_svc+0x38/0x9c > > > el0t_64_sync_handler+0x44/0xec > > > el0t_64_sync+0x1a8/0x1ac > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > > exception > > > SMP: stopping secondary CPUs > > > > > > The patch seems to only provide a workaround, but there are no more > > > effective software solutions to handle the bit flips problem. This > > > path will change the issue from a system crash to a process exception, > > > thereby reducing the impact on the entire machine. > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > > Yeah this smells like a bug. A bit strange though - I have eyeballed the code, and > > we (should have?) locked the PTE before resolving it into the swap entry format. > > Which should have been enough to prevent the swap entry from being > > unmapped and freed up. Which should have been enough to prevent swapoff...? > > > > (are you even doing concurrent swapoff?) > No, the swapoff operation was not executed. > > > > Can you provide more context? What kernel version is this, what kind of > > workload is this, any reproducer, etc.? > kernel version is linux 6.6, Android15 - linux6.6.30. > > The issues encountered by mobile users during usage. > The system load should not be high, as there is no info related to low > memory found in the logs. > The probability of this issue occurring is very low and irregular. > We cannot reproduce the problem during stress testing in the laboratory. > > I found someone reporting a similar issue on the web, see: > https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html > https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry.155581/ > https://forums.unraid.net/topic/145497-server-crashes-with-repeated-get_swap_device-bad-swap-file-entry-3ffffffffffff/ It might be a non-swap entry mistakenly passed to swap functions. I remember fixing a similar issue in the Android Common Kernel 6.6: https://android.googlesource.com/kernel/common/+/119351fe20bc73b71c6 where a migration entry is mistakenly passed to swap APIs. In any case, we need to identify and fix the actual bug. > > > Thanks Barry
On Thu, Feb 13, 2025 at 01:08:54PM +0000, gaoxu wrote: > > > > On Wed, 12 Feb 2025 03:13:46 +0000 gaoxu <gaoxu2@honor.com> wrote: > > > > > swp_swap_info() may return null; it is necessary to check the return > > > value to avoid NULL pointer dereference. The code for other calls to > > > swp_swap_info() includes checks, and __swap_duplicate() should also > > > include checks. > > > > Actually very few of the swp_swap_info() callers check for a NULL return. > The swapfile.c file contains three instances where the return value of > swp_swap_info() is checked for a NULL return. In other files that call > swp_swap_info(), I have confirmed that there are no such checks. > The description in the patch is inaccurate, and I have made modifications > in patch v2. > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > > to CPU cache issues or DDR bit flips. > > > > Quite possibly it's a kernel bug. > > > > > The probability of this issue is very > > > small, and the stack info we encountered is as follows: > > > Unable to handle kernel NULL pointer dereference at virtual address > > > 0000000000000058 > > > > > > ... > > > > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, > > unsigned char usage, int nr) > > > int err, i; > > > > > > si = swp_swap_info(entry); > > > + if (unlikely(!si)) > > > + return -EINVAL; > > > > > > offset = swp_offset(entry); > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); > > > > OK, I guess avoiding the crash is good. But please let's include a WARN so that > > we can perhaps fix the bug, if one is there. > Good. I'll change it as mentioned and send a new patch. > si = swp_swap_info(entry); > + if (unlikely(!si)) { > + WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val); WARN() already contains unlikely(). Also, no need to print the function name it's already in the stack trace. We should probably just do if (WARN_ON_ONCE(!si)). > + return -EINVAL; > + } > >
> > > > > > > > On Tue, Feb 11, 2025 at 7:14 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > > > swp_swap_info() may return null; it is necessary to check the > > > > return value to avoid NULL pointer dereference. The code for other > > > > calls to > > > > swp_swap_info() includes checks, and __swap_duplicate() should > > > > also include checks. > > > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be > > > > due to CPU cache issues or DDR bit flips. The probability of this > > > > issue is very small, and the stack info we encountered is as > > > > follows: > > > > Unable to handle kernel NULL pointer dereference at virtual > > > > address > > > > 0000000000000058 > > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > > > ESR = 0x0000000096000005 > > > > EC = 0x25: DABT (current EL), IL = 32 bits > > > > SET = 0, FnV = 0 > > > > EA = 0, S1PTW = 0 > > > > FSC = 0x05: level 1 translation fault Data abort info: > > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k > > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > > > pgd=0000000000000000, p4d=0000000000000000, > > > > pud=0000000000000000 > > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md > > > > ftrace buffer dump for: 0x1609e0 ... > > > > pc : swap_duplicate+0x44/0x164 > > > > lr : copy_page_range+0x508/0x1e78 > > > > sp : ffffffc0f2a699e0 > > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > > > x23: 00000000002d2d2f x22: 0000000000000008 x21: > 0000000000000000 > > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > > > x17: 0000000000000000 x16: 0010000000000001 x15: > 0040000000000001 > > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > > > > Call > > > > trace: > > > > swap_duplicate+0x44/0x164 > > > > copy_page_range+0x508/0x1e78 > > > > copy_process+0x1278/0x21cc > > > > kernel_clone+0x90/0x438 > > > > __arm64_sys_clone+0x5c/0x8c > > > > invoke_syscall+0x58/0x110 > > > > do_el0_svc+0x8c/0xe0 > > > > el0_svc+0x38/0x9c > > > > el0t_64_sync_handler+0x44/0xec > > > > el0t_64_sync+0x1a8/0x1ac > > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end > > > > trace > > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > > > exception > > > > SMP: stopping secondary CPUs > > > > > > > > The patch seems to only provide a workaround, but there are no > > > > more effective software solutions to handle the bit flips problem. > > > > This path will change the issue from a system crash to a process > > > > exception, thereby reducing the impact on the entire machine. > > > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > > > > Yeah this smells like a bug. A bit strange though - I have eyeballed > > > the code, and we (should have?) locked the PTE before resolving it into the > swap entry format. > > > Which should have been enough to prevent the swap entry from being > > > unmapped and freed up. Which should have been enough to prevent > swapoff...? > > > > > > (are you even doing concurrent swapoff?) > > No, the swapoff operation was not executed. > > > > > > Can you provide more context? What kernel version is this, what kind > > > of workload is this, any reproducer, etc.? > > kernel version is linux 6.6, Android15 - linux6.6.30. > > > > The issues encountered by mobile users during usage. > > The system load should not be high, as there is no info related to low > > memory found in the logs. > > The probability of this issue occurring is very low and irregular. > > We cannot reproduce the problem during stress testing in the laboratory. > > > > I found someone reporting a similar issue on the web, see: > > https://lkml.indiana.edu/hypermail/linux/kernel/2406.0/02380.html > > https://forum.proxmox.com/threads/get_swap_device-bad-swap-file-entry. > > 155581/ > > https://forums.unraid.net/topic/145497-server-crashes-with-repeated-ge > > t_swap_device-bad-swap-file-entry-3ffffffffffff/ > > It might be a non-swap entry mistakenly passed to swap functions. I remember > fixing a similar issue in the Android Common Kernel 6.6: > > https://android.googlesource.com/kernel/common/+/119351fe20bc73b71c6 > > where a migration entry is mistakenly passed to swap APIs. > > In any case, we need to identify and fix the actual bug. As I mentioned in my previous email, the root cause of this issue is unclear, and we do not know why this entry is behaving abnormally. The probability of this issue occurring is very low, and it cannot be reproduced locally. We are now looking to transform a null pointer crash issue into an individual process issue, aiming to reduce the impact of the problem on actual users. I believe everyone understands that this approach is also valuable. Of course, we can also refer to the link you provided to make the corresponding modifications, This modification can also help avoid the null pointer issue. such as: - if (swap_duplicate(entry) < 0) { + if (!non_swap_entry(entry) && swap_duplicate(entry) < 0) { set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; } > > > > > > > > > Thanks > Barry
> > > > > > > swp_swap_info() may return null; it is necessary to check the > > > > return value to avoid NULL pointer dereference. The code for other > > > > calls to > > > > swp_swap_info() includes checks, and __swap_duplicate() should > > > > also include checks. > > > > > > Actually very few of the swp_swap_info() callers check for a NULL return. > > The swapfile.c file contains three instances where the return value of > > swp_swap_info() is checked for a NULL return. In other files that call > > swp_swap_info(), I have confirmed that there are no such checks. > > The description in the patch is inaccurate, and I have made > > modifications in patch v2. > > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be > > > > due to CPU cache issues or DDR bit flips. > > > > > > Quite possibly it's a kernel bug. > > > > > > > The probability of this issue is very small, and the stack info we > > > > encountered is as follows: > > > > Unable to handle kernel NULL pointer dereference at virtual > > > > address > > > > 0000000000000058 > > > > > > > > ... > > > > > > > > --- a/mm/swapfile.c > > > > +++ b/mm/swapfile.c > > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t > > > > entry, > > > unsigned char usage, int nr) > > > > int err, i; > > > > > > > > si = swp_swap_info(entry); > > > > + if (unlikely(!si)) > > > > + return -EINVAL; > > > > > > > > offset = swp_offset(entry); > > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % > SWAPFILE_CLUSTER); > > > > > > OK, I guess avoiding the crash is good. But please let's include a > > > WARN so that we can perhaps fix the bug, if one is there. > > Good. I'll change it as mentioned and send a new patch. > > si = swp_swap_info(entry); > > + if (unlikely(!si)) { > > + WARN(1, KERN_ERR "%s: %s%08lx\n", __func__, Bad_file, entry.val); > > WARN() already contains unlikely(). Also, no need to print the function name it's > already in the stack trace. > > We should probably just do if (WARN_ON_ONCE(!si)). > > > + return -EINVAL; > > + } Yes, thank you for your suggestion. This modification makes the code simpler. si = swp_swap_info(entry); + if (WARN_ON_ONCE(!si)) + return -EINVAL; > > > >
diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..0b253b8ca 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) int err, i; si = swp_swap_info(entry); + if (unlikely(!si)) + return -EINVAL; offset = swp_offset(entry); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
swp_swap_info() may return null; it is necessary to check the return value to avoid NULL pointer dereference. The code for other calls to swp_swap_info() includes checks, and __swap_duplicate() should also include checks. The reason why swp_swap_info() returns NULL is unclear; it may be due to CPU cache issues or DDR bit flips. The probability of this issue is very small, and the stack info we encountered is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace buffer dump for: 0x1609e0 ... pc : swap_duplicate+0x44/0x164 lr : copy_page_range+0x508/0x1e78 sp : ffffffc0f2a699e0 x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call trace: swap_duplicate+0x44/0x164 copy_page_range+0x508/0x1e78 copy_process+0x1278/0x21cc kernel_clone+0x90/0x438 __arm64_sys_clone+0x5c/0x8c invoke_syscall+0x58/0x110 do_el0_svc+0x8c/0xe0 el0_svc+0x38/0x9c el0t_64_sync_handler+0x44/0xec el0t_64_sync+0x1a8/0x1ac Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs The patch seems to only provide a workaround, but there are no more effective software solutions to handle the bit flips problem. This path will change the issue from a system crash to a process exception, thereby reducing the impact on the entire machine. Signed-off-by: gao xu <gaoxu2@honor.com> --- mm/swapfile.c | 2 ++ 1 file changed, 2 insertions(+)