Message ID | f8be2606fa114184a17a48f9859ec592@honor.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] mm: Fix possible NULL pointer dereference in __swap_duplicate | expand |
On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote: > > Add a NULL check on the return value of swp_swap_info in __swap_duplicate > to prevent crashes caused by NULL pointer dereference. > > The reason why swp_swap_info() returns NULL is unclear; it may be due to > CPU cache issues or DDR bit flips. The probability of this issue is very > small, and the stack info we encountered is as follows: > Unable to handle kernel NULL pointer dereference at virtual address > 0000000000000058 > [RB/E]rb_sreason_str_set: sreason_str set null_pointer > Mem abort info: > ESR = 0x0000000096000005 > EC = 0x25: DABT (current EL), IL = 32 bits > SET = 0, FnV = 0 > EA = 0, S1PTW = 0 > FSC = 0x05: level 1 translation fault > Data abort info: > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 > user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 > [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, > pud=0000000000000000 > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP > Skip md ftrace buffer dump for: 0x1609e0 > ... > pc : swap_duplicate+0x44/0x164 > lr : copy_page_range+0x508/0x1e78 > sp : ffffffc0f2a699e0 > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > Call trace: > swap_duplicate+0x44/0x164 > copy_page_range+0x508/0x1e78 This is really strange since we already have a swap entry check before calling swap_duplicate(). copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma, unsigned long addr, int *rss) { unsigned long vm_flags = dst_vma->vm_flags; pte_t orig_pte = ptep_get(src_pte); pte_t pte = orig_pte; struct folio *folio; struct page *page; swp_entry_t entry = pte_to_swp_entry(orig_pte); if (likely(!non_swap_entry(entry))) { if (swap_duplicate(entry) < 0) return -EIO; ... } likely the swap_type is larger than MAX_SWAPFILES so we get a NULL? static struct swap_info_struct *swap_type_to_swap_info(int type) { if (type >= MAX_SWAPFILES) return NULL; return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } But non_swap_entry() guarantees that swp_type is smaller than MAX_SWAPFILES. static inline int non_swap_entry(swp_entry_t entry) { return swp_type(entry) >= MAX_SWAPFILES; } So another possibility is that we have an overflow of swap_info[] where type is < MAX_SWAPFILES but is not a valid existing swapfile? I don't see how the current patch contributes to debugging or fixing anything related to this dumped stack. Can we dump swp_type() as well? > copy_process+0x1278/0x21cc > kernel_clone+0x90/0x438 > __arm64_sys_clone+0x5c/0x8c > invoke_syscall+0x58/0x110 > do_el0_svc+0x8c/0xe0 > el0_svc+0x38/0x9c > el0t_64_sync_handler+0x44/0xec > el0t_64_sync+0x1a8/0x1ac > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) > ---[ end trace 0000000000000000 ]--- > Kernel panic - not syncing: Oops: Fatal exception > SMP: stopping secondary CPUs > > The patch seems to only provide a workaround, but there are no more > effective software solutions to handle the bit flips problem. This path > will change the issue from a system crash to a process exception, thereby > reducing the impact on the entire machine. > > Signed-off-by: gao xu <gaoxu2@honor.com> > --- > v1 -> v2: > - Add WARN_ON_ONCE. > - update the commit info. > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > --- > > mm/swapfile.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 7448a3876..a0bfdba94 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) > int err, i; > > si = swp_swap_info(entry); > + if (WARN_ON_ONCE(!si)) I mean, printk something related to swp_type(). This is really strange, but the current stack won't help with debugging. > + return -EINVAL; > > offset = swp_offset(entry); > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER); > -- > 2.17.1 Thanks Barry
> > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote: > > > > Add a NULL check on the return value of swp_swap_info in > > __swap_duplicate to prevent crashes caused by NULL pointer dereference. > > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > to CPU cache issues or DDR bit flips. The probability of this issue is > > very small, and the stack info we encountered is as follows: > > Unable to handle kernel NULL pointer dereference at virtual address > > 0000000000000058 > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > ESR = 0x0000000096000005 > > EC = 0x25: DABT (current EL), IL = 32 bits > > SET = 0, FnV = 0 > > EA = 0, S1PTW = 0 > > FSC = 0x05: level 1 translation fault Data abort info: > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > pgd=0000000000000000, p4d=0000000000000000, > > pud=0000000000000000 > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace > > buffer dump for: 0x1609e0 ... > > pc : swap_duplicate+0x44/0x164 > > lr : copy_page_range+0x508/0x1e78 > > sp : ffffffc0f2a699e0 > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call > > trace: > > swap_duplicate+0x44/0x164 > > copy_page_range+0x508/0x1e78 > > This is really strange since we already have a swap entry check before calling > swap_duplicate(). > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct > *dst_vma, > struct vm_area_struct *src_vma, unsigned long addr, int > *rss) { > unsigned long vm_flags = dst_vma->vm_flags; > pte_t orig_pte = ptep_get(src_pte); > pte_t pte = orig_pte; > struct folio *folio; > struct page *page; > swp_entry_t entry = pte_to_swp_entry(orig_pte); > > if (likely(!non_swap_entry(entry))) { > if (swap_duplicate(entry) < 0) > return -EIO; > ... > } > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL? > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > if (type >= MAX_SWAPFILES) > return NULL; > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > But non_swap_entry() guarantees that swp_type is smaller than > MAX_SWAPFILES. > > static inline int non_swap_entry(swp_entry_t entry) { > return swp_type(entry) >= MAX_SWAPFILES; } > > So another possibility is that we have an overflow of swap_info[] where type is < > MAX_SWAPFILES but is not a valid existing swapfile? In the log of this issue, there is a printed entry: get_swap_device: Bad swap file entry 18000000002d2d2f. It can be calculated that swp_type(18000000002d2d2f) = 6. In the Android 15-linux6.6: system: MAX_SWAPFILES = 28, nr_swapfiles = 1. Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater than nr_swapfiles, the value of this entry is abnormal. static unsigned int nr_swapfiles; static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is a static array, with its values initialized to 0. The size of the array is MAX_SWAPFILES, and the size of valid values in the array is nr_swapfiles. Therefore, when we validate the validity of swp_type(entry), we should compare it with nr_swapfiles, not MAX_SWAPFILES. The code for validating swp_type may need to be modified as follows: static inline int non_swap_entry(swp_entry_t entry) { - return swp_type(entry) >= MAX_SWAPFILES; + return swp_type(entry) >= nr_swapfiles; } static struct swap_info_struct *swap_type_to_swap_info(int type) { - if (type >= MAX_SWAPFILES) + if (type >= nr_swapfiles) return NULL; return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > I don't see how the current patch contributes to debugging or fixing anything > related to this dumped stack. Can we dump swp_type() as well? > > > copy_process+0x1278/0x21cc > > kernel_clone+0x90/0x438 > > __arm64_sys_clone+0x5c/0x8c > > invoke_syscall+0x58/0x110 > > do_el0_svc+0x8c/0xe0 > > el0_svc+0x38/0x9c > > el0t_64_sync_handler+0x44/0xec > > el0t_64_sync+0x1a8/0x1ac > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > exception > > SMP: stopping secondary CPUs > > > > The patch seems to only provide a workaround, but there are no more > > effective software solutions to handle the bit flips problem. This > > path will change the issue from a system crash to a process exception, > > thereby reducing the impact on the entire machine. > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > --- > > v1 -> v2: > > - Add WARN_ON_ONCE. > > - update the commit info. > > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > > --- > > > > mm/swapfile.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94 > > 100644 > > --- a/mm/swapfile.c > > +++ b/mm/swapfile.c > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, > unsigned char usage, int nr) > > int err, i; > > > > si = swp_swap_info(entry); > > + if (WARN_ON_ONCE(!si)) > > I mean, printk something related to swp_type(). This is really strange, but the > current stack won't help with debugging. The log can find info related to "get_swap_device: Bad swap file entry xxx" when an entry encounters an exception. Add a print info log like the following: pr_err("%s%08d\n", Bad swap type, swp_type(entry)); > > > + return -EINVAL; > > > > offset = swp_offset(entry); > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % > SWAPFILE_CLUSTER); > > -- > > 2.17.1 > > Thanks > Barry
Thank you! On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > Add a NULL check on the return value of swp_swap_info in > > > __swap_duplicate to prevent crashes caused by NULL pointer dereference. > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be due > > > to CPU cache issues or DDR bit flips. The probability of this issue is > > > very small, and the stack info we encountered is as follows: > > > Unable to handle kernel NULL pointer dereference at virtual address > > > 0000000000000058 > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > > ESR = 0x0000000096000005 > > > EC = 0x25: DABT (current EL), IL = 32 bits > > > SET = 0, FnV = 0 > > > EA = 0, S1PTW = 0 > > > FSC = 0x05: level 1 translation fault Data abort info: > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, > > > 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > > pgd=0000000000000000, p4d=0000000000000000, > > > pud=0000000000000000 > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace > > > buffer dump for: 0x1609e0 ... > > > pc : swap_duplicate+0x44/0x164 > > > lr : copy_page_range+0x508/0x1e78 > > > sp : ffffffc0f2a699e0 > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > > x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > > x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call > > > trace: > > > swap_duplicate+0x44/0x164 > > > copy_page_range+0x508/0x1e78 > > > > This is really strange since we already have a swap entry check before calling > > swap_duplicate(). > > > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, > > pte_t *dst_pte, pte_t *src_pte, struct vm_area_struct > > *dst_vma, > > struct vm_area_struct *src_vma, unsigned long addr, int > > *rss) { > > unsigned long vm_flags = dst_vma->vm_flags; > > pte_t orig_pte = ptep_get(src_pte); > > pte_t pte = orig_pte; > > struct folio *folio; > > struct page *page; > > swp_entry_t entry = pte_to_swp_entry(orig_pte); > > > > if (likely(!non_swap_entry(entry))) { > > if (swap_duplicate(entry) < 0) > > return -EIO; > > ... > > } > > > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL? > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > > if (type >= MAX_SWAPFILES) > > return NULL; > > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > > > But non_swap_entry() guarantees that swp_type is smaller than > > MAX_SWAPFILES. > > > > static inline int non_swap_entry(swp_entry_t entry) { > > return swp_type(entry) >= MAX_SWAPFILES; } > > > > So another possibility is that we have an overflow of swap_info[] where type is < > > MAX_SWAPFILES but is not a valid existing swapfile? > In the log of this issue, there is a printed entry: get_swap_device: > Bad swap file entry 18000000002d2d2f. > It can be calculated that swp_type(18000000002d2d2f) = 6. > In the Android 15-linux6.6: > system: MAX_SWAPFILES = 28, nr_swapfiles = 1. > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but greater > than nr_swapfiles, the value of this entry is abnormal. > > static unsigned int nr_swapfiles; > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; > swap_info is a static array, with its values initialized to 0. > The size of the array is MAX_SWAPFILES, and the size of valid values in the array is > nr_swapfiles. Therefore, when we validate the validity of swp_type(entry), > we should compare it with nr_swapfiles, not MAX_SWAPFILES. > The code for validating swp_type may need to be modified as follows: That might be true, but on a normal system, we only need to distinguish between a swap entry and a migrate entry. Therefore, comparing with MAX_SWAPFILES is sufficient. > static inline int non_swap_entry(swp_entry_t entry) > { > - return swp_type(entry) >= MAX_SWAPFILES; > + return swp_type(entry) >= nr_swapfiles; > } > > static struct swap_info_struct *swap_type_to_swap_info(int type) > { > - if (type >= MAX_SWAPFILES) > + if (type >= nr_swapfiles) > return NULL; > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ > } > > > > I don't see how the current patch contributes to debugging or fixing anything > > related to this dumped stack. Can we dump swp_type() as well? > > > > > copy_process+0x1278/0x21cc > > > kernel_clone+0x90/0x438 > > > __arm64_sys_clone+0x5c/0x8c > > > invoke_syscall+0x58/0x110 > > > do_el0_svc+0x8c/0xe0 > > > el0_svc+0x38/0x9c > > > el0t_64_sync_handler+0x44/0xec > > > el0t_64_sync+0x1a8/0x1ac > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > > exception > > > SMP: stopping secondary CPUs > > > > > > The patch seems to only provide a workaround, but there are no more > > > effective software solutions to handle the bit flips problem. This > > > path will change the issue from a system crash to a process exception, > > > thereby reducing the impact on the entire machine. > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > --- > > > v1 -> v2: > > > - Add WARN_ON_ONCE. > > > - update the commit info. > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > > > --- > > > > > > mm/swapfile.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94 > > > 100644 > > > --- a/mm/swapfile.c > > > +++ b/mm/swapfile.c > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, > > unsigned char usage, int nr) > > > int err, i; > > > > > > si = swp_swap_info(entry); > > > + if (WARN_ON_ONCE(!si)) > > > > I mean, printk something related to swp_type(). This is really strange, but the > > current stack won't help with debugging. > The log can find info related to "get_swap_device: Bad swap file entry xxx" > when an entry encounters an exception. > Add a print info log like the following: > pr_err("%s%08d\n", Bad swap type, swp_type(entry)); This is really strange. It would be better to have the entire PTE value dumped so we can determine if a bit-flip occurred on critical bits like PTE_PRESENT. In that case, a present PTE could be misinterpreted as a swap entry. On arm64, /* * Encode and decode a swap entry: * bits 0-1: present (must be zero) * bits 2: remember PG_anon_exclusive * bits 3-7: swap type * bits 8-57: swap offset * bit 58: PTE_PROT_NONE (must be zero) */ #define __SWP_TYPE_SHIFT 3 #define __SWP_TYPE_BITS 5 #define __SWP_OFFSET_BITS 50 #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1) #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + __SWP_TYPE_SHIFT) #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1) _swp_type is bits3-7. For a present pte, bits 3-7 are: AP[7-6], NS[5], AttributeIndex[4-2]. > > > > > + return -EINVAL; > > > > > > offset = swp_offset(entry); > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % > > SWAPFILE_CLUSTER); > > > -- > > > 2.17.1 Thanks Barry
> > Thank you! > > On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > > > Add a NULL check on the return value of swp_swap_info in > > > > __swap_duplicate to prevent crashes caused by NULL pointer > dereference. > > > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be > > > > due to CPU cache issues or DDR bit flips. The probability of this > > > > issue is very small, and the stack info we encountered is as > > > > follows: > > > > Unable to handle kernel NULL pointer dereference at virtual > > > > address > > > > 0000000000000058 > > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > > > ESR = 0x0000000096000005 > > > > EC = 0x25: DABT (current EL), IL = 32 bits > > > > SET = 0, FnV = 0 > > > > EA = 0, S1PTW = 0 > > > > FSC = 0x05: level 1 translation fault Data abort info: > > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k > > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > > > pgd=0000000000000000, p4d=0000000000000000, > > > > pud=0000000000000000 > > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md > > > > ftrace buffer dump for: 0x1609e0 ... > > > > pc : swap_duplicate+0x44/0x164 > > > > lr : copy_page_range+0x508/0x1e78 > > > > sp : ffffffc0f2a699e0 > > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > > > x23: 00000000002d2d2f x22: 0000000000000008 x21: > 0000000000000000 > > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > > > x17: 0000000000000000 x16: 0010000000000001 x15: > 0040000000000001 > > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > > > > Call > > > > trace: > > > > swap_duplicate+0x44/0x164 > > > > copy_page_range+0x508/0x1e78 > > > > > > This is really strange since we already have a swap entry check > > > before calling swap_duplicate(). > > > > > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct > *src_mm, > > > pte_t *dst_pte, pte_t *src_pte, struct > > > vm_area_struct *dst_vma, > > > struct vm_area_struct *src_vma, unsigned long addr, > > > int > > > *rss) { > > > unsigned long vm_flags = dst_vma->vm_flags; > > > pte_t orig_pte = ptep_get(src_pte); > > > pte_t pte = orig_pte; > > > struct folio *folio; > > > struct page *page; > > > swp_entry_t entry = pte_to_swp_entry(orig_pte); > > > > > > if (likely(!non_swap_entry(entry))) { > > > if (swap_duplicate(entry) < 0) > > > return -EIO; ... > > > } > > > > > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL? > > > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > > > if (type >= MAX_SWAPFILES) > > > return NULL; > > > > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > > > > > But non_swap_entry() guarantees that swp_type is smaller than > > > MAX_SWAPFILES. > > > > > > static inline int non_swap_entry(swp_entry_t entry) { > > > return swp_type(entry) >= MAX_SWAPFILES; } > > > > > > So another possibility is that we have an overflow of swap_info[] > > > where type is < MAX_SWAPFILES but is not a valid existing swapfile? > > In the log of this issue, there is a printed entry: get_swap_device: > > Bad swap file entry 18000000002d2d2f. > > It can be calculated that swp_type(18000000002d2d2f) = 6. > > In the Android 15-linux6.6: > > system: MAX_SWAPFILES = 28, nr_swapfiles = 1. > > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but > > greater than nr_swapfiles, the value of this entry is abnormal. > > > > static unsigned int nr_swapfiles; > > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is > > a static array, with its values initialized to 0. > > The size of the array is MAX_SWAPFILES, and the size of valid values > > in the array is nr_swapfiles. Therefore, when we validate the validity > > of swp_type(entry), we should compare it with nr_swapfiles, not > MAX_SWAPFILES. > > The code for validating swp_type may need to be modified as follows: > > That might be true, but on a normal system, we only need to distinguish > between a swap entry and a migrate entry. Therefore, comparing with > MAX_SWAPFILES is sufficient. > > > static inline int non_swap_entry(swp_entry_t entry) { > > - return swp_type(entry) >= MAX_SWAPFILES; > > + return swp_type(entry) >= nr_swapfiles; > > } > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > > - if (type >= MAX_SWAPFILES) > > + if (type >= nr_swapfiles) > > return NULL; > > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > > > > > I don't see how the current patch contributes to debugging or fixing > > > anything related to this dumped stack. Can we dump swp_type() as well? > > > > > > > copy_process+0x1278/0x21cc > > > > kernel_clone+0x90/0x438 > > > > __arm64_sys_clone+0x5c/0x8c > > > > invoke_syscall+0x58/0x110 > > > > do_el0_svc+0x8c/0xe0 > > > > el0_svc+0x38/0x9c > > > > el0t_64_sync_handler+0x44/0xec > > > > el0t_64_sync+0x1a8/0x1ac > > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end > > > > trace > > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > > > exception > > > > SMP: stopping secondary CPUs > > > > > > > > The patch seems to only provide a workaround, but there are no > > > > more effective software solutions to handle the bit flips problem. > > > > This path will change the issue from a system crash to a process > > > > exception, thereby reducing the impact on the entire machine. > > > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > > --- > > > > v1 -> v2: > > > > - Add WARN_ON_ONCE. > > > > - update the commit info. > > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > > > > --- > > > > > > > > mm/swapfile.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index > > > > 7448a3876..a0bfdba94 > > > > 100644 > > > > --- a/mm/swapfile.c > > > > +++ b/mm/swapfile.c > > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t > > > > entry, > > > unsigned char usage, int nr) > > > > int err, i; > > > > > > > > si = swp_swap_info(entry); > > > > + if (WARN_ON_ONCE(!si)) > > > > > > I mean, printk something related to swp_type(). This is really > > > strange, but the current stack won't help with debugging. > > The log can find info related to "get_swap_device: Bad swap file entry xxx" > > when an entry encounters an exception. > > Add a print info log like the following: > > pr_err("%s%08d\n", Bad swap type, swp_type(entry)); > > This is really strange. It would be better to have the entire PTE value dumped so > we can determine if a bit-flip occurred on critical bits like PTE_PRESENT. Do you mean to convert the SWP entry to PTE and then print it out? pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry))); Or is it sufficient to print the SWP entry directly? pr_err("%s%08lx\n", Bad swap entry, entry.val); > > In that case, a present PTE could be misinterpreted as a swap entry. > > On arm64, > /* > * Encode and decode a swap entry: > * bits 0-1: present (must be zero) > * bits 2: remember PG_anon_exclusive > * bits 3-7: swap type > * bits 8-57: swap offset > * bit 58: PTE_PROT_NONE (must be zero) > */ > > #define __SWP_TYPE_SHIFT 3 > #define __SWP_TYPE_BITS 5 > #define __SWP_OFFSET_BITS 50 > #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1) > #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + > __SWP_TYPE_SHIFT) > #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1) > > _swp_type is bits3-7. > > For a present pte, bits 3-7 are: > AP[7-6], NS[5], AttributeIndex[4-2]. > > > > > > > > + return -EINVAL; > > > > > > > > offset = swp_offset(entry); > > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % > > > SWAPFILE_CLUSTER); > > > > -- > > > > 2.17.1 > > Thanks > Barry
On Tue, Feb 18, 2025 at 8:13 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > Thank you! > > > > On Tue, Feb 18, 2025 at 3:51 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > > > > > > On Sat, Feb 15, 2025 at 10:05 PM gaoxu <gaoxu2@honor.com> wrote: > > > > > > > > > > Add a NULL check on the return value of swp_swap_info in > > > > > __swap_duplicate to prevent crashes caused by NULL pointer > > dereference. > > > > > > > > > > The reason why swp_swap_info() returns NULL is unclear; it may be > > > > > due to CPU cache issues or DDR bit flips. The probability of this > > > > > issue is very small, and the stack info we encountered is as > > > > > follows: > > > > > Unable to handle kernel NULL pointer dereference at virtual > > > > > address > > > > > 0000000000000058 > > > > > [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: > > > > > ESR = 0x0000000096000005 > > > > > EC = 0x25: DABT (current EL), IL = 32 bits > > > > > SET = 0, FnV = 0 > > > > > EA = 0, S1PTW = 0 > > > > > FSC = 0x05: level 1 translation fault Data abort info: > > > > > ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 > > > > > CM = 0, WnR = 0, TnD = 0, TagAccess = 0 > > > > > GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k > > > > > pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] > > > > > pgd=0000000000000000, p4d=0000000000000000, > > > > > pud=0000000000000000 > > > > > Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md > > > > > ftrace buffer dump for: 0x1609e0 ... > > > > > pc : swap_duplicate+0x44/0x164 > > > > > lr : copy_page_range+0x508/0x1e78 > > > > > sp : ffffffc0f2a699e0 > > > > > x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 > > > > > x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 > > > > > x23: 00000000002d2d2f x22: 0000000000000008 x21: > > 0000000000000000 > > > > > x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 > > > > > x17: 0000000000000000 x16: 0010000000000001 x15: > > 0040000000000001 > > > > > x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff > > > > > x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 > > > > > x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 > > > > > x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 > > > > > x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f > > > > > Call > > > > > trace: > > > > > swap_duplicate+0x44/0x164 > > > > > copy_page_range+0x508/0x1e78 > > > > > > > > This is really strange since we already have a swap entry check > > > > before calling swap_duplicate(). > > > > > > > > copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct > > *src_mm, > > > > pte_t *dst_pte, pte_t *src_pte, struct > > > > vm_area_struct *dst_vma, > > > > struct vm_area_struct *src_vma, unsigned long addr, > > > > int > > > > *rss) { > > > > unsigned long vm_flags = dst_vma->vm_flags; > > > > pte_t orig_pte = ptep_get(src_pte); > > > > pte_t pte = orig_pte; > > > > struct folio *folio; > > > > struct page *page; > > > > swp_entry_t entry = pte_to_swp_entry(orig_pte); > > > > > > > > if (likely(!non_swap_entry(entry))) { > > > > if (swap_duplicate(entry) < 0) > > > > return -EIO; ... > > > > } > > > > > > > > likely the swap_type is larger than MAX_SWAPFILES so we get a NULL? > > > > > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > > > > if (type >= MAX_SWAPFILES) > > > > return NULL; > > > > > > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > > > > > > > But non_swap_entry() guarantees that swp_type is smaller than > > > > MAX_SWAPFILES. > > > > > > > > static inline int non_swap_entry(swp_entry_t entry) { > > > > return swp_type(entry) >= MAX_SWAPFILES; } > > > > > > > > So another possibility is that we have an overflow of swap_info[] > > > > where type is < MAX_SWAPFILES but is not a valid existing swapfile? > > > In the log of this issue, there is a printed entry: get_swap_device: > > > Bad swap file entry 18000000002d2d2f. > > > It can be calculated that swp_type(18000000002d2d2f) = 6. > > > In the Android 15-linux6.6: > > > system: MAX_SWAPFILES = 28, nr_swapfiles = 1. > > > Since swp_type(18000000002d2d2f)=6 is less than MAX_SWAPFILES but > > > greater than nr_swapfiles, the value of this entry is abnormal. > > > > > > static unsigned int nr_swapfiles; > > > static struct swap_info_struct *swap_info[MAX_SWAPFILES]; swap_info is > > > a static array, with its values initialized to 0. > > > The size of the array is MAX_SWAPFILES, and the size of valid values > > > in the array is nr_swapfiles. Therefore, when we validate the validity > > > of swp_type(entry), we should compare it with nr_swapfiles, not > > MAX_SWAPFILES. > > > The code for validating swp_type may need to be modified as follows: > > > > That might be true, but on a normal system, we only need to distinguish > > between a swap entry and a migrate entry. Therefore, comparing with > > MAX_SWAPFILES is sufficient. > > > > > static inline int non_swap_entry(swp_entry_t entry) { > > > - return swp_type(entry) >= MAX_SWAPFILES; > > > + return swp_type(entry) >= nr_swapfiles; > > > } > > > > > > static struct swap_info_struct *swap_type_to_swap_info(int type) { > > > - if (type >= MAX_SWAPFILES) > > > + if (type >= nr_swapfiles) > > > return NULL; > > > > > > return READ_ONCE(swap_info[type]); /* rcu_dereference() */ } > > > > > > > > I don't see how the current patch contributes to debugging or fixing > > > > anything related to this dumped stack. Can we dump swp_type() as well? > > > > > > > > > copy_process+0x1278/0x21cc > > > > > kernel_clone+0x90/0x438 > > > > > __arm64_sys_clone+0x5c/0x8c > > > > > invoke_syscall+0x58/0x110 > > > > > do_el0_svc+0x8c/0xe0 > > > > > el0_svc+0x38/0x9c > > > > > el0t_64_sync_handler+0x44/0xec > > > > > el0t_64_sync+0x1a8/0x1ac > > > > > Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end > > > > > trace > > > > > 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal > > > > > exception > > > > > SMP: stopping secondary CPUs > > > > > > > > > > The patch seems to only provide a workaround, but there are no > > > > > more effective software solutions to handle the bit flips problem. > > > > > This path will change the issue from a system crash to a process > > > > > exception, thereby reducing the impact on the entire machine. > > > > > > > > > > Signed-off-by: gao xu <gaoxu2@honor.com> > > > > > --- > > > > > v1 -> v2: > > > > > - Add WARN_ON_ONCE. > > > > > - update the commit info. > > > > > v2 -> v3: Delete the review tags (This is my issue, and I apologize). > > > > > --- > > > > > > > > > > mm/swapfile.c | 2 ++ > > > > > 1 file changed, 2 insertions(+) > > > > > > > > > > diff --git a/mm/swapfile.c b/mm/swapfile.c index > > > > > 7448a3876..a0bfdba94 > > > > > 100644 > > > > > --- a/mm/swapfile.c > > > > > +++ b/mm/swapfile.c > > > > > @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t > > > > > entry, > > > > unsigned char usage, int nr) > > > > > int err, i; > > > > > > > > > > si = swp_swap_info(entry); > > > > > + if (WARN_ON_ONCE(!si)) > > > > > > > > I mean, printk something related to swp_type(). This is really > > > > strange, but the current stack won't help with debugging. > > > The log can find info related to "get_swap_device: Bad swap file entry xxx" > > > when an entry encounters an exception. > > > Add a print info log like the following: > > > pr_err("%s%08d\n", Bad swap type, swp_type(entry)); > > > > This is really strange. It would be better to have the entire PTE value dumped so > > we can determine if a bit-flip occurred on critical bits like PTE_PRESENT. > Do you mean to convert the SWP entry to PTE and then print it out? > pr_err("%s%08lx\n", Bad pte, pte_val(swp_entry_to_pte(entry))); > > Or is it sufficient to print the SWP entry directly? > pr_err("%s%08lx\n", Bad swap entry, entry.val); Yes, I think so. With that, we can convert it to PTE offline and debug using that value. By the way, I don’t have a strong opinion on whether this patch gets merged or not, but it’s still nice to have. :) I’m more interested in the bug itself and curious whether other Android products using the same kernel will encounter the same issue. > > > > In that case, a present PTE could be misinterpreted as a swap entry. > > > > On arm64, > > /* > > * Encode and decode a swap entry: > > * bits 0-1: present (must be zero) > > * bits 2: remember PG_anon_exclusive > > * bits 3-7: swap type > > * bits 8-57: swap offset > > * bit 58: PTE_PROT_NONE (must be zero) > > */ > > > > #define __SWP_TYPE_SHIFT 3 > > #define __SWP_TYPE_BITS 5 > > #define __SWP_OFFSET_BITS 50 > > #define __SWP_TYPE_MASK ((1 << __SWP_TYPE_BITS) - 1) > > #define __SWP_OFFSET_SHIFT (__SWP_TYPE_BITS + > > __SWP_TYPE_SHIFT) > > #define __SWP_OFFSET_MASK ((1UL << __SWP_OFFSET_BITS) - 1) > > > > _swp_type is bits3-7. > > > > For a present pte, bits 3-7 are: > > AP[7-6], NS[5], AttributeIndex[4-2]. > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > offset = swp_offset(entry); > > > > > VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % > > > > SWAPFILE_CLUSTER); > > > > > -- > > > > > 2.17.1 > > Thanks Barry
diff --git a/mm/swapfile.c b/mm/swapfile.c index 7448a3876..a0bfdba94 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3521,6 +3521,8 @@ static int __swap_duplicate(swp_entry_t entry, unsigned char usage, int nr) int err, i; si = swp_swap_info(entry); + if (WARN_ON_ONCE(!si)) + return -EINVAL; offset = swp_offset(entry); VM_WARN_ON(nr > SWAPFILE_CLUSTER - offset % SWAPFILE_CLUSTER);
Add a NULL check on the return value of swp_swap_info in __swap_duplicate to prevent crashes caused by NULL pointer dereference. The reason why swp_swap_info() returns NULL is unclear; it may be due to CPU cache issues or DDR bit flips. The probability of this issue is very small, and the stack info we encountered is as follows: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000058 [RB/E]rb_sreason_str_set: sreason_str set null_pointer Mem abort info: ESR = 0x0000000096000005 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x05: level 1 translation fault Data abort info: ISV = 0, ISS = 0x00000005, ISS2 = 0x00000000 CM = 0, WnR = 0, TnD = 0, TagAccess = 0 GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 user pgtable: 4k pages, 39-bit VAs, pgdp=00000008a80e5000 [0000000000000058] pgd=0000000000000000, p4d=0000000000000000, pud=0000000000000000 Internal error: Oops: 0000000096000005 [#1] PREEMPT SMP Skip md ftrace buffer dump for: 0x1609e0 ... pc : swap_duplicate+0x44/0x164 lr : copy_page_range+0x508/0x1e78 sp : ffffffc0f2a699e0 x29: ffffffc0f2a699e0 x28: ffffff8a5b28d388 x27: ffffff8b06603388 x26: ffffffdf7291fe70 x25: 0000000000000006 x24: 0000000000100073 x23: 00000000002d2d2f x22: 0000000000000008 x21: 0000000000000000 x20: 00000000002d2d2f x19: 18000000002d2d2f x18: ffffffdf726faec0 x17: 0000000000000000 x16: 0010000000000001 x15: 0040000000000001 x14: 0400000000000001 x13: ff7ffffffffffb7f x12: ffeffffffffffbff x11: ffffff8a5c7e1898 x10: 0000000000000018 x9 : 0000000000000006 x8 : 1800000000000000 x7 : 0000000000000000 x6 : ffffff8057c01f10 x5 : 000000000000a318 x4 : 0000000000000000 x3 : 0000000000000000 x2 : 0000006daf200000 x1 : 0000000000000001 x0 : 18000000002d2d2f Call trace: swap_duplicate+0x44/0x164 copy_page_range+0x508/0x1e78 copy_process+0x1278/0x21cc kernel_clone+0x90/0x438 __arm64_sys_clone+0x5c/0x8c invoke_syscall+0x58/0x110 do_el0_svc+0x8c/0xe0 el0_svc+0x38/0x9c el0t_64_sync_handler+0x44/0xec el0t_64_sync+0x1a8/0x1ac Code: 9139c35a 71006f3f 54000568 f8797b55 (f9402ea8) ---[ end trace 0000000000000000 ]--- Kernel panic - not syncing: Oops: Fatal exception SMP: stopping secondary CPUs The patch seems to only provide a workaround, but there are no more effective software solutions to handle the bit flips problem. This path will change the issue from a system crash to a process exception, thereby reducing the impact on the entire machine. Signed-off-by: gao xu <gaoxu2@honor.com> --- v1 -> v2: - Add WARN_ON_ONCE. - update the commit info. v2 -> v3: Delete the review tags (This is my issue, and I apologize). --- mm/swapfile.c | 2 ++ 1 file changed, 2 insertions(+)