Message ID | 20250212064002.55598-1-zhengqi.arch@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | arm: pgtable: fix NULL pointer dereference issue | expand |
Hi Qi, Thanks for the fix. I will test it as well as I can. On Wed, Feb 12, 2025 at 7:41 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: > > When update_mmu_cache_range() is called by update_mmu_cache(), the vmf > parameter is NULL, which will cause a NULL pointer dereference issue in > adjust_pte(): > > Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read > Hardware name: Atmel AT91SAM9 > PC is at update_mmu_cache_range+0x1e0/0x278 > LR is at pte_offset_map_rw_nolock+0x18/0x2c > Call trace: > update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec > remove_migration_pte from rmap_walk_file+0xcc/0x130 > rmap_walk_file from remove_migration_ptes+0x90/0xa4 > remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 > migrate_pages_batch from migrate_pages+0x188/0x488 > migrate_pages from compact_zone+0x56c/0x954 > compact_zone from compact_node+0x90/0xf0 > compact_node from kcompactd+0x1d4/0x204 > kcompactd from kthread+0x120/0x12c > kthread from ret_from_fork+0x14/0x38 > Exception stack(0xc0d8bfb0 to 0xc0d8bff8) > > To fix it, do not rely on whether 'ptl' is equal to decide whether to hold > the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is > enabled. In addition, if two vmas map to the same PTE page, there is no > need to hold the pte lock again, otherwise a deadlock will occur. Just add > the need_lock parameter to let adjust_pte() know this information. > > Reported-by: Ezra Buehler <ezra@easyb.ch> Perhaps a detail but, maybe better use "Ezra Buehler <ezra.buehler@husqvarnagroup.com>" here. Cheers, Ezra.
Hi Ezra, On 2025/2/12 15:27, Ezra Buehler wrote: > Hi Qi, > > Thanks for the fix. I will test it as well as I can. Thanks! > > On Wed, Feb 12, 2025 at 7:41 AM Qi Zheng <zhengqi.arch@bytedance.com> wrote: >> >> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf >> parameter is NULL, which will cause a NULL pointer dereference issue in >> adjust_pte(): >> >> Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read >> Hardware name: Atmel AT91SAM9 >> PC is at update_mmu_cache_range+0x1e0/0x278 >> LR is at pte_offset_map_rw_nolock+0x18/0x2c >> Call trace: >> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec >> remove_migration_pte from rmap_walk_file+0xcc/0x130 >> rmap_walk_file from remove_migration_ptes+0x90/0xa4 >> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 >> migrate_pages_batch from migrate_pages+0x188/0x488 >> migrate_pages from compact_zone+0x56c/0x954 >> compact_zone from compact_node+0x90/0xf0 >> compact_node from kcompactd+0x1d4/0x204 >> kcompactd from kthread+0x120/0x12c >> kthread from ret_from_fork+0x14/0x38 >> Exception stack(0xc0d8bfb0 to 0xc0d8bff8) >> >> To fix it, do not rely on whether 'ptl' is equal to decide whether to hold >> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is >> enabled. In addition, if two vmas map to the same PTE page, there is no >> need to hold the pte lock again, otherwise a deadlock will occur. Just add >> the need_lock parameter to let adjust_pte() know this information. >> >> Reported-by: Ezra Buehler <ezra@easyb.ch> > > Perhaps a detail but, maybe better use "Ezra Buehler > <ezra.buehler@husqvarnagroup.com>" here. Got it. Will wait for your test results first. Thanks, Qi > > Cheers, > Ezra.
On 12.02.25 07:40, Qi Zheng wrote: > When update_mmu_cache_range() is called by update_mmu_cache(), the vmf > parameter is NULL, which will cause a NULL pointer dereference issue in > adjust_pte(): > > Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read > Hardware name: Atmel AT91SAM9 > PC is at update_mmu_cache_range+0x1e0/0x278 > LR is at pte_offset_map_rw_nolock+0x18/0x2c > Call trace: > update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec > remove_migration_pte from rmap_walk_file+0xcc/0x130 > rmap_walk_file from remove_migration_ptes+0x90/0xa4 > remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 > migrate_pages_batch from migrate_pages+0x188/0x488 > migrate_pages from compact_zone+0x56c/0x954 > compact_zone from compact_node+0x90/0xf0 > compact_node from kcompactd+0x1d4/0x204 > kcompactd from kthread+0x120/0x12c > kthread from ret_from_fork+0x14/0x38 > Exception stack(0xc0d8bfb0 to 0xc0d8bff8) > > To fix it, do not rely on whether 'ptl' is equal to decide whether to hold > the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is > enabled. In addition, if two vmas map to the same PTE page, there is no > need to hold the pte lock again, otherwise a deadlock will occur. Just add > the need_lock parameter to let adjust_pte() know this information. > > Reported-by: Ezra Buehler <ezra@easyb.ch> > Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ > Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") > Cc: stable@vger.kernel.org > Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> > --- > arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------ > 1 file changed, 28 insertions(+), 12 deletions(-) > > diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c > index 2bec87c3327d2..3627bf0957c75 100644 > --- a/arch/arm/mm/fault-armv.c > +++ b/arch/arm/mm/fault-armv.c > @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address, > } > > static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > - unsigned long pfn, struct vm_fault *vmf) > + unsigned long pfn, bool need_lock) > { > spinlock_t *ptl; > pgd_t *pgd; > @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > if (!pte) > return 0; > > - /* > - * If we are using split PTE locks, then we need to take the page > - * lock here. Otherwise we are using shared mm->page_table_lock > - * which is already locked, thus cannot take it. > - */ > - if (ptl != vmf->ptl) { > + if (need_lock) { > + /* > + * Use nested version here to indicate that we are already > + * holding one similar spinlock. > + */ > spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); > if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { > pte_unmap_unlock(pte, ptl); > @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > > ret = do_adjust_pte(vma, address, pfn, pte); > > - if (ptl != vmf->ptl) > + if (need_lock) > spin_unlock(ptl); > pte_unmap(pte); > > @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, > > static void > make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > - unsigned long addr, pte_t *ptep, unsigned long pfn, > - struct vm_fault *vmf) > + unsigned long addr, pte_t *ptep, unsigned long pfn) > { > struct mm_struct *mm = vma->vm_mm; > struct vm_area_struct *mpnt; > unsigned long offset; > + unsigned long start; > pgoff_t pgoff; > int aliases = 0; > > pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); > + start = ALIGN_DOWN(addr, PMD_SIZE); I assume you can come up with a better name than "start" :) aligned_addr ... pmd_start_addr ... Maybe simply pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); pmd_end_addr = addr + PMD_SIZE; Then the comparison below also becomes easier to read. > > /* > * If we have any shared mappings that are in the same mm > @@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > */ > flush_dcache_mmap_lock(mapping); > vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { > + unsigned long mpnt_addr; > + /* > + * If we are using split PTE locks, then we need to take the pte > + * lock. Otherwise we are using shared mm->page_table_lock which > + * is already locked, thus cannot take it. > + */ > + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); Nit: move "unsigned long mpnt_addr;" below this longer variable+init. > + > /* > * If this VMA is not in our MM, we can ignore it. > * Note that we intentionally mask out the VMA > @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, > if (!(mpnt->vm_flags & VM_MAYSHARE)) > continue; > offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; > - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); > + mpnt_addr = mpnt->vm_start + offset; > + /* > + * If mpnt_addr and addr are mapped to the same PTE page, there > + * is no need to hold the pte lock again, otherwise a deadlock > + * will occur. /* * Avoid deadlocks by not grabbing the PTE lock if we already hold the * PTE lock of this PTE table in the caller. */ ? > + */ > + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE) > + need_lock = false;
On 2025/2/12 16:20, David Hildenbrand wrote: > On 12.02.25 07:40, Qi Zheng wrote: >> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf >> parameter is NULL, which will cause a NULL pointer dereference issue in >> adjust_pte(): >> >> Unable to handle kernel NULL pointer dereference at virtual address >> 00000030 when read >> Hardware name: Atmel AT91SAM9 >> PC is at update_mmu_cache_range+0x1e0/0x278 >> LR is at pte_offset_map_rw_nolock+0x18/0x2c >> Call trace: >> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec >> remove_migration_pte from rmap_walk_file+0xcc/0x130 >> rmap_walk_file from remove_migration_ptes+0x90/0xa4 >> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 >> migrate_pages_batch from migrate_pages+0x188/0x488 >> migrate_pages from compact_zone+0x56c/0x954 >> compact_zone from compact_node+0x90/0xf0 >> compact_node from kcompactd+0x1d4/0x204 >> kcompactd from kthread+0x120/0x12c >> kthread from ret_from_fork+0x14/0x38 >> Exception stack(0xc0d8bfb0 to 0xc0d8bff8) >> >> To fix it, do not rely on whether 'ptl' is equal to decide whether to >> hold >> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is >> enabled. In addition, if two vmas map to the same PTE page, there is no >> need to hold the pte lock again, otherwise a deadlock will occur. Just >> add >> the need_lock parameter to let adjust_pte() know this information. >> >> Reported-by: Ezra Buehler <ezra@easyb.ch> >> Closes: >> https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ >> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") >> Cc: stable@vger.kernel.org >> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >> --- >> arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------ >> 1 file changed, 28 insertions(+), 12 deletions(-) >> >> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c >> index 2bec87c3327d2..3627bf0957c75 100644 >> --- a/arch/arm/mm/fault-armv.c >> +++ b/arch/arm/mm/fault-armv.c >> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> } >> static int adjust_pte(struct vm_area_struct *vma, unsigned long >> address, >> - unsigned long pfn, struct vm_fault *vmf) >> + unsigned long pfn, bool need_lock) >> { >> spinlock_t *ptl; >> pgd_t *pgd; >> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> if (!pte) >> return 0; >> - /* >> - * If we are using split PTE locks, then we need to take the page >> - * lock here. Otherwise we are using shared mm->page_table_lock >> - * which is already locked, thus cannot take it. >> - */ >> - if (ptl != vmf->ptl) { >> + if (need_lock) { >> + /* >> + * Use nested version here to indicate that we are already >> + * holding one similar spinlock. >> + */ >> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { >> pte_unmap_unlock(pte, ptl); >> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, >> unsigned long address, >> ret = do_adjust_pte(vma, address, pfn, pte); >> - if (ptl != vmf->ptl) >> + if (need_lock) >> spin_unlock(ptl); >> pte_unmap(pte); >> @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct >> *vma, unsigned long address, >> static void >> make_coherent(struct address_space *mapping, struct vm_area_struct >> *vma, >> - unsigned long addr, pte_t *ptep, unsigned long pfn, >> - struct vm_fault *vmf) >> + unsigned long addr, pte_t *ptep, unsigned long pfn) >> { >> struct mm_struct *mm = vma->vm_mm; >> struct vm_area_struct *mpnt; >> unsigned long offset; >> + unsigned long start; >> pgoff_t pgoff; >> int aliases = 0; >> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); >> + start = ALIGN_DOWN(addr, PMD_SIZE); > > I assume you can come up with a better name than "start" :) > > aligned_addr ... pmd_start_addr ... > > Maybe simply > > pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); > pmd_end_addr = addr + PMD_SIZE; you mean: pmd_end_addr = pmd_start_addr + PMD_SIZE; Right? > > Then the comparison below also becomes easier to read. > >> /* >> * If we have any shared mappings that are in the same mm >> @@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, >> struct vm_area_struct *vma, >> */ >> flush_dcache_mmap_lock(mapping); >> vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { >> + unsigned long mpnt_addr; >> + /* >> + * If we are using split PTE locks, then we need to take the pte >> + * lock. Otherwise we are using shared mm->page_table_lock which >> + * is already locked, thus cannot take it. >> + */ >> + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); > > Nit: move "unsigned long mpnt_addr;" below this longer variable+init. OK, will do. > >> + >> /* >> * If this VMA is not in our MM, we can ignore it. >> * Note that we intentionally mask out the VMA >> @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, >> struct vm_area_struct *vma, >> if (!(mpnt->vm_flags & VM_MAYSHARE)) >> continue; >> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; >> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); >> + mpnt_addr = mpnt->vm_start + offset; >> + /* >> + * If mpnt_addr and addr are mapped to the same PTE page, there >> + * is no need to hold the pte lock again, otherwise a deadlock >> + * will occur. > > /* > * Avoid deadlocks by not grabbing the PTE lock if we already hold the > * PTE lock of this PTE table in the caller. > */ Maybe just: /* Avoid deadlocks by not grabbing the same PTE lock again. */ Thanks, Qi > > ? > >> + */ >> + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE) >> + need_lock = false; > > >
On 12.02.25 09:28, Qi Zheng wrote: > > > On 2025/2/12 16:20, David Hildenbrand wrote: >> On 12.02.25 07:40, Qi Zheng wrote: >>> When update_mmu_cache_range() is called by update_mmu_cache(), the vmf >>> parameter is NULL, which will cause a NULL pointer dereference issue in >>> adjust_pte(): >>> >>> Unable to handle kernel NULL pointer dereference at virtual address >>> 00000030 when read >>> Hardware name: Atmel AT91SAM9 >>> PC is at update_mmu_cache_range+0x1e0/0x278 >>> LR is at pte_offset_map_rw_nolock+0x18/0x2c >>> Call trace: >>> update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec >>> remove_migration_pte from rmap_walk_file+0xcc/0x130 >>> rmap_walk_file from remove_migration_ptes+0x90/0xa4 >>> remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 >>> migrate_pages_batch from migrate_pages+0x188/0x488 >>> migrate_pages from compact_zone+0x56c/0x954 >>> compact_zone from compact_node+0x90/0xf0 >>> compact_node from kcompactd+0x1d4/0x204 >>> kcompactd from kthread+0x120/0x12c >>> kthread from ret_from_fork+0x14/0x38 >>> Exception stack(0xc0d8bfb0 to 0xc0d8bff8) >>> >>> To fix it, do not rely on whether 'ptl' is equal to decide whether to >>> hold >>> the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is >>> enabled. In addition, if two vmas map to the same PTE page, there is no >>> need to hold the pte lock again, otherwise a deadlock will occur. Just >>> add >>> the need_lock parameter to let adjust_pte() know this information. >>> >>> Reported-by: Ezra Buehler <ezra@easyb.ch> >>> Closes: >>> https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ >>> Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") >>> Cc: stable@vger.kernel.org >>> Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> >>> --- >>> arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------ >>> 1 file changed, 28 insertions(+), 12 deletions(-) >>> >>> diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c >>> index 2bec87c3327d2..3627bf0957c75 100644 >>> --- a/arch/arm/mm/fault-armv.c >>> +++ b/arch/arm/mm/fault-armv.c >>> @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, >>> unsigned long address, >>> } >>> static int adjust_pte(struct vm_area_struct *vma, unsigned long >>> address, >>> - unsigned long pfn, struct vm_fault *vmf) >>> + unsigned long pfn, bool need_lock) >>> { >>> spinlock_t *ptl; >>> pgd_t *pgd; >>> @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, >>> unsigned long address, >>> if (!pte) >>> return 0; >>> - /* >>> - * If we are using split PTE locks, then we need to take the page >>> - * lock here. Otherwise we are using shared mm->page_table_lock >>> - * which is already locked, thus cannot take it. >>> - */ >>> - if (ptl != vmf->ptl) { >>> + if (need_lock) { >>> + /* >>> + * Use nested version here to indicate that we are already >>> + * holding one similar spinlock. >>> + */ >>> spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); >>> if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { >>> pte_unmap_unlock(pte, ptl); >>> @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, >>> unsigned long address, >>> ret = do_adjust_pte(vma, address, pfn, pte); >>> - if (ptl != vmf->ptl) >>> + if (need_lock) >>> spin_unlock(ptl); >>> pte_unmap(pte); >>> @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct >>> *vma, unsigned long address, >>> static void >>> make_coherent(struct address_space *mapping, struct vm_area_struct >>> *vma, >>> - unsigned long addr, pte_t *ptep, unsigned long pfn, >>> - struct vm_fault *vmf) >>> + unsigned long addr, pte_t *ptep, unsigned long pfn) >>> { >>> struct mm_struct *mm = vma->vm_mm; >>> struct vm_area_struct *mpnt; >>> unsigned long offset; >>> + unsigned long start; >>> pgoff_t pgoff; >>> int aliases = 0; >>> pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); >>> + start = ALIGN_DOWN(addr, PMD_SIZE); >> >> I assume you can come up with a better name than "start" :) >> >> aligned_addr ... pmd_start_addr ... >> >> Maybe simply >> >> pmd_start_addr = ALIGN_DOWN(addr, PMD_SIZE); >> pmd_end_addr = addr + PMD_SIZE; > > you mean: > > pmd_end_addr = pmd_start_addr + PMD_SIZE; > > Right? Yes :) >> >>> + >>> /* >>> * If this VMA is not in our MM, we can ignore it. >>> * Note that we intentionally mask out the VMA >>> @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, >>> struct vm_area_struct *vma, >>> if (!(mpnt->vm_flags & VM_MAYSHARE)) >>> continue; >>> offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; >>> - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); >>> + mpnt_addr = mpnt->vm_start + offset; >>> + /* >>> + * If mpnt_addr and addr are mapped to the same PTE page, there >>> + * is no need to hold the pte lock again, otherwise a deadlock >>> + * will occur. >> >> /* >> * Avoid deadlocks by not grabbing the PTE lock if we already hold the >> * PTE lock of this PTE table in the caller. >> */ > > Maybe just: > > /* Avoid deadlocks by not grabbing the same PTE lock again. */ > Agreed.
diff --git a/arch/arm/mm/fault-armv.c b/arch/arm/mm/fault-armv.c index 2bec87c3327d2..3627bf0957c75 100644 --- a/arch/arm/mm/fault-armv.c +++ b/arch/arm/mm/fault-armv.c @@ -62,7 +62,7 @@ static int do_adjust_pte(struct vm_area_struct *vma, unsigned long address, } static int adjust_pte(struct vm_area_struct *vma, unsigned long address, - unsigned long pfn, struct vm_fault *vmf) + unsigned long pfn, bool need_lock) { spinlock_t *ptl; pgd_t *pgd; @@ -99,12 +99,11 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, if (!pte) return 0; - /* - * If we are using split PTE locks, then we need to take the page - * lock here. Otherwise we are using shared mm->page_table_lock - * which is already locked, thus cannot take it. - */ - if (ptl != vmf->ptl) { + if (need_lock) { + /* + * Use nested version here to indicate that we are already + * holding one similar spinlock. + */ spin_lock_nested(ptl, SINGLE_DEPTH_NESTING); if (unlikely(!pmd_same(pmdval, pmdp_get_lockless(pmd)))) { pte_unmap_unlock(pte, ptl); @@ -114,7 +113,7 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, ret = do_adjust_pte(vma, address, pfn, pte); - if (ptl != vmf->ptl) + if (need_lock) spin_unlock(ptl); pte_unmap(pte); @@ -123,16 +122,17 @@ static int adjust_pte(struct vm_area_struct *vma, unsigned long address, static void make_coherent(struct address_space *mapping, struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep, unsigned long pfn, - struct vm_fault *vmf) + unsigned long addr, pte_t *ptep, unsigned long pfn) { struct mm_struct *mm = vma->vm_mm; struct vm_area_struct *mpnt; unsigned long offset; + unsigned long start; pgoff_t pgoff; int aliases = 0; pgoff = vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT); + start = ALIGN_DOWN(addr, PMD_SIZE); /* * If we have any shared mappings that are in the same mm @@ -141,6 +141,14 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, */ flush_dcache_mmap_lock(mapping); vma_interval_tree_foreach(mpnt, &mapping->i_mmap, pgoff, pgoff) { + unsigned long mpnt_addr; + /* + * If we are using split PTE locks, then we need to take the pte + * lock. Otherwise we are using shared mm->page_table_lock which + * is already locked, thus cannot take it. + */ + bool need_lock = IS_ENABLED(CONFIG_SPLIT_PTE_PTLOCKS); + /* * If this VMA is not in our MM, we can ignore it. * Note that we intentionally mask out the VMA @@ -151,7 +159,15 @@ make_coherent(struct address_space *mapping, struct vm_area_struct *vma, if (!(mpnt->vm_flags & VM_MAYSHARE)) continue; offset = (pgoff - mpnt->vm_pgoff) << PAGE_SHIFT; - aliases += adjust_pte(mpnt, mpnt->vm_start + offset, pfn, vmf); + mpnt_addr = mpnt->vm_start + offset; + /* + * If mpnt_addr and addr are mapped to the same PTE page, there + * is no need to hold the pte lock again, otherwise a deadlock + * will occur. + */ + if (mpnt_addr >= start && mpnt_addr - start < PMD_SIZE) + need_lock = false; + aliases += adjust_pte(mpnt, mpnt_addr, pfn, need_lock); } flush_dcache_mmap_unlock(mapping); if (aliases) @@ -194,7 +210,7 @@ void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, __flush_dcache_folio(mapping, folio); if (mapping) { if (cache_is_vivt()) - make_coherent(mapping, vma, addr, ptep, pfn, vmf); + make_coherent(mapping, vma, addr, ptep, pfn); else if (vma->vm_flags & VM_EXEC) __flush_icache_all(); }
When update_mmu_cache_range() is called by update_mmu_cache(), the vmf parameter is NULL, which will cause a NULL pointer dereference issue in adjust_pte(): Unable to handle kernel NULL pointer dereference at virtual address 00000030 when read Hardware name: Atmel AT91SAM9 PC is at update_mmu_cache_range+0x1e0/0x278 LR is at pte_offset_map_rw_nolock+0x18/0x2c Call trace: update_mmu_cache_range from remove_migration_pte+0x29c/0x2ec remove_migration_pte from rmap_walk_file+0xcc/0x130 rmap_walk_file from remove_migration_ptes+0x90/0xa4 remove_migration_ptes from migrate_pages_batch+0x6d4/0x858 migrate_pages_batch from migrate_pages+0x188/0x488 migrate_pages from compact_zone+0x56c/0x954 compact_zone from compact_node+0x90/0xf0 compact_node from kcompactd+0x1d4/0x204 kcompactd from kthread+0x120/0x12c kthread from ret_from_fork+0x14/0x38 Exception stack(0xc0d8bfb0 to 0xc0d8bff8) To fix it, do not rely on whether 'ptl' is equal to decide whether to hold the pte lock, but decide it by whether CONFIG_SPLIT_PTE_PTLOCKS is enabled. In addition, if two vmas map to the same PTE page, there is no need to hold the pte lock again, otherwise a deadlock will occur. Just add the need_lock parameter to let adjust_pte() know this information. Reported-by: Ezra Buehler <ezra@easyb.ch> Closes: https://lore.kernel.org/lkml/CAM1KZSmZ2T_riHvay+7cKEFxoPgeVpHkVFTzVVEQ1BO0cLkHEQ@mail.gmail.com/ Fixes: fc9c45b71f43 ("arm: adjust_pte() use pte_offset_map_rw_nolock()") Cc: stable@vger.kernel.org Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> --- arch/arm/mm/fault-armv.c | 40 ++++++++++++++++++++++++++++------------ 1 file changed, 28 insertions(+), 12 deletions(-)