mbox series

[v5,0/6] Fix some bugs related to ramp and dax

Message ID 20220318074529.5261-1-songmuchun@bytedance.com (mailing list archive)
Headers show
Series Fix some bugs related to ramp and dax | expand

Message

Muchun Song March 18, 2022, 7:45 a.m. UTC
This series is based on next-20220225.

Patch 1-2 fix a cache flush bug, because subsequent patches depend on
those on those changes, there are placed in this series.  Patch 3-4
are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
since the previous patch remove the usage of follow_invalidate_pte().

v5:
- Collect Reviewed-by from Dan Williams.
- Fix panic reported by kernel test robot <oliver.sang@intel.com>.
- Remove pmdpp parameter from follow_invalidate_pte() and fold it into follow_pte().

v4:
- Fix compilation error on riscv.

v3:
- Based on next-20220225.

v2:
- Avoid the overly long line in lots of places suggested by Christoph.
- Fix a compiler warning reported by kernel test robot since pmd_pfn()
  is not defined when !CONFIG_TRANSPARENT_HUGEPAGE on powerpc architecture.
- Split a new patch 4 for preparation of fixing the dax bug.

Muchun Song (6):
  mm: rmap: fix cache flush on THP pages
  dax: fix cache flush on PMD-mapped pages
  mm: rmap: introduce pfn_mkclean_range() to cleans PTEs
  mm: pvmw: add support for walking devmap pages
  dax: fix missing writeprotect the pte entry
  mm: simplify follow_invalidate_pte()

 fs/dax.c             | 82 +++++-----------------------------------------------
 include/linux/mm.h   |  3 --
 include/linux/rmap.h |  3 ++
 mm/internal.h        | 26 +++++++++++------
 mm/memory.c          | 81 +++++++++++++++------------------------------------
 mm/page_vma_mapped.c | 16 +++++-----
 mm/rmap.c            | 68 +++++++++++++++++++++++++++++++++++--------
 7 files changed, 114 insertions(+), 165 deletions(-)

Comments

Qian Cai March 31, 2022, 3:55 p.m. UTC | #1
On Fri, Mar 18, 2022 at 03:45:23PM +0800, Muchun Song wrote:
> This series is based on next-20220225.
> 
> Patch 1-2 fix a cache flush bug, because subsequent patches depend on
> those on those changes, there are placed in this series.  Patch 3-4
> are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
> since the previous patch remove the usage of follow_invalidate_pte().

Reverting this series fixed boot crashes.

 KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
   FSC = 0x04: level 0 translation fault
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 [dfff800000000003] address between user and kernel address ranges
 Internal error: Oops: 96000004 [#1] PREEMPT SMP
 Modules linked in: cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau crct10dif_ce drm_ttm_helper mlx5_core ttm drm_dp_helper drm_kms_helper nvme mpt3sas nvme_core xhci_pci raid_class drm xhci_pci_renesas
 CPU: 3 PID: 1707 Comm: systemd-udevd Not tainted 5.17.0-next-20220331-00004-g2d550916a6b9 #51
 pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : __lock_acquire
 lr : lock_acquire.part.0
 sp : ffff800030a16fd0
 x29: ffff800030a16fd0 x28: ffffdd876c4e9f90 x27: 0000000000000018
 x26: 0000000000000000 x25: 0000000000000018 x24: 0000000000000000
 x23: ffff08022beacf00 x22: ffffdd8772507660 x21: 0000000000000000
 x20: 0000000000000000 x19: 0000000000000000 x18: ffffdd8772417d2c
 x17: ffffdd876c5bc2e0 x16: 1fffe100457d5b06 x15: 0000000000000094
 x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff08022beacf08
 x11: 1ffffbb0ee482fa5 x10: ffffdd8772417d28 x9 : 0000000000000000
 x8 : 0000000000000003 x7 : ffffdd876c4e9f90 x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
 x2 : 0000000000000000 x1 : 0000000000000003 x0 : dfff800000000000
 Call trace:
  __lock_acquire
  lock_acquire.part.0
  lock_acquire
  _raw_spin_lock
  page_vma_mapped_walk
  try_to_migrate_one
  rmap_walk_anon
  try_to_migrate
  __unmap_and_move
  unmap_and_move
  migrate_pages
  migrate_misplaced_page
  do_huge_pmd_numa_page
  __handle_mm_fault
  handle_mm_fault
  do_translation_fault
  do_mem_abort
  el0_da
  el0t_64_sync_handler
  el0t_64_sync
 Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)
 ---[ end trace 0000000000000000 ]---
 Kernel panic - not syncing: Oops: Fatal exception
 SMP: stopping secondary CPUs
 Kernel Offset: 0x5d8763da0000 from 0xffff800008000000
 PHYS_OFFSET: 0x80000000
 CPU features: 0x000,00085c0d,19801c82
 Memory Limit: none
 ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
> 
> v5:
> - Collect Reviewed-by from Dan Williams.
> - Fix panic reported by kernel test robot <oliver.sang@intel.com>.
> - Remove pmdpp parameter from follow_invalidate_pte() and fold it into follow_pte().
> 
> v4:
> - Fix compilation error on riscv.
> 
> v3:
> - Based on next-20220225.
> 
> v2:
> - Avoid the overly long line in lots of places suggested by Christoph.
> - Fix a compiler warning reported by kernel test robot since pmd_pfn()
>   is not defined when !CONFIG_TRANSPARENT_HUGEPAGE on powerpc architecture.
> - Split a new patch 4 for preparation of fixing the dax bug.
> 
> Muchun Song (6):
>   mm: rmap: fix cache flush on THP pages
>   dax: fix cache flush on PMD-mapped pages
>   mm: rmap: introduce pfn_mkclean_range() to cleans PTEs
>   mm: pvmw: add support for walking devmap pages
>   dax: fix missing writeprotect the pte entry
>   mm: simplify follow_invalidate_pte()
> 
>  fs/dax.c             | 82 +++++-----------------------------------------------
>  include/linux/mm.h   |  3 --
>  include/linux/rmap.h |  3 ++
>  mm/internal.h        | 26 +++++++++++------
>  mm/memory.c          | 81 +++++++++++++++------------------------------------
>  mm/page_vma_mapped.c | 16 +++++-----
>  mm/rmap.c            | 68 +++++++++++++++++++++++++++++++++++--------
>  7 files changed, 114 insertions(+), 165 deletions(-)
> 
> -- 
> 2.11.0
>
Andrew Morton March 31, 2022, 10:36 p.m. UTC | #2
On Thu, 31 Mar 2022 11:55:47 -0400 Qian Cai <quic_qiancai@quicinc.com> wrote:

> On Fri, Mar 18, 2022 at 03:45:23PM +0800, Muchun Song wrote:
> > This series is based on next-20220225.
> > 
> > Patch 1-2 fix a cache flush bug, because subsequent patches depend on
> > those on those changes, there are placed in this series.  Patch 3-4
> > are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
> > since the previous patch remove the usage of follow_invalidate_pte().
> 
> Reverting this series fixed boot crashes.
> 

Thanks.  I'll drop

mm-rmap-fix-cache-flush-on-thp-pages.patch
dax-fix-cache-flush-on-pmd-mapped-pages.patch
mm-rmap-introduce-pfn_mkclean_range-to-cleans-ptes.patch
mm-rmap-introduce-pfn_mkclean_range-to-cleans-ptes-fix.patch
mm-pvmw-add-support-for-walking-devmap-pages.patch
dax-fix-missing-writeprotect-the-pte-entry.patch
dax-fix-missing-writeprotect-the-pte-entry-v6.patch
mm-simplify-follow_invalidate_pte.patch
Stephen Rothwell March 31, 2022, 10:48 p.m. UTC | #3
Hi Andrew,

On Thu, 31 Mar 2022 15:36:04 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:
>
> Thanks.  I'll drop
> 
> mm-rmap-fix-cache-flush-on-thp-pages.patch
> dax-fix-cache-flush-on-pmd-mapped-pages.patch
> mm-rmap-introduce-pfn_mkclean_range-to-cleans-ptes.patch
> mm-rmap-introduce-pfn_mkclean_range-to-cleans-ptes-fix.patch
> mm-pvmw-add-support-for-walking-devmap-pages.patch
> dax-fix-missing-writeprotect-the-pte-entry.patch
> dax-fix-missing-writeprotect-the-pte-entry-v6.patch
> mm-simplify-follow_invalidate_pte.patch

I have removed those and the 4 patches that I had to revert yesterday.
Muchun Song April 1, 2022, 3:44 a.m. UTC | #4
On Thu, Mar 31, 2022 at 11:55 PM Qian Cai <quic_qiancai@quicinc.com> wrote:
>
> On Fri, Mar 18, 2022 at 03:45:23PM +0800, Muchun Song wrote:
> > This series is based on next-20220225.
> >
> > Patch 1-2 fix a cache flush bug, because subsequent patches depend on
> > those on those changes, there are placed in this series.  Patch 3-4
> > are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
> > since the previous patch remove the usage of follow_invalidate_pte().
>
> Reverting this series fixed boot crashes.
>
>  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
>  Mem abort info:
>    ESR = 0x96000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  [dfff800000000003] address between user and kernel address ranges
>  Internal error: Oops: 96000004 [#1] PREEMPT SMP
>  Modules linked in: cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau crct10dif_ce drm_ttm_helper mlx5_core ttm drm_dp_helper drm_kms_helper nvme mpt3sas nvme_core xhci_pci raid_class drm xhci_pci_renesas
>  CPU: 3 PID: 1707 Comm: systemd-udevd Not tainted 5.17.0-next-20220331-00004-g2d550916a6b9 #51
>  pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : __lock_acquire
>  lr : lock_acquire.part.0
>  sp : ffff800030a16fd0
>  x29: ffff800030a16fd0 x28: ffffdd876c4e9f90 x27: 0000000000000018
>  x26: 0000000000000000 x25: 0000000000000018 x24: 0000000000000000
>  x23: ffff08022beacf00 x22: ffffdd8772507660 x21: 0000000000000000
>  x20: 0000000000000000 x19: 0000000000000000 x18: ffffdd8772417d2c
>  x17: ffffdd876c5bc2e0 x16: 1fffe100457d5b06 x15: 0000000000000094
>  x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff08022beacf08
>  x11: 1ffffbb0ee482fa5 x10: ffffdd8772417d28 x9 : 0000000000000000
>  x8 : 0000000000000003 x7 : ffffdd876c4e9f90 x6 : 0000000000000000
>  x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
>  x2 : 0000000000000000 x1 : 0000000000000003 x0 : dfff800000000000
>  Call trace:
>   __lock_acquire
>   lock_acquire.part.0
>   lock_acquire
>   _raw_spin_lock
>   page_vma_mapped_walk
>   try_to_migrate_one
>   rmap_walk_anon
>   try_to_migrate
>   __unmap_and_move
>   unmap_and_move
>   migrate_pages
>   migrate_misplaced_page
>   do_huge_pmd_numa_page
>   __handle_mm_fault
>   handle_mm_fault
>   do_translation_fault
>   do_mem_abort
>   el0_da
>   el0t_64_sync_handler
>   el0t_64_sync
>  Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)
>  ---[ end trace 0000000000000000 ]---
>  Kernel panic - not syncing: Oops: Fatal exception
>  SMP: stopping secondary CPUs
>  Kernel Offset: 0x5d8763da0000 from 0xffff800008000000
>  PHYS_OFFSET: 0x80000000
>  CPU features: 0x000,00085c0d,19801c82
>  Memory Limit: none
>  ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---

Thanks for your report. Would you mind providing the .config?
Muchun Song April 1, 2022, 8:50 a.m. UTC | #5
On Fri, Apr 1, 2022 at 11:44 AM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Thu, Mar 31, 2022 at 11:55 PM Qian Cai <quic_qiancai@quicinc.com> wrote:
> >
> > On Fri, Mar 18, 2022 at 03:45:23PM +0800, Muchun Song wrote:
> > > This series is based on next-20220225.
> > >
> > > Patch 1-2 fix a cache flush bug, because subsequent patches depend on
> > > those on those changes, there are placed in this series.  Patch 3-4
> > > are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
> > > since the previous patch remove the usage of follow_invalidate_pte().
> >
> > Reverting this series fixed boot crashes.
> >
> >  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
> >  Mem abort info:
> >    ESR = 0x96000004
> >    EC = 0x25: DABT (current EL), IL = 32 bits
> >    SET = 0, FnV = 0
> >    EA = 0, S1PTW = 0
> >    FSC = 0x04: level 0 translation fault
> >  Data abort info:
> >    ISV = 0, ISS = 0x00000004
> >    CM = 0, WnR = 0
> >  [dfff800000000003] address between user and kernel address ranges
> >  Internal error: Oops: 96000004 [#1] PREEMPT SMP
> >  Modules linked in: cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau crct10dif_ce drm_ttm_helper mlx5_core ttm drm_dp_helper drm_kms_helper nvme mpt3sas nvme_core xhci_pci raid_class drm xhci_pci_renesas
> >  CPU: 3 PID: 1707 Comm: systemd-udevd Not tainted 5.17.0-next-20220331-00004-g2d550916a6b9 #51
> >  pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> >  pc : __lock_acquire
> >  lr : lock_acquire.part.0
> >  sp : ffff800030a16fd0
> >  x29: ffff800030a16fd0 x28: ffffdd876c4e9f90 x27: 0000000000000018
> >  x26: 0000000000000000 x25: 0000000000000018 x24: 0000000000000000
> >  x23: ffff08022beacf00 x22: ffffdd8772507660 x21: 0000000000000000
> >  x20: 0000000000000000 x19: 0000000000000000 x18: ffffdd8772417d2c
> >  x17: ffffdd876c5bc2e0 x16: 1fffe100457d5b06 x15: 0000000000000094
> >  x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff08022beacf08
> >  x11: 1ffffbb0ee482fa5 x10: ffffdd8772417d28 x9 : 0000000000000000
> >  x8 : 0000000000000003 x7 : ffffdd876c4e9f90 x6 : 0000000000000000
> >  x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
> >  x2 : 0000000000000000 x1 : 0000000000000003 x0 : dfff800000000000
> >  Call trace:
> >   __lock_acquire
> >   lock_acquire.part.0
> >   lock_acquire
> >   _raw_spin_lock
> >   page_vma_mapped_walk
> >   try_to_migrate_one
> >   rmap_walk_anon
> >   try_to_migrate
> >   __unmap_and_move
> >   unmap_and_move
> >   migrate_pages
> >   migrate_misplaced_page
> >   do_huge_pmd_numa_page
> >   __handle_mm_fault
> >   handle_mm_fault
> >   do_translation_fault
> >   do_mem_abort
> >   el0_da
> >   el0t_64_sync_handler
> >   el0t_64_sync
> >  Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)
> >  ---[ end trace 0000000000000000 ]---
> >  Kernel panic - not syncing: Oops: Fatal exception
> >  SMP: stopping secondary CPUs
> >  Kernel Offset: 0x5d8763da0000 from 0xffff800008000000
> >  PHYS_OFFSET: 0x80000000
> >  CPU features: 0x000,00085c0d,19801c82
> >  Memory Limit: none
> >  ---[ end Kernel panic - not syncing: Oops: Fatal exception ]---
>
> Thanks for your report. Would you mind providing the .config?

Hi Qian Cai,

Would you mind helping me test if the following patch works properly?
Thanks.

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index b3bf802a6435..3da82bf65de8 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -210,7 +210,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
                 */
                pmde = READ_ONCE(*pvmw->pmd);

-               if (pmd_leaf(pmde) || is_pmd_migration_entry(pmde)) {
+               if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) ||
+                   (pmd_present(pmde) && pmd_devmap(pmde))) {
                        pvmw->ptl = pmd_lock(mm, pvmw->pmd);
                        pmde = *pvmw->pmd;
                        if (!pmd_present(pmde)) {
Qian Cai April 1, 2022, 11:07 a.m. UTC | #6
On Fri, Apr 01, 2022 at 11:44:16AM +0800, Muchun Song wrote:
> Thanks for your report. Would you mind providing the .config?

$ make ARCH=arm64 defconfig debug.config
Muchun Song April 2, 2022, 3:22 p.m. UTC | #7
On Thu, Mar 31, 2022 at 11:55 PM Qian Cai <quic_qiancai@quicinc.com> wrote:
>
> On Fri, Mar 18, 2022 at 03:45:23PM +0800, Muchun Song wrote:
> > This series is based on next-20220225.
> >
> > Patch 1-2 fix a cache flush bug, because subsequent patches depend on
> > those on those changes, there are placed in this series.  Patch 3-4
> > are preparation for fixing a dax bug in patch 5.  Patch 6 is code cleanup
> > since the previous patch remove the usage of follow_invalidate_pte().
>
> Reverting this series fixed boot crashes.
>
>  KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
>  Mem abort info:
>    ESR = 0x96000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>    FSC = 0x04: level 0 translation fault
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  [dfff800000000003] address between user and kernel address ranges
>  Internal error: Oops: 96000004 [#1] PREEMPT SMP
>  Modules linked in: cdc_ether usbnet ipmi_devintf ipmi_msghandler cppc_cpufreq fuse ip_tables x_tables ipv6 btrfs blake2b_generic libcrc32c xor xor_neon raid6_pq zstd_compress dm_mod nouveau crct10dif_ce drm_ttm_helper mlx5_core ttm drm_dp_helper drm_kms_helper nvme mpt3sas nvme_core xhci_pci raid_class drm xhci_pci_renesas
>  CPU: 3 PID: 1707 Comm: systemd-udevd Not tainted 5.17.0-next-20220331-00004-g2d550916a6b9 #51
>  pstate: 104000c9 (nzcV daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>  pc : __lock_acquire
>  lr : lock_acquire.part.0
>  sp : ffff800030a16fd0
>  x29: ffff800030a16fd0 x28: ffffdd876c4e9f90 x27: 0000000000000018
>  x26: 0000000000000000 x25: 0000000000000018 x24: 0000000000000000
>  x23: ffff08022beacf00 x22: ffffdd8772507660 x21: 0000000000000000
>  x20: 0000000000000000 x19: 0000000000000000 x18: ffffdd8772417d2c
>  x17: ffffdd876c5bc2e0 x16: 1fffe100457d5b06 x15: 0000000000000094
>  x14: 000000000000f1f1 x13: 00000000f3f3f3f3 x12: ffff08022beacf08
>  x11: 1ffffbb0ee482fa5 x10: ffffdd8772417d28 x9 : 0000000000000000
>  x8 : 0000000000000003 x7 : ffffdd876c4e9f90 x6 : 0000000000000000
>  x5 : 0000000000000000 x4 : 0000000000000001 x3 : 0000000000000000
>  x2 : 0000000000000000 x1 : 0000000000000003 x0 : dfff800000000000
>  Call trace:
>   __lock_acquire
>   lock_acquire.part.0
>   lock_acquire
>   _raw_spin_lock
>   page_vma_mapped_walk
>   try_to_migrate_one
>   rmap_walk_anon
>   try_to_migrate
>   __unmap_and_move
>   unmap_and_move
>   migrate_pages
>   migrate_misplaced_page
>   do_huge_pmd_numa_page
>   __handle_mm_fault
>   handle_mm_fault
>   do_translation_fault
>   do_mem_abort
>   el0_da
>   el0t_64_sync_handler
>   el0t_64_sync
>  Code: d65f03c0 d343ff61 d2d00000 f2fbffe0 (38e06820)

Hi,

I have found the root cause. It is because the implementation of
pmd_leaf() on arm64 is wrong.  It didn't consider the PROT_NONE
mapped PMD, which does not match the expectation of pmd_leaf().
I'll send a fixed patch for arm64 like the following.

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 94e147e5456c..09eaae46a19b 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -535,7 +535,7 @@ extern pgprot_t phys_mem_access_prot(struct file
*file, unsigned long pfn,
                                 PMD_TYPE_TABLE)
 #define pmd_sect(pmd)          ((pmd_val(pmd) & PMD_TYPE_MASK) == \
                                 PMD_TYPE_SECT)
-#define pmd_leaf(pmd)          pmd_sect(pmd)
+#define pmd_leaf(pmd)          (pmd_present(pmd) && !(pmd_val(pmd) &
PMD_TABLE_BIT))
 #define pmd_bad(pmd)           (!pmd_table(pmd))

 #define pmd_leaf_size(pmd)     (pmd_cont(pmd) ? CONT_PMD_SIZE : PMD_SIZE)

Thanks.