Message ID | 20221021230722.370587-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing | expand |
On Fri, 2022-10-21 at 16:07 -0700, Mike Kravetz wrote: > madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the > page tables associated with the address range. For hugetlb vmas, > zap_page_range will call __unmap_hugepage_range_final. However, > __unmap_hugepage_range_final assumes the passed vma is about to be > removed and deletes the vma_lock to prevent pmd sharing as the vma is > on the way out. In the case of madvise(MADV_DONTNEED) the vma > remains, > but the missing vma_lock prevents pmd sharing and could potentially > lead to issues with truncation/fault races. > > This issue was originally reported here [1] as a BUG triggered in > page_try_dup_anon_rmap. Prior to the introduction of the hugetlb > vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag > to > prevent pmd sharing. Subsequent faults on this vma were confused as > VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping > was not set in new pages added to the page table. This resulted in > pages that appeared anonymous in a VM_SHARED vma and triggered the > BUG. > > Create a new routine clear_hugetlb_page_range() that can be called > from > madvise(MADV_DONTNEED) for hugetlb vmas. It has the same setup as > zap_page_range, but does not delete the vma_lock. > Reviewed-by: Rik van Riel <riel@surriel.com>
Hi Mike, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.1-rc1 next-20221021] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-don-t-delete-vma_lock-in-hugetlb-MADV_DONTNEED-processing/20221022-070934 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20221021230722.370587-1-mike.kravetz%40oracle.com patch subject: [PATCH] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing config: powerpc-allnoconfig compiler: powerpc-linux-gcc (GCC) 12.1.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/b52333ec636c58b31a006e7b4a0e6e9f1280ceaa git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Mike-Kravetz/hugetlb-don-t-delete-vma_lock-in-hugetlb-MADV_DONTNEED-processing/20221022-070934 git checkout b52333ec636c58b31a006e7b4a0e6e9f1280ceaa # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/kernel/ If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from arch/powerpc/kernel/setup-common.c:37: >> include/linux/hugetlb.h:431:13: error: 'clear_hugetlb_page_range' defined but not used [-Werror=unused-function] 431 | static void clear_hugetlb_page_range(struct vm_area_struct *vma, | ^~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors vim +/clear_hugetlb_page_range +431 include/linux/hugetlb.h 430 > 431 static void clear_hugetlb_page_range(struct vm_area_struct *vma, 432 unsigned long start, unsigned long end) 433 { 434 } 435
On 10/22/22 10:45, kernel test robot wrote: > Hi Mike, > > I love your patch! Yet something to improve: > > [auto build test ERROR on akpm-mm/mm-everything] > [also build test ERROR on linus/master v6.1-rc1 next-20221021] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch#_base_tree_information] > > url: https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-don-t-delete-vma_lock-in-hugetlb-MADV_DONTNEED-processing/20221022-070934 > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything > patch link: https://lore.kernel.org/r/20221021230722.370587-1-mike.kravetz%40oracle.com > patch subject: [PATCH] hugetlb: don't delete vma_lock in hugetlb MADV_DONTNEED processing > config: powerpc-allnoconfig > compiler: powerpc-linux-gcc (GCC) 12.1.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/intel-lab-lkp/linux/commit/b52333ec636c58b31a006e7b4a0e6e9f1280ceaa > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review Mike-Kravetz/hugetlb-don-t-delete-vma_lock-in-hugetlb-MADV_DONTNEED-processing/20221022-070934 > git checkout b52333ec636c58b31a006e7b4a0e6e9f1280ceaa > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-12.1.0 make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash arch/powerpc/kernel/ > > If you fix the issue, kindly add following tag where applicable > | Reported-by: kernel test robot <lkp@intel.com> > > All errors (new ones prefixed by >>): > > In file included from arch/powerpc/kernel/setup-common.c:37: > >> include/linux/hugetlb.h:431:13: error: 'clear_hugetlb_page_range' defined but not used [-Werror=unused-function] > 431 | static void clear_hugetlb_page_range(struct vm_area_struct *vma, > | ^~~~~~~~~~~~~~~~~~~~~~~~ > cc1: all warnings being treated as errors Wow! I was unaware that madvise could be removed via a config option. I will soon send a v2 with changes as follows: - Use __maybe_unused on the static stub in the !CONFIG_HUGETLB_PAGE case in hugetlb.h. - Wrap clear_hugetlb_page_range in hugetlb.c with #ifdef CONFIG_ADVISE_SYSCALLS so that it is not included if !CONFIG_ADVISE_SYSCALLS.
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a899bc76d677..5cf2028ebad4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -158,6 +158,8 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, void unmap_hugepage_range(struct vm_area_struct *, unsigned long, unsigned long, struct page *, zap_flags_t); +void clear_hugetlb_page_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, @@ -426,6 +428,11 @@ static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, BUG(); } +static void clear_hugetlb_page_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ +} + static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 931789a8f734..3de717367e0a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5202,27 +5202,63 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct tlb_flush_mmu_tlbonly(tlb); } -void __unmap_hugepage_range_final(struct mmu_gather *tlb, +static void __unmap_hugepage_range_locking(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, struct page *ref_page, - zap_flags_t zap_flags) + zap_flags_t zap_flags, bool final) { hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); - /* - * Unlock and free the vma lock before releasing i_mmap_rwsem. When - * the vma_lock is freed, this makes the vma ineligible for pmd - * sharing. And, i_mmap_rwsem is required to set up pmd sharing. - * This is important as page tables for this unmapped range will - * be asynchrously deleted. If the page tables are shared, there - * will be issues when accessed by someone else. - */ - __hugetlb_vma_unlock_write_free(vma); + if (final) { + /* + * Unlock and free the vma lock before releasing i_mmap_rwsem. + * When the vma_lock is freed, this makes the vma ineligible + * for pmd sharing. And, i_mmap_rwsem is required to set up + * pmd sharing. This is important as page tables for this + * unmapped range will be asynchrously deleted. If the page + * tables are shared, there will be issues when accessed by + * someone else. + */ + __hugetlb_vma_unlock_write_free(vma); + i_mmap_unlock_write(vma->vm_file->f_mapping); + } else { + i_mmap_unlock_write(vma->vm_file->f_mapping); + hugetlb_vma_unlock_write(vma); + } +} - i_mmap_unlock_write(vma->vm_file->f_mapping); +void __unmap_hugepage_range_final(struct mmu_gather *tlb, + struct vm_area_struct *vma, unsigned long start, + unsigned long end, struct page *ref_page, + zap_flags_t zap_flags) +{ + __unmap_hugepage_range_locking(tlb, vma, start, end, ref_page, + zap_flags, true); +} + +/* + * Similar setup as in zap_page_range(). madvise(MADV_DONTNEED) can not call + * zap_page_range for hugetlb vmas as __unmap_hugepage_range_final will delete + * the associated vma_lock. + */ +void clear_hugetlb_page_range(struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct mmu_notifier_range range; + struct mmu_gather tlb; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm, + start, end); + tlb_gather_mmu(&tlb, vma->vm_mm); + update_hiwater_rss(vma->vm_mm); + + __unmap_hugepage_range_locking(&tlb, vma, start, end, NULL, 0, false); + + mmu_notifier_invalidate_range_end(&range); + tlb_finish_mmu(&tlb); } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, diff --git a/mm/madvise.c b/mm/madvise.c index 2baa93ca2310..90577a669635 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -790,7 +790,10 @@ static int madvise_free_single_vma(struct vm_area_struct *vma, static long madvise_dontneed_single_vma(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - zap_page_range(vma, start, end - start); + if (!is_vm_hugetlb_page(vma)) + zap_page_range(vma, start, end - start); + else + clear_hugetlb_page_range(vma, start, end); return 0; }
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the page tables associated with the address range. For hugetlb vmas, zap_page_range will call __unmap_hugepage_range_final. However, __unmap_hugepage_range_final assumes the passed vma is about to be removed and deletes the vma_lock to prevent pmd sharing as the vma is on the way out. In the case of madvise(MADV_DONTNEED) the vma remains, but the missing vma_lock prevents pmd sharing and could potentially lead to issues with truncation/fault races. This issue was originally reported here [1] as a BUG triggered in page_try_dup_anon_rmap. Prior to the introduction of the hugetlb vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to prevent pmd sharing. Subsequent faults on this vma were confused as VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping was not set in new pages added to the page table. This resulted in pages that appeared anonymous in a VM_SHARED vma and triggered the BUG. Create a new routine clear_hugetlb_page_range() that can be called from madvise(MADV_DONTNEED) for hugetlb vmas. It has the same setup as zap_page_range, but does not delete the vma_lock. [1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6JudD0g@mail.gmail.com/ Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings") Reported-by: Wei Chen <harperchen1110@gmail.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: <stable@vger.kernel.org> --- Note: backports to stable will be somewhat different as hugetlb vma_lock was added in 6.1. include/linux/hugetlb.h | 7 +++++ mm/hugetlb.c | 60 ++++++++++++++++++++++++++++++++--------- mm/madvise.c | 5 +++- 3 files changed, 59 insertions(+), 13 deletions(-)