diff mbox series

[2/2] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

Message ID 20220512174551.81279-2-shy828301@gmail.com (mailing list archive)
State New
Headers show
Series [1/2] mm: rmap: use the correct parameter name for DEFINE_PAGE_VMA_WALK | expand

Commit Message

Yang Shi May 12, 2022, 5:45 p.m. UTC
IIUC PVMW checks if the vma is possibly huge PMD mapped by
transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".

Actually pvmw->nr_pages is returned by compound_nr() or
folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
>= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
in the first place.  But it may be not PMD mapped if the VMA is file
VMA and it is not properly aligned.  The transhuge_vma_suitable()
is used to do such check, so replace transparent_hugepage_active() to
it, which is too heavy and overkilling.

Fixes: 2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs")
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
 mm/page_vma_mapped.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrew Morton May 12, 2022, 11:39 p.m. UTC | #1
On Thu, 12 May 2022 10:45:51 -0700 Yang Shi <shy828301@gmail.com> wrote:

> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> 
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> in the first place.  But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned.  The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.
> 
> ...
>
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -237,13 +237,14 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>  			spin_unlock(pvmw->ptl);
>  			pvmw->ptl = NULL;
>  		} else if (!pmd_present(pmde)) {
> +			unsigned long haddr = pvmw->address & HPAGE_PMD_MASK;

This hits

	#define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })

when CONFIG_TRANSPARENT_HUGEPAGE=n (x86_64 allnoconfig).
Yang Shi May 13, 2022, 12:39 a.m. UTC | #2
Andrew Morton <akpm@linux-foundation.org>于2022年5月12日 周四下午4:39写道:

> On Thu, 12 May 2022 10:45:51 -0700 Yang Shi <shy828301@gmail.com> wrote:
>
> > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> >
> > Actually pvmw->nr_pages is returned by compound_nr() or
> > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> > in the first place.  But it may be not PMD mapped if the VMA is file
> > VMA and it is not properly aligned.  The transhuge_vma_suitable()
> > is used to do such check, so replace transparent_hugepage_active() to
> > it, which is too heavy and overkilling.
> >
> > ...
> >
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -237,13 +237,14 @@ bool page_vma_mapped_walk(struct
> page_vma_mapped_walk *pvmw)
> >                       spin_unlock(pvmw->ptl);
> >                       pvmw->ptl = NULL;
> >               } else if (!pmd_present(pmde)) {
> > +                     unsigned long haddr = pvmw->address &
> HPAGE_PMD_MASK;
>
> This hits
>
>         #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
>
> when CONFIG_TRANSPARENT_HUGEPAGE=n (x86_64 allnoconfig).


Thanks for catching this. I think the best way is to round the address in
transhuge_vma_suitable() which is protected by the config.

Will prepare v2 soon.


>
>
kernel test robot May 13, 2022, 1:41 a.m. UTC | #3
Hi Yang,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on akpm-mm/mm-everything]

url:    https://github.com/intel-lab-lkp/linux/commits/Yang-Shi/mm-rmap-use-the-correct-parameter-name-for-DEFINE_PAGE_VMA_WALK/20220513-015301
base:   https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
config: mips-maltaup_defconfig (https://download.01.org/0day-ci/archive/20220513/202205130928.ThdARQ6I-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 9519dacab7b8afd537811fc2abaceb4d14f4e16a)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install mips cross compiling tool for clang build
        # apt-get install binutils-mips-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/66065a2fef73252d47341b46d9ea946b325da354
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Yang-Shi/mm-rmap-use-the-correct-parameter-name-for-DEFINE_PAGE_VMA_WALK/20220513-015301
        git checkout 66065a2fef73252d47341b46d9ea946b325da354
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=mips SHELL=/bin/bash

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/page_vma_mapped.c:240:42: error: call to __compiletime_assert_345 declared with 'error' attribute: BUILD_BUG failed
                           unsigned long haddr = pvmw->address & HPAGE_PMD_MASK;
                                                                 ^
   include/linux/huge_mm.h:322:27: note: expanded from macro 'HPAGE_PMD_MASK'
   #define HPAGE_PMD_MASK ({ BUILD_BUG(); 0; })
                             ^
   include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG'
   #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
                       ^
   include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
   #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
                                       ^
   note: (skipping 1 expansions in backtrace; use -fmacro-backtrace-limit=0 to see all)
   include/linux/compiler_types.h:340:2: note: expanded from macro '_compiletime_assert'
           __compiletime_assert(condition, msg, prefix, suffix)
           ^
   include/linux/compiler_types.h:333:4: note: expanded from macro '__compiletime_assert'
                           prefix ## suffix();                             \
                           ^
   <scratch space>:19:1: note: expanded from here
   __compiletime_assert_345
   ^
   1 error generated.


vim +/error +240 mm/page_vma_mapped.c

   126	
   127	/**
   128	 * page_vma_mapped_walk - check if @pvmw->pfn is mapped in @pvmw->vma at
   129	 * @pvmw->address
   130	 * @pvmw: pointer to struct page_vma_mapped_walk. page, vma, address and flags
   131	 * must be set. pmd, pte and ptl must be NULL.
   132	 *
   133	 * Returns true if the page is mapped in the vma. @pvmw->pmd and @pvmw->pte point
   134	 * to relevant page table entries. @pvmw->ptl is locked. @pvmw->address is
   135	 * adjusted if needed (for PTE-mapped THPs).
   136	 *
   137	 * If @pvmw->pmd is set but @pvmw->pte is not, you have found PMD-mapped page
   138	 * (usually THP). For PTE-mapped THP, you should run page_vma_mapped_walk() in
   139	 * a loop to find all PTEs that map the THP.
   140	 *
   141	 * For HugeTLB pages, @pvmw->pte is set to the relevant page table entry
   142	 * regardless of which page table level the page is mapped at. @pvmw->pmd is
   143	 * NULL.
   144	 *
   145	 * Returns false if there are no more page table entries for the page in
   146	 * the vma. @pvmw->ptl is unlocked and @pvmw->pte is unmapped.
   147	 *
   148	 * If you need to stop the walk before page_vma_mapped_walk() returned false,
   149	 * use page_vma_mapped_walk_done(). It will do the housekeeping.
   150	 */
   151	bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
   152	{
   153		struct vm_area_struct *vma = pvmw->vma;
   154		struct mm_struct *mm = vma->vm_mm;
   155		unsigned long end;
   156		pgd_t *pgd;
   157		p4d_t *p4d;
   158		pud_t *pud;
   159		pmd_t pmde;
   160	
   161		/* The only possible pmd mapping has been handled on last iteration */
   162		if (pvmw->pmd && !pvmw->pte)
   163			return not_found(pvmw);
   164	
   165		if (unlikely(is_vm_hugetlb_page(vma))) {
   166			struct hstate *hstate = hstate_vma(vma);
   167			unsigned long size = huge_page_size(hstate);
   168			/* The only possible mapping was handled on last iteration */
   169			if (pvmw->pte)
   170				return not_found(pvmw);
   171	
   172			/* when pud is not present, pte will be NULL */
   173			pvmw->pte = huge_pte_offset(mm, pvmw->address, size);
   174			if (!pvmw->pte)
   175				return false;
   176	
   177			pvmw->ptl = huge_pte_lockptr(hstate, mm, pvmw->pte);
   178			spin_lock(pvmw->ptl);
   179			if (!check_pte(pvmw))
   180				return not_found(pvmw);
   181			return true;
   182		}
   183	
   184		end = vma_address_end(pvmw);
   185		if (pvmw->pte)
   186			goto next_pte;
   187	restart:
   188		do {
   189			pgd = pgd_offset(mm, pvmw->address);
   190			if (!pgd_present(*pgd)) {
   191				step_forward(pvmw, PGDIR_SIZE);
   192				continue;
   193			}
   194			p4d = p4d_offset(pgd, pvmw->address);
   195			if (!p4d_present(*p4d)) {
   196				step_forward(pvmw, P4D_SIZE);
   197				continue;
   198			}
   199			pud = pud_offset(p4d, pvmw->address);
   200			if (!pud_present(*pud)) {
   201				step_forward(pvmw, PUD_SIZE);
   202				continue;
   203			}
   204	
   205			pvmw->pmd = pmd_offset(pud, pvmw->address);
   206			/*
   207			 * Make sure the pmd value isn't cached in a register by the
   208			 * compiler and used as a stale value after we've observed a
   209			 * subsequent update.
   210			 */
   211			pmde = READ_ONCE(*pvmw->pmd);
   212	
   213			if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) ||
   214			    (pmd_present(pmde) && pmd_devmap(pmde))) {
   215				pvmw->ptl = pmd_lock(mm, pvmw->pmd);
   216				pmde = *pvmw->pmd;
   217				if (!pmd_present(pmde)) {
   218					swp_entry_t entry;
   219	
   220					if (!thp_migration_supported() ||
   221					    !(pvmw->flags & PVMW_MIGRATION))
   222						return not_found(pvmw);
   223					entry = pmd_to_swp_entry(pmde);
   224					if (!is_migration_entry(entry) ||
   225					    !check_pmd(swp_offset(entry), pvmw))
   226						return not_found(pvmw);
   227					return true;
   228				}
   229				if (likely(pmd_trans_huge(pmde) || pmd_devmap(pmde))) {
   230					if (pvmw->flags & PVMW_MIGRATION)
   231						return not_found(pvmw);
   232					if (!check_pmd(pmd_pfn(pmde), pvmw))
   233						return not_found(pvmw);
   234					return true;
   235				}
   236				/* THP pmd was split under us: handle on pte level */
   237				spin_unlock(pvmw->ptl);
   238				pvmw->ptl = NULL;
   239			} else if (!pmd_present(pmde)) {
 > 240				unsigned long haddr = pvmw->address & HPAGE_PMD_MASK;
   241				/*
   242				 * If PVMW_SYNC, take and drop THP pmd lock so that we
   243				 * cannot return prematurely, while zap_huge_pmd() has
   244				 * cleared *pmd but not decremented compound_mapcount().
   245				 */
   246				if ((pvmw->flags & PVMW_SYNC) &&
   247				    transhuge_vma_suitable(vma, haddr) &&
   248				    (pvmw->nr_pages >= HPAGE_PMD_NR)) {
   249					spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
   250	
   251					spin_unlock(ptl);
   252				}
   253				step_forward(pvmw, PMD_SIZE);
   254				continue;
   255			}
   256			if (!map_pte(pvmw))
   257				goto next_pte;
   258	this_pte:
   259			if (check_pte(pvmw))
   260				return true;
   261	next_pte:
   262			do {
   263				pvmw->address += PAGE_SIZE;
   264				if (pvmw->address >= end)
   265					return not_found(pvmw);
   266				/* Did we cross page table boundary? */
   267				if ((pvmw->address & (PMD_SIZE - PAGE_SIZE)) == 0) {
   268					if (pvmw->ptl) {
   269						spin_unlock(pvmw->ptl);
   270						pvmw->ptl = NULL;
   271					}
   272					pte_unmap(pvmw->pte);
   273					pvmw->pte = NULL;
   274					goto restart;
   275				}
   276				pvmw->pte++;
   277				if ((pvmw->flags & PVMW_SYNC) && !pvmw->ptl) {
   278					pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
   279					spin_lock(pvmw->ptl);
   280				}
   281			} while (pte_none(*pvmw->pte));
   282	
   283			if (!pvmw->ptl) {
   284				pvmw->ptl = pte_lockptr(mm, pvmw->pmd);
   285				spin_lock(pvmw->ptl);
   286			}
   287			goto this_pte;
   288		} while (pvmw->address < end);
   289	
   290		return false;
   291	}
   292
Matthew Wilcox May 13, 2022, 3:30 a.m. UTC | #4
On Thu, May 12, 2022 at 10:45:51AM -0700, Yang Shi wrote:
> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> 
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> in the first place.  But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned.  The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.
> 
> Fixes: 2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs")

I think Fixes is a bit much.  There's no bug being fixed here.  This is
just an optimisation.  Is it an important optimisation?  We could put a
bool into page_vma_mapped_walk() so we only have to ask the page whether
it's PMD-mappable once per walk rather than for each VMA.
Yang Shi May 13, 2022, 3:51 a.m. UTC | #5
On Thu, May 12, 2022 at 8:30 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, May 12, 2022 at 10:45:51AM -0700, Yang Shi wrote:
> > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> >
> > Actually pvmw->nr_pages is returned by compound_nr() or
> > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> > in the first place.  But it may be not PMD mapped if the VMA is file
> > VMA and it is not properly aligned.  The transhuge_vma_suitable()
> > is used to do such check, so replace transparent_hugepage_active() to
> > it, which is too heavy and overkilling.
> >
> > Fixes: 2aff7a4755be ("mm: Convert page_vma_mapped_walk to work on PFNs")
>
> I think Fixes is a bit much.  There's no bug being fixed here.  This is
> just an optimisation.  Is it an important optimisation?  We could put a

Yeah, it is just an optimization, will remove the fix tag.

I'm trying to do some cleanup for all the transhuge_page_* checks
suggested by Vlastimil. I should be able to kill
transparent_hugepage_active() by replacing it with
transhuge_vma_suitable() here.

> bool into page_vma_mapped_walk() so we only have to ask the page whether
> it's PMD-mappable once per walk rather than for each VMA.

The page may be PMD-mappable for one VMA, but not for the other VMA.

>
diff mbox series

Patch

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..2634565be175 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -237,13 +237,14 @@  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			spin_unlock(pvmw->ptl);
 			pvmw->ptl = NULL;
 		} else if (!pmd_present(pmde)) {
+			unsigned long haddr = pvmw->address & HPAGE_PMD_MASK;
 			/*
 			 * If PVMW_SYNC, take and drop THP pmd lock so that we
 			 * cannot return prematurely, while zap_huge_pmd() has
 			 * cleared *pmd but not decremented compound_mapcount().
 			 */
 			if ((pvmw->flags & PVMW_SYNC) &&
-			    transparent_hugepage_active(vma) &&
+			    transhuge_vma_suitable(vma, haddr) &&
 			    (pvmw->nr_pages >= HPAGE_PMD_NR)) {
 				spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);