diff mbox series

[1/4] hugetlb: skip to end of PT page mapping when pte not present

Message ID 20220616210518.125287-2-mike.kravetz@oracle.com (mailing list archive)
State Superseded
Headers show
Series hugetlb: speed up linear address scanning | expand

Commit Message

Mike Kravetz June 16, 2022, 9:05 p.m. UTC
HugeTLB address ranges are linearly scanned during fork, unmap and
remap operations.  If a non-present entry is encountered, the code
currently continues to the next huge page aligned address.  However,
a non-present entry implies that the page table page for that entry
is not present.  Therefore, the linear scan can skip to the end of
range mapped by the page table page.  This can speed operations on
large sparsely populated hugetlb mappings.

Create a new routine hugetlb_mask_last_page() that will return an
address mask.  When the mask is ORed with an address, the result
will be the address of the last huge page mapped by the associated
page table page.  Use this mask to update addresses in routines which
linearly scan hugetlb address ranges when a non-present pte is
encountered.

hugetlb_mask_last_page is related to the implementation of
huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset
returns NULL.  This patch only provides a complete hugetlb_mask_last_page
implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.
Architectures which provide their own versions of huge_pte_offset can also
provide their own version of hugetlb_mask_last_page.

Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 include/linux/hugetlb.h |  1 +
 mm/hugetlb.c            | 62 +++++++++++++++++++++++++++++++++++++----
 2 files changed, 58 insertions(+), 5 deletions(-)

Comments

Muchun Song June 17, 2022, 8:13 a.m. UTC | #1
On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
> HugeTLB address ranges are linearly scanned during fork, unmap and
> remap operations.  If a non-present entry is encountered, the code
> currently continues to the next huge page aligned address.  However,
> a non-present entry implies that the page table page for that entry
> is not present.  Therefore, the linear scan can skip to the end of
> range mapped by the page table page.  This can speed operations on
> large sparsely populated hugetlb mappings.
> 
> Create a new routine hugetlb_mask_last_page() that will return an
> address mask.  When the mask is ORed with an address, the result
> will be the address of the last huge page mapped by the associated
> page table page.  Use this mask to update addresses in routines which
> linearly scan hugetlb address ranges when a non-present pte is
> encountered.
> 
> hugetlb_mask_last_page is related to the implementation of
> huge_pte_offset as hugetlb_mask_last_page is called when huge_pte_offset
> returns NULL.  This patch only provides a complete hugetlb_mask_last_page
> implementation when CONFIG_ARCH_WANT_GENERAL_HUGETLB is defined.
> Architectures which provide their own versions of huge_pte_offset can also
> provide their own version of hugetlb_mask_last_page.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> Tested-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

It'll be more efficient, Thanks.

Acked-by: Muchun Song <songmuchun@bytedance.com>
kernel test robot June 17, 2022, 11:26 a.m. UTC | #2
Hi Mike,

I love your patch! Yet something to improve:

[auto build test ERROR on soc/for-next]
[also build test ERROR on linus/master v5.19-rc2 next-20220617]
[cannot apply to arm64/for-next/core arm/for-next kvmarm/next xilinx-xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220617/202206171929.ZIUrNg6p-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project f0e608de27b3d568000046eebf3712ab542979d6)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/intel-lab-lkp/linux/commit/4c647687607f10fece04967b8180c0dadaf765e6
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
        git checkout 4c647687607f10fece04967b8180c0dadaf765e6
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

>> mm/hugetlb.c:6901:7: error: duplicate case value '4194304'
           case PUD_SIZE:
                ^
   include/asm-generic/pgtable-nopud.h:20:20: note: expanded from macro 'PUD_SIZE'
   #define PUD_SIZE        (1UL << PUD_SHIFT)
                           ^
   mm/hugetlb.c:6899:7: note: previous case defined here
           case P4D_SIZE:
                ^
   include/asm-generic/pgtable-nop4d.h:13:19: note: expanded from macro 'P4D_SIZE'
   #define P4D_SIZE                (1UL << P4D_SHIFT)
                                   ^
   mm/hugetlb.c:6903:7: error: duplicate case value '4194304'
           case PMD_SIZE:
                ^
   include/asm-generic/pgtable-nopmd.h:22:20: note: expanded from macro 'PMD_SIZE'
   #define PMD_SIZE        (1UL << PMD_SHIFT)
                           ^
   mm/hugetlb.c:6901:7: note: previous case defined here
           case PUD_SIZE:
                ^
   include/asm-generic/pgtable-nopud.h:20:20: note: expanded from macro 'PUD_SIZE'
   #define PUD_SIZE        (1UL << PUD_SHIFT)
                           ^
   2 errors generated.


vim +/4194304 +6901 mm/hugetlb.c

  6886	
  6887	/*
  6888	 * Return a mask that can be used to update an address to the last huge
  6889	 * page in a page table page mapping size.  Used to skip non-present
  6890	 * page table entries when linearly scanning address ranges.  Architectures
  6891	 * with unique huge page to page table relationships can define their own
  6892	 * version of this routine.
  6893	 */
  6894	unsigned long hugetlb_mask_last_page(struct hstate *h)
  6895	{
  6896		unsigned long hp_size = huge_page_size(h);
  6897	
  6898		switch (hp_size) {
  6899		case P4D_SIZE:
  6900			return PGDIR_SIZE - P4D_SIZE;
> 6901		case PUD_SIZE:
  6902			return P4D_SIZE - PUD_SIZE;
  6903		case PMD_SIZE:
  6904			return PUD_SIZE - PMD_SIZE;
  6905		default:
  6906			break; /* Should never happen */
  6907		}
  6908	
  6909		return ~(0UL);
  6910	}
  6911
Peter Xu June 17, 2022, 2:15 p.m. UTC | #3
Hi, Mike,

On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
> @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>  	return (pte_t *)pmd;
>  }
>  
> +/*
> + * Return a mask that can be used to update an address to the last huge
> + * page in a page table page mapping size.  Used to skip non-present
> + * page table entries when linearly scanning address ranges.  Architectures
> + * with unique huge page to page table relationships can define their own
> + * version of this routine.
> + */
> +unsigned long hugetlb_mask_last_page(struct hstate *h)
> +{
> +	unsigned long hp_size = huge_page_size(h);
> +
> +	switch (hp_size) {
> +	case P4D_SIZE:
> +		return PGDIR_SIZE - P4D_SIZE;
> +	case PUD_SIZE:
> +		return P4D_SIZE - PUD_SIZE;
> +	case PMD_SIZE:
> +		return PUD_SIZE - PMD_SIZE;
> +	default:

Should we add a WARN_ON_ONCE() if it should never trigger?

> +		break; /* Should never happen */
> +	}
> +
> +	return ~(0UL);
> +}
> +
> +#else
> +
> +/* See description above.  Architectures can provide their own version. */
> +__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
> +{
> +	return ~(0UL);

I'm wondering whether it's better to return 0 rather than ~0 by default.
Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
valid address ranges with ~0, or perhaps I misread?

Thanks,
Geert Uytterhoeven June 17, 2022, 3:26 p.m. UTC | #4
Hi Peter,

On Fri, Jun 17, 2022 at 4:22 PM Peter Xu <peterx@redhat.com> wrote:
> On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
> > @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >       return (pte_t *)pmd;
> >  }
> >
> > +/*
> > + * Return a mask that can be used to update an address to the last huge
> > + * page in a page table page mapping size.  Used to skip non-present
> > + * page table entries when linearly scanning address ranges.  Architectures
> > + * with unique huge page to page table relationships can define their own
> > + * version of this routine.
> > + */
> > +unsigned long hugetlb_mask_last_page(struct hstate *h)
> > +{
> > +     unsigned long hp_size = huge_page_size(h);
> > +
> > +     switch (hp_size) {
> > +     case P4D_SIZE:
> > +             return PGDIR_SIZE - P4D_SIZE;
> > +     case PUD_SIZE:
> > +             return P4D_SIZE - PUD_SIZE;
> > +     case PMD_SIZE:
> > +             return PUD_SIZE - PMD_SIZE;
> > +     default:
>
> Should we add a WARN_ON_ONCE() if it should never trigger?

And with panic_on_warn, it'll panic only once ;-)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
kernel test robot June 17, 2022, 5:06 p.m. UTC | #5
Hi Mike,

I love your patch! Yet something to improve:

[auto build test ERROR on soc/for-next]
[also build test ERROR on linus/master v5.19-rc2 next-20220617]
[cannot apply to arm64/for-next/core arm/for-next kvmarm/next xilinx-xlnx/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
config: i386-debian-10.3 (https://download.01.org/0day-ci/archive/20220618/202206180021.rcc4B1by-lkp@intel.com/config)
compiler: gcc-11 (Debian 11.3.0-3) 11.3.0
reproduce (this is a W=1 build):
        # https://github.com/intel-lab-lkp/linux/commit/4c647687607f10fece04967b8180c0dadaf765e6
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
        git checkout 4c647687607f10fece04967b8180c0dadaf765e6
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        make W=1 O=build_dir ARCH=i386 SHELL=/bin/bash

If you fix the issue, kindly add following tag where applicable
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   mm/hugetlb.c: In function 'hugetlb_mask_last_page':
>> mm/hugetlb.c:6901:9: error: duplicate case value
    6901 |         case PUD_SIZE:
         |         ^~~~
   mm/hugetlb.c:6899:9: note: previously used here
    6899 |         case P4D_SIZE:
         |         ^~~~


vim +6901 mm/hugetlb.c

  6886	
  6887	/*
  6888	 * Return a mask that can be used to update an address to the last huge
  6889	 * page in a page table page mapping size.  Used to skip non-present
  6890	 * page table entries when linearly scanning address ranges.  Architectures
  6891	 * with unique huge page to page table relationships can define their own
  6892	 * version of this routine.
  6893	 */
  6894	unsigned long hugetlb_mask_last_page(struct hstate *h)
  6895	{
  6896		unsigned long hp_size = huge_page_size(h);
  6897	
  6898		switch (hp_size) {
  6899		case P4D_SIZE:
  6900			return PGDIR_SIZE - P4D_SIZE;
> 6901		case PUD_SIZE:
  6902			return P4D_SIZE - PUD_SIZE;
  6903		case PMD_SIZE:
  6904			return PUD_SIZE - PMD_SIZE;
  6905		default:
  6906			break; /* Should never happen */
  6907		}
  6908	
  6909		return ~(0UL);
  6910	}
  6911
Mike Kravetz June 17, 2022, 5:17 p.m. UTC | #6
On 06/17/22 10:15, Peter Xu wrote:
> Hi, Mike,
> 
> On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
> > @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >  	return (pte_t *)pmd;
> >  }
> >  
> > +/*
> > + * Return a mask that can be used to update an address to the last huge
> > + * page in a page table page mapping size.  Used to skip non-present
> > + * page table entries when linearly scanning address ranges.  Architectures
> > + * with unique huge page to page table relationships can define their own
> > + * version of this routine.
> > + */
> > +unsigned long hugetlb_mask_last_page(struct hstate *h)
> > +{
> > +	unsigned long hp_size = huge_page_size(h);
> > +
> > +	switch (hp_size) {
> > +	case P4D_SIZE:
> > +		return PGDIR_SIZE - P4D_SIZE;
> > +	case PUD_SIZE:
> > +		return P4D_SIZE - PUD_SIZE;
> > +	case PMD_SIZE:
> > +		return PUD_SIZE - PMD_SIZE;
> > +	default:
> 
> Should we add a WARN_ON_ONCE() if it should never trigger?
> 

Sure.  I will add this.

> > +		break; /* Should never happen */
> > +	}
> > +
> > +	return ~(0UL);
> > +}
> > +
> > +#else
> > +
> > +/* See description above.  Architectures can provide their own version. */
> > +__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
> > +{
> > +	return ~(0UL);
> 
> I'm wondering whether it's better to return 0 rather than ~0 by default.
> Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
> valid address ranges with ~0, or perhaps I misread?

Thank you, thank you, thank you Peter!

Yes, the 'default' return for hugetlb_mask_last_page() should be 0.  If
there is no 'optimization', we do not want to modify the address so we
want to OR with 0 not ~0.  My bad, I must have been thinking AND instead
of OR.

I will change here as well as in Baolin's patch.
Mike Kravetz June 17, 2022, 9:09 p.m. UTC | #7
On 06/17/22 19:26, kernel test robot wrote:
> Hi Mike,
> 
> I love your patch! Yet something to improve:
> 
> [auto build test ERROR on soc/for-next]
> [also build test ERROR on linus/master v5.19-rc2 next-20220617]
> [cannot apply to arm64/for-next/core arm/for-next kvmarm/next xilinx-xlnx/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/soc/soc.git for-next
> config: i386-randconfig-a002 (https://download.01.org/0day-ci/archive/20220617/202206171929.ZIUrNg6p-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project f0e608de27b3d568000046eebf3712ab542979d6)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # https://github.com/intel-lab-lkp/linux/commit/4c647687607f10fece04967b8180c0dadaf765e6
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Mike-Kravetz/hugetlb-speed-up-linear-address-scanning/20220617-050726
>         git checkout 4c647687607f10fece04967b8180c0dadaf765e6
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=i386 SHELL=/bin/bash
> 
> If you fix the issue, kindly add following tag where applicable
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All errors (new ones prefixed by >>):

A couple of things here,

> 
> >> mm/hugetlb.c:6901:7: error: duplicate case value '4194304'
>            case PUD_SIZE:
>                 ^
>    include/asm-generic/pgtable-nopud.h:20:20: note: expanded from macro 'PUD_SIZE'
>    #define PUD_SIZE        (1UL << PUD_SHIFT)
>                            ^
>    mm/hugetlb.c:6899:7: note: previous case defined here
>            case P4D_SIZE:
>                 ^
>    include/asm-generic/pgtable-nop4d.h:13:19: note: expanded from macro 'P4D_SIZE'
>    #define P4D_SIZE                (1UL << P4D_SHIFT)

In the CONFIG_ARCH_WANT_GENERAL_HUGETLB case covered by this version of
hugetlb_mask_last_page, huge pages can only be PMD_SIZE or PUD_SIZE.
So, the 'case P4D_SIZE:' should not exist and can be removed.

>                                    ^
>    mm/hugetlb.c:6903:7: error: duplicate case value '4194304'
>            case PMD_SIZE:
>                 ^
>    include/asm-generic/pgtable-nopmd.h:22:20: note: expanded from macro 'PMD_SIZE'
>    #define PMD_SIZE        (1UL << PMD_SHIFT)
>                            ^

On i386 with CONFIG_PGTABLE_LEVELS=2, PUD_SIZE == PMD_SIZE.
Originally, I coded this as a if .. else if ... statement instead of a
switch.  If coded this way, the compiler does not complain about the
duplicate values.  The only other alternative I can think of is
something like '#if CONFIG_PGTABLE_LEVELS > 2' around the PUD_SIZE case.

I would prefer the if else if, unless someone can suggest something else?
Baolin Wang June 18, 2022, 3:27 a.m. UTC | #8
On 6/18/2022 1:17 AM, Mike Kravetz wrote:
> On 06/17/22 10:15, Peter Xu wrote:
>> Hi, Mike,
>>
>> On Thu, Jun 16, 2022 at 02:05:15PM -0700, Mike Kravetz wrote:
>>> @@ -6877,6 +6896,39 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>>>   	return (pte_t *)pmd;
>>>   }
>>>   
>>> +/*
>>> + * Return a mask that can be used to update an address to the last huge
>>> + * page in a page table page mapping size.  Used to skip non-present
>>> + * page table entries when linearly scanning address ranges.  Architectures
>>> + * with unique huge page to page table relationships can define their own
>>> + * version of this routine.
>>> + */
>>> +unsigned long hugetlb_mask_last_page(struct hstate *h)
>>> +{
>>> +	unsigned long hp_size = huge_page_size(h);
>>> +
>>> +	switch (hp_size) {
>>> +	case P4D_SIZE:
>>> +		return PGDIR_SIZE - P4D_SIZE;
>>> +	case PUD_SIZE:
>>> +		return P4D_SIZE - PUD_SIZE;
>>> +	case PMD_SIZE:
>>> +		return PUD_SIZE - PMD_SIZE;
>>> +	default:
>>
>> Should we add a WARN_ON_ONCE() if it should never trigger?
>>
> 
> Sure.  I will add this.
> 
>>> +		break; /* Should never happen */
>>> +	}
>>> +
>>> +	return ~(0UL);
>>> +}
>>> +
>>> +#else
>>> +
>>> +/* See description above.  Architectures can provide their own version. */
>>> +__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
>>> +{
>>> +	return ~(0UL);
>>
>> I'm wondering whether it's better to return 0 rather than ~0 by default.
>> Could an arch with !CONFIG_ARCH_WANT_GENERAL_HUGETLB wrongly skip some
>> valid address ranges with ~0, or perhaps I misread?
> 
> Thank you, thank you, thank you Peter!
> 
> Yes, the 'default' return for hugetlb_mask_last_page() should be 0.  If
> there is no 'optimization', we do not want to modify the address so we
> want to OR with 0 not ~0.  My bad, I must have been thinking AND instead
> of OR.
> 
> I will change here as well as in Baolin's patch.

Ah, I also overlooked this. Thanks Peter, and thanks Mike for updating.
diff mbox series

Patch

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 642a39016f9a..e37465e830fe 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -197,6 +197,7 @@  pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma,
 			unsigned long addr, unsigned long sz);
 pte_t *huge_pte_offset(struct mm_struct *mm,
 		       unsigned long addr, unsigned long sz);
+unsigned long hugetlb_mask_last_page(struct hstate *h);
 int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma,
 				unsigned long *addr, pte_t *ptep);
 void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma,
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 259b9c41892f..7c4a82848603 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4740,6 +4740,7 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 	unsigned long npages = pages_per_huge_page(h);
 	struct address_space *mapping = src_vma->vm_file->f_mapping;
 	struct mmu_notifier_range range;
+	unsigned long last_addr_mask;
 	int ret = 0;
 
 	if (cow) {
@@ -4759,11 +4760,14 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		i_mmap_lock_read(mapping);
 	}
 
+	last_addr_mask = hugetlb_mask_last_page(h);
 	for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) {
 		spinlock_t *src_ptl, *dst_ptl;
 		src_pte = huge_pte_offset(src, addr, sz);
-		if (!src_pte)
+		if (!src_pte) {
+			addr |= last_addr_mask;
 			continue;
+		}
 		dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz);
 		if (!dst_pte) {
 			ret = -ENOMEM;
@@ -4780,8 +4784,10 @@  int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
 		 * after taking the lock below.
 		 */
 		dst_entry = huge_ptep_get(dst_pte);
-		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
+		if ((dst_pte == src_pte) || !huge_pte_none(dst_entry)) {
+			addr |= last_addr_mask;
 			continue;
+		}
 
 		dst_ptl = huge_pte_lock(h, dst, dst_pte);
 		src_ptl = huge_pte_lockptr(h, src, src_pte);
@@ -4942,6 +4948,7 @@  int move_hugetlb_page_tables(struct vm_area_struct *vma,
 	unsigned long sz = huge_page_size(h);
 	struct mm_struct *mm = vma->vm_mm;
 	unsigned long old_end = old_addr + len;
+	unsigned long last_addr_mask;
 	unsigned long old_addr_copy;
 	pte_t *src_pte, *dst_pte;
 	struct mmu_notifier_range range;
@@ -4957,12 +4964,16 @@  int move_hugetlb_page_tables(struct vm_area_struct *vma,
 	flush_cache_range(vma, range.start, range.end);
 
 	mmu_notifier_invalidate_range_start(&range);
+	last_addr_mask = hugetlb_mask_last_page(h);
 	/* Prevent race with file truncation */
 	i_mmap_lock_write(mapping);
 	for (; old_addr < old_end; old_addr += sz, new_addr += sz) {
 		src_pte = huge_pte_offset(mm, old_addr, sz);
-		if (!src_pte)
+		if (!src_pte) {
+			old_addr |= last_addr_mask;
+			new_addr |= last_addr_mask;
 			continue;
+		}
 		if (huge_pte_none(huge_ptep_get(src_pte)))
 			continue;
 
@@ -5007,6 +5018,7 @@  static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 	struct hstate *h = hstate_vma(vma);
 	unsigned long sz = huge_page_size(h);
 	struct mmu_notifier_range range;
+	unsigned long last_addr_mask;
 	bool force_flush = false;
 
 	WARN_ON(!is_vm_hugetlb_page(vma));
@@ -5027,11 +5039,14 @@  static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
 				end);
 	adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end);
 	mmu_notifier_invalidate_range_start(&range);
+	last_addr_mask = hugetlb_mask_last_page(h);
 	address = start;
 	for (; address < end; address += sz) {
 		ptep = huge_pte_offset(mm, address, sz);
-		if (!ptep)
+		if (!ptep) {
+			address |= last_addr_mask;
 			continue;
+		}
 
 		ptl = huge_pte_lock(h, mm, ptep);
 		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
@@ -6305,6 +6320,7 @@  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	unsigned long pages = 0, psize = huge_page_size(h);
 	bool shared_pmd = false;
 	struct mmu_notifier_range range;
+	unsigned long last_addr_mask;
 	bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
 	bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
 
@@ -6321,12 +6337,15 @@  unsigned long hugetlb_change_protection(struct vm_area_struct *vma,
 	flush_cache_range(vma, range.start, range.end);
 
 	mmu_notifier_invalidate_range_start(&range);
+	last_addr_mask = hugetlb_mask_last_page(h);
 	i_mmap_lock_write(vma->vm_file->f_mapping);
 	for (; address < end; address += psize) {
 		spinlock_t *ptl;
 		ptep = huge_pte_offset(mm, address, psize);
-		if (!ptep)
+		if (!ptep) {
+			address |= last_addr_mask;
 			continue;
+		}
 		ptl = huge_pte_lock(h, mm, ptep);
 		if (huge_pmd_unshare(mm, vma, &address, ptep)) {
 			/*
@@ -6877,6 +6896,39 @@  pte_t *huge_pte_offset(struct mm_struct *mm,
 	return (pte_t *)pmd;
 }
 
+/*
+ * Return a mask that can be used to update an address to the last huge
+ * page in a page table page mapping size.  Used to skip non-present
+ * page table entries when linearly scanning address ranges.  Architectures
+ * with unique huge page to page table relationships can define their own
+ * version of this routine.
+ */
+unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+	unsigned long hp_size = huge_page_size(h);
+
+	switch (hp_size) {
+	case P4D_SIZE:
+		return PGDIR_SIZE - P4D_SIZE;
+	case PUD_SIZE:
+		return P4D_SIZE - PUD_SIZE;
+	case PMD_SIZE:
+		return PUD_SIZE - PMD_SIZE;
+	default:
+		break; /* Should never happen */
+	}
+
+	return ~(0UL);
+}
+
+#else
+
+/* See description above.  Architectures can provide their own version. */
+__weak unsigned long hugetlb_mask_last_page(struct hstate *h)
+{
+	return ~(0UL);
+}
+
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */
 
 /*