diff mbox series

[4/5] mm: Provide page mask in struct follow_pfnmap_args

Message ID 20250205231728.2527186-5-alex.williamson@redhat.com (mailing list archive)
State New
Headers show
Series vfio: Improve DMA mapping performance for huge pfnmaps | expand

Commit Message

Alex Williamson Feb. 5, 2025, 11:17 p.m. UTC
follow_pfnmap_start() walks the page table for a given address and
fills out the struct follow_pfnmap_args in pfnmap_args_setup().
The page mask of the page table level is already provided to this
latter function for calculating the pfn.  This page mask can also be
useful for the caller to determine the extent of the contiguous
mapping.

For example, vfio-pci now supports huge_fault for pfnmaps and is able
to insert pud and pmd mappings.  When we DMA map these pfnmaps, ex.
PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test
for a contiguous pfn range.  Providing the mapping page mask allows us
to skip the extent of the mapping level.  Assuming a 1GB pud level and
4KB page size, iterations are reduced by a factor of 256K.  In wall
clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms.

Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---
 include/linux/mm.h | 2 ++
 mm/memory.c        | 1 +
 2 files changed, 3 insertions(+)

Comments

Mitchell Augustin Feb. 7, 2025, 1:38 a.m. UTC | #1
Reviewed-by: "Mitchell Augustin" <mitchell.augustin@canonical.com>
Tested-by: "Mitchell Augustin" <mitchell.augustin@canonical.com>


On Wed, Feb 5, 2025 at 5:18 PM Alex Williamson
<alex.williamson@redhat.com> wrote:
>
> follow_pfnmap_start() walks the page table for a given address and
> fills out the struct follow_pfnmap_args in pfnmap_args_setup().
> The page mask of the page table level is already provided to this
> latter function for calculating the pfn.  This page mask can also be
> useful for the caller to determine the extent of the contiguous
> mapping.
>
> For example, vfio-pci now supports huge_fault for pfnmaps and is able
> to insert pud and pmd mappings.  When we DMA map these pfnmaps, ex.
> PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test
> for a contiguous pfn range.  Providing the mapping page mask allows us
> to skip the extent of the mapping level.  Assuming a 1GB pud level and
> 4KB page size, iterations are reduced by a factor of 256K.  In wall
> clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms.
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  include/linux/mm.h | 2 ++
>  mm/memory.c        | 1 +
>  2 files changed, 3 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b1c3db9cf355..0ef7e7a0b4eb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2416,11 +2416,13 @@ struct follow_pfnmap_args {
>          * Outputs:
>          *
>          * @pfn: the PFN of the address
> +        * @pgmask: page mask covering pfn
>          * @pgprot: the pgprot_t of the mapping
>          * @writable: whether the mapping is writable
>          * @special: whether the mapping is a special mapping (real PFN maps)
>          */
>         unsigned long pfn;
> +       unsigned long pgmask;
>         pgprot_t pgprot;
>         bool writable;
>         bool special;
> diff --git a/mm/memory.c b/mm/memory.c
> index 398c031be9ba..97ccd43761b2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6388,6 +6388,7 @@ static inline void pfnmap_args_setup(struct follow_pfnmap_args *args,
>         args->lock = lock;
>         args->ptep = ptep;
>         args->pfn = pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT);
> +       args->pgmask = addr_mask;
>         args->pgprot = pgprot;
>         args->writable = writable;
>         args->special = special;
> --
> 2.47.1
>


--
Mitchell Augustin
Software Engineer - Ubuntu Partner Engineering
Alex Williamson Feb. 14, 2025, 5:17 p.m. UTC | #2
Nudge.  Peter Xu provided an R-b for the series.  Would any other mm
folks like to chime in here to provide objection or approval for this
change and merging it through the vfio tree?  Series[1].  Thanks!

Alex

[1]https://lore.kernel.org/all/20250205231728.2527186-1-alex.williamson@redhat.com/

On Wed,  5 Feb 2025 16:17:20 -0700
Alex Williamson <alex.williamson@redhat.com> wrote:

> follow_pfnmap_start() walks the page table for a given address and
> fills out the struct follow_pfnmap_args in pfnmap_args_setup().
> The page mask of the page table level is already provided to this
> latter function for calculating the pfn.  This page mask can also be
> useful for the caller to determine the extent of the contiguous
> mapping.
> 
> For example, vfio-pci now supports huge_fault for pfnmaps and is able
> to insert pud and pmd mappings.  When we DMA map these pfnmaps, ex.
> PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test
> for a contiguous pfn range.  Providing the mapping page mask allows us
> to skip the extent of the mapping level.  Assuming a 1GB pud level and
> 4KB page size, iterations are reduced by a factor of 256K.  In wall
> clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  include/linux/mm.h | 2 ++
>  mm/memory.c        | 1 +
>  2 files changed, 3 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index b1c3db9cf355..0ef7e7a0b4eb 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2416,11 +2416,13 @@ struct follow_pfnmap_args {
>  	 * Outputs:
>  	 *
>  	 * @pfn: the PFN of the address
> +	 * @pgmask: page mask covering pfn
>  	 * @pgprot: the pgprot_t of the mapping
>  	 * @writable: whether the mapping is writable
>  	 * @special: whether the mapping is a special mapping (real PFN maps)
>  	 */
>  	unsigned long pfn;
> +	unsigned long pgmask;
>  	pgprot_t pgprot;
>  	bool writable;
>  	bool special;
> diff --git a/mm/memory.c b/mm/memory.c
> index 398c031be9ba..97ccd43761b2 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -6388,6 +6388,7 @@ static inline void pfnmap_args_setup(struct follow_pfnmap_args *args,
>  	args->lock = lock;
>  	args->ptep = ptep;
>  	args->pfn = pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT);
> +	args->pgmask = addr_mask;
>  	args->pgprot = pgprot;
>  	args->writable = writable;
>  	args->special = special;
Jason Gunthorpe Feb. 14, 2025, 7:14 p.m. UTC | #3
On Wed, Feb 05, 2025 at 04:17:20PM -0700, Alex Williamson wrote:
> follow_pfnmap_start() walks the page table for a given address and
> fills out the struct follow_pfnmap_args in pfnmap_args_setup().
> The page mask of the page table level is already provided to this
> latter function for calculating the pfn.  This page mask can also be
> useful for the caller to determine the extent of the contiguous
> mapping.
> 
> For example, vfio-pci now supports huge_fault for pfnmaps and is able
> to insert pud and pmd mappings.  When we DMA map these pfnmaps, ex.
> PCI MMIO BARs, we iterate follow_pfnmap_start() to get each pfn to test
> for a contiguous pfn range.  Providing the mapping page mask allows us
> to skip the extent of the mapping level.  Assuming a 1GB pud level and
> 4KB page size, iterations are reduced by a factor of 256K.  In wall
> clock time, mapping a 32GB PCI BAR is reduced from ~1s to <1ms.
> 
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
>  include/linux/mm.h | 2 ++
>  mm/memory.c        | 1 +
>  2 files changed, 3 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
David Hildenbrand Feb. 14, 2025, 9:39 p.m. UTC | #4
On 14.02.25 18:17, Alex Williamson wrote:
> 
> Nudge.  Peter Xu provided an R-b for the series.  Would any other mm
> folks like to chime in here to provide objection or approval for this
> change and merging it through the vfio tree?  Series[1].  Thanks!
> 

Only skimmed over it, nothing jumped at me except ...

Nitpicking:

I was wondering if "page mask" really the right term here. I know that 
we use it in some context (gup, hugetlb, zeropage) to express "mask this 
off and you get the start of the aligned huge page".

For something that walks PFNMAPs (page frames without any real "huge 
page" logical metadata etc. grouping) it was uintuitive for me at first.

addr_mask or pfn_mask (shifted addr_mask) would have been clearer for me.

No strong opinion, just what came to mind while reading this ...
Alex Williamson Feb. 17, 2025, 9:56 p.m. UTC | #5
On Fri, 14 Feb 2025 22:39:30 +0100
David Hildenbrand <david@redhat.com> wrote:

> On 14.02.25 18:17, Alex Williamson wrote:
> > 
> > Nudge.  Peter Xu provided an R-b for the series.  Would any other mm
> > folks like to chime in here to provide objection or approval for this
> > change and merging it through the vfio tree?  Series[1].  Thanks!
> >   
> 
> Only skimmed over it, nothing jumped at me except ...
> 
> Nitpicking:
> 
> I was wondering if "page mask" really the right term here. I know that 
> we use it in some context (gup, hugetlb, zeropage) to express "mask this 
> off and you get the start of the aligned huge page".
> 
> For something that walks PFNMAPs (page frames without any real "huge 
> page" logical metadata etc. grouping) it was uintuitive for me at first.
> 
> addr_mask or pfn_mask (shifted addr_mask) would have been clearer for me.
> 
> No strong opinion, just what came to mind while reading this ...

It's called addr_mask in pfnmap_args_setup() so I'm happy to keep that
naming if pgmask is less intuitive.  Thanks,

Alex
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index b1c3db9cf355..0ef7e7a0b4eb 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2416,11 +2416,13 @@  struct follow_pfnmap_args {
 	 * Outputs:
 	 *
 	 * @pfn: the PFN of the address
+	 * @pgmask: page mask covering pfn
 	 * @pgprot: the pgprot_t of the mapping
 	 * @writable: whether the mapping is writable
 	 * @special: whether the mapping is a special mapping (real PFN maps)
 	 */
 	unsigned long pfn;
+	unsigned long pgmask;
 	pgprot_t pgprot;
 	bool writable;
 	bool special;
diff --git a/mm/memory.c b/mm/memory.c
index 398c031be9ba..97ccd43761b2 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -6388,6 +6388,7 @@  static inline void pfnmap_args_setup(struct follow_pfnmap_args *args,
 	args->lock = lock;
 	args->ptep = ptep;
 	args->pfn = pfn_base + ((args->address & ~addr_mask) >> PAGE_SHIFT);
+	args->pgmask = addr_mask;
 	args->pgprot = pgprot;
 	args->writable = writable;
 	args->special = special;