diff mbox series

[03/19] mm: Mark special bits for huge pfn mappings when inject

Message ID 20240809160909.1023470-4-peterx@redhat.com (mailing list archive)
State New
Headers show
Series mm: Support huge pfnmaps | expand

Commit Message

Peter Xu Aug. 9, 2024, 4:08 p.m. UTC
We need these special bits to be around to enable gup-fast on pfnmaps.
Mark properly for !devmap case, reflecting that there's no page struct
backing the entry.

Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/huge_memory.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Jason Gunthorpe Aug. 14, 2024, 12:40 p.m. UTC | #1
On Fri, Aug 09, 2024 at 12:08:53PM -0400, Peter Xu wrote:
> We need these special bits to be around to enable gup-fast on pfnmaps.

It is not gup-fast you are after but follow_pfn/etc for KVM usage
right?

GUP family of functions should all fail on pfnmaps.

> Mark properly for !devmap case, reflecting that there's no page struct
> backing the entry.
> 
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  mm/huge_memory.c | 4 ++++
>  1 file changed, 4 insertions(+)

Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

Jason
Peter Xu Aug. 14, 2024, 3:23 p.m. UTC | #2
On Wed, Aug 14, 2024 at 09:40:00AM -0300, Jason Gunthorpe wrote:
> On Fri, Aug 09, 2024 at 12:08:53PM -0400, Peter Xu wrote:
> > We need these special bits to be around to enable gup-fast on pfnmaps.
> 
> It is not gup-fast you are after but follow_pfn/etc for KVM usage
> right?

Gup-fast needs it to make sure we don't pmd_page() it and fail early.  So
still needed in some sort..

But yeah, this comment is ambiguous and not describing the whole picture,
as multiple places will so far rely this bit, e.g. fork() to identify a
private page or pfnmap. Similarly we'll do that in folio_walk_start(), and
follow_pfnmap.  I plan to simplify that to:

  We need these special bits to be around on pfnmaps.  Mark properly for
  !devmap case, reflecting that there's no page struct backing the entry.

> 
> GUP family of functions should all fail on pfnmaps.
> 
> > Mark properly for !devmap case, reflecting that there's no page struct
> > backing the entry.
> > 
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  mm/huge_memory.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> 
> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>

So I'll tentatively take this with the amended commit message, unless
there's objection.

Thanks,
Jason Gunthorpe Aug. 14, 2024, 3:53 p.m. UTC | #3
On Wed, Aug 14, 2024 at 11:23:41AM -0400, Peter Xu wrote:
> On Wed, Aug 14, 2024 at 09:40:00AM -0300, Jason Gunthorpe wrote:
> > On Fri, Aug 09, 2024 at 12:08:53PM -0400, Peter Xu wrote:
> > > We need these special bits to be around to enable gup-fast on pfnmaps.
> > 
> > It is not gup-fast you are after but follow_pfn/etc for KVM usage
> > right?
> 
> Gup-fast needs it to make sure we don't pmd_page() it and fail early.  So
> still needed in some sort..

Yes, but making gup-fast fail is not "enabling" it :)

> But yeah, this comment is ambiguous and not describing the whole picture,
> as multiple places will so far rely this bit, e.g. fork() to identify a
> private page or pfnmap. Similarly we'll do that in folio_walk_start(), and
> follow_pfnmap.  I plan to simplify that to:
> 
>   We need these special bits to be around on pfnmaps.  Mark properly for
>   !devmap case, reflecting that there's no page struct backing the entry.

Yes

Jason
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 39c401a62e87..e95b3a468aee 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1162,6 +1162,8 @@  static void insert_pfn_pmd(struct vm_area_struct *vma, unsigned long addr,
 	entry = pmd_mkhuge(pfn_t_pmd(pfn, prot));
 	if (pfn_t_devmap(pfn))
 		entry = pmd_mkdevmap(entry);
+	else
+		entry = pmd_mkspecial(entry);
 	if (write) {
 		entry = pmd_mkyoung(pmd_mkdirty(entry));
 		entry = maybe_pmd_mkwrite(entry, vma);
@@ -1258,6 +1260,8 @@  static void insert_pfn_pud(struct vm_area_struct *vma, unsigned long addr,
 	entry = pud_mkhuge(pfn_t_pud(pfn, prot));
 	if (pfn_t_devmap(pfn))
 		entry = pud_mkdevmap(entry);
+	else
+		entry = pud_mkspecial(entry);
 	if (write) {
 		entry = pud_mkyoung(pud_mkdirty(entry));
 		entry = maybe_pud_mkwrite(entry, vma);