Message ID | 20230727212845.135673-3-david@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | smaps / mm/gup: fix gup_can_follow_protnone fallout | expand |
On 7/27/23 14:28, David Hildenbrand wrote: > We accidentally enforced PROT_NONE PTE/PMD permission checks for > follow_page() like we do for get_user_pages() and friends. That was > undesired, because follow_page() is usually only used to lookup a currently > mapped page, not to actually access it. Further, follow_page() does not > actually trigger fault handling, but instead simply fails. I see that follow_page() is also completely undocumented. And that reduces us to deducing how it should be used...these things that change follow_page()'s behavior maybe should have a go at documenting it too, perhaps. > > Let's restore that behavior by conditionally setting FOLL_FORCE if > FOLL_WRITE is not set. This way, for example KSM and migration code will > no longer fail on PROT_NONE mapped PTEs/PMDS. > > Handling this internally doesn't require us to add any new FOLL_FORCE > usage outside of GUP code. > > While at it, refuse to accept FOLL_FORCE: we don't even perform VMA > permission checks like in check_vma_flags(), so especially > FOLL_FORCE|FOLL_WRITE would be dodgy. > > This issue was identified by code inspection. We'll add some > documentation regarding FOLL_FORCE next. > > Reported-by: Peter Xu <peterx@redhat.com> > Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") > Cc: <stable@vger.kernel.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/gup.c | 10 +++++++++- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/mm/gup.c b/mm/gup.c > index 2493ffa10f4b..da9a5cc096ac 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, > if (vma_is_secretmem(vma)) > return NULL; > > - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) > + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE))) > return NULL; This is not a super happy situation: follow_page() is now prohibited (see above: we should document that interface) from passing in FOLL_FORCE... > > + /* > + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages > + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's > + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set. > + */ > + if (!(foll_flags & FOLL_WRITE)) > + foll_flags |= FOLL_FORCE; > + ...but then we set it anyway, for special cases. It's awkward because FOLL_FORCE is not an "internal to gup" flag (yet?). I don't yet have suggestions, other than: 1) Yes, the FOLL_NUMA made things bad. 2) And they are still very confusing, especially the new use of FOLL_FORCE. ...I'll try to let this soak in and maybe recommend something in a more productive way. :) thanks,
On 28.07.23 04:30, John Hubbard wrote: > On 7/27/23 14:28, David Hildenbrand wrote: >> We accidentally enforced PROT_NONE PTE/PMD permission checks for >> follow_page() like we do for get_user_pages() and friends. That was >> undesired, because follow_page() is usually only used to lookup a currently >> mapped page, not to actually access it. Further, follow_page() does not >> actually trigger fault handling, but instead simply fails. > > I see that follow_page() is also completely undocumented. And that > reduces us to deducing how it should be used...these things that > change follow_page()'s behavior maybe should have a go at documenting > it too, perhaps. I can certainly be motivated to do that. :) > >> >> Let's restore that behavior by conditionally setting FOLL_FORCE if >> FOLL_WRITE is not set. This way, for example KSM and migration code will >> no longer fail on PROT_NONE mapped PTEs/PMDS. >> >> Handling this internally doesn't require us to add any new FOLL_FORCE >> usage outside of GUP code. >> >> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA >> permission checks like in check_vma_flags(), so especially >> FOLL_FORCE|FOLL_WRITE would be dodgy. >> >> This issue was identified by code inspection. We'll add some >> documentation regarding FOLL_FORCE next. >> >> Reported-by: Peter Xu <peterx@redhat.com> >> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") >> Cc: <stable@vger.kernel.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/gup.c | 10 +++++++++- >> 1 file changed, 9 insertions(+), 1 deletion(-) >> >> diff --git a/mm/gup.c b/mm/gup.c >> index 2493ffa10f4b..da9a5cc096ac 100644 >> --- a/mm/gup.c >> +++ b/mm/gup.c >> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, >> if (vma_is_secretmem(vma)) >> return NULL; >> >> - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) >> + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE))) >> return NULL; > > This is not a super happy situation: follow_page() is now prohibited > (see above: we should document that interface) from passing in > FOLL_FORCE... I guess you saw my patch #4. If you take a look at the existing callers (that are fortunately very limited), you'll see that nobody cares. Most of the FOLL flags don't make any sense for follow_page(), and limiting further (ab)use is at least to me very appealing. > >> >> + /* >> + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages >> + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's >> + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set. >> + */ >> + if (!(foll_flags & FOLL_WRITE)) >> + foll_flags |= FOLL_FORCE; >> + > > ...but then we set it anyway, for special cases. It's awkward because > FOLL_FORCE is not an "internal to gup" flag (yet?). > > I don't yet have suggestions, other than: > > 1) Yes, the FOLL_NUMA made things bad. > > 2) And they are still very confusing, especially the new use of > FOLL_FORCE. > > ...I'll try to let this soak in and maybe recommend something > in a more productive way. :) What I can offer that might be very appealing is the following: Get rid of the flags parameter for follow_page() *completely*. Yes, then we can even rename FOLL_ to something reasonable in the context where it is nowadays used ;) Internally, we'll then set FOLL_GET | FOLL_DUMP | FOLL_FORCE and document exactly what this functions does. Any user that needs something different should just look into using get_user_pages() instead. I can prototype that on top of this work easily.
On 28.07.23 11:08, David Hildenbrand wrote: > On 28.07.23 04:30, John Hubbard wrote: >> On 7/27/23 14:28, David Hildenbrand wrote: >>> We accidentally enforced PROT_NONE PTE/PMD permission checks for >>> follow_page() like we do for get_user_pages() and friends. That was >>> undesired, because follow_page() is usually only used to lookup a currently >>> mapped page, not to actually access it. Further, follow_page() does not >>> actually trigger fault handling, but instead simply fails. >> >> I see that follow_page() is also completely undocumented. And that >> reduces us to deducing how it should be used...these things that >> change follow_page()'s behavior maybe should have a go at documenting >> it too, perhaps. > > I can certainly be motivated to do that. :) > >> >>> >>> Let's restore that behavior by conditionally setting FOLL_FORCE if >>> FOLL_WRITE is not set. This way, for example KSM and migration code will >>> no longer fail on PROT_NONE mapped PTEs/PMDS. >>> >>> Handling this internally doesn't require us to add any new FOLL_FORCE >>> usage outside of GUP code. >>> >>> While at it, refuse to accept FOLL_FORCE: we don't even perform VMA >>> permission checks like in check_vma_flags(), so especially >>> FOLL_FORCE|FOLL_WRITE would be dodgy. >>> >>> This issue was identified by code inspection. We'll add some >>> documentation regarding FOLL_FORCE next. >>> >>> Reported-by: Peter Xu <peterx@redhat.com> >>> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") >>> Cc: <stable@vger.kernel.org> >>> Signed-off-by: David Hildenbrand <david@redhat.com> >>> --- >>> mm/gup.c | 10 +++++++++- >>> 1 file changed, 9 insertions(+), 1 deletion(-) >>> >>> diff --git a/mm/gup.c b/mm/gup.c >>> index 2493ffa10f4b..da9a5cc096ac 100644 >>> --- a/mm/gup.c >>> +++ b/mm/gup.c >>> @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, >>> if (vma_is_secretmem(vma)) >>> return NULL; >>> >>> - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) >>> + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE))) >>> return NULL; >> >> This is not a super happy situation: follow_page() is now prohibited >> (see above: we should document that interface) from passing in >> FOLL_FORCE... > > I guess you saw my patch #4. > > If you take a look at the existing callers (that are fortunately very > limited), you'll see that nobody cares. > > Most of the FOLL flags don't make any sense for follow_page(), and > limiting further (ab)use is at least to me very appealing. > >> >>> >>> + /* >>> + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages >>> + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's >>> + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set. >>> + */ >>> + if (!(foll_flags & FOLL_WRITE)) >>> + foll_flags |= FOLL_FORCE; >>> + >> >> ...but then we set it anyway, for special cases. It's awkward because >> FOLL_FORCE is not an "internal to gup" flag (yet?). >> >> I don't yet have suggestions, other than: >> >> 1) Yes, the FOLL_NUMA made things bad. >> >> 2) And they are still very confusing, especially the new use of >> FOLL_FORCE. >> >> ...I'll try to let this soak in and maybe recommend something >> in a more productive way. :) > > What I can offer that might be very appealing is the following: > > Get rid of the flags parameter for follow_page() *completely*. Yes, then > we can even rename FOLL_ to something reasonable in the context where it > is nowadays used ;) > > > Internally, we'll then set > > FOLL_GET | FOLL_DUMP | FOLL_FORCE > > and document exactly what this functions does. Any user that needs > something different should just look into using get_user_pages() instead. > > I can prototype that on top of this work easily. The end result looks something like: /** * follow_page - look up and reference a page descriptor from a user-virtual * address * @vma: vm_area_struct mapping @address * @address: virtual address to look up * * follow_page() will look up the page mapped at the given address and * take a reference on the page. The returned page has to be released using * put_page(). * * follow_page() will not return special (like zero) pages and does not check * PTE protection: the returned page might be mapped PROT_NONE, R/O or R/W. * Consequently, follow_page() will not trigger NUMA hinting faults. * * follow_page() does not trigger page faults. If no page is mapped, or * a special (like zero) page is mapped, it returns %NULL or an error pointer. * * Note: new users with different requirements are probably better off using * one of the get_user_pages() variants or one of the walk_page_range() * variants. * * Return: the mapped (struct page *), %NULL if no mapping exists, or * an error pointer if there is a mapping to something not represented * by a page descriptor (see also vm_normal_page()) or the zero page. */ struct page *follow_page(struct vm_area_struct *vma, unsigned long address) { struct follow_page_context ctx = { NULL }; unsigned long gup_flags; struct page *page; if (vma_is_secretmem(vma)) return NULL; /* * FOLL_GET: We always want a reference on the returned page. * FOL_DUMP: Ignore special (like zero) pages. * FOLL_FORCE: Succeeded on PROT_NONE-mapped pages. */ gup_flags = FOLL_GET | FOLL_DUMP | FOLL_FORCE; page = follow_page_mask(vma, address, gup_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap); return page; }
diff --git a/mm/gup.c b/mm/gup.c index 2493ffa10f4b..da9a5cc096ac 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address, if (vma_is_secretmem(vma)) return NULL; - if (WARN_ON_ONCE(foll_flags & FOLL_PIN)) + if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE))) return NULL; + /* + * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages + * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's + * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set. + */ + if (!(foll_flags & FOLL_WRITE)) + foll_flags |= FOLL_FORCE; + page = follow_page_mask(vma, address, foll_flags, &ctx); if (ctx.pgmap) put_dev_pagemap(ctx.pgmap);
We accidentally enforced PROT_NONE PTE/PMD permission checks for follow_page() like we do for get_user_pages() and friends. That was undesired, because follow_page() is usually only used to lookup a currently mapped page, not to actually access it. Further, follow_page() does not actually trigger fault handling, but instead simply fails. Let's restore that behavior by conditionally setting FOLL_FORCE if FOLL_WRITE is not set. This way, for example KSM and migration code will no longer fail on PROT_NONE mapped PTEs/PMDS. Handling this internally doesn't require us to add any new FOLL_FORCE usage outside of GUP code. While at it, refuse to accept FOLL_FORCE: we don't even perform VMA permission checks like in check_vma_flags(), so especially FOLL_FORCE|FOLL_WRITE would be dodgy. This issue was identified by code inspection. We'll add some documentation regarding FOLL_FORCE next. Reported-by: Peter Xu <peterx@redhat.com> Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()") Cc: <stable@vger.kernel.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/gup.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)