diff mbox series

[RFC,1/4] uprobes: use set_pte_at() not set_pte_at_notify()

Message ID 20190131183706.20980-2-jglisse@redhat.com (mailing list archive)
State New, archived
Headers show
Series Restore change_pte optimization to its former glory | expand

Commit Message

Jerome Glisse Jan. 31, 2019, 6:37 p.m. UTC
From: Jérôme Glisse <jglisse@redhat.com>

Using set_pte_at_notify() trigger useless calls to change_pte() so just
use set_pte_at() instead. The reason is that set_pte_at_notify() should
only be use when going from either a read and write pte to read only pte
with same pfn, or from read only to read and write with a different pfn.

The set_pte_at_notify() was use because __replace_page() code came from
the mm/ksm.c code in which the above rules are valid.

Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: kvm@vger.kernel.org
---
 kernel/events/uprobes.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Andrea Arcangeli Feb. 2, 2019, 12:50 a.m. UTC | #1
On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote:
> @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  
>  	flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
>  	ptep_clear_flush_notify(vma, addr, pvmw.pte);
> -	set_pte_at_notify(mm, addr, pvmw.pte,
> -			mk_pte(new_page, vma->vm_page_prot));
> +	set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot));
>  
>  	page_remove_rmap(old_page, false);
>  	if (!page_mapped(old_page))

This seems racy by design in the way it copies the page, if the vma
mapping isn't readonly to begin with (in which case it'd be ok to
change the pfn with change_pte too, it'd be a from read-only to
read-only change which is ok).

If the code copies a writable page there's no much issue if coherency
is lost by other means too.

Said that this isn't a worthwhile optimization for uprobes so because
of the lack of explicit read-only enforcement, I agree it's simpler to
skip change_pte above.

It's orthogonal, but in this function the
mmu_notifier_invalidate_range_end(&range); can be optimized to
mmu_notifier_invalidate_range_only_end(&range); otherwise there's no
point to retain the _notify in ptep_clear_flush_notify.
Jerome Glisse Feb. 11, 2019, 7:27 p.m. UTC | #2
Background we are discussing __replace_page() in:
    kernel/events/uprobes.c

and wether this can be call on page that can be written too through
its virtual address mapping.

On Fri, Feb 01, 2019 at 07:50:22PM -0500, Andrea Arcangeli wrote:
> On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote:
> > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
> >  
> >  	flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
> >  	ptep_clear_flush_notify(vma, addr, pvmw.pte);
> > -	set_pte_at_notify(mm, addr, pvmw.pte,
> > -			mk_pte(new_page, vma->vm_page_prot));
> > +	set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot));
> >  
> >  	page_remove_rmap(old_page, false);
> >  	if (!page_mapped(old_page))
> 
> This seems racy by design in the way it copies the page, if the vma
> mapping isn't readonly to begin with (in which case it'd be ok to
> change the pfn with change_pte too, it'd be a from read-only to
> read-only change which is ok).
> 
> If the code copies a writable page there's no much issue if coherency
> is lost by other means too.

I am not sure the race exist but i am not familiar with the uprobe
code so maybe the page is already write protected and thus the copy
is fine and in fact that is likely the case but there is not check
for that. Maybe there should be a check ?

Maybe someone familiar with this code can chime in.

> 
> Said that this isn't a worthwhile optimization for uprobes so because
> of the lack of explicit read-only enforcement, I agree it's simpler to
> skip change_pte above.
> 
> It's orthogonal, but in this function the
> mmu_notifier_invalidate_range_end(&range); can be optimized to
> mmu_notifier_invalidate_range_only_end(&range); otherwise there's no
> point to retain the _notify in ptep_clear_flush_notify.

We need to keep the _notify for IOMMU otherwise it would break IOMMU.
IOMMU can walk the page table at any time and thus we need to first
clear the table then notify the IOMMU to flush TLB on all the devices
that might have a TLB entry. Only then can we set the new pte.

But yes the mmu_notifier_invalidate_range_end can be optimized to
only end. I will do a separate patch for this. As it is orthogonal as
you pointed out :)

Cheers,
Jérôme
diff mbox series

Patch

diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 87e76a1dc758..a4807b1edd7f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -207,8 +207,7 @@  static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 
 	flush_cache_page(vma, addr, pte_pfn(*pvmw.pte));
 	ptep_clear_flush_notify(vma, addr, pvmw.pte);
-	set_pte_at_notify(mm, addr, pvmw.pte,
-			mk_pte(new_page, vma->vm_page_prot));
+	set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot));
 
 	page_remove_rmap(old_page, false);
 	if (!page_mapped(old_page))