Message ID | 20190131183706.20980-2-jglisse@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Restore change_pte optimization to its former glory | expand |
On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote: > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > > flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); > ptep_clear_flush_notify(vma, addr, pvmw.pte); > - set_pte_at_notify(mm, addr, pvmw.pte, > - mk_pte(new_page, vma->vm_page_prot)); > + set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); > > page_remove_rmap(old_page, false); > if (!page_mapped(old_page)) This seems racy by design in the way it copies the page, if the vma mapping isn't readonly to begin with (in which case it'd be ok to change the pfn with change_pte too, it'd be a from read-only to read-only change which is ok). If the code copies a writable page there's no much issue if coherency is lost by other means too. Said that this isn't a worthwhile optimization for uprobes so because of the lack of explicit read-only enforcement, I agree it's simpler to skip change_pte above. It's orthogonal, but in this function the mmu_notifier_invalidate_range_end(&range); can be optimized to mmu_notifier_invalidate_range_only_end(&range); otherwise there's no point to retain the _notify in ptep_clear_flush_notify.
Background we are discussing __replace_page() in: kernel/events/uprobes.c and wether this can be call on page that can be written too through its virtual address mapping. On Fri, Feb 01, 2019 at 07:50:22PM -0500, Andrea Arcangeli wrote: > On Thu, Jan 31, 2019 at 01:37:03PM -0500, Jerome Glisse wrote: > > @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, > > > > flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); > > ptep_clear_flush_notify(vma, addr, pvmw.pte); > > - set_pte_at_notify(mm, addr, pvmw.pte, > > - mk_pte(new_page, vma->vm_page_prot)); > > + set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); > > > > page_remove_rmap(old_page, false); > > if (!page_mapped(old_page)) > > This seems racy by design in the way it copies the page, if the vma > mapping isn't readonly to begin with (in which case it'd be ok to > change the pfn with change_pte too, it'd be a from read-only to > read-only change which is ok). > > If the code copies a writable page there's no much issue if coherency > is lost by other means too. I am not sure the race exist but i am not familiar with the uprobe code so maybe the page is already write protected and thus the copy is fine and in fact that is likely the case but there is not check for that. Maybe there should be a check ? Maybe someone familiar with this code can chime in. > > Said that this isn't a worthwhile optimization for uprobes so because > of the lack of explicit read-only enforcement, I agree it's simpler to > skip change_pte above. > > It's orthogonal, but in this function the > mmu_notifier_invalidate_range_end(&range); can be optimized to > mmu_notifier_invalidate_range_only_end(&range); otherwise there's no > point to retain the _notify in ptep_clear_flush_notify. We need to keep the _notify for IOMMU otherwise it would break IOMMU. IOMMU can walk the page table at any time and thus we need to first clear the table then notify the IOMMU to flush TLB on all the devices that might have a TLB entry. Only then can we set the new pte. But yes the mmu_notifier_invalidate_range_end can be optimized to only end. I will do a separate patch for this. As it is orthogonal as you pointed out :) Cheers, Jérôme
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 87e76a1dc758..a4807b1edd7f 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -207,8 +207,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr, flush_cache_page(vma, addr, pte_pfn(*pvmw.pte)); ptep_clear_flush_notify(vma, addr, pvmw.pte); - set_pte_at_notify(mm, addr, pvmw.pte, - mk_pte(new_page, vma->vm_page_prot)); + set_pte_at(mm, addr, pvmw.pte, mk_pte(new_page, vma->vm_page_prot)); page_remove_rmap(old_page, false); if (!page_mapped(old_page))