[2/2] mm/hmm: make full use of walk_page_range()
diff mbox series

Message ID 20190723233016.26403-3-rcampbell@nvidia.com
State New
Headers show
Series
  • mm/hmm: more HMM clean up
Related show

Commit Message

Ralph Campbell July 23, 2019, 11:30 p.m. UTC
hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
walk_page_range() in a loop. This is unnecessary duplication since
walk_page_range() calls find_vma() in a loop already.
Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
walk_test() callback function to filter unhandled vmas.

Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>
Cc: Jason Gunthorpe <jgg@mellanox.com>
Cc: Christoph Hellwig <hch@lst.de>
---
 mm/hmm.c | 197 ++++++++++++++++++++-----------------------------------
 1 file changed, 70 insertions(+), 127 deletions(-)

Comments

Christoph Hellwig July 24, 2019, 6:51 a.m. UTC | #1
On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> walk_page_range() in a loop. This is unnecessary duplication since
> walk_page_range() calls find_vma() in a loop already.
> Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> walk_test() callback function to filter unhandled vmas.

I like the approach a lot!

But we really need to sort out the duplication between hmm_range_fault
and hmm_range_snapshot first, as they are basically the same code.  I
have patches here:

http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74

http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

That being said we don't really have any users for the snapshot mode
or non-blocking faults, and I don't see any in the immediate pipeline
either.  It might actually be a better idea to just kill that stuff off
for now until we have a user, as code without users is per definition
untested and will just bitrot and break.

> +	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
> +	struct hmm_vma_walk *hmm_vma_walk = walk->private;
> +	struct hmm_range *range = hmm_vma_walk->range;
> +	struct vm_area_struct *vma = walk->vma;
> +
> +	/* If range is no longer valid, force retry. */
> +	if (!range->valid)
> +		return -EBUSY;
> +
> +	if (vma->vm_flags & device_vma)

Can we just kill off this odd device_vma variable?

	if (vma->vm_flags & (VM_IO | VM_PFNMAP | VM_MIXEDMAP))

and maybe add a comment on why we are skipping them (because they
don't have struct page backing I guess..)
Jason Gunthorpe July 24, 2019, 11:53 a.m. UTC | #2
On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:
> On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
> > hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
> > walk_page_range() in a loop. This is unnecessary duplication since
> > walk_page_range() calls find_vma() in a loop already.
> > Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
> > walk_test() callback function to filter unhandled vmas.
> 
> I like the approach a lot!
> 
> But we really need to sort out the duplication between hmm_range_fault
> and hmm_range_snapshot first, as they are basically the same code.  I
> have patches here:
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74
> 
> http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e

Yeah, that is a straightforward improvement, maybe Ralph should grab
these two as part of his series?

> That being said we don't really have any users for the snapshot mode
> or non-blocking faults, and I don't see any in the immediate pipeline
> either.

If this code was production ready I'd use it in ODP right away.

When we first create a ODP MR we'd want to snapshot to pre-load the
NIC tables with something other than page fault, but not fault
anything.

This would be a big performance improvement for ODP.

Jason
Ralph Campbell July 24, 2019, 7:50 p.m. UTC | #3
On 7/24/19 4:53 AM, Jason Gunthorpe wrote:
> On Wed, Jul 24, 2019 at 08:51:46AM +0200, Christoph Hellwig wrote:
>> On Tue, Jul 23, 2019 at 04:30:16PM -0700, Ralph Campbell wrote:
>>> hmm_range_snapshot() and hmm_range_fault() both call find_vma() and
>>> walk_page_range() in a loop. This is unnecessary duplication since
>>> walk_page_range() calls find_vma() in a loop already.
>>> Simplify hmm_range_snapshot() and hmm_range_fault() by defining a
>>> walk_test() callback function to filter unhandled vmas.
>>
>> I like the approach a lot!
>>
>> But we really need to sort out the duplication between hmm_range_fault
>> and hmm_range_snapshot first, as they are basically the same code.  I
>> have patches here:
>>
>> http://git.infradead.org/users/hch/misc.git/commitdiff/a34ccd30ee8a8a3111d9e91711c12901ed7dea74
>>
>> http://git.infradead.org/users/hch/misc.git/commitdiff/81f442ebac7170815af7770a1efa9c4ab662137e
> 
> Yeah, that is a straightforward improvement, maybe Ralph should grab
> these two as part of his series?

Sure, no problem.
I'll add them in v2 when I fix the other issues in the series.

>> That being said we don't really have any users for the snapshot mode
>> or non-blocking faults, and I don't see any in the immediate pipeline
>> either.
> 
> If this code was production ready I'd use it in ODP right away.
> 
> When we first create a ODP MR we'd want to snapshot to pre-load the
> NIC tables with something other than page fault, but not fault
> anything.
> 
> This would be a big performance improvement for ODP.
> 
> Jason
>

Patch
diff mbox series

diff --git a/mm/hmm.c b/mm/hmm.c
index 8271f110c243..218557451070 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -839,13 +839,40 @@  static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask,
 #endif
 }
 
-static void hmm_pfns_clear(struct hmm_range *range,
-			   uint64_t *pfns,
-			   unsigned long addr,
-			   unsigned long end)
+static int hmm_vma_walk_test(unsigned long start,
+			     unsigned long end,
+			     struct mm_walk *walk)
 {
-	for (; addr < end; addr += PAGE_SIZE, pfns++)
-		*pfns = range->values[HMM_PFN_NONE];
+	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
+	struct hmm_vma_walk *hmm_vma_walk = walk->private;
+	struct hmm_range *range = hmm_vma_walk->range;
+	struct vm_area_struct *vma = walk->vma;
+
+	/* If range is no longer valid, force retry. */
+	if (!range->valid)
+		return -EBUSY;
+
+	if (vma->vm_flags & device_vma)
+		return -EFAULT;
+
+	if (is_vm_hugetlb_page(vma)) {
+		if (huge_page_shift(hstate_vma(vma)) != range->page_shift &&
+		    range->page_shift != PAGE_SHIFT)
+			return -EINVAL;
+	} else {
+		if (range->page_shift != PAGE_SHIFT)
+			return -EINVAL;
+	}
+
+	/*
+	 * If vma does not allow read access, then assume that it does not
+	 * allow write access, either. HMM does not support architectures
+	 * that allow write without read.
+	 */
+	if (!(vma->vm_flags & VM_READ))
+		return -EPERM;
+
+	return 0;
 }
 
 /*
@@ -949,65 +976,28 @@  EXPORT_SYMBOL(hmm_range_unregister);
  */
 long hmm_range_snapshot(struct hmm_range *range)
 {
-	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
-	unsigned long start = range->start, end;
-	struct hmm_vma_walk hmm_vma_walk;
+	unsigned long start = range->start;
+	struct hmm_vma_walk hmm_vma_walk = {};
 	struct hmm *hmm = range->hmm;
-	struct vm_area_struct *vma;
-	struct mm_walk mm_walk;
+	struct mm_walk mm_walk = {};
+	int ret;
 
 	lockdep_assert_held(&hmm->mm->mmap_sem);
-	do {
-		/* If range is no longer valid force retry. */
-		if (!range->valid)
-			return -EBUSY;
 
-		vma = find_vma(hmm->mm, start);
-		if (vma == NULL || (vma->vm_flags & device_vma))
-			return -EFAULT;
-
-		if (is_vm_hugetlb_page(vma)) {
-			if (huge_page_shift(hstate_vma(vma)) !=
-				    range->page_shift &&
-			    range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		} else {
-			if (range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		}
+	hmm_vma_walk.range = range;
+	hmm_vma_walk.last = start;
+	mm_walk.private = &hmm_vma_walk;
 
-		if (!(vma->vm_flags & VM_READ)) {
-			/*
-			 * If vma do not allow read access, then assume that it
-			 * does not allow write access, either. HMM does not
-			 * support architecture that allow write without read.
-			 */
-			hmm_pfns_clear(range, range->pfns,
-				range->start, range->end);
-			return -EPERM;
-		}
+	mm_walk.mm = hmm->mm;
+	mm_walk.pud_entry = hmm_vma_walk_pud;
+	mm_walk.pmd_entry = hmm_vma_walk_pmd;
+	mm_walk.pte_hole = hmm_vma_walk_hole;
+	mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+	mm_walk.test_walk = hmm_vma_walk_test;
 
-		range->vma = vma;
-		hmm_vma_walk.pgmap = NULL;
-		hmm_vma_walk.last = start;
-		hmm_vma_walk.fault = false;
-		hmm_vma_walk.range = range;
-		mm_walk.private = &hmm_vma_walk;
-		end = min(range->end, vma->vm_end);
-
-		mm_walk.vma = vma;
-		mm_walk.mm = vma->vm_mm;
-		mm_walk.pte_entry = NULL;
-		mm_walk.test_walk = NULL;
-		mm_walk.hugetlb_entry = NULL;
-		mm_walk.pud_entry = hmm_vma_walk_pud;
-		mm_walk.pmd_entry = hmm_vma_walk_pmd;
-		mm_walk.pte_hole = hmm_vma_walk_hole;
-		mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
-		walk_page_range(start, end, &mm_walk);
-		start = end;
-	} while (start < range->end);
+	ret = walk_page_range(start, range->end, &mm_walk);
+	if (ret)
+		return ret;
 
 	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
 }
@@ -1043,83 +1033,36 @@  EXPORT_SYMBOL(hmm_range_snapshot);
  */
 long hmm_range_fault(struct hmm_range *range, bool block)
 {
-	const unsigned long device_vma = VM_IO | VM_PFNMAP | VM_MIXEDMAP;
-	unsigned long start = range->start, end;
-	struct hmm_vma_walk hmm_vma_walk;
+	unsigned long start = range->start;
+	struct hmm_vma_walk hmm_vma_walk = {};
 	struct hmm *hmm = range->hmm;
-	struct vm_area_struct *vma;
-	struct mm_walk mm_walk;
+	struct mm_walk mm_walk = {};
 	int ret;
 
 	lockdep_assert_held(&hmm->mm->mmap_sem);
 
-	do {
-		/* If range is no longer valid force retry. */
-		if (!range->valid)
-			return -EBUSY;
+	hmm_vma_walk.range = range;
+	hmm_vma_walk.last = start;
+	hmm_vma_walk.fault = true;
+	hmm_vma_walk.block = block;
+	mm_walk.private = &hmm_vma_walk;
 
-		vma = find_vma(hmm->mm, start);
-		if (vma == NULL || (vma->vm_flags & device_vma))
-			return -EFAULT;
-
-		if (is_vm_hugetlb_page(vma)) {
-			if (huge_page_shift(hstate_vma(vma)) !=
-			    range->page_shift &&
-			    range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		} else {
-			if (range->page_shift != PAGE_SHIFT)
-				return -EINVAL;
-		}
+	mm_walk.mm = hmm->mm;
+	mm_walk.pud_entry = hmm_vma_walk_pud;
+	mm_walk.pmd_entry = hmm_vma_walk_pmd;
+	mm_walk.pte_hole = hmm_vma_walk_hole;
+	mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
+	mm_walk.test_walk = hmm_vma_walk_test;
 
-		if (!(vma->vm_flags & VM_READ)) {
-			/*
-			 * If vma do not allow read access, then assume that it
-			 * does not allow write access, either. HMM does not
-			 * support architecture that allow write without read.
-			 */
-			hmm_pfns_clear(range, range->pfns,
-				range->start, range->end);
-			return -EPERM;
-		}
+	do {
+		ret = walk_page_range(start, range->end, &mm_walk);
+		start = hmm_vma_walk.last;
 
-		range->vma = vma;
-		hmm_vma_walk.pgmap = NULL;
-		hmm_vma_walk.last = start;
-		hmm_vma_walk.fault = true;
-		hmm_vma_walk.block = block;
-		hmm_vma_walk.range = range;
-		mm_walk.private = &hmm_vma_walk;
-		end = min(range->end, vma->vm_end);
-
-		mm_walk.vma = vma;
-		mm_walk.mm = vma->vm_mm;
-		mm_walk.pte_entry = NULL;
-		mm_walk.test_walk = NULL;
-		mm_walk.hugetlb_entry = NULL;
-		mm_walk.pud_entry = hmm_vma_walk_pud;
-		mm_walk.pmd_entry = hmm_vma_walk_pmd;
-		mm_walk.pte_hole = hmm_vma_walk_hole;
-		mm_walk.hugetlb_entry = hmm_vma_walk_hugetlb_entry;
-
-		do {
-			ret = walk_page_range(start, end, &mm_walk);
-			start = hmm_vma_walk.last;
-
-			/* Keep trying while the range is valid. */
-		} while (ret == -EBUSY && range->valid);
-
-		if (ret) {
-			unsigned long i;
-
-			i = (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
-			hmm_pfns_clear(range, &range->pfns[i],
-				hmm_vma_walk.last, range->end);
-			return ret;
-		}
-		start = end;
+		/* Keep trying while the range is valid. */
+	} while (ret == -EBUSY && range->valid);
 
-	} while (start < range->end);
+	if (ret)
+		return ret;
 
 	return (hmm_vma_walk.last - range->start) >> PAGE_SHIFT;
 }