diff mbox series

vfio/pci: Insert full vma on mmap'd MMIO fault

Message ID 20240607035213.2054226-1-alex.williamson@redhat.com (mailing list archive)
State New
Headers show
Series vfio/pci: Insert full vma on mmap'd MMIO fault | expand

Commit Message

Alex Williamson June 7, 2024, 3:52 a.m. UTC
In order to improve performance of typical scenarios we can try to insert
the entire vma on fault.  This accelerates typical cases, such as when
the MMIO region is DMA mapped by QEMU.  The vfio_iommu_type1 driver will
fault in the entire DMA mapped range through fixup_user_fault().

In synthetic testing, this improves the time required to walk a PCI BAR
mapping from userspace by roughly 1/3rd.

This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain
support for pfnmaps.

Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
---

I'm sending this as a follow-on patch to the v2 series[1] because this
is largely a performance optimization, and one that we may want to
revert when we can introduce huge_fault support.  In the meantime, I
can't argue with the 1/3rd performance improvement this provides to
reduce the overall impact of the series below.  Without objection I'd
therefore target this for v6.10 as well.  Thanks,

Alex

[1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/

 drivers/vfio/pci/vfio_pci_core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Alex Williamson June 11, 2024, 3:23 p.m. UTC | #1
Any support for this or should we just go with the v2 series[1] by
itself for v6.10?  Thanks,

Alex

[1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/

On Thu,  6 Jun 2024 21:52:07 -0600
Alex Williamson <alex.williamson@redhat.com> wrote:

> In order to improve performance of typical scenarios we can try to insert
> the entire vma on fault.  This accelerates typical cases, such as when
> the MMIO region is DMA mapped by QEMU.  The vfio_iommu_type1 driver will
> fault in the entire DMA mapped range through fixup_user_fault().
> 
> In synthetic testing, this improves the time required to walk a PCI BAR
> mapping from userspace by roughly 1/3rd.
> 
> This is likely an interim solution until vmf_insert_pfn_{pmd,pud}() gain
> support for pfnmaps.
> 
> Suggested-by: Yan Zhao <yan.y.zhao@intel.com>
> Link: https://lore.kernel.org/all/Zl6XdUkt%2FzMMGOLF@yzhao56-desk.sh.intel.com/
> Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> ---
> 
> I'm sending this as a follow-on patch to the v2 series[1] because this
> is largely a performance optimization, and one that we may want to
> revert when we can introduce huge_fault support.  In the meantime, I
> can't argue with the 1/3rd performance improvement this provides to
> reduce the overall impact of the series below.  Without objection I'd
> therefore target this for v6.10 as well.  Thanks,
> 
> Alex
> 
> [1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
> 
>  drivers/vfio/pci/vfio_pci_core.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
> index db31c27bf78b..987c7921affa 100644
> --- a/drivers/vfio/pci/vfio_pci_core.c
> +++ b/drivers/vfio/pci/vfio_pci_core.c
> @@ -1662,6 +1662,7 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  	struct vm_area_struct *vma = vmf->vma;
>  	struct vfio_pci_core_device *vdev = vma->vm_private_data;
>  	unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff;
> +	unsigned long addr = vma->vm_start;
>  	vm_fault_t ret = VM_FAULT_SIGBUS;
>  
>  	pfn = vma_to_pfn(vma);
> @@ -1669,11 +1670,25 @@ static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
>  	down_read(&vdev->memory_lock);
>  
>  	if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev))
> -		goto out_disabled;
> +		goto out_unlock;
>  
>  	ret = vmf_insert_pfn(vma, vmf->address, pfn + pgoff);
> +	if (ret & VM_FAULT_ERROR)
> +		goto out_unlock;
>  
> -out_disabled:
> +	/*
> +	 * Pre-fault the remainder of the vma, abort further insertions and
> +	 * supress error if fault is encountered during pre-fault.
> +	 */
> +	for (; addr < vma->vm_end; addr += PAGE_SIZE, pfn++) {
> +		if (addr == vmf->address)
> +			continue;
> +
> +		if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
> +			break;
> +	}
> +
> +out_unlock:
>  	up_read(&vdev->memory_lock);
>  
>  	return ret;
Yan Zhao June 12, 2024, 10:08 a.m. UTC | #2
On Tue, Jun 11, 2024 at 09:23:33AM -0600, Alex Williamson wrote:
> 
> Any support for this or should we just go with the v2 series[1] by
> itself for v6.10?  Thanks,
Tested on GPU passthrough with 1G MMIO bar.

Cnt of vfio_pci_mmap_fault() is reduced from 2M to 18,
Cycles of vfio_pci_mmap_fault is reduced rom 3400M to 2700M.

Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
> 
> [1]https://lore.kernel.org/all/20240530045236.1005864-1-alex.williamson@redhat.com/
>
Jason Gunthorpe June 12, 2024, 12:17 p.m. UTC | #3
On Tue, Jun 11, 2024 at 09:23:33AM -0600, Alex Williamson wrote:
> 
> Any support for this or should we just go with the v2 series[1] by
> itself for v6.10?  Thanks,

I didn't think of a reason not to do this, but I don't know the fault
path especially well.

It sure would be nice to fault all the memory in one shot.

Jason
diff mbox series

Patch

diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index db31c27bf78b..987c7921affa 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1662,6 +1662,7 @@  static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 	struct vm_area_struct *vma = vmf->vma;
 	struct vfio_pci_core_device *vdev = vma->vm_private_data;
 	unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff;
+	unsigned long addr = vma->vm_start;
 	vm_fault_t ret = VM_FAULT_SIGBUS;
 
 	pfn = vma_to_pfn(vma);
@@ -1669,11 +1670,25 @@  static vm_fault_t vfio_pci_mmap_fault(struct vm_fault *vmf)
 	down_read(&vdev->memory_lock);
 
 	if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev))
-		goto out_disabled;
+		goto out_unlock;
 
 	ret = vmf_insert_pfn(vma, vmf->address, pfn + pgoff);
+	if (ret & VM_FAULT_ERROR)
+		goto out_unlock;
 
-out_disabled:
+	/*
+	 * Pre-fault the remainder of the vma, abort further insertions and
+	 * supress error if fault is encountered during pre-fault.
+	 */
+	for (; addr < vma->vm_end; addr += PAGE_SIZE, pfn++) {
+		if (addr == vmf->address)
+			continue;
+
+		if (vmf_insert_pfn(vma, addr, pfn) & VM_FAULT_ERROR)
+			break;
+	}
+
+out_unlock:
 	up_read(&vdev->memory_lock);
 
 	return ret;