mbox series

[RFC,0/3] vfio/type1: Reduce vfio_iommu.lock contention

Message ID 157919849533.21002.4782774695733669879.stgit@gimli.home (mailing list archive)
Headers show
Series vfio/type1: Reduce vfio_iommu.lock contention | expand

Message

Alex Williamson Jan. 16, 2020, 6:17 p.m. UTC
Hi Yan,

I wonder if this might reduce the lock contention you're seeing in the
vfio_dma_rw series.  These are only compile tested on my end, so I hope
they're not too broken to test.  Thanks,

Alex

---

Alex Williamson (3):
      vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
      vfio/type1: Replace obvious read lock instances
      vfio/type1: Introduce pfn_list mutex


 drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
 1 file changed, 41 insertions(+), 26 deletions(-)

Comments

Yan Zhao Jan. 17, 2020, 1:10 a.m. UTC | #1
Thank you, Alex!
I'll try it and let you know the result soon. :)

On Fri, Jan 17, 2020 at 02:17:49AM +0800, Alex Williamson wrote:
> Hi Yan,
> 
> I wonder if this might reduce the lock contention you're seeing in the
> vfio_dma_rw series.  These are only compile tested on my end, so I hope
> they're not too broken to test.  Thanks,
> 
> Alex
> 
> ---
> 
> Alex Williamson (3):
>       vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
>       vfio/type1: Replace obvious read lock instances
>       vfio/type1: Introduce pfn_list mutex
> 
> 
>  drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
>  1 file changed, 41 insertions(+), 26 deletions(-)
>
Yan Zhao Feb. 19, 2020, 9:04 a.m. UTC | #2
On Fri, Jan 17, 2020 at 09:10:51AM +0800, Yan Zhao wrote:
> Thank you, Alex!
> I'll try it and let you know the result soon. :)
> 
> On Fri, Jan 17, 2020 at 02:17:49AM +0800, Alex Williamson wrote:
> > Hi Yan,
> > 
> > I wonder if this might reduce the lock contention you're seeing in the
> > vfio_dma_rw series.  These are only compile tested on my end, so I hope
> > they're not too broken to test.  Thanks,
> > 
> > Alex
> > 
> > ---
> > 
> > Alex Williamson (3):
> >       vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
> >       vfio/type1: Replace obvious read lock instances
> >       vfio/type1: Introduce pfn_list mutex
> > 
> > 
> >  drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
> >  1 file changed, 41 insertions(+), 26 deletions(-)
> >

hi Alex
I have finished testing of this series.
It's quite stable and passed our MTBF testing :)

However, after comparing the performance data obtained from several
benchmarks in guests (see below),
it seems that this series does not bring in obvious benefit.
(at least to cases we have tested, and though I cannot fully explain it yet).
So, do you think it's good for me not to include this series into my next
version of "use vfio_dma_rw to read/write IOVAs from CPU side"?


B: stands for baseline code, where mutex is used for vfio_iommu.lock
B+S: applied rwsem patches to convert vfio_iommu.lock from mutex to
rwsem.

==== comparison: benchmark scores ====
(1) with 1 VM:

 score  |     glmark2    |   lightsmark    |   openarena
-----------------------------------------------------------
      B | 1248 (100%)    | 219.70 (100%)   | 114.9 (100%)
    B+S | 1252 (100.3%)  | 222.76 (101.2%) | 114.8 ( 99.9%)


(2) with 2 VMs:

 score  |     glmark2    |   lightsmark    |   openarena                                       
-----------------------------------------------------------                                    
      B | 812   (100%)   | 211.46 (100%)   | 115.3 (100%)                                      
    B+S | 812.8 (100.1%) | 212.96 (100.7%) | 114.9 (99.6%) 


==== comparison: average cycles spent on vfio_iommu.lock =====
(1) with 1 VM:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
      B | 107       | 113        | 110       | 107
    B+S | 112 (+5)  | 111  (-2)  | 108 (-2)  | 104 (-3)

Note:
a. during VM boot up, for rwsem, there are 24921 reads vs 67 writes
(372:1)
b. for the mesured 3 benchmarks, no write for rwsem.


(2) with 2 VMs:

 cycles | glmark2   | lightsmark | openarena | VM boot up
----------------------------------------------------------
      B | 113       | 119        | 112       | 119
    B+S | 118 (+5)  | 138  (+19) | 110 (-2)  | 114 (-5)


similar results obtained after applying patches of vfio_dma_rw.

B: stands for baseline code, where mutex is used for vfio_iommu.lock
B+V: baseline code + patches to convert from using kvm_read/write_guest
to using vfio_dma_rw
B+V+S: baseline code + patches to using vfio_dma_rw + patches to use
rwsem

==== comparison: benchmark scores =====
(1) with 1 VM:

 score  |     glmark2    |   lightsmark    |   openarena
----------------------------------------------------------
    B+V | 1244 (100%)    | 222.18 (100%)   | 114.4 (100%)
  B+V+S | 1241 ( 99.8%)  | 223.90 (100.8%) | 114.6 (100.2%)

(2) with 2 VMs:

        |     glmark2    |   lightsmark    |   openarena
----------------------------------------------------------
    B+V | 811.2 (100%)   | 211.20 (100%)   | 115.4 (100%)
  B+V+S | 811   (99.98%) | 211.81 (100.3%) | 115.5 (100.1%)


==== comparison: average cycles spent on vfio_dma_rw =====
(1) with 1 VM:

cycles  |    glmark2  | lightsmark | openarena
--------------------------------------------------
    B+V | 1396        | 1592       | 1351 
  B+V+S | 1415 (+19 ) | 1650 (+58) | 1357 (+6)

(2) with 2 VMs:

cycles  |    glmark2  | lightsmark | openarena
--------------------------------------------------
    B+V | 1974        | 2024       | 1636
  B+V+S | 1979 (+5)   | 2051 (+27) | 1644 (+8)


==== comparison: average cycles spent on vfio_iommu.lock =====
(1) with 1 VM:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
    B+V | 137       | 139        | 156       | 124
  B+V+S | 142 (+5)  | 143 (+4)   | 149 (-7)  | 114 (-10)

(2) with 2 VMs:

 cycles | glmark2   | lightsmark | openarena | VM boot up
---------------------------------------------------------
    B+V | 153       | 148        | 146       | 111
  B+V+S | 155 (+2)  | 157 (+9)   | 156 (+10) | 118 (+7)


P.S.
You may find some inconsistency when comparing to the test result I sent
at https://lkml.org/lkml/2020/1/14/1486. It is because I had to changed
my test machine for personal reason and also because I made lightsmark not
to sync on vblank events.


Thanks
Yan
Alex Williamson Feb. 19, 2020, 8:52 p.m. UTC | #3
On Wed, 19 Feb 2020 04:04:17 -0500
Yan Zhao <yan.y.zhao@intel.com> wrote:

> On Fri, Jan 17, 2020 at 09:10:51AM +0800, Yan Zhao wrote:
> > Thank you, Alex!
> > I'll try it and let you know the result soon. :)
> > 
> > On Fri, Jan 17, 2020 at 02:17:49AM +0800, Alex Williamson wrote:  
> > > Hi Yan,
> > > 
> > > I wonder if this might reduce the lock contention you're seeing in the
> > > vfio_dma_rw series.  These are only compile tested on my end, so I hope
> > > they're not too broken to test.  Thanks,
> > > 
> > > Alex
> > > 
> > > ---
> > > 
> > > Alex Williamson (3):
> > >       vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
> > >       vfio/type1: Replace obvious read lock instances
> > >       vfio/type1: Introduce pfn_list mutex
> > > 
> > > 
> > >  drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
> > >  1 file changed, 41 insertions(+), 26 deletions(-)
> > >  
> 
> hi Alex
> I have finished testing of this series.
> It's quite stable and passed our MTBF testing :)
> 
> However, after comparing the performance data obtained from several
> benchmarks in guests (see below),
> it seems that this series does not bring in obvious benefit.
> (at least to cases we have tested, and though I cannot fully explain it yet).
> So, do you think it's good for me not to include this series into my next
> version of "use vfio_dma_rw to read/write IOVAs from CPU side"?

Yes, I would not include it in your series.  No reason to bloat your
series for a feature that doesn't clearly show an improvement.  Thanks
for the additional testing, we can revive them if this lock ever
resurfaces.  I'm was actually more hopeful that holding an external
group reference might provide a better performance improvement, the
lookup on every vfio_dma_rw is not very efficient.  Thanks,

Alex
Yan Zhao Feb. 20, 2020, 4:38 a.m. UTC | #4
On Thu, Feb 20, 2020 at 04:52:47AM +0800, Alex Williamson wrote:
> On Wed, 19 Feb 2020 04:04:17 -0500
> Yan Zhao <yan.y.zhao@intel.com> wrote:
> 
> > On Fri, Jan 17, 2020 at 09:10:51AM +0800, Yan Zhao wrote:
> > > Thank you, Alex!
> > > I'll try it and let you know the result soon. :)
> > > 
> > > On Fri, Jan 17, 2020 at 02:17:49AM +0800, Alex Williamson wrote:  
> > > > Hi Yan,
> > > > 
> > > > I wonder if this might reduce the lock contention you're seeing in the
> > > > vfio_dma_rw series.  These are only compile tested on my end, so I hope
> > > > they're not too broken to test.  Thanks,
> > > > 
> > > > Alex
> > > > 
> > > > ---
> > > > 
> > > > Alex Williamson (3):
> > > >       vfio/type1: Convert vfio_iommu.lock from mutex to rwsem
> > > >       vfio/type1: Replace obvious read lock instances
> > > >       vfio/type1: Introduce pfn_list mutex
> > > > 
> > > > 
> > > >  drivers/vfio/vfio_iommu_type1.c |   67 ++++++++++++++++++++++++---------------
> > > >  1 file changed, 41 insertions(+), 26 deletions(-)
> > > >  
> > 
> > hi Alex
> > I have finished testing of this series.
> > It's quite stable and passed our MTBF testing :)
> > 
> > However, after comparing the performance data obtained from several
> > benchmarks in guests (see below),
> > it seems that this series does not bring in obvious benefit.
> > (at least to cases we have tested, and though I cannot fully explain it yet).
> > So, do you think it's good for me not to include this series into my next
> > version of "use vfio_dma_rw to read/write IOVAs from CPU side"?
> 
> Yes, I would not include it in your series.  No reason to bloat your
> series for a feature that doesn't clearly show an improvement.  Thanks
> for the additional testing, we can revive them if this lock ever
> resurfaces.  I'm was actually more hopeful that holding an external
> group reference might provide a better performance improvement, the
> lookup on every vfio_dma_rw is not very efficient.  Thanks,
> 
got it. thanks~

Yan