mbox series

[net-next,V4,0/5] vhost: accelerate metadata access through vmap()

Message ID 20190123095557.30168-1-jasowang@redhat.com (mailing list archive)
Headers show
Series vhost: accelerate metadata access through vmap() | expand

Message

Jason Wang Jan. 23, 2019, 9:55 a.m. UTC
This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.

Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.

Changes from V3:
- don't try to use vmap for file backed pages
- rebase to master
Changes from V2:
- fix buggy range overlapping check
- tear down MMU notifier during vhost ioctl to make sure invalidation
  request can read metadata userspace address and vq size without
  holding vq mutex.
Changes from V1:
- instead of pinning pages, use MMU notifier to invalidate vmaps and
  remap duing metadata prefetch
- fix build warning on MIPS

Jason Wang (5):
  vhost: generalize adding used elem
  vhost: fine grain userspace memory accessors
  vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
  vhost: introduce helpers to get the size of metadata area
  vhost: access vq metadata through kernel virtual address

 drivers/vhost/net.c   |   4 +-
 drivers/vhost/vhost.c | 441 +++++++++++++++++++++++++++++++++++++-----
 drivers/vhost/vhost.h |  15 +-
 mm/shmem.c            |   1 +
 4 files changed, 410 insertions(+), 51 deletions(-)

Comments

Michael S. Tsirkin Jan. 23, 2019, 1:58 p.m. UTC | #1
On Wed, Jan 23, 2019 at 05:55:52PM +0800, Jason Wang wrote:
> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling.
> 
> Test shows about 24% improvement on TX PPS. It should benefit other
> cases as well.

ok I think this addresses most comments but it's a big change and we
just started 1.1 review so to pls give me a week to review this ok?

> Changes from V3:
> - don't try to use vmap for file backed pages
> - rebase to master
> Changes from V2:
> - fix buggy range overlapping check
> - tear down MMU notifier during vhost ioctl to make sure invalidation
>   request can read metadata userspace address and vq size without
>   holding vq mutex.
> Changes from V1:
> - instead of pinning pages, use MMU notifier to invalidate vmaps and
>   remap duing metadata prefetch
> - fix build warning on MIPS
> 
> Jason Wang (5):
>   vhost: generalize adding used elem
>   vhost: fine grain userspace memory accessors
>   vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch()
>   vhost: introduce helpers to get the size of metadata area
>   vhost: access vq metadata through kernel virtual address
> 
>  drivers/vhost/net.c   |   4 +-
>  drivers/vhost/vhost.c | 441 +++++++++++++++++++++++++++++++++++++-----
>  drivers/vhost/vhost.h |  15 +-
>  mm/shmem.c            |   1 +
>  4 files changed, 410 insertions(+), 51 deletions(-)
> 
> -- 
> 2.17.1
David Miller Jan. 23, 2019, 5:24 p.m. UTC | #2
From: "Michael S. Tsirkin" <mst@redhat.com>
Date: Wed, 23 Jan 2019 08:58:07 -0500

> On Wed, Jan 23, 2019 at 05:55:52PM +0800, Jason Wang wrote:
>> This series tries to access virtqueue metadata through kernel virtual
>> address instead of copy_user() friends since they had too much
>> overheads like checks, spec barriers or even hardware feature
>> toggling.
>> 
>> Test shows about 24% improvement on TX PPS. It should benefit other
>> cases as well.
> 
> ok I think this addresses most comments but it's a big change and we
> just started 1.1 review so to pls give me a week to review this ok?

Ok. :)
David Miller Jan. 26, 2019, 10:37 p.m. UTC | #3
From: Jason Wang <jasowang@redhat.com>
Date: Wed, 23 Jan 2019 17:55:52 +0800

> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling.
> 
> Test shows about 24% improvement on TX PPS. It should benefit other
> cases as well.

I've read over the discussion of patch #5 a few times.

And it seems to me that, at a minimum, a few things still need to
be resolved:

1) More perf data added to commit message.

2) Whether invalidate_range_start() and invalidate_range_end() must
   be paired.

Etc.  So I am marking this series "Changes Requested".
Michael S. Tsirkin Jan. 27, 2019, 12:31 a.m. UTC | #4
On Sat, Jan 26, 2019 at 02:37:08PM -0800, David Miller wrote:
> From: Jason Wang <jasowang@redhat.com>
> Date: Wed, 23 Jan 2019 17:55:52 +0800
> 
> > This series tries to access virtqueue metadata through kernel virtual
> > address instead of copy_user() friends since they had too much
> > overheads like checks, spec barriers or even hardware feature
> > toggling.
> > 
> > Test shows about 24% improvement on TX PPS. It should benefit other
> > cases as well.
> 
> I've read over the discussion of patch #5 a few times.
> 
> And it seems to me that, at a minimum, a few things still need to
> be resolved:
> 
> 1) More perf data added to commit message.
> 
> 2) Whether invalidate_range_start() and invalidate_range_end() must
>    be paired.


Add dirty tracking.

> Etc.  So I am marking this series "Changes Requested".
Jason Wang Jan. 29, 2019, 2:34 a.m. UTC | #5
On 2019/1/27 上午8:31, Michael S. Tsirkin wrote:
> On Sat, Jan 26, 2019 at 02:37:08PM -0800, David Miller wrote:
>> From: Jason Wang <jasowang@redhat.com>
>> Date: Wed, 23 Jan 2019 17:55:52 +0800
>>
>>> This series tries to access virtqueue metadata through kernel virtual
>>> address instead of copy_user() friends since they had too much
>>> overheads like checks, spec barriers or even hardware feature
>>> toggling.
>>>
>>> Test shows about 24% improvement on TX PPS. It should benefit other
>>> cases as well.
>> I've read over the discussion of patch #5 a few times.
>>
>> And it seems to me that, at a minimum, a few things still need to
>> be resolved:
>>
>> 1) More perf data added to commit message.


Ok.


>>
>> 2) Whether invalidate_range_start() and invalidate_range_end() must
>>     be paired.


The reason that vhost doesn't need an invalidate_range_end() is because 
we have a fallback to copy_to_user() friends. So there's no requirement 
to setup the mapping in range_end() or lock the vq between range_start() 
and range_end(). We try to delay the setup of vmap until it will be 
really used in vhost_meta_prefetch() and we hold mmap_sem when trying to 
setup vmap, this will guarantee there's no intermediate state at this time.


>
> Add dirty tracking.


I think this could be solved by introducing e.g 
vhost_meta_prefetch_done() at the end of handle_tx()/handle_rx() and 
call set_page_dirty() for used pages instead of the tricks of 
classifying VMA. (As I saw hugetlbfs has its own set dirty method).

Thanks


>
>> Etc.  So I am marking this series "Changes Requested".