Message ID | 20181229124656.3900-1-jasowang@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | Hi: | expand |
On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > This series tries to access virtqueue metadata through kernel virtual > address instead of copy_user() friends since they had too much > overheads like checks, spec barriers or even hardware feature > toggling. Will review, thanks! One questions that comes to mind is whether it's all about bypassing stac/clac. Could you please include a performance comparison with nosmap? > > Test shows about 24% improvement on TX PPS. It should benefit other > cases as well. > > Changes from V2: > - fix buggy range overlapping check > - tear down MMU notifier during vhost ioctl to make sure invalidation > request can read metadata userspace address and vq size without > holding vq mutex. > Changes from V1: > - instead of pinning pages, use MMU notifier to invalidate vmaps and > remap duing metadata prefetch > - fix build warning on MIPS > > Jason Wang (5): > vhost: generalize adding used elem > vhost: fine grain userspace memory accessors > vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() > vhost: introduce helpers to get the size of metadata area > vhost: access vq metadata through kernel virtual address > > drivers/vhost/net.c | 4 +- > drivers/vhost/vhost.c | 416 +++++++++++++++++++++++++++++++++++++----- > drivers/vhost/vhost.h | 15 +- > 3 files changed, 384 insertions(+), 51 deletions(-) > > -- > 2.17.1
On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > This series tries to access virtqueue metadata through kernel virtual > address instead of copy_user() friends since they had too much > overheads like checks, spec barriers or even hardware feature > toggling. I think it's a reasonable approach. However I need to look at whether and which mmu notifiers are invoked before writeback. Do you know? > Test shows about 24% improvement on TX PPS. It should benefit other > cases as well. > > Changes from V2: > - fix buggy range overlapping check > - tear down MMU notifier during vhost ioctl to make sure invalidation > request can read metadata userspace address and vq size without > holding vq mutex. > Changes from V1: > - instead of pinning pages, use MMU notifier to invalidate vmaps and > remap duing metadata prefetch > - fix build warning on MIPS > > Jason Wang (5): > vhost: generalize adding used elem > vhost: fine grain userspace memory accessors > vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() > vhost: introduce helpers to get the size of metadata area > vhost: access vq metadata through kernel virtual address > > drivers/vhost/net.c | 4 +- > drivers/vhost/vhost.c | 416 +++++++++++++++++++++++++++++++++++++----- > drivers/vhost/vhost.h | 15 +- > 3 files changed, 384 insertions(+), 51 deletions(-) > > -- > 2.17.1
On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >> This series tries to access virtqueue metadata through kernel virtual >> address instead of copy_user() friends since they had too much >> overheads like checks, spec barriers or even hardware feature >> toggling. > Will review, thanks! > One questions that comes to mind is whether it's all about bypassing > stac/clac. Could you please include a performance comparison with > nosmap? > On machine without SMAP (Sandy Bridge): Before: 4.8Mpps After: 5.2Mpps On machine with SMAP (Broadwell): Before: 5.0Mpps After: 6.1Mpps No smap: 7.5Mpps Thanks
On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > This series tries to access virtqueue metadata through kernel virtual > > > address instead of copy_user() friends since they had too much > > > overheads like checks, spec barriers or even hardware feature > > > toggling. > > Will review, thanks! > > One questions that comes to mind is whether it's all about bypassing > > stac/clac. Could you please include a performance comparison with > > nosmap? > > > > On machine without SMAP (Sandy Bridge): > > Before: 4.8Mpps > > After: 5.2Mpps OK so would you say it's really unsafe versus safe accesses? Or would you say it's just a better written code? > On machine with SMAP (Broadwell): > > Before: 5.0Mpps > > After: 6.1Mpps > > No smap: 7.5Mpps > > > Thanks no smap being before or after?
On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: >> On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: >>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>> This series tries to access virtqueue metadata through kernel virtual >>>> address instead of copy_user() friends since they had too much >>>> overheads like checks, spec barriers or even hardware feature >>>> toggling. >>> Will review, thanks! >>> One questions that comes to mind is whether it's all about bypassing >>> stac/clac. Could you please include a performance comparison with >>> nosmap? >>> >> On machine without SMAP (Sandy Bridge): >> >> Before: 4.8Mpps >> >> After: 5.2Mpps > OK so would you say it's really unsafe versus safe accesses? > Or would you say it's just a better written code? It's the effect of removing speculation barrier. > >> On machine with SMAP (Broadwell): >> >> Before: 5.0Mpps >> >> After: 6.1Mpps >> >> No smap: 7.5Mpps >> >> >> Thanks > > no smap being before or after? > Let me clarify: Before (SMAP on): 5.0Mpps Before (SMAP off): 7.5Mpps After (SMAP on): 6.1Mpps Thanks
On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > address instead of copy_user() friends since they had too much > > > > > overheads like checks, spec barriers or even hardware feature > > > > > toggling. > > > > Will review, thanks! > > > > One questions that comes to mind is whether it's all about bypassing > > > > stac/clac. Could you please include a performance comparison with > > > > nosmap? > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > Before: 4.8Mpps > > > > > > After: 5.2Mpps > > OK so would you say it's really unsafe versus safe accesses? > > Or would you say it's just a better written code? > > > It's the effect of removing speculation barrier. You mean __uaccess_begin_nospec introduced by commit 304ec1b050310548db33063e567123fae8fd0301 ? So fundamentally we do access_ok checks when supplying the memory table to the kernel thread, and we should do the spec barrier there. Then we can just create and use a variant of uaccess macros that does not include the barrier? Or, how about moving the barrier into access_ok? This way repeated accesses with a single access_ok get a bit faster. CC Dan Williams on this idea. > > > > > > On machine with SMAP (Broadwell): > > > > > > Before: 5.0Mpps > > > > > > After: 6.1Mpps > > > > > > No smap: 7.5Mpps > > > > > > > > > Thanks > > > > no smap being before or after? > > > > Let me clarify: > > > Before (SMAP on): 5.0Mpps > > Before (SMAP off): 7.5Mpps > > After (SMAP on): 6.1Mpps > > > Thanks How about after + smap off? And maybe we want a module option just for the vhost thread to keep smap off generally since almost all it does is copy stuff from userspace into kernel anyway. Because what above numbers should is that we really really want a solution that isn't limited to just meta-data access, and I really do not see how any such solution can not also be used to make meta-data access fast.
On 2019/1/7 下午12:17, Michael S. Tsirkin wrote: > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: >> On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: >>> On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: >>>> On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: >>>>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>>>> This series tries to access virtqueue metadata through kernel virtual >>>>>> address instead of copy_user() friends since they had too much >>>>>> overheads like checks, spec barriers or even hardware feature >>>>>> toggling. >>>>> Will review, thanks! >>>>> One questions that comes to mind is whether it's all about bypassing >>>>> stac/clac. Could you please include a performance comparison with >>>>> nosmap? >>>>> >>>> On machine without SMAP (Sandy Bridge): >>>> >>>> Before: 4.8Mpps >>>> >>>> After: 5.2Mpps >>> OK so would you say it's really unsafe versus safe accesses? >>> Or would you say it's just a better written code? >> >> It's the effect of removing speculation barrier. > > You mean __uaccess_begin_nospec introduced by > commit 304ec1b050310548db33063e567123fae8fd0301 > ? Yes. > > So fundamentally we do access_ok checks when supplying > the memory table to the kernel thread, and we should > do the spec barrier there. > > Then we can just create and use a variant of uaccess macros that does > not include the barrier? The unsafe ones? > > Or, how about moving the barrier into access_ok? > This way repeated accesses with a single access_ok get a bit faster. > CC Dan Williams on this idea. The problem is, e.g for vhost control path. During mem table validation, we don't even want to access them there. So the spec barrier is not needed. > > >>>> On machine with SMAP (Broadwell): >>>> >>>> Before: 5.0Mpps >>>> >>>> After: 6.1Mpps >>>> >>>> No smap: 7.5Mpps >>>> >>>> >>>> Thanks >>> no smap being before or after? >>> >> Let me clarify: >> >> >> Before (SMAP on): 5.0Mpps >> >> Before (SMAP off): 7.5Mpps >> >> After (SMAP on): 6.1Mpps >> >> >> Thanks > How about after + smap off? After (SMAP off): 8.0Mpps > > And maybe we want a module option just for the vhost thread to keep smap > off generally since almost all it does is copy stuff from userspace into > kernel anyway. Because what above numbers should is that we really > really want a solution that isn't limited to just meta-data access, > and I really do not see how any such solution can not also be > used to make meta-data access fast. As we've discussed in another thread of previous version. This requires lots of changes, the main issues is SMAP state was not saved/restored on explicit schedule(). Even if it did, since vhost will call lots of net/block codes, any kind of uaccess in those codes needs understand this special request from vhost e.g you provably need to invent a new kinds of iov iterator that does not touch SMAP at all. And I'm not sure this is the only thing we need to deal with. So I still prefer to: 1) speedup the metadata access through vmap + MMU notifier 2) speedup the datacopy with batched copy (unsafe ones or other new interfaces) Thanks
On 2019/1/5 上午5:41, Michael S. Tsirkin wrote: > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >> This series tries to access virtqueue metadata through kernel virtual >> address instead of copy_user() friends since they had too much >> overheads like checks, spec barriers or even hardware feature >> toggling. > > I think it's a reasonable approach. > However I need to look at whether and which mmu notifiers are invoked before > writeback. Do you know? I don't know but just looking at the MMU notifier ops definition, there's no such callback if my understanding is correct. Thanks > >> Test shows about 24% improvement on TX PPS. It should benefit other >> cases as well. >> >> Changes from V2: >> - fix buggy range overlapping check >> - tear down MMU notifier during vhost ioctl to make sure invalidation >> request can read metadata userspace address and vq size without >> holding vq mutex. >> Changes from V1: >> - instead of pinning pages, use MMU notifier to invalidate vmaps and >> remap duing metadata prefetch >> - fix build warning on MIPS >> >> Jason Wang (5): >> vhost: generalize adding used elem >> vhost: fine grain userspace memory accessors >> vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() >> vhost: introduce helpers to get the size of metadata area >> vhost: access vq metadata through kernel virtual address >> >> drivers/vhost/net.c | 4 +- >> drivers/vhost/vhost.c | 416 +++++++++++++++++++++++++++++++++++++----- >> drivers/vhost/vhost.h | 15 +- >> 3 files changed, 384 insertions(+), 51 deletions(-) >> >> -- >> 2.17.1
On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > address instead of copy_user() friends since they had too much > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > toggling. > > > > > Will review, thanks! > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > stac/clac. Could you please include a performance comparison with > > > > > nosmap? > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > Before: 4.8Mpps > > > > > > > > After: 5.2Mpps > > > OK so would you say it's really unsafe versus safe accesses? > > > Or would you say it's just a better written code? > > > > > > It's the effect of removing speculation barrier. > > > You mean __uaccess_begin_nospec introduced by > commit 304ec1b050310548db33063e567123fae8fd0301 > ? > > So fundamentally we do access_ok checks when supplying > the memory table to the kernel thread, and we should > do the spec barrier there. > > Then we can just create and use a variant of uaccess macros that does > not include the barrier? > > Or, how about moving the barrier into access_ok? > This way repeated accesses with a single access_ok get a bit faster. > CC Dan Williams on this idea. It would be interesting to see how expensive re-doing the address limit check is compared to the speculation barrier. I.e. just switch vhost_get_user() to use get_user() rather than __get_user(). That will sanitize the pointer in the speculative path without a barrier. I recall we had a convert access_ok() discussion with this result here: https://lkml.org/lkml/2018/1/17/929 ...but it sounds like you are proposing a smaller scope fixup for the vhost use case? Something like barrier_nospec() in the success path for all vhost access_ok() checks and then a get_user() variant that disables the barrier.
On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > toggling. > > > > > > Will review, thanks! > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > nosmap? > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > After: 5.2Mpps > > > > OK so would you say it's really unsafe versus safe accesses? > > > > Or would you say it's just a better written code? > > > > > > > > > It's the effect of removing speculation barrier. > > > > > > You mean __uaccess_begin_nospec introduced by > > commit 304ec1b050310548db33063e567123fae8fd0301 > > ? > > > > So fundamentally we do access_ok checks when supplying > > the memory table to the kernel thread, and we should > > do the spec barrier there. > > > > Then we can just create and use a variant of uaccess macros that does > > not include the barrier? > > > > Or, how about moving the barrier into access_ok? > > This way repeated accesses with a single access_ok get a bit faster. > > CC Dan Williams on this idea. > > It would be interesting to see how expensive re-doing the address > limit check is compared to the speculation barrier. I.e. just switch > vhost_get_user() to use get_user() rather than __get_user(). That will > sanitize the pointer in the speculative path without a barrier. Hmm it's way cheaper even though IIRC it's measureable. Jason, would you like to try? Although frankly __get_user being slower than get_user feels very wrong. Not yet sure what to do exactly but would you agree? > I recall we had a convert access_ok() discussion with this result here: > > https://lkml.org/lkml/2018/1/17/929 Sorry let me try to clarify. IIUC speculating access_ok once is harmless. As Linus said the problem is with "_subsequent_ accesses that can then be used to perturb the cache". Thus: 1. if (!access_ok) 2. return 3. get_user 4. if (!access_ok) 5. return 6. get_user Your proposal that Linus nacked was to effectively add a barrier after lines 2 and 5 (also using the array_index_nospec trick for speed), right? Unfortunately that needs a big API change. I am asking about adding barrier_nospec within access_ok. Thus effectively before lines 1 and 4. access_ok will be slower but after all the point of access_ok is to then access the same memory multiple times. So we should be making __get_user faster and access_ok slower ... > ...but it sounds like you are proposing a smaller scope fixup for the > vhost use case? Something like barrier_nospec() in the success path > for all vhost access_ok() checks and then a get_user() variant that > disables the barrier. Maybe we'll have to. Except I hope vhost won't end up being the only user otherwise it will be hard to maintain.
On Mon, Jan 07, 2019 at 02:50:17PM +0800, Jason Wang wrote: > > On 2019/1/7 下午12:17, Michael S. Tsirkin wrote: > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > toggling. > > > > > > Will review, thanks! > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > nosmap? > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > After: 5.2Mpps > > > > OK so would you say it's really unsafe versus safe accesses? > > > > Or would you say it's just a better written code? > > > > > > It's the effect of removing speculation barrier. > > > > You mean __uaccess_begin_nospec introduced by > > commit 304ec1b050310548db33063e567123fae8fd0301 > > ? > > Yes. > > > > > > So fundamentally we do access_ok checks when supplying > > the memory table to the kernel thread, and we should > > do the spec barrier there. > > > > Then we can just create and use a variant of uaccess macros that does > > not include the barrier? > > > The unsafe ones? Fundamentally yes. > > > > > Or, how about moving the barrier into access_ok? > > This way repeated accesses with a single access_ok get a bit faster. > > CC Dan Williams on this idea. > > > The problem is, e.g for vhost control path. During mem table validation, we > don't even want to access them there. So the spec barrier is not needed. Again spec barrier is not needed as such at all. It's defence in depth. And mem table init is slow path. So we can stick a barrier there and it won't be a problem for anyone. > > > > > > > > > > On machine with SMAP (Broadwell): > > > > > > > > > > Before: 5.0Mpps > > > > > > > > > > After: 6.1Mpps > > > > > > > > > > No smap: 7.5Mpps > > > > > > > > > > > > > > > Thanks > > > > no smap being before or after? > > > > > > > Let me clarify: > > > > > > > > > Before (SMAP on): 5.0Mpps > > > > > > Before (SMAP off): 7.5Mpps > > > > > > After (SMAP on): 6.1Mpps > > > > > > > > > Thanks > > How about after + smap off? > > > After (SMAP off): 8.0Mpps > > > > > And maybe we want a module option just for the vhost thread to keep smap > > off generally since almost all it does is copy stuff from userspace into > > kernel anyway. Because what above numbers should is that we really > > really want a solution that isn't limited to just meta-data access, > > and I really do not see how any such solution can not also be > > used to make meta-data access fast. > > > As we've discussed in another thread of previous version. This requires lots > of changes, the main issues is SMAP state was not saved/restored on explicit > schedule(). I wonder how expensive can reading eflags be? If it's cheap we can just check EFLAGS.AC and rerun stac if needed. > Even if it did, since vhost will call lots of net/block codes, > any kind of uaccess in those codes needs understand this special request > from vhost e.g you provably need to invent a new kinds of iov iterator that > does not touch SMAP at all. And I'm not sure this is the only thing we need > to deal with. Well we wanted to move packet processing from tun into vhost anyway right? > > So I still prefer to: > > 1) speedup the metadata access through vmap + MMU notifier > > 2) speedup the datacopy with batched copy (unsafe ones or other new > interfaces) > > Thanks I just guess once you do (2) you will want to rework (1) to use the new interfaces. So all the effort you are now investing in (1) will be wasted. Just my $.02.
On Mon, Jan 07, 2019 at 02:58:08PM +0800, Jason Wang wrote: > > On 2019/1/5 上午5:41, Michael S. Tsirkin wrote: > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > This series tries to access virtqueue metadata through kernel virtual > > > address instead of copy_user() friends since they had too much > > > overheads like checks, spec barriers or even hardware feature > > > toggling. > > > > I think it's a reasonable approach. > > However I need to look at whether and which mmu notifiers are invoked before > > writeback. Do you know? > > > I don't know but just looking at the MMU notifier ops definition, there's no > such callback if my understanding is correct. > > Thanks In that case how are you making sure used ring updates are written back? If they aren't guest will crash ... > > > > > > Test shows about 24% improvement on TX PPS. It should benefit other > > > cases as well. > > > > > > Changes from V2: > > > - fix buggy range overlapping check > > > - tear down MMU notifier during vhost ioctl to make sure invalidation > > > request can read metadata userspace address and vq size without > > > holding vq mutex. > > > Changes from V1: > > > - instead of pinning pages, use MMU notifier to invalidate vmaps and > > > remap duing metadata prefetch > > > - fix build warning on MIPS > > > > > > Jason Wang (5): > > > vhost: generalize adding used elem > > > vhost: fine grain userspace memory accessors > > > vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() > > > vhost: introduce helpers to get the size of metadata area > > > vhost: access vq metadata through kernel virtual address > > > > > > drivers/vhost/net.c | 4 +- > > > drivers/vhost/vhost.c | 416 +++++++++++++++++++++++++++++++++++++----- > > > drivers/vhost/vhost.h | 15 +- > > > 3 files changed, 384 insertions(+), 51 deletions(-) > > > > > > -- > > > 2.17.1
On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > > toggling. > > > > > > > Will review, thanks! > > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > > nosmap? > > > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > > > After: 5.2Mpps > > > > > OK so would you say it's really unsafe versus safe accesses? > > > > > Or would you say it's just a better written code? > > > > > > > > > > > > It's the effect of removing speculation barrier. > > > > > > > > > You mean __uaccess_begin_nospec introduced by > > > commit 304ec1b050310548db33063e567123fae8fd0301 > > > ? > > > > > > So fundamentally we do access_ok checks when supplying > > > the memory table to the kernel thread, and we should > > > do the spec barrier there. > > > > > > Then we can just create and use a variant of uaccess macros that does > > > not include the barrier? > > > > > > Or, how about moving the barrier into access_ok? > > > This way repeated accesses with a single access_ok get a bit faster. > > > CC Dan Williams on this idea. > > > > It would be interesting to see how expensive re-doing the address > > limit check is compared to the speculation barrier. I.e. just switch > > vhost_get_user() to use get_user() rather than __get_user(). That will > > sanitize the pointer in the speculative path without a barrier. > > Hmm it's way cheaper even though IIRC it's measureable. > Jason, would you like to try? > Although frankly __get_user being slower than get_user feels very wrong. > Not yet sure what to do exactly but would you agree? Agree. __get_user() being faster than get_user() defeats the whole point of converting code paths to the access_ok() + __get_user() pattern. > > > > I recall we had a convert access_ok() discussion with this result here: > > > > https://lkml.org/lkml/2018/1/17/929 > > Sorry let me try to clarify. IIUC speculating access_ok once > is harmless. As Linus said the problem is with "_subsequent_ > accesses that can then be used to perturb the cache". > > Thus: > > 1. if (!access_ok) > 2. return > 3. get_user > 4. if (!access_ok) > 5. return > 6. get_user > > Your proposal that Linus nacked was to effectively add a barrier after > lines 2 and 5 (also using the array_index_nospec trick for speed), > right? Unfortunately that needs a big API change. > > I am asking about adding barrier_nospec within access_ok. > Thus effectively before lines 1 and 4. > access_ok will be slower but after all the point of access_ok is > to then access the same memory multiple times. If the barrier is before lines 1 and 4 then it offers no real protection as far I can see. It will start speculating again on the user controlled pointer value after the barrier resolves. > So we should be making __get_user faster and access_ok slower ... I agree, but then the barrier always needs to be after the access_ok() check unconditionally called in the return path from access_ok(). At that point it's back to the implementation that Linus nak'd, or I'm missing some other detail. ...but maybe if it is limited to a new access_ok_nospec() then the concern is addressed? Then rename current __get_user() to __get_user_nospec() and make a new __get_user() that is back to being optimal.
On Mon, Jan 07, 2019 at 01:39:15PM -0800, Dan Williams wrote: > On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: > > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > > > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > > > toggling. > > > > > > > > Will review, thanks! > > > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > > > nosmap? > > > > > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > > > > > After: 5.2Mpps > > > > > > OK so would you say it's really unsafe versus safe accesses? > > > > > > Or would you say it's just a better written code? > > > > > > > > > > > > > > > It's the effect of removing speculation barrier. > > > > > > > > > > > > You mean __uaccess_begin_nospec introduced by > > > > commit 304ec1b050310548db33063e567123fae8fd0301 > > > > ? > > > > > > > > So fundamentally we do access_ok checks when supplying > > > > the memory table to the kernel thread, and we should > > > > do the spec barrier there. > > > > > > > > Then we can just create and use a variant of uaccess macros that does > > > > not include the barrier? > > > > > > > > Or, how about moving the barrier into access_ok? > > > > This way repeated accesses with a single access_ok get a bit faster. > > > > CC Dan Williams on this idea. > > > > > > It would be interesting to see how expensive re-doing the address > > > limit check is compared to the speculation barrier. I.e. just switch > > > vhost_get_user() to use get_user() rather than __get_user(). That will > > > sanitize the pointer in the speculative path without a barrier. > > > > Hmm it's way cheaper even though IIRC it's measureable. > > Jason, would you like to try? > > Although frankly __get_user being slower than get_user feels very wrong. > > Not yet sure what to do exactly but would you agree? > > Agree. __get_user() being faster than get_user() defeats the whole > point of converting code paths to the access_ok() + __get_user() > pattern. Did you mean the reverse? > > > > > > > I recall we had a convert access_ok() discussion with this result here: > > > > > > https://lkml.org/lkml/2018/1/17/929 > > > > Sorry let me try to clarify. IIUC speculating access_ok once > > is harmless. As Linus said the problem is with "_subsequent_ > > accesses that can then be used to perturb the cache". > > > > Thus: > > > > 1. if (!access_ok) > > 2. return > > 3. get_user > > 4. if (!access_ok) > > 5. return > > 6. get_user > > > > Your proposal that Linus nacked was to effectively add a barrier after > > lines 2 and 5 (also using the array_index_nospec trick for speed), > > right? Unfortunately that needs a big API change. > > > > I am asking about adding barrier_nospec within access_ok. > > Thus effectively before lines 1 and 4. > > access_ok will be slower but after all the point of access_ok is > > to then access the same memory multiple times. > > If the barrier is before lines 1 and 4 then it offers no real > protection as far I can see. It will start speculating again on the > user controlled pointer value after the barrier resolves. > > > So we should be making __get_user faster and access_ok slower ... > > I agree, but then the barrier always needs to be after the access_ok() > check unconditionally called in the return path from access_ok(). At > that point it's back to the implementation that Linus nak'd, or I'm > missing some other detail. > > ...but maybe if it is limited to a new access_ok_nospec() then the > concern is addressed? Then rename current __get_user() to > __get_user_nospec() and make a new __get_user() that is back to being > optimal.
On Mon, Jan 7, 2019 at 2:25 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, Jan 07, 2019 at 01:39:15PM -0800, Dan Williams wrote: > > On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: > > > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > > > > > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > > > > toggling. > > > > > > > > > Will review, thanks! > > > > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > > > > nosmap? > > > > > > > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > > > > > > > After: 5.2Mpps > > > > > > > OK so would you say it's really unsafe versus safe accesses? > > > > > > > Or would you say it's just a better written code? > > > > > > > > > > > > > > > > > > It's the effect of removing speculation barrier. > > > > > > > > > > > > > > > You mean __uaccess_begin_nospec introduced by > > > > > commit 304ec1b050310548db33063e567123fae8fd0301 > > > > > ? > > > > > > > > > > So fundamentally we do access_ok checks when supplying > > > > > the memory table to the kernel thread, and we should > > > > > do the spec barrier there. > > > > > > > > > > Then we can just create and use a variant of uaccess macros that does > > > > > not include the barrier? > > > > > > > > > > Or, how about moving the barrier into access_ok? > > > > > This way repeated accesses with a single access_ok get a bit faster. > > > > > CC Dan Williams on this idea. > > > > > > > > It would be interesting to see how expensive re-doing the address > > > > limit check is compared to the speculation barrier. I.e. just switch > > > > vhost_get_user() to use get_user() rather than __get_user(). That will > > > > sanitize the pointer in the speculative path without a barrier. > > > > > > Hmm it's way cheaper even though IIRC it's measureable. > > > Jason, would you like to try? > > > Although frankly __get_user being slower than get_user feels very wrong. > > > Not yet sure what to do exactly but would you agree? > > > > Agree. __get_user() being faster than get_user() defeats the whole > > point of converting code paths to the access_ok() + __get_user() > > pattern. > > Did you mean the reverse? Hmm, no... I'll rephrase: __get_user() should have lower overhead than get_user().
On 2019/1/7 下午10:37, Michael S. Tsirkin wrote: > On Mon, Jan 07, 2019 at 02:50:17PM +0800, Jason Wang wrote: >> On 2019/1/7 下午12:17, Michael S. Tsirkin wrote: >>> On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: >>>> On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: >>>>> On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: >>>>>> On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: >>>>>>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>>>>>> This series tries to access virtqueue metadata through kernel virtual >>>>>>>> address instead of copy_user() friends since they had too much >>>>>>>> overheads like checks, spec barriers or even hardware feature >>>>>>>> toggling. >>>>>>> Will review, thanks! >>>>>>> One questions that comes to mind is whether it's all about bypassing >>>>>>> stac/clac. Could you please include a performance comparison with >>>>>>> nosmap? >>>>>>> >>>>>> On machine without SMAP (Sandy Bridge): >>>>>> >>>>>> Before: 4.8Mpps >>>>>> >>>>>> After: 5.2Mpps >>>>> OK so would you say it's really unsafe versus safe accesses? >>>>> Or would you say it's just a better written code? >>>> It's the effect of removing speculation barrier. >>> You mean __uaccess_begin_nospec introduced by >>> commit 304ec1b050310548db33063e567123fae8fd0301 >>> ? >> Yes. >> >> >>> So fundamentally we do access_ok checks when supplying >>> the memory table to the kernel thread, and we should >>> do the spec barrier there. >>> >>> Then we can just create and use a variant of uaccess macros that does >>> not include the barrier? >> >> The unsafe ones? > Fundamentally yes. > > >>> Or, how about moving the barrier into access_ok? >>> This way repeated accesses with a single access_ok get a bit faster. >>> CC Dan Williams on this idea. >> >> The problem is, e.g for vhost control path. During mem table validation, we >> don't even want to access them there. So the spec barrier is not needed. > Again spec barrier is not needed as such at all. It's defence in depth. > And mem table init is slow path. So we can stick a barrier there and it > won't be a problem for anyone. Consider it's a generic helper. For a deep defense we should keep it around the places we do the real userspace memory access. > >>> >>>>>> On machine with SMAP (Broadwell): >>>>>> >>>>>> Before: 5.0Mpps >>>>>> >>>>>> After: 6.1Mpps >>>>>> >>>>>> No smap: 7.5Mpps >>>>>> >>>>>> >>>>>> Thanks >>>>> no smap being before or after? >>>>> >>>> Let me clarify: >>>> >>>> >>>> Before (SMAP on): 5.0Mpps >>>> >>>> Before (SMAP off): 7.5Mpps >>>> >>>> After (SMAP on): 6.1Mpps >>>> >>>> >>>> Thanks >>> How about after + smap off? >> >> After (SMAP off): 8.0Mpps >> >>> And maybe we want a module option just for the vhost thread to keep smap >>> off generally since almost all it does is copy stuff from userspace into >>> kernel anyway. Because what above numbers should is that we really >>> really want a solution that isn't limited to just meta-data access, >>> and I really do not see how any such solution can not also be >>> used to make meta-data access fast. >> >> As we've discussed in another thread of previous version. This requires lots >> of changes, the main issues is SMAP state was not saved/restored on explicit >> schedule(). > I wonder how expensive can reading eflags be? > If it's cheap we can just check EFLAGS.AC and rerun stac if needed. Probably not expensive, but consider vhost is probably the only user, is it really worth to do this? If we do vmap + batched copy, most part of the code were still under protection of SMAP but the performance is almost the same. Isn't this a much better solution? > >> Even if it did, since vhost will call lots of net/block codes, >> any kind of uaccess in those codes needs understand this special request >> from vhost e.g you provably need to invent a new kinds of iov iterator that >> does not touch SMAP at all. And I'm not sure this is the only thing we need >> to deal with. > > Well we wanted to move packet processing from tun into vhost anyway right? Yes, but how about other devices? And we should deal with zerocopy path. It not a small amount of refactoring and work. > >> So I still prefer to: >> >> 1) speedup the metadata access through vmap + MMU notifier >> >> 2) speedup the datacopy with batched copy (unsafe ones or other new >> interfaces) >> >> Thanks > I just guess once you do (2) you will want to rework (1) to use > the new interfaces. Do you mean batching? So batched copy is much more easier, just few codes if unsafe variants if ready or we can invent new safe variants. But it would still be slower than vmap. And what's more, vmap does not conflict with batching. > So all the effort you are now investing in (1) > will be wasted. Just my $.02. > Speeding up metadata access is much easier and vmap was the fastest method. So we can benefit from it soon. Speeding up data copy requires much more work to do. And in the future if kernel or vhost is ready for some new API and perf numbers prove its advantage, it doesn't harm to switch. Thanks
On 2019/1/7 下午10:47, Michael S. Tsirkin wrote: > On Mon, Jan 07, 2019 at 02:58:08PM +0800, Jason Wang wrote: >> On 2019/1/5 上午5:41, Michael S. Tsirkin wrote: >>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>> This series tries to access virtqueue metadata through kernel virtual >>>> address instead of copy_user() friends since they had too much >>>> overheads like checks, spec barriers or even hardware feature >>>> toggling. >>> I think it's a reasonable approach. >>> However I need to look at whether and which mmu notifiers are invoked before >>> writeback. Do you know? >> >> I don't know but just looking at the MMU notifier ops definition, there's no >> such callback if my understanding is correct. >> >> Thanks > In that case how are you making sure used ring updates are written back? > If they aren't guest will crash ... I think this is the writeback issue you mentioned early. I don't do a followup on the pointer but it looks to me some work is ongoing to fix the issue. I can investigate it more, but it's not something new, consider the case of VFIO. Thanks
On 2019/1/7 下午10:11, Michael S. Tsirkin wrote: > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: >> On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: >>> On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: >>>> On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: >>>>> On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: >>>>>> On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: >>>>>>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>>>>>> This series tries to access virtqueue metadata through kernel virtual >>>>>>>> address instead of copy_user() friends since they had too much >>>>>>>> overheads like checks, spec barriers or even hardware feature >>>>>>>> toggling. >>>>>>> Will review, thanks! >>>>>>> One questions that comes to mind is whether it's all about bypassing >>>>>>> stac/clac. Could you please include a performance comparison with >>>>>>> nosmap? >>>>>>> >>>>>> On machine without SMAP (Sandy Bridge): >>>>>> >>>>>> Before: 4.8Mpps >>>>>> >>>>>> After: 5.2Mpps >>>>> OK so would you say it's really unsafe versus safe accesses? >>>>> Or would you say it's just a better written code? >>>> >>>> It's the effect of removing speculation barrier. >>> >>> You mean __uaccess_begin_nospec introduced by >>> commit 304ec1b050310548db33063e567123fae8fd0301 >>> ? >>> >>> So fundamentally we do access_ok checks when supplying >>> the memory table to the kernel thread, and we should >>> do the spec barrier there. >>> >>> Then we can just create and use a variant of uaccess macros that does >>> not include the barrier? >>> >>> Or, how about moving the barrier into access_ok? >>> This way repeated accesses with a single access_ok get a bit faster. >>> CC Dan Williams on this idea. >> It would be interesting to see how expensive re-doing the address >> limit check is compared to the speculation barrier. I.e. just switch >> vhost_get_user() to use get_user() rather than __get_user(). That will >> sanitize the pointer in the speculative path without a barrier. > Hmm it's way cheaper even though IIRC it's measureable. > Jason, would you like to try? 0.5% regression after using get_user()/put_user()/... > Although frankly __get_user being slower than get_user feels very wrong. > Not yet sure what to do exactly but would you agree? > > >> I recall we had a convert access_ok() discussion with this result here: >> >> https://lkml.org/lkml/2018/1/17/929 > Sorry let me try to clarify. IIUC speculating access_ok once > is harmless. As Linus said the problem is with "_subsequent_ > accesses that can then be used to perturb the cache". > > Thus: > > 1. if (!access_ok) > 2. return > 3. get_user > 4. if (!access_ok) > 5. return > 6. get_user > > Your proposal that Linus nacked was to effectively add a barrier after > lines 2 and 5 (also using the array_index_nospec trick for speed), > right? Unfortunately that needs a big API change. > > I am asking about adding barrier_nospec within access_ok. > Thus effectively before lines 1 and 4. > access_ok will be slower but after all the point of access_ok is > to then access the same memory multiple times. > So we should be making __get_user faster and access_ok slower ... And the barrier_nospec() was completely necessary if you want to do write instead read. Thanks > >> ...but it sounds like you are proposing a smaller scope fixup for the >> vhost use case? Something like barrier_nospec() in the success path >> for all vhost access_ok() checks and then a get_user() variant that >> disables the barrier. > Maybe we'll have to. Except I hope vhost won't end up being the > only user otherwise it will be hard to maintain. > >
On Mon, Jan 07, 2019 at 02:44:24PM -0800, Dan Williams wrote: > On Mon, Jan 7, 2019 at 2:25 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > On Mon, Jan 07, 2019 at 01:39:15PM -0800, Dan Williams wrote: > > > On Mon, Jan 7, 2019 at 6:11 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > On Sun, Jan 06, 2019 at 11:15:20PM -0800, Dan Williams wrote: > > > > > On Sun, Jan 6, 2019 at 8:17 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > > > > > > > > > > > On Mon, Jan 07, 2019 at 11:53:41AM +0800, Jason Wang wrote: > > > > > > > > > > > > > > On 2019/1/7 上午11:28, Michael S. Tsirkin wrote: > > > > > > > > On Mon, Jan 07, 2019 at 10:19:03AM +0800, Jason Wang wrote: > > > > > > > > > On 2019/1/3 上午4:47, Michael S. Tsirkin wrote: > > > > > > > > > > On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: > > > > > > > > > > > This series tries to access virtqueue metadata through kernel virtual > > > > > > > > > > > address instead of copy_user() friends since they had too much > > > > > > > > > > > overheads like checks, spec barriers or even hardware feature > > > > > > > > > > > toggling. > > > > > > > > > > Will review, thanks! > > > > > > > > > > One questions that comes to mind is whether it's all about bypassing > > > > > > > > > > stac/clac. Could you please include a performance comparison with > > > > > > > > > > nosmap? > > > > > > > > > > > > > > > > > > > On machine without SMAP (Sandy Bridge): > > > > > > > > > > > > > > > > > > Before: 4.8Mpps > > > > > > > > > > > > > > > > > > After: 5.2Mpps > > > > > > > > OK so would you say it's really unsafe versus safe accesses? > > > > > > > > Or would you say it's just a better written code? > > > > > > > > > > > > > > > > > > > > > It's the effect of removing speculation barrier. > > > > > > > > > > > > > > > > > > You mean __uaccess_begin_nospec introduced by > > > > > > commit 304ec1b050310548db33063e567123fae8fd0301 > > > > > > ? > > > > > > > > > > > > So fundamentally we do access_ok checks when supplying > > > > > > the memory table to the kernel thread, and we should > > > > > > do the spec barrier there. > > > > > > > > > > > > Then we can just create and use a variant of uaccess macros that does > > > > > > not include the barrier? > > > > > > > > > > > > Or, how about moving the barrier into access_ok? > > > > > > This way repeated accesses with a single access_ok get a bit faster. > > > > > > CC Dan Williams on this idea. > > > > > > > > > > It would be interesting to see how expensive re-doing the address > > > > > limit check is compared to the speculation barrier. I.e. just switch > > > > > vhost_get_user() to use get_user() rather than __get_user(). That will > > > > > sanitize the pointer in the speculative path without a barrier. > > > > > > > > Hmm it's way cheaper even though IIRC it's measureable. > > > > Jason, would you like to try? > > > > Although frankly __get_user being slower than get_user feels very wrong. > > > > Not yet sure what to do exactly but would you agree? > > > > > > Agree. __get_user() being faster than get_user() defeats the whole > > > point of converting code paths to the access_ok() + __get_user() > > > pattern. > > > > Did you mean the reverse? > > Hmm, no... I'll rephrase: __get_user() should have lower overhead than > get_user(). Right ... Linus, given that you just changed all users of access_ok anyway, do you still think that the access_ok() conversion to return a speculation sanitized pointer or NULL is too big a conversion? It was previously discarded here: https://lkml.org/lkml/2018/1/17/929 but at that point we didn't have numbers and there was an understandable rush to ship something safe. At this point I think that vhost can show very measureable gains from this conversion. Thanks,
On Tue, Jan 8, 2019 at 8:31 PM Michael S. Tsirkin <mst@redhat.com> wrote: > > Linus, given that you just changed all users of access_ok anyway, do > you still think that the access_ok() conversion to return a speculation > sanitized pointer or NULL is too big a conversion? I didn't actually change a single access_ok(). I changed the (very few) users of "user_access_begin()" to do an access_ok() in them. There were 8 of them total. It turns out that two of those cases (the strn*_user() ones) found bugs in the implementation of access_ok() of two architectures, and then looking at the others found that six more architectures also had problems, but those weren't actually because of any access_ok() changes, they were pre-existing issues. So we definitely had unfortunate bugs in access_ok(), but they were mostly the benign kind (ir the "use arguments twice - a real potential bug, but not one that actually likely makes any difference to existing users) Changing all 600+ users of access_ok() would be painful. That said, one thing I *would* like to do is to just get rid of __get_user() and __put_user() entirely. Or rather, just make them do exactly the same thing that the normal "get_user()"/"put_user()" functions do. And then, _within_ the case of get_user()/put_user(), doing the access_ok() as a data dependency rather than a lfence should be easy enough. Linus
On 2019/1/8 下午6:12, Jason Wang wrote: > > On 2019/1/7 下午10:47, Michael S. Tsirkin wrote: >> On Mon, Jan 07, 2019 at 02:58:08PM +0800, Jason Wang wrote: >>> On 2019/1/5 上午5:41, Michael S. Tsirkin wrote: >>>> On Sat, Dec 29, 2018 at 08:46:51PM +0800, Jason Wang wrote: >>>>> This series tries to access virtqueue metadata through kernel virtual >>>>> address instead of copy_user() friends since they had too much >>>>> overheads like checks, spec barriers or even hardware feature >>>>> toggling. >>>> I think it's a reasonable approach. >>>> However I need to look at whether and which mmu notifiers are >>>> invoked before >>>> writeback. Do you know? >>> >>> I don't know but just looking at the MMU notifier ops definition, >>> there's no >>> such callback if my understanding is correct. >>> >>> Thanks >> In that case how are you making sure used ring updates are written back? >> If they aren't guest will crash ... > > > I think this is the writeback issue you mentioned early. I don't do a > followup on the pointer but it looks to me some work is ongoing to fix > the issue. > > I can investigate it more, but it's not something new, consider the > case of VFIO. > > Thanks Ok, after some investigation. The GUP + dirty pages issue is not easy to be fixed so it may still take a while. An idea is switch back to copy_user() friends if we find the metadata page is not anonymous. Does this sound good to you? Thanks