diff mbox series

[net] virtio-net: add cond_resched() to the command waiting loop

Message ID 20220905045341.66191-1-jasowang@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] virtio-net: add cond_resched() to the command waiting loop | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers success CCed 9 of 9 maintainers
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Wang Sept. 5, 2022, 4:53 a.m. UTC
Adding cond_resched() to the command waiting loop for a better
co-operation with the scheduler. This allows to give CPU a breath to
run other task(workqueue) instead of busy looping when preemption is
not allowed.

What's more important. This is a must for some vDPA parent to work
since control virtqueue is emulated via a workqueue for those parents.

Fixes: bda324fd037a ("vdpasim: control virtqueue support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/virtio_net.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Comments

Michael S. Tsirkin Sept. 5, 2022, 7:15 a.m. UTC | #1
On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> Adding cond_resched() to the command waiting loop for a better
> co-operation with the scheduler. This allows to give CPU a breath to
> run other task(workqueue) instead of busy looping when preemption is
> not allowed.
> 
> What's more important. This is a must for some vDPA parent to work
> since control virtqueue is emulated via a workqueue for those parents.
> 
> Fixes: bda324fd037a ("vdpasim: control virtqueue support")

That's a weird commit to fix. so it fixes the simulator?

> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ece00b84e3a7..169368365d6a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2000,8 +2000,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  	 * into the hypervisor, so the request should be handled immediately.
>  	 */
>  	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -	       !virtqueue_is_broken(vi->cvq))
> +	       !virtqueue_is_broken(vi->cvq)) {
> +		cond_resched();
>  		cpu_relax();
> +	}

with cond_resched do we still need cpu_relax?

>  	return vi->ctrl->status == VIRTIO_NET_OK;
>  }
> -- 
> 2.25.1
Jason Wang Sept. 5, 2022, 7:49 a.m. UTC | #2
On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > Adding cond_resched() to the command waiting loop for a better
> > co-operation with the scheduler. This allows to give CPU a breath to
> > run other task(workqueue) instead of busy looping when preemption is
> > not allowed.
> >
> > What's more important. This is a must for some vDPA parent to work
> > since control virtqueue is emulated via a workqueue for those parents.
> >
> > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
>
> That's a weird commit to fix. so it fixes the simulator?

Yes, since the simulator is using a workqueue to handle control virtueue.

>
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/net/virtio_net.c | 4 +++-
> >  1 file changed, 3 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> > index ece00b84e3a7..169368365d6a 100644
> > --- a/drivers/net/virtio_net.c
> > +++ b/drivers/net/virtio_net.c
> > @@ -2000,8 +2000,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
> >        * into the hypervisor, so the request should be handled immediately.
> >        */
> >       while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> > -            !virtqueue_is_broken(vi->cvq))
> > +            !virtqueue_is_broken(vi->cvq)) {
> > +             cond_resched();
> >               cpu_relax();
> > +     }
>
> with cond_resched do we still need cpu_relax?

I think so, it's possible that cond_sched() just doesn't trigger scheduling.

Thanks

>
> >       return vi->ctrl->status == VIRTIO_NET_OK;
> >  }
> > --
> > 2.25.1
>
Paolo Abeni Sept. 6, 2022, 10:55 a.m. UTC | #3
On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > Adding cond_resched() to the command waiting loop for a better
> > > co-operation with the scheduler. This allows to give CPU a breath to
> > > run other task(workqueue) instead of busy looping when preemption is
> > > not allowed.
> > > 
> > > What's more important. This is a must for some vDPA parent to work
> > > since control virtqueue is emulated via a workqueue for those parents.
> > > 
> > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > 
> > That's a weird commit to fix. so it fixes the simulator?
> 
> Yes, since the simulator is using a workqueue to handle control virtueue.

Uhmm... touching a driver for a simulator's sake looks a little weird. 

Additionally, if the bug is vdpasim, I think it's better to try to
solve it there, if possible.

Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
neither needs a process context, so perhaps you could rework it to run
the work_fn() directly from vdpasim_kick_vq(), at least for the control
virtqueue?

Thanks!

Paolo
Jason Wang Sept. 7, 2022, 2:09 a.m. UTC | #4
On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > Adding cond_resched() to the command waiting loop for a better
> > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > run other task(workqueue) instead of busy looping when preemption is
> > > > not allowed.
> > > >
> > > > What's more important. This is a must for some vDPA parent to work
> > > > since control virtqueue is emulated via a workqueue for those parents.
> > > >
> > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > >
> > > That's a weird commit to fix. so it fixes the simulator?
> >
> > Yes, since the simulator is using a workqueue to handle control virtueue.
>
> Uhmm... touching a driver for a simulator's sake looks a little weird.

Simulator is not the only one that is using a workqueue (but should be
the first).

I can see  that the mlx5 vDPA driver is using a workqueue as well (see
mlx5_vdpa_kick_vq()).

And in the case of VDUSE, it needs to wait for the response from the
userspace, this means cond_resched() is probably a must for the case
like UP.

>
> Additionally, if the bug is vdpasim, I think it's better to try to
> solve it there, if possible.
>
> Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> neither needs a process context, so perhaps you could rework it to run
> the work_fn() directly from vdpasim_kick_vq(), at least for the control
> virtqueue?

It's possible (but require some rework on the simulator core). But
considering we have other similar use cases, it looks better to solve
it in the virtio-net driver.

Additionally, this may have better behaviour when using for the buggy
hardware (e.g the control virtqueue takes too long to respond). We may
consider switching to use interrupt/sleep in the future (but not
suitable for -net).

Thanks

>
> Thanks!
>
> Paolo
>
Paolo Abeni Sept. 7, 2022, 7:07 a.m. UTC | #5
On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > 
> > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > not allowed.
> > > > > 
> > > > > What's more important. This is a must for some vDPA parent to work
> > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > 
> > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > 
> > > > That's a weird commit to fix. so it fixes the simulator?
> > > 
> > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > 
> > Uhmm... touching a driver for a simulator's sake looks a little weird.
> 
> Simulator is not the only one that is using a workqueue (but should be
> the first).
> 
> I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> mlx5_vdpa_kick_vq()).
> 
> And in the case of VDUSE, it needs to wait for the response from the
> userspace, this means cond_resched() is probably a must for the case
> like UP.
> 
> > 
> > Additionally, if the bug is vdpasim, I think it's better to try to
> > solve it there, if possible.
> > 
> > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > neither needs a process context, so perhaps you could rework it to run
> > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > virtqueue?
> 
> It's possible (but require some rework on the simulator core). But
> considering we have other similar use cases, it looks better to solve
> it in the virtio-net driver.

I see.

> Additionally, this may have better behaviour when using for the buggy
> hardware (e.g the control virtqueue takes too long to respond). We may
> consider switching to use interrupt/sleep in the future (but not
> suitable for -net).

Agreed. Possibly a timeout could be useful, too.

Cheers,

Paolo
Michael S. Tsirkin Sept. 7, 2022, 7:46 a.m. UTC | #6
On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
> On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> > On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > 
> > > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > 
> > > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > > not allowed.
> > > > > > 
> > > > > > What's more important. This is a must for some vDPA parent to work
> > > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > > 
> > > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > > 
> > > > > That's a weird commit to fix. so it fixes the simulator?
> > > > 
> > > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > > 
> > > Uhmm... touching a driver for a simulator's sake looks a little weird.
> > 
> > Simulator is not the only one that is using a workqueue (but should be
> > the first).
> > 
> > I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> > mlx5_vdpa_kick_vq()).
> > 
> > And in the case of VDUSE, it needs to wait for the response from the
> > userspace, this means cond_resched() is probably a must for the case
> > like UP.
> > 
> > > 
> > > Additionally, if the bug is vdpasim, I think it's better to try to
> > > solve it there, if possible.
> > > 
> > > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > > neither needs a process context, so perhaps you could rework it to run
> > > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > > virtqueue?
> > 
> > It's possible (but require some rework on the simulator core). But
> > considering we have other similar use cases, it looks better to solve
> > it in the virtio-net driver.
> 
> I see.
> 
> > Additionally, this may have better behaviour when using for the buggy
> > hardware (e.g the control virtqueue takes too long to respond). We may
> > consider switching to use interrupt/sleep in the future (but not
> > suitable for -net).
> 
> Agreed. Possibly a timeout could be useful, too.
> 
> Cheers,
> 
> Paolo


Hmm timeouts are kind of arbitrary.
regular drivers basically derive them from hardware
behaviour but with a generic driver like virtio it's harder.
I guess we could add timeout as a config field, have
device make a promise to the driver.

Making the wait interruptible seems more reasonable.
Jason Wang Sept. 8, 2022, 2:21 a.m. UTC | #7
在 2022/9/7 15:46, Michael S. Tsirkin 写道:
> On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
>> On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
>>> On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>> On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
>>>>> On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>> On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
>>>>>>> Adding cond_resched() to the command waiting loop for a better
>>>>>>> co-operation with the scheduler. This allows to give CPU a breath to
>>>>>>> run other task(workqueue) instead of busy looping when preemption is
>>>>>>> not allowed.
>>>>>>>
>>>>>>> What's more important. This is a must for some vDPA parent to work
>>>>>>> since control virtqueue is emulated via a workqueue for those parents.
>>>>>>>
>>>>>>> Fixes: bda324fd037a ("vdpasim: control virtqueue support")
>>>>>> That's a weird commit to fix. so it fixes the simulator?
>>>>> Yes, since the simulator is using a workqueue to handle control virtueue.
>>>> Uhmm... touching a driver for a simulator's sake looks a little weird.
>>> Simulator is not the only one that is using a workqueue (but should be
>>> the first).
>>>
>>> I can see  that the mlx5 vDPA driver is using a workqueue as well (see
>>> mlx5_vdpa_kick_vq()).
>>>
>>> And in the case of VDUSE, it needs to wait for the response from the
>>> userspace, this means cond_resched() is probably a must for the case
>>> like UP.
>>>
>>>> Additionally, if the bug is vdpasim, I think it's better to try to
>>>> solve it there, if possible.
>>>>
>>>> Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
>>>> neither needs a process context, so perhaps you could rework it to run
>>>> the work_fn() directly from vdpasim_kick_vq(), at least for the control
>>>> virtqueue?
>>> It's possible (but require some rework on the simulator core). But
>>> considering we have other similar use cases, it looks better to solve
>>> it in the virtio-net driver.
>> I see.
>>
>>> Additionally, this may have better behaviour when using for the buggy
>>> hardware (e.g the control virtqueue takes too long to respond). We may
>>> consider switching to use interrupt/sleep in the future (but not
>>> suitable for -net).
>> Agreed. Possibly a timeout could be useful, too.
>>
>> Cheers,
>>
>> Paolo
>
> Hmm timeouts are kind of arbitrary.
> regular drivers basically derive them from hardware
> behaviour but with a generic driver like virtio it's harder.
> I guess we could add timeout as a config field, have
> device make a promise to the driver.
>
> Making the wait interruptible seems more reasonable.


Yes, but I think we still need this patch for -net and -stable.

Thanks


>
Michael S. Tsirkin Sept. 8, 2022, 5:19 a.m. UTC | #8
On Thu, Sep 08, 2022 at 10:21:45AM +0800, Jason Wang wrote:
> 
> 在 2022/9/7 15:46, Michael S. Tsirkin 写道:
> > On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
> > > On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> > > > On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > > > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > > > > not allowed.
> > > > > > > > 
> > > > > > > > What's more important. This is a must for some vDPA parent to work
> > > > > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > > > > 
> > > > > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > > > > That's a weird commit to fix. so it fixes the simulator?
> > > > > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > > > > Uhmm... touching a driver for a simulator's sake looks a little weird.
> > > > Simulator is not the only one that is using a workqueue (but should be
> > > > the first).
> > > > 
> > > > I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> > > > mlx5_vdpa_kick_vq()).
> > > > 
> > > > And in the case of VDUSE, it needs to wait for the response from the
> > > > userspace, this means cond_resched() is probably a must for the case
> > > > like UP.
> > > > 
> > > > > Additionally, if the bug is vdpasim, I think it's better to try to
> > > > > solve it there, if possible.
> > > > > 
> > > > > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > > > > neither needs a process context, so perhaps you could rework it to run
> > > > > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > > > > virtqueue?
> > > > It's possible (but require some rework on the simulator core). But
> > > > considering we have other similar use cases, it looks better to solve
> > > > it in the virtio-net driver.
> > > I see.
> > > 
> > > > Additionally, this may have better behaviour when using for the buggy
> > > > hardware (e.g the control virtqueue takes too long to respond). We may
> > > > consider switching to use interrupt/sleep in the future (but not
> > > > suitable for -net).
> > > Agreed. Possibly a timeout could be useful, too.
> > > 
> > > Cheers,
> > > 
> > > Paolo
> > 
> > Hmm timeouts are kind of arbitrary.
> > regular drivers basically derive them from hardware
> > behaviour but with a generic driver like virtio it's harder.
> > I guess we could add timeout as a config field, have
> > device make a promise to the driver.
> > 
> > Making the wait interruptible seems more reasonable.
> 
> 
> Yes, but I think we still need this patch for -net and -stable.
> 
> Thanks

I was referring to Paolo's idea of having a timeout.
Jason Wang Oct. 9, 2022, 5:58 a.m. UTC | #9
在 2022/9/8 13:19, Michael S. Tsirkin 写道:
> On Thu, Sep 08, 2022 at 10:21:45AM +0800, Jason Wang wrote:
>> 在 2022/9/7 15:46, Michael S. Tsirkin 写道:
>>> On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
>>>> On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
>>>>> On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
>>>>>> On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
>>>>>>> On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>>>>>>>> On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
>>>>>>>>> Adding cond_resched() to the command waiting loop for a better
>>>>>>>>> co-operation with the scheduler. This allows to give CPU a breath to
>>>>>>>>> run other task(workqueue) instead of busy looping when preemption is
>>>>>>>>> not allowed.
>>>>>>>>>
>>>>>>>>> What's more important. This is a must for some vDPA parent to work
>>>>>>>>> since control virtqueue is emulated via a workqueue for those parents.
>>>>>>>>>
>>>>>>>>> Fixes: bda324fd037a ("vdpasim: control virtqueue support")
>>>>>>>> That's a weird commit to fix. so it fixes the simulator?
>>>>>>> Yes, since the simulator is using a workqueue to handle control virtueue.
>>>>>> Uhmm... touching a driver for a simulator's sake looks a little weird.
>>>>> Simulator is not the only one that is using a workqueue (but should be
>>>>> the first).
>>>>>
>>>>> I can see  that the mlx5 vDPA driver is using a workqueue as well (see
>>>>> mlx5_vdpa_kick_vq()).
>>>>>
>>>>> And in the case of VDUSE, it needs to wait for the response from the
>>>>> userspace, this means cond_resched() is probably a must for the case
>>>>> like UP.
>>>>>
>>>>>> Additionally, if the bug is vdpasim, I think it's better to try to
>>>>>> solve it there, if possible.
>>>>>>
>>>>>> Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
>>>>>> neither needs a process context, so perhaps you could rework it to run
>>>>>> the work_fn() directly from vdpasim_kick_vq(), at least for the control
>>>>>> virtqueue?
>>>>> It's possible (but require some rework on the simulator core). But
>>>>> considering we have other similar use cases, it looks better to solve
>>>>> it in the virtio-net driver.
>>>> I see.
>>>>
>>>>> Additionally, this may have better behaviour when using for the buggy
>>>>> hardware (e.g the control virtqueue takes too long to respond). We may
>>>>> consider switching to use interrupt/sleep in the future (but not
>>>>> suitable for -net).
>>>> Agreed. Possibly a timeout could be useful, too.
>>>>
>>>> Cheers,
>>>>
>>>> Paolo
>>> Hmm timeouts are kind of arbitrary.
>>> regular drivers basically derive them from hardware
>>> behaviour but with a generic driver like virtio it's harder.
>>> I guess we could add timeout as a config field, have
>>> device make a promise to the driver.
>>>
>>> Making the wait interruptible seems more reasonable.
>>
>> Yes, but I think we still need this patch for -net and -stable.
>>
>> Thanks
> I was referring to Paolo's idea of having a timeout.


Ok, I think we're fine with this patch. Any chance to merge this or do I 
need to resend?

Thanks


>
Michael S. Tsirkin Oct. 10, 2022, 5:11 p.m. UTC | #10
On Sun, Oct 09, 2022 at 01:58:53PM +0800, Jason Wang wrote:
> 
> 在 2022/9/8 13:19, Michael S. Tsirkin 写道:
> > On Thu, Sep 08, 2022 at 10:21:45AM +0800, Jason Wang wrote:
> > > 在 2022/9/7 15:46, Michael S. Tsirkin 写道:
> > > > On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
> > > > > On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> > > > > > On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > > > > > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > > > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > > > > > > not allowed.
> > > > > > > > > > 
> > > > > > > > > > What's more important. This is a must for some vDPA parent to work
> > > > > > > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > > > > > > 
> > > > > > > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > > > > > > That's a weird commit to fix. so it fixes the simulator?
> > > > > > > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > > > > > > Uhmm... touching a driver for a simulator's sake looks a little weird.
> > > > > > Simulator is not the only one that is using a workqueue (but should be
> > > > > > the first).
> > > > > > 
> > > > > > I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> > > > > > mlx5_vdpa_kick_vq()).
> > > > > > 
> > > > > > And in the case of VDUSE, it needs to wait for the response from the
> > > > > > userspace, this means cond_resched() is probably a must for the case
> > > > > > like UP.
> > > > > > 
> > > > > > > Additionally, if the bug is vdpasim, I think it's better to try to
> > > > > > > solve it there, if possible.
> > > > > > > 
> > > > > > > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > > > > > > neither needs a process context, so perhaps you could rework it to run
> > > > > > > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > > > > > > virtqueue?
> > > > > > It's possible (but require some rework on the simulator core). But
> > > > > > considering we have other similar use cases, it looks better to solve
> > > > > > it in the virtio-net driver.
> > > > > I see.
> > > > > 
> > > > > > Additionally, this may have better behaviour when using for the buggy
> > > > > > hardware (e.g the control virtqueue takes too long to respond). We may
> > > > > > consider switching to use interrupt/sleep in the future (but not
> > > > > > suitable for -net).
> > > > > Agreed. Possibly a timeout could be useful, too.
> > > > > 
> > > > > Cheers,
> > > > > 
> > > > > Paolo
> > > > Hmm timeouts are kind of arbitrary.
> > > > regular drivers basically derive them from hardware
> > > > behaviour but with a generic driver like virtio it's harder.
> > > > I guess we could add timeout as a config field, have
> > > > device make a promise to the driver.
> > > > 
> > > > Making the wait interruptible seems more reasonable.
> > > 
> > > Yes, but I think we still need this patch for -net and -stable.
> > > 
> > > Thanks
> > I was referring to Paolo's idea of having a timeout.
> 
> 
> Ok, I think we're fine with this patch. Any chance to merge this or do I
> need to resend?
> 
> Thanks

Last question: do we want cpu_relax here now? Or is cond_resched
sufficient?

> 
> >
Jason Wang Oct. 12, 2022, 3:19 a.m. UTC | #11
On Tue, Oct 11, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Sun, Oct 09, 2022 at 01:58:53PM +0800, Jason Wang wrote:
> >
> > 在 2022/9/8 13:19, Michael S. Tsirkin 写道:
> > > On Thu, Sep 08, 2022 at 10:21:45AM +0800, Jason Wang wrote:
> > > > 在 2022/9/7 15:46, Michael S. Tsirkin 写道:
> > > > > On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
> > > > > > On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> > > > > > > On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > > > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > > > > > > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > > > > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > > > > > > > not allowed.
> > > > > > > > > > >
> > > > > > > > > > > What's more important. This is a must for some vDPA parent to work
> > > > > > > > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > > > > > > >
> > > > > > > > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > > > > > > > That's a weird commit to fix. so it fixes the simulator?
> > > > > > > > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > > > > > > > Uhmm... touching a driver for a simulator's sake looks a little weird.
> > > > > > > Simulator is not the only one that is using a workqueue (but should be
> > > > > > > the first).
> > > > > > >
> > > > > > > I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> > > > > > > mlx5_vdpa_kick_vq()).
> > > > > > >
> > > > > > > And in the case of VDUSE, it needs to wait for the response from the
> > > > > > > userspace, this means cond_resched() is probably a must for the case
> > > > > > > like UP.
> > > > > > >
> > > > > > > > Additionally, if the bug is vdpasim, I think it's better to try to
> > > > > > > > solve it there, if possible.
> > > > > > > >
> > > > > > > > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > > > > > > > neither needs a process context, so perhaps you could rework it to run
> > > > > > > > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > > > > > > > virtqueue?
> > > > > > > It's possible (but require some rework on the simulator core). But
> > > > > > > considering we have other similar use cases, it looks better to solve
> > > > > > > it in the virtio-net driver.
> > > > > > I see.
> > > > > >
> > > > > > > Additionally, this may have better behaviour when using for the buggy
> > > > > > > hardware (e.g the control virtqueue takes too long to respond). We may
> > > > > > > consider switching to use interrupt/sleep in the future (but not
> > > > > > > suitable for -net).
> > > > > > Agreed. Possibly a timeout could be useful, too.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Paolo
> > > > > Hmm timeouts are kind of arbitrary.
> > > > > regular drivers basically derive them from hardware
> > > > > behaviour but with a generic driver like virtio it's harder.
> > > > > I guess we could add timeout as a config field, have
> > > > > device make a promise to the driver.
> > > > >
> > > > > Making the wait interruptible seems more reasonable.
> > > >
> > > > Yes, but I think we still need this patch for -net and -stable.
> > > >
> > > > Thanks
> > > I was referring to Paolo's idea of having a timeout.
> >
> >
> > Ok, I think we're fine with this patch. Any chance to merge this or do I
> > need to resend?
> >
> > Thanks
>
> Last question: do we want cpu_relax here now? Or is cond_resched
> sufficient?

(Have answered in another thread)

I think we need cpu_relax() since there could be no high priority task
in the current cpu so we still need to relax.

Thanks

>
> >
> > >
>
Jason Wang Oct. 17, 2022, 7:15 a.m. UTC | #12
On Wed, Oct 12, 2022 at 11:19 AM Jason Wang <jasowang@redhat.com> wrote:
>
> On Tue, Oct 11, 2022 at 1:11 AM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Sun, Oct 09, 2022 at 01:58:53PM +0800, Jason Wang wrote:
> > >
> > > 在 2022/9/8 13:19, Michael S. Tsirkin 写道:
> > > > On Thu, Sep 08, 2022 at 10:21:45AM +0800, Jason Wang wrote:
> > > > > 在 2022/9/7 15:46, Michael S. Tsirkin 写道:
> > > > > > On Wed, Sep 07, 2022 at 09:07:20AM +0200, Paolo Abeni wrote:
> > > > > > > On Wed, 2022-09-07 at 10:09 +0800, Jason Wang wrote:
> > > > > > > > On Tue, Sep 6, 2022 at 6:56 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > > > > > On Mon, 2022-09-05 at 15:49 +0800, Jason Wang wrote:
> > > > > > > > > > On Mon, Sep 5, 2022 at 3:15 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > > > > > > > > On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> > > > > > > > > > > > Adding cond_resched() to the command waiting loop for a better
> > > > > > > > > > > > co-operation with the scheduler. This allows to give CPU a breath to
> > > > > > > > > > > > run other task(workqueue) instead of busy looping when preemption is
> > > > > > > > > > > > not allowed.
> > > > > > > > > > > >
> > > > > > > > > > > > What's more important. This is a must for some vDPA parent to work
> > > > > > > > > > > > since control virtqueue is emulated via a workqueue for those parents.
> > > > > > > > > > > >
> > > > > > > > > > > > Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> > > > > > > > > > > That's a weird commit to fix. so it fixes the simulator?
> > > > > > > > > > Yes, since the simulator is using a workqueue to handle control virtueue.
> > > > > > > > > Uhmm... touching a driver for a simulator's sake looks a little weird.
> > > > > > > > Simulator is not the only one that is using a workqueue (but should be
> > > > > > > > the first).
> > > > > > > >
> > > > > > > > I can see  that the mlx5 vDPA driver is using a workqueue as well (see
> > > > > > > > mlx5_vdpa_kick_vq()).
> > > > > > > >
> > > > > > > > And in the case of VDUSE, it needs to wait for the response from the
> > > > > > > > userspace, this means cond_resched() is probably a must for the case
> > > > > > > > like UP.
> > > > > > > >
> > > > > > > > > Additionally, if the bug is vdpasim, I think it's better to try to
> > > > > > > > > solve it there, if possible.
> > > > > > > > >
> > > > > > > > > Looking at vdpasim_net_work() and vdpasim_blk_work() it looks like
> > > > > > > > > neither needs a process context, so perhaps you could rework it to run
> > > > > > > > > the work_fn() directly from vdpasim_kick_vq(), at least for the control
> > > > > > > > > virtqueue?
> > > > > > > > It's possible (but require some rework on the simulator core). But
> > > > > > > > considering we have other similar use cases, it looks better to solve
> > > > > > > > it in the virtio-net driver.
> > > > > > > I see.
> > > > > > >
> > > > > > > > Additionally, this may have better behaviour when using for the buggy
> > > > > > > > hardware (e.g the control virtqueue takes too long to respond). We may
> > > > > > > > consider switching to use interrupt/sleep in the future (but not
> > > > > > > > suitable for -net).
> > > > > > > Agreed. Possibly a timeout could be useful, too.
> > > > > > >
> > > > > > > Cheers,
> > > > > > >
> > > > > > > Paolo
> > > > > > Hmm timeouts are kind of arbitrary.
> > > > > > regular drivers basically derive them from hardware
> > > > > > behaviour but with a generic driver like virtio it's harder.
> > > > > > I guess we could add timeout as a config field, have
> > > > > > device make a promise to the driver.
> > > > > >
> > > > > > Making the wait interruptible seems more reasonable.
> > > > >
> > > > > Yes, but I think we still need this patch for -net and -stable.
> > > > >
> > > > > Thanks
> > > > I was referring to Paolo's idea of having a timeout.
> > >
> > >
> > > Ok, I think we're fine with this patch. Any chance to merge this or do I
> > > need to resend?
> > >
> > > Thanks
> >
> > Last question: do we want cpu_relax here now? Or is cond_resched
> > sufficient?
>
> (Have answered in another thread)
>
> I think we need cpu_relax() since there could be no high priority task
> in the current cpu so we still need to relax.
>
> Thanks

Michael, does this answer make sense? If yes, would you like to ack the patch?

Thanks

>
> >
> > >
> > > >
> >
Michael S. Tsirkin Oct. 17, 2022, 8:09 p.m. UTC | #13
On Mon, Sep 05, 2022 at 12:53:41PM +0800, Jason Wang wrote:
> Adding cond_resched() to the command waiting loop for a better
> co-operation with the scheduler. This allows to give CPU a breath to
> run other task(workqueue) instead of busy looping when preemption is
> not allowed.
> 
> What's more important. This is a must for some vDPA parent to work
> since control virtqueue is emulated via a workqueue for those parents.
> 
> Fixes: bda324fd037a ("vdpasim: control virtqueue support")
> Signed-off-by: Jason Wang <jasowang@redhat.com>

Acked-by: Michael S. Tsirkin <mst@redhat.com>

> ---
>  drivers/net/virtio_net.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> index ece00b84e3a7..169368365d6a 100644
> --- a/drivers/net/virtio_net.c
> +++ b/drivers/net/virtio_net.c
> @@ -2000,8 +2000,10 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
>  	 * into the hypervisor, so the request should be handled immediately.
>  	 */
>  	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
> -	       !virtqueue_is_broken(vi->cvq))
> +	       !virtqueue_is_broken(vi->cvq)) {
> +		cond_resched();
>  		cpu_relax();
> +	}
>  
>  	return vi->ctrl->status == VIRTIO_NET_OK;
>  }
> -- 
> 2.25.1
diff mbox series

Patch

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index ece00b84e3a7..169368365d6a 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -2000,8 +2000,10 @@  static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd,
 	 * into the hypervisor, so the request should be handled immediately.
 	 */
 	while (!virtqueue_get_buf(vi->cvq, &tmp) &&
-	       !virtqueue_is_broken(vi->cvq))
+	       !virtqueue_is_broken(vi->cvq)) {
+		cond_resched();
 		cpu_relax();
+	}
 
 	return vi->ctrl->status == VIRTIO_NET_OK;
 }