Message ID | 20240416193039.272997-4-danielj@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Remove RTNL lock protection of CVQ | expand |
On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> wrote: > > The command VQ will no longer be protected by the RTNL lock. Use a > spinlock to protect the control buffer header and the VQ. > > Signed-off-by: Daniel Jurgens <danielj@nvidia.com> > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > --- > drivers/net/virtio_net.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 0ee192b45e1e..d02f83a919a7 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -282,6 +282,7 @@ struct virtnet_info { > > /* Has control virtqueue */ > bool has_cvq; > + spinlock_t cvq_lock; Spinlock is instead of mutex which is problematic as there's no guarantee on when the driver will get a reply. And it became even more serious after 0d197a147164 ("virtio-net: add cond_resched() to the command waiting loop"). Any reason we can't use mutex? Thanks > > /* Host can handle any s/g split between our header and packet data */ > bool any_header_sg; > @@ -2529,6 +2530,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, > /* Caller should know better */ > BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); > > + guard(spinlock)(&vi->cvq_lock); > vi->ctrl->status = ~0; > vi->ctrl->hdr.class = class; > vi->ctrl->hdr.cmd = cmd; > @@ -4818,8 +4820,10 @@ static int virtnet_probe(struct virtio_device *vdev) > virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) > vi->any_header_sg = true; > > - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) > + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { > vi->has_cvq = true; > + spin_lock_init(&vi->cvq_lock); > + } > > if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { > mtu = virtio_cread16(vdev, > -- > 2.34.1 >
在 2024/4/18 下午2:42, Jason Wang 写道: > On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> wrote: >> The command VQ will no longer be protected by the RTNL lock. Use a >> spinlock to protect the control buffer header and the VQ. >> >> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> >> Reviewed-by: Jiri Pirko <jiri@nvidia.com> >> --- >> drivers/net/virtio_net.c | 6 +++++- >> 1 file changed, 5 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> index 0ee192b45e1e..d02f83a919a7 100644 >> --- a/drivers/net/virtio_net.c >> +++ b/drivers/net/virtio_net.c >> @@ -282,6 +282,7 @@ struct virtnet_info { >> >> /* Has control virtqueue */ >> bool has_cvq; >> + spinlock_t cvq_lock; > Spinlock is instead of mutex which is problematic as there's no > guarantee on when the driver will get a reply. And it became even more > serious after 0d197a147164 ("virtio-net: add cond_resched() to the > command waiting loop"). > > Any reason we can't use mutex? Hi Jason, I made a patch set to enable ctrlq's irq on top of this patch set, which removes cond_resched(). But I need a little time to test, this is close to fast. So could the topic about cond_resched + spin lock or mutex lock be wait? Thank you very much! > > Thanks > >> /* Host can handle any s/g split between our header and packet data */ >> bool any_header_sg; >> @@ -2529,6 +2530,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, >> /* Caller should know better */ >> BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); >> >> + guard(spinlock)(&vi->cvq_lock); >> vi->ctrl->status = ~0; >> vi->ctrl->hdr.class = class; >> vi->ctrl->hdr.cmd = cmd; >> @@ -4818,8 +4820,10 @@ static int virtnet_probe(struct virtio_device *vdev) >> virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) >> vi->any_header_sg = true; >> >> - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) >> + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { >> vi->has_cvq = true; >> + spin_lock_init(&vi->cvq_lock); >> + } >> >> if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { >> mtu = virtio_cread16(vdev, >> -- >> 2.34.1 >>
On Thu, 2024-04-18 at 15:36 +0800, Heng Qi wrote: > > 在 2024/4/18 下午2:42, Jason Wang 写道: > > On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> wrote: > > > The command VQ will no longer be protected by the RTNL lock. Use a > > > spinlock to protect the control buffer header and the VQ. > > > > > > Signed-off-by: Daniel Jurgens <danielj@nvidia.com> > > > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > > > --- > > > drivers/net/virtio_net.c | 6 +++++- > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 0ee192b45e1e..d02f83a919a7 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -282,6 +282,7 @@ struct virtnet_info { > > > > > > /* Has control virtqueue */ > > > bool has_cvq; > > > + spinlock_t cvq_lock; > > Spinlock is instead of mutex which is problematic as there's no > > guarantee on when the driver will get a reply. And it became even more > > serious after 0d197a147164 ("virtio-net: add cond_resched() to the > > command waiting loop"). > > > > Any reason we can't use mutex? > > Hi Jason, > > I made a patch set to enable ctrlq's irq on top of this patch set, which > removes cond_resched(). > > But I need a little time to test, this is close to fast. So could the > topic about cond_resched + > spin lock or mutex lock be wait? The big problem is that until the cond_resched() is there, replacing the mutex with a spinlock can/will lead to scheduling while atomic splats. We can't intentionally introduce such scenario. Side note: the compiler apparently does not like guard() construct, leading to new warning, here and in later patches. I'm unsure if the code simplification is worthy. Cheers, Paolo
> From: Paolo Abeni <pabeni@redhat.com> > Sent: Thursday, April 18, 2024 5:57 AM > On Thu, 2024-04-18 at 15:36 +0800, Heng Qi wrote: > > > > 在 2024/4/18 下午2:42, Jason Wang 写道: > > > On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> > wrote: > > > > The command VQ will no longer be protected by the RTNL lock. Use a > > > > spinlock to protect the control buffer header and the VQ. > > > > > > > > Signed-off-by: Daniel Jurgens <danielj@nvidia.com> > > > > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > > > > --- > > > > drivers/net/virtio_net.c | 6 +++++- > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > index 0ee192b45e1e..d02f83a919a7 100644 > > > > --- a/drivers/net/virtio_net.c > > > > +++ b/drivers/net/virtio_net.c > > > > @@ -282,6 +282,7 @@ struct virtnet_info { > > > > > > > > /* Has control virtqueue */ > > > > bool has_cvq; > > > > + spinlock_t cvq_lock; > > > Spinlock is instead of mutex which is problematic as there's no > > > guarantee on when the driver will get a reply. And it became even > > > more serious after 0d197a147164 ("virtio-net: add cond_resched() to > > > the command waiting loop"). > > > > > > Any reason we can't use mutex? > > > > Hi Jason, > > > > I made a patch set to enable ctrlq's irq on top of this patch set, > > which removes cond_resched(). > > > > But I need a little time to test, this is close to fast. So could the > > topic about cond_resched + spin lock or mutex lock be wait? > > The big problem is that until the cond_resched() is there, replacing the > mutex with a spinlock can/will lead to scheduling while atomic splats. We > can't intentionally introduce such scenario. When I created the series set_rx_mode wasn't moved to a work queue, and the cond_resched wasn't there. Mutex wasn't possible, then. If the CVQ is made to be event driven, then the lock can be released right after posting the work to the VQ. > > Side note: the compiler apparently does not like guard() construct, leading to > new warning, here and in later patches. I'm unsure if the code simplification > is worthy. I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) clang version 17.0.6 (Fedora 17.0.6-2.fc39) > > Cheers, > > Paolo
On Thu, 2024-04-18 at 15:38 +0000, Dan Jurgens wrote: > > From: Paolo Abeni <pabeni@redhat.com> > > Sent: Thursday, April 18, 2024 5:57 AM > > On Thu, 2024-04-18 at 15:36 +0800, Heng Qi wrote: > > > > > > 在 2024/4/18 下午2:42, Jason Wang 写道: > > > > On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> > > wrote: > > > > > The command VQ will no longer be protected by the RTNL lock. Use a > > > > > spinlock to protect the control buffer header and the VQ. > > > > > > > > > > Signed-off-by: Daniel Jurgens <danielj@nvidia.com> > > > > > Reviewed-by: Jiri Pirko <jiri@nvidia.com> > > > > > --- > > > > > drivers/net/virtio_net.c | 6 +++++- > > > > > 1 file changed, 5 insertions(+), 1 deletion(-) > > > > > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > > > index 0ee192b45e1e..d02f83a919a7 100644 > > > > > --- a/drivers/net/virtio_net.c > > > > > +++ b/drivers/net/virtio_net.c > > > > > @@ -282,6 +282,7 @@ struct virtnet_info { > > > > > > > > > > /* Has control virtqueue */ > > > > > bool has_cvq; > > > > > + spinlock_t cvq_lock; > > > > Spinlock is instead of mutex which is problematic as there's no > > > > guarantee on when the driver will get a reply. And it became even > > > > more serious after 0d197a147164 ("virtio-net: add cond_resched() to > > > > the command waiting loop"). > > > > > > > > Any reason we can't use mutex? > > > > > > Hi Jason, > > > > > > I made a patch set to enable ctrlq's irq on top of this patch set, > > > which removes cond_resched(). > > > > > > But I need a little time to test, this is close to fast. So could the > > > topic about cond_resched + spin lock or mutex lock be wait? > > > > The big problem is that until the cond_resched() is there, replacing the > > mutex with a spinlock can/will lead to scheduling while atomic splats. We > > can't intentionally introduce such scenario. > > When I created the series set_rx_mode wasn't moved to a work queue, > and the cond_resched wasn't there. Unfortunately cond_resched() is there right now. > Mutex wasn't possible, then. If the CVQ is made to be event driven, then > the lock can be released right after posting the work to the VQ. That should work. > > Side note: the compiler apparently does not like guard() construct, leading to > > new warning, here and in later patches. I'm unsure if the code simplification > > is worthy. > > I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. > gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) > clang version 17.0.6 (Fedora 17.0.6-2.fc39) > See: https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-4-danielj@nvidia.com/ https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_32bit/stderr https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_allmodconfig_warn/stderr Cheers, Paolo
On Thu, 18 Apr 2024 17:48:57 +0200 Paolo Abeni wrote: > > > Side note: the compiler apparently does not like guard() construct, leading to > > > new warning, here and in later patches. I'm unsure if the code simplification > > > is worthy. > > > > I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. > > gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) > > clang version 17.0.6 (Fedora 17.0.6-2.fc39) > > > > See: > > https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-4-danielj@nvidia.com/ > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_32bit/stderr > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_allmodconfig_warn/stderr These are sparse errors, I think, but I agree that there's little gain here and clearly a cost of wasted time, since the standard kernel tooling has not caught with with this ugly invention.
> I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. > gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) > clang version 17.0.6 (Fedora 17.0.6-2.fc39) > I think Paolo is suggesting this[1][2], guard will mess with the sparse check and cause a warning: [1] https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-4-danielj@nvidia.com/ [2] https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-6-danielj@nvidia.com/
在 2024/4/18 下午11:48, Paolo Abeni 写道: > On Thu, 2024-04-18 at 15:38 +0000, Dan Jurgens wrote: >>> From: Paolo Abeni <pabeni@redhat.com> >>> Sent: Thursday, April 18, 2024 5:57 AM >>> On Thu, 2024-04-18 at 15:36 +0800, Heng Qi wrote: >>>> 在 2024/4/18 下午2:42, Jason Wang 写道: >>>>> On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> >>> wrote: >>>>>> The command VQ will no longer be protected by the RTNL lock. Use a >>>>>> spinlock to protect the control buffer header and the VQ. >>>>>> >>>>>> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> >>>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> >>>>>> --- >>>>>> drivers/net/virtio_net.c | 6 +++++- >>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>>> index 0ee192b45e1e..d02f83a919a7 100644 >>>>>> --- a/drivers/net/virtio_net.c >>>>>> +++ b/drivers/net/virtio_net.c >>>>>> @@ -282,6 +282,7 @@ struct virtnet_info { >>>>>> >>>>>> /* Has control virtqueue */ >>>>>> bool has_cvq; >>>>>> + spinlock_t cvq_lock; >>>>> Spinlock is instead of mutex which is problematic as there's no >>>>> guarantee on when the driver will get a reply. And it became even >>>>> more serious after 0d197a147164 ("virtio-net: add cond_resched() to >>>>> the command waiting loop"). >>>>> >>>>> Any reason we can't use mutex? >>>> Hi Jason, >>>> >>>> I made a patch set to enable ctrlq's irq on top of this patch set, >>>> which removes cond_resched(). >>>> >>>> But I need a little time to test, this is close to fast. So could the >>>> topic about cond_resched + spin lock or mutex lock be wait? >>> The big problem is that until the cond_resched() is there, replacing the >>> mutex with a spinlock can/will lead to scheduling while atomic splats. We >>> can't intentionally introduce such scenario. >> When I created the series set_rx_mode wasn't moved to a work queue, >> and the cond_resched wasn't there. > Unfortunately cond_resched() is there right now. YES. > >> Mutex wasn't possible, then. If the CVQ is made to be event driven, then >> the lock can be released right after posting the work to the VQ. > That should work. Yes, I will test my new patches (ctrlq with irq enabled) soon, then the combination of the this set and mine MAY make deciding between mutex or spin lock easier. Thanks. > >>> Side note: the compiler apparently does not like guard() construct, leading to >>> new warning, here and in later patches. I'm unsure if the code simplification >>> is worthy. >> I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. >> gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) >> clang version 17.0.6 (Fedora 17.0.6-2.fc39) >> > See: > > https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-4-danielj@nvidia.com/ > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_32bit/stderr > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_allmodconfig_warn/stderr > > Cheers, > > Paolo
On Fri, Apr 19, 2024 at 12:12 AM Heng Qi <hengqi@linux.alibaba.com> wrote: > > > > 在 2024/4/18 下午11:48, Paolo Abeni 写道: > > On Thu, 2024-04-18 at 15:38 +0000, Dan Jurgens wrote: > >>> From: Paolo Abeni <pabeni@redhat.com> > >>> Sent: Thursday, April 18, 2024 5:57 AM > >>> On Thu, 2024-04-18 at 15:36 +0800, Heng Qi wrote: > >>>> 在 2024/4/18 下午2:42, Jason Wang 写道: > >>>>> On Wed, Apr 17, 2024 at 3:31 AM Daniel Jurgens <danielj@nvidia.com> > >>> wrote: > >>>>>> The command VQ will no longer be protected by the RTNL lock. Use a > >>>>>> spinlock to protect the control buffer header and the VQ. > >>>>>> > >>>>>> Signed-off-by: Daniel Jurgens <danielj@nvidia.com> > >>>>>> Reviewed-by: Jiri Pirko <jiri@nvidia.com> > >>>>>> --- > >>>>>> drivers/net/virtio_net.c | 6 +++++- > >>>>>> 1 file changed, 5 insertions(+), 1 deletion(-) > >>>>>> > >>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>>> index 0ee192b45e1e..d02f83a919a7 100644 > >>>>>> --- a/drivers/net/virtio_net.c > >>>>>> +++ b/drivers/net/virtio_net.c > >>>>>> @@ -282,6 +282,7 @@ struct virtnet_info { > >>>>>> > >>>>>> /* Has control virtqueue */ > >>>>>> bool has_cvq; > >>>>>> + spinlock_t cvq_lock; > >>>>> Spinlock is instead of mutex which is problematic as there's no > >>>>> guarantee on when the driver will get a reply. And it became even > >>>>> more serious after 0d197a147164 ("virtio-net: add cond_resched() to > >>>>> the command waiting loop"). > >>>>> > >>>>> Any reason we can't use mutex? > >>>> Hi Jason, > >>>> > >>>> I made a patch set to enable ctrlq's irq on top of this patch set, > >>>> which removes cond_resched(). > >>>> > >>>> But I need a little time to test, this is close to fast. So could the > >>>> topic about cond_resched + spin lock or mutex lock be wait? > >>> The big problem is that until the cond_resched() is there, replacing the > >>> mutex with a spinlock can/will lead to scheduling while atomic splats. We > >>> can't intentionally introduce such scenario. > >> When I created the series set_rx_mode wasn't moved to a work queue, > >> and the cond_resched wasn't there. > > Unfortunately cond_resched() is there right now. > > YES. > > > > >> Mutex wasn't possible, then. If the CVQ is made to be event driven, then > >> the lock can be released right after posting the work to the VQ. > > That should work. > > Yes, I will test my new patches (ctrlq with irq enabled) soon, then the > combination > of the this set and mine MAY make deciding between mutex or spin lock > easier. > > Thanks. So I guess the plan is to let your series come first? Thanks > > > > >>> Side note: the compiler apparently does not like guard() construct, leading to > >>> new warning, here and in later patches. I'm unsure if the code simplification > >>> is worthy. > >> I didn't see any warnings with GCC or clang. This is used other places in the kernel as well. > >> gcc version 13.2.1 20230918 (Red Hat 13.2.1-3) (GCC) > >> clang version 17.0.6 (Fedora 17.0.6-2.fc39) > >> > > See: > > > > https://patchwork.kernel.org/project/netdevbpf/patch/20240416193039.272997-4-danielj@nvidia.com/ > > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_32bit/stderr > > https://netdev.bots.linux.dev/static/nipa/845178/13632442/build_allmodconfig_warn/stderr > > > > Cheers, > > > > Paolo >
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0ee192b45e1e..d02f83a919a7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -282,6 +282,7 @@ struct virtnet_info { /* Has control virtqueue */ bool has_cvq; + spinlock_t cvq_lock; /* Host can handle any s/g split between our header and packet data */ bool any_header_sg; @@ -2529,6 +2530,7 @@ static bool virtnet_send_command(struct virtnet_info *vi, u8 class, u8 cmd, /* Caller should know better */ BUG_ON(!virtio_has_feature(vi->vdev, VIRTIO_NET_F_CTRL_VQ)); + guard(spinlock)(&vi->cvq_lock); vi->ctrl->status = ~0; vi->ctrl->hdr.class = class; vi->ctrl->hdr.cmd = cmd; @@ -4818,8 +4820,10 @@ static int virtnet_probe(struct virtio_device *vdev) virtio_has_feature(vdev, VIRTIO_F_VERSION_1)) vi->any_header_sg = true; - if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) + if (virtio_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ)) { vi->has_cvq = true; + spin_lock_init(&vi->cvq_lock); + } if (virtio_has_feature(vdev, VIRTIO_NET_F_MTU)) { mtu = virtio_cread16(vdev,