diff mbox series

[3/3] caif_virtio: fix the race between reset and netdev unregister

Message ID 20220620051115.3142-4-jasowang@redhat.com (mailing list archive)
State Awaiting Upstream
Delegated to: Netdev Maintainers
Headers show
Series Fixing races in probe/remove | expand

Checks

Context Check Description
netdev/tree_selection success Guessed tree name to be net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: rusty@rustcorp.com.au; 3 maintainers not CCed: pabeni@redhat.com rusty@rustcorp.com.au edumazet@google.com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 18 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jason Wang June 20, 2022, 5:11 a.m. UTC
We use to do the following steps during .remove():

static void cfv_remove(struct virtio_device *vdev)
{
	struct cfv_info *cfv = vdev->priv;

	rtnl_lock();
	dev_close(cfv->ndev);
	rtnl_unlock();

	tasklet_kill(&cfv->tx_release_tasklet);
	debugfs_remove_recursive(cfv->debugfs);

	vringh_kiov_cleanup(&cfv->ctx.riov);
	virtio_reset_device(vdev);
	vdev->vringh_config->del_vrhs(cfv->vdev);
	cfv->vr_rx = NULL;
	vdev->config->del_vqs(cfv->vdev);
	unregister_netdev(cfv->ndev);
}

This is racy since device could be re-opened after dev_close() but
before unregister_netdevice():

1) RX vringh is cleaned before resetting the device, rx callbacks that
   is called after the vringh_kiov_cleanup() will result a UAF
2) Network stack can still try to use TX virtqueue even if it has been
   deleted after dev_vqs()

Fixing this by unregistering the network device first to make sure not
device access from both TX and RX side.

Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 drivers/net/caif/caif_virtio.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

Comments

Michael S. Tsirkin June 20, 2022, 9:09 a.m. UTC | #1
On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> We use to do the following steps during .remove():

We currently do


> static void cfv_remove(struct virtio_device *vdev)
> {
> 	struct cfv_info *cfv = vdev->priv;
> 
> 	rtnl_lock();
> 	dev_close(cfv->ndev);
> 	rtnl_unlock();
> 
> 	tasklet_kill(&cfv->tx_release_tasklet);
> 	debugfs_remove_recursive(cfv->debugfs);
> 
> 	vringh_kiov_cleanup(&cfv->ctx.riov);
> 	virtio_reset_device(vdev);
> 	vdev->vringh_config->del_vrhs(cfv->vdev);
> 	cfv->vr_rx = NULL;
> 	vdev->config->del_vqs(cfv->vdev);
> 	unregister_netdev(cfv->ndev);
> }
> This is racy since device could be re-opened after dev_close() but
> before unregister_netdevice():
> 
> 1) RX vringh is cleaned before resetting the device, rx callbacks that
>    is called after the vringh_kiov_cleanup() will result a UAF
> 2) Network stack can still try to use TX virtqueue even if it has been
>    deleted after dev_vqs()
> 
> Fixing this by unregistering the network device first to make sure not
> device access from both TX and RX side.
> 
> Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> Signed-off-by: Jason Wang <jasowang@redhat.com>
> ---
>  drivers/net/caif/caif_virtio.c | 6 ++----
>  1 file changed, 2 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> index 66375bea2fcd..a29f9b2df5b1 100644
> --- a/drivers/net/caif/caif_virtio.c
> +++ b/drivers/net/caif/caif_virtio.c
> @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
>  {
>  	struct cfv_info *cfv = vdev->priv;
>  
> -	rtnl_lock();
> -	dev_close(cfv->ndev);
> -	rtnl_unlock();
> +	/* Make sure NAPI/TX won't try to access the device */
> +	unregister_netdev(cfv->ndev);
>  
>  	tasklet_kill(&cfv->tx_release_tasklet);
>  	debugfs_remove_recursive(cfv->debugfs);
> @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
>  	vdev->vringh_config->del_vrhs(cfv->vdev);
>  	cfv->vr_rx = NULL;
>  	vdev->config->del_vqs(cfv->vdev);
> -	unregister_netdev(cfv->ndev);
>  }


This gives me pause, callbacks can now trigger after device
has been unregistered. Are we sure this is safe?
Won't it be safer to just keep the rtnl_lock around
the whole process?

>  static struct virtio_device_id id_table[] = {
> -- 
> 2.25.1
Jason Wang June 20, 2022, 9:18 a.m. UTC | #2
On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > We use to do the following steps during .remove():
>
> We currently do
>
>
> > static void cfv_remove(struct virtio_device *vdev)
> > {
> >       struct cfv_info *cfv = vdev->priv;
> >
> >       rtnl_lock();
> >       dev_close(cfv->ndev);
> >       rtnl_unlock();
> >
> >       tasklet_kill(&cfv->tx_release_tasklet);
> >       debugfs_remove_recursive(cfv->debugfs);
> >
> >       vringh_kiov_cleanup(&cfv->ctx.riov);
> >       virtio_reset_device(vdev);
> >       vdev->vringh_config->del_vrhs(cfv->vdev);
> >       cfv->vr_rx = NULL;
> >       vdev->config->del_vqs(cfv->vdev);
> >       unregister_netdev(cfv->ndev);
> > }
> > This is racy since device could be re-opened after dev_close() but
> > before unregister_netdevice():
> >
> > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> >    is called after the vringh_kiov_cleanup() will result a UAF
> > 2) Network stack can still try to use TX virtqueue even if it has been
> >    deleted after dev_vqs()
> >
> > Fixing this by unregistering the network device first to make sure not
> > device access from both TX and RX side.
> >
> > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > ---
> >  drivers/net/caif/caif_virtio.c | 6 ++----
> >  1 file changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > index 66375bea2fcd..a29f9b2df5b1 100644
> > --- a/drivers/net/caif/caif_virtio.c
> > +++ b/drivers/net/caif/caif_virtio.c
> > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> >  {
> >       struct cfv_info *cfv = vdev->priv;
> >
> > -     rtnl_lock();
> > -     dev_close(cfv->ndev);
> > -     rtnl_unlock();
> > +     /* Make sure NAPI/TX won't try to access the device */
> > +     unregister_netdev(cfv->ndev);
> >
> >       tasklet_kill(&cfv->tx_release_tasklet);
> >       debugfs_remove_recursive(cfv->debugfs);
> > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> >       vdev->vringh_config->del_vrhs(cfv->vdev);
> >       cfv->vr_rx = NULL;
> >       vdev->config->del_vqs(cfv->vdev);
> > -     unregister_netdev(cfv->ndev);
> >  }
>
>
> This gives me pause, callbacks can now trigger after device
> has been unregistered. Are we sure this is safe?

It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
after tasklet_kill(). I can add a comment to explain this.

> Won't it be safer to just keep the rtnl_lock around
> the whole process?

It looks to me we rtnl_lock can't help in synchronizing with the
callbacks, anything I miss?

Thanks

>
> >  static struct virtio_device_id id_table[] = {
> > --
> > 2.25.1
>
Michael S. Tsirkin June 20, 2022, 10:18 a.m. UTC | #3
On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > We use to do the following steps during .remove():
> >
> > We currently do
> >
> >
> > > static void cfv_remove(struct virtio_device *vdev)
> > > {
> > >       struct cfv_info *cfv = vdev->priv;
> > >
> > >       rtnl_lock();
> > >       dev_close(cfv->ndev);
> > >       rtnl_unlock();
> > >
> > >       tasklet_kill(&cfv->tx_release_tasklet);
> > >       debugfs_remove_recursive(cfv->debugfs);
> > >
> > >       vringh_kiov_cleanup(&cfv->ctx.riov);
> > >       virtio_reset_device(vdev);
> > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > >       cfv->vr_rx = NULL;
> > >       vdev->config->del_vqs(cfv->vdev);
> > >       unregister_netdev(cfv->ndev);
> > > }
> > > This is racy since device could be re-opened after dev_close() but
> > > before unregister_netdevice():
> > >
> > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > >    is called after the vringh_kiov_cleanup() will result a UAF
> > > 2) Network stack can still try to use TX virtqueue even if it has been
> > >    deleted after dev_vqs()
> > >
> > > Fixing this by unregistering the network device first to make sure not
> > > device access from both TX and RX side.
> > >
> > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > ---
> > >  drivers/net/caif/caif_virtio.c | 6 ++----
> > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > --- a/drivers/net/caif/caif_virtio.c
> > > +++ b/drivers/net/caif/caif_virtio.c
> > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > >  {
> > >       struct cfv_info *cfv = vdev->priv;
> > >
> > > -     rtnl_lock();
> > > -     dev_close(cfv->ndev);
> > > -     rtnl_unlock();
> > > +     /* Make sure NAPI/TX won't try to access the device */
> > > +     unregister_netdev(cfv->ndev);
> > >
> > >       tasklet_kill(&cfv->tx_release_tasklet);
> > >       debugfs_remove_recursive(cfv->debugfs);
> > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > >       cfv->vr_rx = NULL;
> > >       vdev->config->del_vqs(cfv->vdev);
> > > -     unregister_netdev(cfv->ndev);
> > >  }
> >
> >
> > This gives me pause, callbacks can now trigger after device
> > has been unregistered. Are we sure this is safe?
> 
> It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> after tasklet_kill(). I can add a comment to explain this.

that waits for outstanding tasklets but does it really prevent
future ones?

> > Won't it be safer to just keep the rtnl_lock around
> > the whole process?
> 
> It looks to me we rtnl_lock can't help in synchronizing with the
> callbacks, anything I miss?
> 
> Thanks

good point.


> >
> > >  static struct virtio_device_id id_table[] = {
> > > --
> > > 2.25.1
> >
Jason Wang June 21, 2022, 3:09 a.m. UTC | #4
On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > We use to do the following steps during .remove():
> > >
> > > We currently do
> > >
> > >
> > > > static void cfv_remove(struct virtio_device *vdev)
> > > > {
> > > >       struct cfv_info *cfv = vdev->priv;
> > > >
> > > >       rtnl_lock();
> > > >       dev_close(cfv->ndev);
> > > >       rtnl_unlock();
> > > >
> > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > >       debugfs_remove_recursive(cfv->debugfs);
> > > >
> > > >       vringh_kiov_cleanup(&cfv->ctx.riov);
> > > >       virtio_reset_device(vdev);
> > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > >       cfv->vr_rx = NULL;
> > > >       vdev->config->del_vqs(cfv->vdev);
> > > >       unregister_netdev(cfv->ndev);
> > > > }
> > > > This is racy since device could be re-opened after dev_close() but
> > > > before unregister_netdevice():
> > > >
> > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > >    is called after the vringh_kiov_cleanup() will result a UAF
> > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > >    deleted after dev_vqs()
> > > >
> > > > Fixing this by unregistering the network device first to make sure not
> > > > device access from both TX and RX side.
> > > >
> > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > ---
> > > >  drivers/net/caif/caif_virtio.c | 6 ++----
> > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > --- a/drivers/net/caif/caif_virtio.c
> > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > >  {
> > > >       struct cfv_info *cfv = vdev->priv;
> > > >
> > > > -     rtnl_lock();
> > > > -     dev_close(cfv->ndev);
> > > > -     rtnl_unlock();
> > > > +     /* Make sure NAPI/TX won't try to access the device */
> > > > +     unregister_netdev(cfv->ndev);
> > > >
> > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > >       cfv->vr_rx = NULL;
> > > >       vdev->config->del_vqs(cfv->vdev);
> > > > -     unregister_netdev(cfv->ndev);
> > > >  }
> > >
> > >
> > > This gives me pause, callbacks can now trigger after device
> > > has been unregistered. Are we sure this is safe?
> >
> > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > after tasklet_kill(). I can add a comment to explain this.
>
> that waits for outstanding tasklets but does it really prevent
> future ones?

I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
the future scheduling of a tasklet.

Thanks

>
> > > Won't it be safer to just keep the rtnl_lock around
> > > the whole process?
> >
> > It looks to me we rtnl_lock can't help in synchronizing with the
> > callbacks, anything I miss?
> >
> > Thanks
>
> good point.
>
>
> > >
> > > >  static struct virtio_device_id id_table[] = {
> > > > --
> > > > 2.25.1
> > >
>
Michael S. Tsirkin June 21, 2022, 6 a.m. UTC | #5
On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote:
> On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> >
> > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > >
> > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > > We use to do the following steps during .remove():
> > > >
> > > > We currently do
> > > >
> > > >
> > > > > static void cfv_remove(struct virtio_device *vdev)
> > > > > {
> > > > >       struct cfv_info *cfv = vdev->priv;
> > > > >
> > > > >       rtnl_lock();
> > > > >       dev_close(cfv->ndev);
> > > > >       rtnl_unlock();
> > > > >
> > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > >
> > > > >       vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > >       virtio_reset_device(vdev);
> > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > >       cfv->vr_rx = NULL;
> > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > >       unregister_netdev(cfv->ndev);
> > > > > }
> > > > > This is racy since device could be re-opened after dev_close() but
> > > > > before unregister_netdevice():
> > > > >
> > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > >    is called after the vringh_kiov_cleanup() will result a UAF
> > > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > >    deleted after dev_vqs()
> > > > >
> > > > > Fixing this by unregistering the network device first to make sure not
> > > > > device access from both TX and RX side.
> > > > >
> > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > ---
> > > > >  drivers/net/caif/caif_virtio.c | 6 ++----
> > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > > --- a/drivers/net/caif/caif_virtio.c
> > > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > >  {
> > > > >       struct cfv_info *cfv = vdev->priv;
> > > > >
> > > > > -     rtnl_lock();
> > > > > -     dev_close(cfv->ndev);
> > > > > -     rtnl_unlock();
> > > > > +     /* Make sure NAPI/TX won't try to access the device */
> > > > > +     unregister_netdev(cfv->ndev);
> > > > >
> > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > >       cfv->vr_rx = NULL;
> > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > > -     unregister_netdev(cfv->ndev);
> > > > >  }
> > > >
> > > >
> > > > This gives me pause, callbacks can now trigger after device
> > > > has been unregistered. Are we sure this is safe?
> > >
> > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > > after tasklet_kill(). I can add a comment to explain this.
> >
> > that waits for outstanding tasklets but does it really prevent
> > future ones?
> 
> I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
> the future scheduling of a tasklet.
> 
> Thanks

But then in the end it clears it, does it not?

> >
> > > > Won't it be safer to just keep the rtnl_lock around
> > > > the whole process?
> > >
> > > It looks to me we rtnl_lock can't help in synchronizing with the
> > > callbacks, anything I miss?
> > >
> > > Thanks
> >
> > good point.
> >
> >
> > > >
> > > > >  static struct virtio_device_id id_table[] = {
> > > > > --
> > > > > 2.25.1
> > > >
> >
Jason Wang June 21, 2022, 6:25 a.m. UTC | #6
On Tue, Jun 21, 2022 at 2:00 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Tue, Jun 21, 2022 at 11:09:45AM +0800, Jason Wang wrote:
> > On Mon, Jun 20, 2022 at 6:18 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > >
> > > On Mon, Jun 20, 2022 at 05:18:29PM +0800, Jason Wang wrote:
> > > > On Mon, Jun 20, 2022 at 5:09 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > >
> > > > > On Mon, Jun 20, 2022 at 01:11:15PM +0800, Jason Wang wrote:
> > > > > > We use to do the following steps during .remove():
> > > > >
> > > > > We currently do
> > > > >
> > > > >
> > > > > > static void cfv_remove(struct virtio_device *vdev)
> > > > > > {
> > > > > >       struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > >       rtnl_lock();
> > > > > >       dev_close(cfv->ndev);
> > > > > >       rtnl_unlock();
> > > > > >
> > > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > > >
> > > > > >       vringh_kiov_cleanup(&cfv->ctx.riov);
> > > > > >       virtio_reset_device(vdev);
> > > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > >       cfv->vr_rx = NULL;
> > > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > > >       unregister_netdev(cfv->ndev);
> > > > > > }
> > > > > > This is racy since device could be re-opened after dev_close() but
> > > > > > before unregister_netdevice():
> > > > > >
> > > > > > 1) RX vringh is cleaned before resetting the device, rx callbacks that
> > > > > >    is called after the vringh_kiov_cleanup() will result a UAF
> > > > > > 2) Network stack can still try to use TX virtqueue even if it has been
> > > > > >    deleted after dev_vqs()
> > > > > >
> > > > > > Fixing this by unregistering the network device first to make sure not
> > > > > > device access from both TX and RX side.
> > > > > >
> > > > > > Fixes: 0d2e1a2926b18 ("caif_virtio: Introduce caif over virtio")
> > > > > > Signed-off-by: Jason Wang <jasowang@redhat.com>
> > > > > > ---
> > > > > >  drivers/net/caif/caif_virtio.c | 6 ++----
> > > > > >  1 file changed, 2 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
> > > > > > index 66375bea2fcd..a29f9b2df5b1 100644
> > > > > > --- a/drivers/net/caif/caif_virtio.c
> > > > > > +++ b/drivers/net/caif/caif_virtio.c
> > > > > > @@ -752,9 +752,8 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > >  {
> > > > > >       struct cfv_info *cfv = vdev->priv;
> > > > > >
> > > > > > -     rtnl_lock();
> > > > > > -     dev_close(cfv->ndev);
> > > > > > -     rtnl_unlock();
> > > > > > +     /* Make sure NAPI/TX won't try to access the device */
> > > > > > +     unregister_netdev(cfv->ndev);
> > > > > >
> > > > > >       tasklet_kill(&cfv->tx_release_tasklet);
> > > > > >       debugfs_remove_recursive(cfv->debugfs);
> > > > > > @@ -764,7 +763,6 @@ static void cfv_remove(struct virtio_device *vdev)
> > > > > >       vdev->vringh_config->del_vrhs(cfv->vdev);
> > > > > >       cfv->vr_rx = NULL;
> > > > > >       vdev->config->del_vqs(cfv->vdev);
> > > > > > -     unregister_netdev(cfv->ndev);
> > > > > >  }
> > > > >
> > > > >
> > > > > This gives me pause, callbacks can now trigger after device
> > > > > has been unregistered. Are we sure this is safe?
> > > >
> > > > It looks safe, for RX NAPI is disabled. For TX, tasklet is disabled
> > > > after tasklet_kill(). I can add a comment to explain this.
> > >
> > > that waits for outstanding tasklets but does it really prevent
> > > future ones?
> >
> > I think so, it tries to test and set TASKLET_STATE_SCHED which blocks
> > the future scheduling of a tasklet.
> >
> > Thanks
>
> But then in the end it clears it, does it not?

Right, so we need to reset before taskset_kill().

Thanks

>
> > >
> > > > > Won't it be safer to just keep the rtnl_lock around
> > > > > the whole process?
> > > >
> > > > It looks to me we rtnl_lock can't help in synchronizing with the
> > > > callbacks, anything I miss?
> > > >
> > > > Thanks
> > >
> > > good point.
> > >
> > >
> > > > >
> > > > > >  static struct virtio_device_id id_table[] = {
> > > > > > --
> > > > > > 2.25.1
> > > > >
> > >
>
diff mbox series

Patch

diff --git a/drivers/net/caif/caif_virtio.c b/drivers/net/caif/caif_virtio.c
index 66375bea2fcd..a29f9b2df5b1 100644
--- a/drivers/net/caif/caif_virtio.c
+++ b/drivers/net/caif/caif_virtio.c
@@ -752,9 +752,8 @@  static void cfv_remove(struct virtio_device *vdev)
 {
 	struct cfv_info *cfv = vdev->priv;
 
-	rtnl_lock();
-	dev_close(cfv->ndev);
-	rtnl_unlock();
+	/* Make sure NAPI/TX won't try to access the device */
+	unregister_netdev(cfv->ndev);
 
 	tasklet_kill(&cfv->tx_release_tasklet);
 	debugfs_remove_recursive(cfv->debugfs);
@@ -764,7 +763,6 @@  static void cfv_remove(struct virtio_device *vdev)
 	vdev->vringh_config->del_vrhs(cfv->vdev);
 	cfv->vr_rx = NULL;
 	vdev->config->del_vqs(cfv->vdev);
-	unregister_netdev(cfv->ndev);
 }
 
 static struct virtio_device_id id_table[] = {