mbox series

[v2,0/3] vsock/virtio: several fixes in the .probe() and .remove()

Message ID 20190628123659.139576-1-sgarzare@redhat.com (mailing list archive)
Headers show
Series vsock/virtio: several fixes in the .probe() and .remove() | expand

Message

Stefano Garzarella June 28, 2019, 12:36 p.m. UTC
During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.

This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
  'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
  be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
  use-after-free of 'vsock' object.

v2:
- Patch 1: use RCU to protect 'the_virtio_vsock' pointer
- Patch 2: no changes
- Patch 3: flush works only at the end of .remove()
- Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
  allocated.

v1: https://patchwork.kernel.org/cover/10964733/

Stefano Garzarella (3):
  vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
  vsock/virtio: stop workers during the .remove()
  vsock/virtio: fix flush of works during the .remove()

 net/vmw_vsock/virtio_transport.c | 131 ++++++++++++++++++++++++-------
 1 file changed, 102 insertions(+), 29 deletions(-)

Comments

Stefan Hajnoczi July 1, 2019, 3:11 p.m. UTC | #1
On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> before registering the driver", Stefan pointed out some possible issues
> in the .probe() and .remove() callbacks of the virtio-vsock driver.
> 
> This series tries to solve these issues:
> - Patch 1 adds RCU critical sections to avoid use-after-free of
>   'the_virtio_vsock' pointer.
> - Patch 2 stops workers before to call vdev->config->reset(vdev) to
>   be sure that no one is accessing the device.
> - Patch 3 moves the works flush at the end of the .remove() to avoid
>   use-after-free of 'vsock' object.
> 
> v2:
> - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> - Patch 2: no changes
> - Patch 3: flush works only at the end of .remove()
> - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
>   allocated.
> 
> v1: https://patchwork.kernel.org/cover/10964733/

This looks good to me.

Did you run any stress tests?  For example an SMP guest constantly
connecting and sending packets together with a script that
hotplug/unplugs vhost-vsock-pci from the host side.

Stefan
Stefano Garzarella July 1, 2019, 5:03 p.m. UTC | #2
On Mon, Jul 01, 2019 at 04:11:13PM +0100, Stefan Hajnoczi wrote:
> On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> > During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> > before registering the driver", Stefan pointed out some possible issues
> > in the .probe() and .remove() callbacks of the virtio-vsock driver.
> > 
> > This series tries to solve these issues:
> > - Patch 1 adds RCU critical sections to avoid use-after-free of
> >   'the_virtio_vsock' pointer.
> > - Patch 2 stops workers before to call vdev->config->reset(vdev) to
> >   be sure that no one is accessing the device.
> > - Patch 3 moves the works flush at the end of the .remove() to avoid
> >   use-after-free of 'vsock' object.
> > 
> > v2:
> > - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> > - Patch 2: no changes
> > - Patch 3: flush works only at the end of .remove()
> > - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
> >   allocated.
> > 
> > v1: https://patchwork.kernel.org/cover/10964733/
> 
> This looks good to me.

Thanks for the review!

> 
> Did you run any stress tests?  For example an SMP guest constantly
> connecting and sending packets together with a script that
> hotplug/unplugs vhost-vsock-pci from the host side.

Yes, I started an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
and I run these scripts to stress the .probe()/.remove() path:

- guest
  while true; do
      cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
      cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
      wait
  done

- host
  while true; do
      cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
      cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
      sleep 2
      echo "device_del v1" | nc 127.0.0.1 1234
      sleep 1
      echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
      sleep 1
  done

Do you think is enough or is better to have a test more accurate?

Thanks,
Stefano
Stefan Hajnoczi July 3, 2019, 9:14 a.m. UTC | #3
On Mon, Jul 01, 2019 at 07:03:57PM +0200, Stefano Garzarella wrote:
> On Mon, Jul 01, 2019 at 04:11:13PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> > > During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> > > before registering the driver", Stefan pointed out some possible issues
> > > in the .probe() and .remove() callbacks of the virtio-vsock driver.
> > > 
> > > This series tries to solve these issues:
> > > - Patch 1 adds RCU critical sections to avoid use-after-free of
> > >   'the_virtio_vsock' pointer.
> > > - Patch 2 stops workers before to call vdev->config->reset(vdev) to
> > >   be sure that no one is accessing the device.
> > > - Patch 3 moves the works flush at the end of the .remove() to avoid
> > >   use-after-free of 'vsock' object.
> > > 
> > > v2:
> > > - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> > > - Patch 2: no changes
> > > - Patch 3: flush works only at the end of .remove()
> > > - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
> > >   allocated.
> > > 
> > > v1: https://patchwork.kernel.org/cover/10964733/
> > 
> > This looks good to me.
> 
> Thanks for the review!
> 
> > 
> > Did you run any stress tests?  For example an SMP guest constantly
> > connecting and sending packets together with a script that
> > hotplug/unplugs vhost-vsock-pci from the host side.
> 
> Yes, I started an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
> and I run these scripts to stress the .probe()/.remove() path:
> 
> - guest
>   while true; do
>       cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
>       cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
>       cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
>       cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
>       wait
>   done
> 
> - host
>   while true; do
>       cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
>       cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
>       cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
>       cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
>       sleep 2
>       echo "device_del v1" | nc 127.0.0.1 1234
>       sleep 1
>       echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
>       sleep 1
>   done
> 
> Do you think is enough or is better to have a test more accurate?

That's good when left running overnight so that thousands of hotplug
events are tested.

Stefan
Stefano Garzarella July 3, 2019, 10:07 a.m. UTC | #4
On Wed, Jul 03, 2019 at 10:14:53AM +0100, Stefan Hajnoczi wrote:
> On Mon, Jul 01, 2019 at 07:03:57PM +0200, Stefano Garzarella wrote:
> > On Mon, Jul 01, 2019 at 04:11:13PM +0100, Stefan Hajnoczi wrote:
> > > On Fri, Jun 28, 2019 at 02:36:56PM +0200, Stefano Garzarella wrote:
> > > > During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> > > > before registering the driver", Stefan pointed out some possible issues
> > > > in the .probe() and .remove() callbacks of the virtio-vsock driver.
> > > > 
> > > > This series tries to solve these issues:
> > > > - Patch 1 adds RCU critical sections to avoid use-after-free of
> > > >   'the_virtio_vsock' pointer.
> > > > - Patch 2 stops workers before to call vdev->config->reset(vdev) to
> > > >   be sure that no one is accessing the device.
> > > > - Patch 3 moves the works flush at the end of the .remove() to avoid
> > > >   use-after-free of 'vsock' object.
> > > > 
> > > > v2:
> > > > - Patch 1: use RCU to protect 'the_virtio_vsock' pointer
> > > > - Patch 2: no changes
> > > > - Patch 3: flush works only at the end of .remove()
> > > > - Removed patch 4 because virtqueue_detach_unused_buf() returns all the buffers
> > > >   allocated.
> > > > 
> > > > v1: https://patchwork.kernel.org/cover/10964733/
> > > 
> > > This looks good to me.
> > 
> > Thanks for the review!
> > 
> > > 
> > > Did you run any stress tests?  For example an SMP guest constantly
> > > connecting and sending packets together with a script that
> > > hotplug/unplugs vhost-vsock-pci from the host side.
> > 
> > Yes, I started an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
> > and I run these scripts to stress the .probe()/.remove() path:
> > 
> > - guest
> >   while true; do
> >       cat /dev/urandom | nc-vsock -l 4321 > /dev/null &
> >       cat /dev/urandom | nc-vsock -l 5321 > /dev/null &
> >       cat /dev/urandom | nc-vsock -l 6321 > /dev/null &
> >       cat /dev/urandom | nc-vsock -l 7321 > /dev/null &
> >       wait
> >   done
> > 
> > - host
> >   while true; do
> >       cat /dev/urandom | nc-vsock 3 4321 > /dev/null &
> >       cat /dev/urandom | nc-vsock 3 5321 > /dev/null &
> >       cat /dev/urandom | nc-vsock 3 6321 > /dev/null &
> >       cat /dev/urandom | nc-vsock 3 7321 > /dev/null &
> >       sleep 2
> >       echo "device_del v1" | nc 127.0.0.1 1234
> >       sleep 1
> >       echo "device_add vhost-vsock-pci,id=v1,guest-cid=3" | nc 127.0.0.1 1234
> >       sleep 1
> >   done
> > 
> > Do you think is enough or is better to have a test more accurate?
> 
> That's good when left running overnight so that thousands of hotplug
> events are tested.

Honestly I run the test for ~30 mins (because without the patch the
crash happens in a few seconds), but of course, I'll run it this night :)

Thanks,
Stefano