diff mbox

vhost: fix crash on virtio_error while device stop

Message ID 1512565578-12078-1-git-send-email-i.maximets@samsung.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ilya Maximets Dec. 6, 2017, 1:06 p.m. UTC
In case virtio error occured after vhost_dev_close(), qemu will crash
in nested cleanup while checking IOMMU flag because dev->vdev already
set to zero and resources are already freed.

Example:

Program received signal SIGSEGV, Segmentation fault.
vhost_virtqueue_stop at hw/virtio/vhost.c:1155

    #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
    #1  vhost_dev_stop at hw/virtio/vhost.c:1594
    #2  vhost_net_stop_one at hw/net/vhost_net.c:289
    #3  vhost_net_stop at hw/net/vhost_net.c:368

            Nested call to vhost_net_stop(). First time was at #14.

    #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
    #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
    #6  virtio_set_status at hw/virtio/virtio.c:1146
    #7  virtio_error at hw/virtio/virtio.c:2455

        virtqueue_get_head() failed here.

    #8  virtqueue_get_head at hw/virtio/virtio.c:543
    #9  virtqueue_drop_all at hw/virtio/virtio.c:984
    #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
    #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
    #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431

        vhost_dev_stop() was executed here. dev->vdev == NULL now.

    #13 vhost_net_stop_one at hw/net/vhost_net.c:290
    #14 vhost_net_stop at hw/net/vhost_net.c:368
    #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
    #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
    #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
    #18 chr_closed_bh at net/vhost-user.c:214
    #19 aio_bh_call at util/async.c:90
    #20 aio_bh_poll at util/async.c:118
    #21 aio_dispatch at util/aio-posix.c:429
    #22 aio_ctx_dispatch at util/async.c:261
    #23 g_main_context_dispatch
    #24 glib_pollfds_poll at util/main-loop.c:213
    #25 os_host_main_loop_wait at util/main-loop.c:261
    #26 main_loop_wait at util/main-loop.c:515
    #27 main_loop () at vl.c:1917
    #28 main at vl.c:4795

Above backtrace captured from qemu crash on vhost disconnect while
virtio driver in guest was in failed state.

We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
it will assert further trying to free already freed ioeventfds. The
real problem is that we're allowing nested calls to 'vhost_net_stop'.

This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
any possible double frees and segmentation faults doue to using of
already freed resources by setting 'vhost_started' flag to zero prior
to 'vhost_net_stop' call.

Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
---

This issue was already addressed more than a year ago by the following
patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
but it was dropped without review due to not yet implemented re-connection
of vhost-user. Re-connection implementation lately fixed most of the
nested calls, but few of them still exists. For example, above backtrace
captured after 'virtqueue_get_head()' failure on vhost-user disconnection.

 hw/net/virtio-net.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Michael S. Tsirkin Dec. 6, 2017, 4:45 p.m. UTC | #1
On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
> In case virtio error occured after vhost_dev_close(), qemu will crash
> in nested cleanup while checking IOMMU flag because dev->vdev already
> set to zero and resources are already freed.
> 
> Example:
> 
> Program received signal SIGSEGV, Segmentation fault.
> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> 
>     #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>     #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>     #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>     #3  vhost_net_stop at hw/net/vhost_net.c:368
> 
>             Nested call to vhost_net_stop(). First time was at #14.
> 
>     #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>     #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>     #6  virtio_set_status at hw/virtio/virtio.c:1146
>     #7  virtio_error at hw/virtio/virtio.c:2455
> 
>         virtqueue_get_head() failed here.
> 
>     #8  virtqueue_get_head at hw/virtio/virtio.c:543
>     #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>     #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>     #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>     #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> 
>         vhost_dev_stop() was executed here. dev->vdev == NULL now.
> 
>     #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>     #14 vhost_net_stop at hw/net/vhost_net.c:368
>     #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>     #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>     #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>     #18 chr_closed_bh at net/vhost-user.c:214
>     #19 aio_bh_call at util/async.c:90
>     #20 aio_bh_poll at util/async.c:118
>     #21 aio_dispatch at util/aio-posix.c:429
>     #22 aio_ctx_dispatch at util/async.c:261
>     #23 g_main_context_dispatch
>     #24 glib_pollfds_poll at util/main-loop.c:213
>     #25 os_host_main_loop_wait at util/main-loop.c:261
>     #26 main_loop_wait at util/main-loop.c:515
>     #27 main_loop () at vl.c:1917
>     #28 main at vl.c:4795
> 
> Above backtrace captured from qemu crash on vhost disconnect while
> virtio driver in guest was in failed state.
> 
> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
> it will assert further trying to free already freed ioeventfds. The
> real problem is that we're allowing nested calls to 'vhost_net_stop'.
> 
> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
> any possible double frees and segmentation faults doue to using of
> already freed resources by setting 'vhost_started' flag to zero prior
> to 'vhost_net_stop' call.
> 
> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> ---
> 
> This issue was already addressed more than a year ago by the following
> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
> but it was dropped without review due to not yet implemented re-connection
> of vhost-user. Re-connection implementation lately fixed most of the
> nested calls, but few of them still exists. For example, above backtrace
> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
> 
>  hw/net/virtio-net.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 38674b0..4d95a18 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>              n->vhost_started = 0;
>          }
>      } else {
> -        vhost_net_stop(vdev, n->nic->ncs, queues);
>          n->vhost_started = 0;
> +        vhost_net_stop(vdev, n->nic->ncs, queues);
>      }
>  }

Well the wider context is


        n->vhost_started = 1;
        r = vhost_net_start(vdev, n->nic->ncs, queues);
        if (r < 0) {
            error_report("unable to start vhost net: %d: "
                         "falling back on userspace virtio", -r);
            n->vhost_started = 0;
        }
    } else {
        vhost_net_stop(vdev, n->nic->ncs, queues);
        n->vhost_started = 0;

So we set it to 1 before start, we should clear after stop.


> -- 
> 2.7.4
Ilya Maximets Dec. 7, 2017, 6:39 a.m. UTC | #2
On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
>> In case virtio error occured after vhost_dev_close(), qemu will crash
>> in nested cleanup while checking IOMMU flag because dev->vdev already
>> set to zero and resources are already freed.
>>
>> Example:
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>>
>>     #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
>>     #1  vhost_dev_stop at hw/virtio/vhost.c:1594
>>     #2  vhost_net_stop_one at hw/net/vhost_net.c:289
>>     #3  vhost_net_stop at hw/net/vhost_net.c:368
>>
>>             Nested call to vhost_net_stop(). First time was at #14.
>>
>>     #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
>>     #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
>>     #6  virtio_set_status at hw/virtio/virtio.c:1146
>>     #7  virtio_error at hw/virtio/virtio.c:2455
>>
>>         virtqueue_get_head() failed here.
>>
>>     #8  virtqueue_get_head at hw/virtio/virtio.c:543
>>     #9  virtqueue_drop_all at hw/virtio/virtio.c:984
>>     #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
>>     #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
>>     #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
>>
>>         vhost_dev_stop() was executed here. dev->vdev == NULL now.
>>
>>     #13 vhost_net_stop_one at hw/net/vhost_net.c:290
>>     #14 vhost_net_stop at hw/net/vhost_net.c:368
>>     #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
>>     #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
>>     #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
>>     #18 chr_closed_bh at net/vhost-user.c:214
>>     #19 aio_bh_call at util/async.c:90
>>     #20 aio_bh_poll at util/async.c:118
>>     #21 aio_dispatch at util/aio-posix.c:429
>>     #22 aio_ctx_dispatch at util/async.c:261
>>     #23 g_main_context_dispatch
>>     #24 glib_pollfds_poll at util/main-loop.c:213
>>     #25 os_host_main_loop_wait at util/main-loop.c:261
>>     #26 main_loop_wait at util/main-loop.c:515
>>     #27 main_loop () at vl.c:1917
>>     #28 main at vl.c:4795
>>
>> Above backtrace captured from qemu crash on vhost disconnect while
>> virtio driver in guest was in failed state.
>>
>> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
>> it will assert further trying to free already freed ioeventfds. The
>> real problem is that we're allowing nested calls to 'vhost_net_stop'.
>>
>> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
>> any possible double frees and segmentation faults doue to using of
>> already freed resources by setting 'vhost_started' flag to zero prior
>> to 'vhost_net_stop' call.
>>
>> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
>> ---
>>
>> This issue was already addressed more than a year ago by the following
>> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
>> but it was dropped without review due to not yet implemented re-connection
>> of vhost-user. Re-connection implementation lately fixed most of the
>> nested calls, but few of them still exists. For example, above backtrace
>> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
>>
>>  hw/net/virtio-net.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>> index 38674b0..4d95a18 100644
>> --- a/hw/net/virtio-net.c
>> +++ b/hw/net/virtio-net.c
>> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
>>              n->vhost_started = 0;
>>          }
>>      } else {
>> -        vhost_net_stop(vdev, n->nic->ncs, queues);
>>          n->vhost_started = 0;
>> +        vhost_net_stop(vdev, n->nic->ncs, queues);
>>      }
>>  }
> 
> Well the wider context is
> 
> 
>         n->vhost_started = 1;
>         r = vhost_net_start(vdev, n->nic->ncs, queues);
>         if (r < 0) {
>             error_report("unable to start vhost net: %d: "
>                          "falling back on userspace virtio", -r);
>             n->vhost_started = 0;
>         }
>     } else {
>         vhost_net_stop(vdev, n->nic->ncs, queues);
>         n->vhost_started = 0;
> 
> So we set it to 1 before start, we should clear after stop.

OK. I agree that clearing after is a bit safer. But in this case we need
a separate flag or other way to detect that we're already inside the
'vhost_net_stop()'.

What do you think about that old patch:
https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?

It implements the same thing but introduces additional flag. It even could
be still applicable.
Michael S. Tsirkin Dec. 7, 2017, 5:06 p.m. UTC | #3
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote:
> On 06.12.2017 19:45, Michael S. Tsirkin wrote:
> > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote:
> >> In case virtio error occured after vhost_dev_close(), qemu will crash
> >> in nested cleanup while checking IOMMU flag because dev->vdev already
> >> set to zero and resources are already freed.
> >>
> >> Example:
> >>
> >> Program received signal SIGSEGV, Segmentation fault.
> >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >>
> >>     #0  vhost_virtqueue_stop at hw/virtio/vhost.c:1155
> >>     #1  vhost_dev_stop at hw/virtio/vhost.c:1594
> >>     #2  vhost_net_stop_one at hw/net/vhost_net.c:289
> >>     #3  vhost_net_stop at hw/net/vhost_net.c:368
> >>
> >>             Nested call to vhost_net_stop(). First time was at #14.
> >>
> >>     #4  virtio_net_vhost_status at hw/net/virtio-net.c:180
> >>     #5  virtio_net_set_status (status=79) at hw/net/virtio-net.c:254
> >>     #6  virtio_set_status at hw/virtio/virtio.c:1146
> >>     #7  virtio_error at hw/virtio/virtio.c:2455
> >>
> >>         virtqueue_get_head() failed here.
> >>
> >>     #8  virtqueue_get_head at hw/virtio/virtio.c:543
> >>     #9  virtqueue_drop_all at hw/virtio/virtio.c:984
> >>     #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240
> >>     #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297
> >>     #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431
> >>
> >>         vhost_dev_stop() was executed here. dev->vdev == NULL now.
> >>
> >>     #13 vhost_net_stop_one at hw/net/vhost_net.c:290
> >>     #14 vhost_net_stop at hw/net/vhost_net.c:368
> >>     #15 virtio_net_vhost_status at hw/net/virtio-net.c:180
> >>     #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254
> >>     #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430
> >>     #18 chr_closed_bh at net/vhost-user.c:214
> >>     #19 aio_bh_call at util/async.c:90
> >>     #20 aio_bh_poll at util/async.c:118
> >>     #21 aio_dispatch at util/aio-posix.c:429
> >>     #22 aio_ctx_dispatch at util/async.c:261
> >>     #23 g_main_context_dispatch
> >>     #24 glib_pollfds_poll at util/main-loop.c:213
> >>     #25 os_host_main_loop_wait at util/main-loop.c:261
> >>     #26 main_loop_wait at util/main-loop.c:515
> >>     #27 main_loop () at vl.c:1917
> >>     #28 main at vl.c:4795
> >>
> >> Above backtrace captured from qemu crash on vhost disconnect while
> >> virtio driver in guest was in failed state.
> >>
> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but
> >> it will assert further trying to free already freed ioeventfds. The
> >> real problem is that we're allowing nested calls to 'vhost_net_stop'.
> >>
> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid
> >> any possible double frees and segmentation faults doue to using of
> >> already freed resources by setting 'vhost_started' flag to zero prior
> >> to 'vhost_net_stop' call.
> >>
> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com>
> >> ---
> >>
> >> This issue was already addressed more than a year ago by the following
> >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html
> >> but it was dropped without review due to not yet implemented re-connection
> >> of vhost-user. Re-connection implementation lately fixed most of the
> >> nested calls, but few of them still exists. For example, above backtrace
> >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection.
> >>
> >>  hw/net/virtio-net.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 38674b0..4d95a18 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
> >>              n->vhost_started = 0;
> >>          }
> >>      } else {
> >> -        vhost_net_stop(vdev, n->nic->ncs, queues);
> >>          n->vhost_started = 0;
> >> +        vhost_net_stop(vdev, n->nic->ncs, queues);
> >>      }
> >>  }
> > 
> > Well the wider context is
> > 
> > 
> >         n->vhost_started = 1;
> >         r = vhost_net_start(vdev, n->nic->ncs, queues);
> >         if (r < 0) {
> >             error_report("unable to start vhost net: %d: "
> >                          "falling back on userspace virtio", -r);
> >             n->vhost_started = 0;
> >         }
> >     } else {
> >         vhost_net_stop(vdev, n->nic->ncs, queues);
> >         n->vhost_started = 0;
> > 
> > So we set it to 1 before start, we should clear after stop.
> 
> OK. I agree that clearing after is a bit safer. But in this case we need
> a separate flag or other way to detect that we're already inside the
> 'vhost_net_stop()'.
> 
> What do you think about that old patch:
> https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html  ?
> 
> It implements the same thing but introduces additional flag. It even could
> be still applicable.

I see. I think the issue is with how host set status works.
It only sets the status afterwards.

So here's the plan:
- set status to set the new status first, and
  pass the old status as parameter
- fix all callbacks
diff mbox

Patch

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 38674b0..4d95a18 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -177,8 +177,8 @@  static void virtio_net_vhost_status(VirtIONet *n, uint8_t status)
             n->vhost_started = 0;
         }
     } else {
-        vhost_net_stop(vdev, n->nic->ncs, queues);
         n->vhost_started = 0;
+        vhost_net_stop(vdev, n->nic->ncs, queues);
     }
 }