Message ID | 1512565578-12078-1-git-send-email-i.maximets@samsung.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > In case virtio error occured after vhost_dev_close(), qemu will crash > in nested cleanup while checking IOMMU flag because dev->vdev already > set to zero and resources are already freed. > > Example: > > Program received signal SIGSEGV, Segmentation fault. > vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > > #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > #3 vhost_net_stop at hw/net/vhost_net.c:368 > > Nested call to vhost_net_stop(). First time was at #14. > > #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > #6 virtio_set_status at hw/virtio/virtio.c:1146 > #7 virtio_error at hw/virtio/virtio.c:2455 > > virtqueue_get_head() failed here. > > #8 virtqueue_get_head at hw/virtio/virtio.c:543 > #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > > vhost_dev_stop() was executed here. dev->vdev == NULL now. > > #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > #14 vhost_net_stop at hw/net/vhost_net.c:368 > #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > #18 chr_closed_bh at net/vhost-user.c:214 > #19 aio_bh_call at util/async.c:90 > #20 aio_bh_poll at util/async.c:118 > #21 aio_dispatch at util/aio-posix.c:429 > #22 aio_ctx_dispatch at util/async.c:261 > #23 g_main_context_dispatch > #24 glib_pollfds_poll at util/main-loop.c:213 > #25 os_host_main_loop_wait at util/main-loop.c:261 > #26 main_loop_wait at util/main-loop.c:515 > #27 main_loop () at vl.c:1917 > #28 main at vl.c:4795 > > Above backtrace captured from qemu crash on vhost disconnect while > virtio driver in guest was in failed state. > > We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > it will assert further trying to free already freed ioeventfds. The > real problem is that we're allowing nested calls to 'vhost_net_stop'. > > This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > any possible double frees and segmentation faults doue to using of > already freed resources by setting 'vhost_started' flag to zero prior > to 'vhost_net_stop' call. > > Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > --- > > This issue was already addressed more than a year ago by the following > patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > but it was dropped without review due to not yet implemented re-connection > of vhost-user. Re-connection implementation lately fixed most of the > nested calls, but few of them still exists. For example, above backtrace > captured after 'virtqueue_get_head()' failure on vhost-user disconnection. > > hw/net/virtio-net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > index 38674b0..4d95a18 100644 > --- a/hw/net/virtio-net.c > +++ b/hw/net/virtio-net.c > @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) > n->vhost_started = 0; > } > } else { > - vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > + vhost_net_stop(vdev, n->nic->ncs, queues); > } > } Well the wider context is n->vhost_started = 1; r = vhost_net_start(vdev, n->nic->ncs, queues); if (r < 0) { error_report("unable to start vhost net: %d: " "falling back on userspace virtio", -r); n->vhost_started = 0; } } else { vhost_net_stop(vdev, n->nic->ncs, queues); n->vhost_started = 0; So we set it to 1 before start, we should clear after stop. > -- > 2.7.4
On 06.12.2017 19:45, Michael S. Tsirkin wrote: > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: >> In case virtio error occured after vhost_dev_close(), qemu will crash >> in nested cleanup while checking IOMMU flag because dev->vdev already >> set to zero and resources are already freed. >> >> Example: >> >> Program received signal SIGSEGV, Segmentation fault. >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 >> #3 vhost_net_stop at hw/net/vhost_net.c:368 >> >> Nested call to vhost_net_stop(). First time was at #14. >> >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 >> #6 virtio_set_status at hw/virtio/virtio.c:1146 >> #7 virtio_error at hw/virtio/virtio.c:2455 >> >> virtqueue_get_head() failed here. >> >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 >> >> vhost_dev_stop() was executed here. dev->vdev == NULL now. >> >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 >> #14 vhost_net_stop at hw/net/vhost_net.c:368 >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 >> #18 chr_closed_bh at net/vhost-user.c:214 >> #19 aio_bh_call at util/async.c:90 >> #20 aio_bh_poll at util/async.c:118 >> #21 aio_dispatch at util/aio-posix.c:429 >> #22 aio_ctx_dispatch at util/async.c:261 >> #23 g_main_context_dispatch >> #24 glib_pollfds_poll at util/main-loop.c:213 >> #25 os_host_main_loop_wait at util/main-loop.c:261 >> #26 main_loop_wait at util/main-loop.c:515 >> #27 main_loop () at vl.c:1917 >> #28 main at vl.c:4795 >> >> Above backtrace captured from qemu crash on vhost disconnect while >> virtio driver in guest was in failed state. >> >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but >> it will assert further trying to free already freed ioeventfds. The >> real problem is that we're allowing nested calls to 'vhost_net_stop'. >> >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid >> any possible double frees and segmentation faults doue to using of >> already freed resources by setting 'vhost_started' flag to zero prior >> to 'vhost_net_stop' call. >> >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> >> --- >> >> This issue was already addressed more than a year ago by the following >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html >> but it was dropped without review due to not yet implemented re-connection >> of vhost-user. Re-connection implementation lately fixed most of the >> nested calls, but few of them still exists. For example, above backtrace >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection. >> >> hw/net/virtio-net.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c >> index 38674b0..4d95a18 100644 >> --- a/hw/net/virtio-net.c >> +++ b/hw/net/virtio-net.c >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) >> n->vhost_started = 0; >> } >> } else { >> - vhost_net_stop(vdev, n->nic->ncs, queues); >> n->vhost_started = 0; >> + vhost_net_stop(vdev, n->nic->ncs, queues); >> } >> } > > Well the wider context is > > > n->vhost_started = 1; > r = vhost_net_start(vdev, n->nic->ncs, queues); > if (r < 0) { > error_report("unable to start vhost net: %d: " > "falling back on userspace virtio", -r); > n->vhost_started = 0; > } > } else { > vhost_net_stop(vdev, n->nic->ncs, queues); > n->vhost_started = 0; > > So we set it to 1 before start, we should clear after stop. OK. I agree that clearing after is a bit safer. But in this case we need a separate flag or other way to detect that we're already inside the 'vhost_net_stop()'. What do you think about that old patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? It implements the same thing but introduces additional flag. It even could be still applicable.
On Thu, Dec 07, 2017 at 09:39:36AM +0300, Ilya Maximets wrote: > On 06.12.2017 19:45, Michael S. Tsirkin wrote: > > On Wed, Dec 06, 2017 at 04:06:18PM +0300, Ilya Maximets wrote: > >> In case virtio error occured after vhost_dev_close(), qemu will crash > >> in nested cleanup while checking IOMMU flag because dev->vdev already > >> set to zero and resources are already freed. > >> > >> Example: > >> > >> Program received signal SIGSEGV, Segmentation fault. > >> vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> > >> #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 > >> #1 vhost_dev_stop at hw/virtio/vhost.c:1594 > >> #2 vhost_net_stop_one at hw/net/vhost_net.c:289 > >> #3 vhost_net_stop at hw/net/vhost_net.c:368 > >> > >> Nested call to vhost_net_stop(). First time was at #14. > >> > >> #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 > >> #6 virtio_set_status at hw/virtio/virtio.c:1146 > >> #7 virtio_error at hw/virtio/virtio.c:2455 > >> > >> virtqueue_get_head() failed here. > >> > >> #8 virtqueue_get_head at hw/virtio/virtio.c:543 > >> #9 virtqueue_drop_all at hw/virtio/virtio.c:984 > >> #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 > >> #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 > >> #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 > >> > >> vhost_dev_stop() was executed here. dev->vdev == NULL now. > >> > >> #13 vhost_net_stop_one at hw/net/vhost_net.c:290 > >> #14 vhost_net_stop at hw/net/vhost_net.c:368 > >> #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 > >> #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 > >> #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 > >> #18 chr_closed_bh at net/vhost-user.c:214 > >> #19 aio_bh_call at util/async.c:90 > >> #20 aio_bh_poll at util/async.c:118 > >> #21 aio_dispatch at util/aio-posix.c:429 > >> #22 aio_ctx_dispatch at util/async.c:261 > >> #23 g_main_context_dispatch > >> #24 glib_pollfds_poll at util/main-loop.c:213 > >> #25 os_host_main_loop_wait at util/main-loop.c:261 > >> #26 main_loop_wait at util/main-loop.c:515 > >> #27 main_loop () at vl.c:1917 > >> #28 main at vl.c:4795 > >> > >> Above backtrace captured from qemu crash on vhost disconnect while > >> virtio driver in guest was in failed state. > >> > >> We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but > >> it will assert further trying to free already freed ioeventfds. The > >> real problem is that we're allowing nested calls to 'vhost_net_stop'. > >> > >> This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid > >> any possible double frees and segmentation faults doue to using of > >> already freed resources by setting 'vhost_started' flag to zero prior > >> to 'vhost_net_stop' call. > >> > >> Signed-off-by: Ilya Maximets <i.maximets@samsung.com> > >> --- > >> > >> This issue was already addressed more than a year ago by the following > >> patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html > >> but it was dropped without review due to not yet implemented re-connection > >> of vhost-user. Re-connection implementation lately fixed most of the > >> nested calls, but few of them still exists. For example, above backtrace > >> captured after 'virtqueue_get_head()' failure on vhost-user disconnection. > >> > >> hw/net/virtio-net.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c > >> index 38674b0..4d95a18 100644 > >> --- a/hw/net/virtio-net.c > >> +++ b/hw/net/virtio-net.c > >> @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) > >> n->vhost_started = 0; > >> } > >> } else { > >> - vhost_net_stop(vdev, n->nic->ncs, queues); > >> n->vhost_started = 0; > >> + vhost_net_stop(vdev, n->nic->ncs, queues); > >> } > >> } > > > > Well the wider context is > > > > > > n->vhost_started = 1; > > r = vhost_net_start(vdev, n->nic->ncs, queues); > > if (r < 0) { > > error_report("unable to start vhost net: %d: " > > "falling back on userspace virtio", -r); > > n->vhost_started = 0; > > } > > } else { > > vhost_net_stop(vdev, n->nic->ncs, queues); > > n->vhost_started = 0; > > > > So we set it to 1 before start, we should clear after stop. > > OK. I agree that clearing after is a bit safer. But in this case we need > a separate flag or other way to detect that we're already inside the > 'vhost_net_stop()'. > > What do you think about that old patch: > https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html ? > > It implements the same thing but introduces additional flag. It even could > be still applicable. I see. I think the issue is with how host set status works. It only sets the status afterwards. So here's the plan: - set status to set the new status first, and pass the old status as parameter - fix all callbacks
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c index 38674b0..4d95a18 100644 --- a/hw/net/virtio-net.c +++ b/hw/net/virtio-net.c @@ -177,8 +177,8 @@ static void virtio_net_vhost_status(VirtIONet *n, uint8_t status) n->vhost_started = 0; } } else { - vhost_net_stop(vdev, n->nic->ncs, queues); n->vhost_started = 0; + vhost_net_stop(vdev, n->nic->ncs, queues); } }
In case virtio error occured after vhost_dev_close(), qemu will crash in nested cleanup while checking IOMMU flag because dev->vdev already set to zero and resources are already freed. Example: Program received signal SIGSEGV, Segmentation fault. vhost_virtqueue_stop at hw/virtio/vhost.c:1155 #0 vhost_virtqueue_stop at hw/virtio/vhost.c:1155 #1 vhost_dev_stop at hw/virtio/vhost.c:1594 #2 vhost_net_stop_one at hw/net/vhost_net.c:289 #3 vhost_net_stop at hw/net/vhost_net.c:368 Nested call to vhost_net_stop(). First time was at #14. #4 virtio_net_vhost_status at hw/net/virtio-net.c:180 #5 virtio_net_set_status (status=79) at hw/net/virtio-net.c:254 #6 virtio_set_status at hw/virtio/virtio.c:1146 #7 virtio_error at hw/virtio/virtio.c:2455 virtqueue_get_head() failed here. #8 virtqueue_get_head at hw/virtio/virtio.c:543 #9 virtqueue_drop_all at hw/virtio/virtio.c:984 #10 virtio_net_drop_tx_queue_data at hw/net/virtio-net.c:240 #11 virtio_bus_set_host_notifier at hw/virtio/virtio-bus.c:297 #12 vhost_dev_disable_notifiers at hw/virtio/vhost.c:1431 vhost_dev_stop() was executed here. dev->vdev == NULL now. #13 vhost_net_stop_one at hw/net/vhost_net.c:290 #14 vhost_net_stop at hw/net/vhost_net.c:368 #15 virtio_net_vhost_status at hw/net/virtio-net.c:180 #16 virtio_net_set_status (status=15) at hw/net/virtio-net.c:254 #17 qmp_set_link ("hostnet0", up=false) at net/net.c:1430 #18 chr_closed_bh at net/vhost-user.c:214 #19 aio_bh_call at util/async.c:90 #20 aio_bh_poll at util/async.c:118 #21 aio_dispatch at util/aio-posix.c:429 #22 aio_ctx_dispatch at util/async.c:261 #23 g_main_context_dispatch #24 glib_pollfds_poll at util/main-loop.c:213 #25 os_host_main_loop_wait at util/main-loop.c:261 #26 main_loop_wait at util/main-loop.c:515 #27 main_loop () at vl.c:1917 #28 main at vl.c:4795 Above backtrace captured from qemu crash on vhost disconnect while virtio driver in guest was in failed state. We can just add checking for 'vdev' in 'vhost_dev_has_iommu()' but it will assert further trying to free already freed ioeventfds. The real problem is that we're allowing nested calls to 'vhost_net_stop'. This patch is aimed to forbid nested calls to 'vhost_net_stop' to avoid any possible double frees and segmentation faults doue to using of already freed resources by setting 'vhost_started' flag to zero prior to 'vhost_net_stop' call. Signed-off-by: Ilya Maximets <i.maximets@samsung.com> --- This issue was already addressed more than a year ago by the following patch: https://lists.gnu.org/archive/html/qemu-devel/2016-03/msg06732.html but it was dropped without review due to not yet implemented re-connection of vhost-user. Re-connection implementation lately fixed most of the nested calls, but few of them still exists. For example, above backtrace captured after 'virtqueue_get_head()' failure on vhost-user disconnection. hw/net/virtio-net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)