Message ID | 1459509388-6185-12-git-send-email-marcandre.lureau@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Marc, First of all, sorry again for late response! Last time I tried with your first version, I found few issues related with reconnect, mainly on the acked_feautres lost. While checking your new code, I found that you've already solved that, which is great. So, I tried harder this time, your patches work great, except that I found few nits. On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: > From: Marc-André Lureau <marcandre.lureau@redhat.com> ... > +Slave message types > +------------------- > + > + * VHOST_USER_SLAVE_SHUTDOWN: > + Id: 1 > + Master payload: N/A > + Slave payload: u64 > + > + Request the master to shutdown the slave. A 0 reply is for > + success, in which case the slave may close all connections > + immediately and quit. Assume we are using ovs + dpdk here, that we could have two vhost-user connections. While ovs tries to initiate a restart, it might unregister the two connections one by one. In such case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent, and two replies will get. Therefore, I don't think it's a proper ask here to let the backend implementation to do quit here. > > switch (msg.request) { > + case VHOST_USER_SLAVE_SHUTDOWN: { > + uint64_t success = 1; /* 0 is for success */ > + if (dev->stop) { > + dev->stop(dev); > + success = 0; > + } > + msg.payload.u64 = success; > + msg.size = sizeof(msg.payload.u64); > + size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); > + if (size != VHOST_USER_HDR_SIZE + msg.size) { > + error_report("Failed to write reply."); > + } > + break; You might want to remove the slave_fd from watch list? We might also need to close slave_fd here, assuming that we will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is received? I'm asking because I found a seg fault issue sometimes, due to opaque is NULL. --yliu
Hi ----- Original Message ----- > Hi Marc, > > First of all, sorry again for late response! > > Last time I tried with your first version, I found few issues related > with reconnect, mainly on the acked_feautres lost. While checking your > new code, I found that you've already solved that, which is great. > > So, I tried harder this time, your patches work great, except that I > found few nits. > > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: > > From: Marc-André Lureau <marcandre.lureau@redhat.com> > ... > > +Slave message types > > +------------------- > > + > > + * VHOST_USER_SLAVE_SHUTDOWN: > > + Id: 1 > > + Master payload: N/A > > + Slave payload: u64 > > + > > + Request the master to shutdown the slave. A 0 reply is for > > + success, in which case the slave may close all connections > > + immediately and quit. > > Assume we are using ovs + dpdk here, that we could have two > vhost-user connections. While ovs tries to initiate a restart, > it might unregister the two connections one by one. In such > case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent, > and two replies will get. Therefore, I don't think it's a > proper ask here to let the backend implementation to do quit > here. > On success reply, the master sent all the commands to finish the connection. So the slave must flush/finish all pending requests first. I think this should be enough, otherwise we may need a new explicit message? > > > > > switch (msg.request) { > > + case VHOST_USER_SLAVE_SHUTDOWN: { > > + uint64_t success = 1; /* 0 is for success */ > > + if (dev->stop) { > > + dev->stop(dev); > > + success = 0; > > + } > > + msg.payload.u64 = success; > > + msg.size = sizeof(msg.payload.u64); > > + size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); > > + if (size != VHOST_USER_HDR_SIZE + msg.size) { > > + error_report("Failed to write reply."); > > + } > > + break; > > You might want to remove the slave_fd from watch list? We > might also need to close slave_fd here, assuming that we > will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is > received? Makes sense, I will change that in next update. > I'm asking because I found a seg fault issue sometimes, > due to opaque is NULL. > I would be interested to see the backtrace or have a reproducer. thanks
On Wed, Apr 13, 2016 at 05:51:15AM -0400, Marc-André Lureau wrote: > Hi > > ----- Original Message ----- > > Hi Marc, > > > > First of all, sorry again for late response! > > > > Last time I tried with your first version, I found few issues related > > with reconnect, mainly on the acked_feautres lost. While checking your > > new code, I found that you've already solved that, which is great. > > > > So, I tried harder this time, your patches work great, except that I > > found few nits. > > > > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: > > > From: Marc-André Lureau <marcandre.lureau@redhat.com> > > ... > > > +Slave message types > > > +------------------- > > > + > > > + * VHOST_USER_SLAVE_SHUTDOWN: > > > + Id: 1 > > > + Master payload: N/A > > > + Slave payload: u64 > > > + > > > + Request the master to shutdown the slave. A 0 reply is for > > > + success, in which case the slave may close all connections > > > + immediately and quit. > > > > Assume we are using ovs + dpdk here, that we could have two > > vhost-user connections. While ovs tries to initiate a restart, > > it might unregister the two connections one by one. In such > > case, two VHOST_USER_SLAVE_SHUTDOWN request will be sent, > > and two replies will get. Therefore, I don't think it's a > > proper ask here to let the backend implementation to do quit > > here. > > > > On success reply, the master sent all the commands to finish the connection. So the slave must flush/finish all pending requests first. Yes, that's okay. I here just mean the "close __all__ connections" and "quit" part. Firstly, we should do cleanup/flush/finish to it's own connection. But not all, right? Second, as stated, doing quit might not make sense, as we may have more connections. > I think this should be enough, otherwise we may need a new explicit message? > > > > > > > > > switch (msg.request) { > > > + case VHOST_USER_SLAVE_SHUTDOWN: { > > > + uint64_t success = 1; /* 0 is for success */ > > > + if (dev->stop) { > > > + dev->stop(dev); > > > + success = 0; > > > + } > > > + msg.payload.u64 = success; > > > + msg.size = sizeof(msg.payload.u64); > > > + size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); > > > + if (size != VHOST_USER_HDR_SIZE + msg.size) { > > > + error_report("Failed to write reply."); > > > + } > > > + break; > > > > You might want to remove the slave_fd from watch list? We > > might also need to close slave_fd here, assuming that we > > will no longer use it when VHOST_USER_SLAVE_SHUTDOWN is > > received? > > Makes sense, I will change that in next update. > > > I'm asking because I found a seg fault issue sometimes, > > due to opaque is NULL. Oh, I was wrong, it's u being NULL, but not opaque. > > > > I would be interested to see the backtrace or have a reproducer. It's a normal test steps: start a vhost-user switch (I'm using DPDK vhost-switch example), kill it, and wait for a while (something like more than 10s or even longer), then I saw a seg fault: (gdb) p dev $4 = (struct vhost_dev *) 0x555556571bf0 (gdb) p u $5 = (struct vhost_user *) 0x0 (gdb) where #0 0x0000555555798612 in slave_read (opaque=0x555556571bf0) at /home/yliu/qemu/hw/virtio/vhost-user.c:539 #1 0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327 #2 0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0) at /home/yliu/qemu/async.c:233 #3 0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 #4 0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213 #5 0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258 #6 0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506 #7 0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934 #8 0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178) at /home/yliu/qemu/vl.c:4658 --yliu
On Wed, Apr 13, 2016 at 7:32 PM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: >> >> > I'm asking because I found a seg fault issue sometimes, >> > due to opaque is NULL. > > Oh, I was wrong, it's u being NULL, but not opaque. >> > >> >> I would be interested to see the backtrace or have a reproducer. > > It's a normal test steps: start a vhost-user switch (I'm using DPDK > vhost-switch example), kill it, and wait for a while (something like > more than 10s or even longer), then I saw a seg fault: > > (gdb) p dev > $4 = (struct vhost_dev *) 0x555556571bf0 > (gdb) p u > $5 = (struct vhost_user *) 0x0 > (gdb) where > #0 0x0000555555798612 in slave_read (opaque=0x555556571bf0) > at /home/yliu/qemu/hw/virtio/vhost-user.c:539 > #1 0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327 > #2 0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0) > at /home/yliu/qemu/async.c:233 > #3 0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > #4 0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213 > #5 0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258 > #6 0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506 > #7 0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934 > #8 0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178) > at /home/yliu/qemu/vl.c:4658 > This patch set doesn't try to handle crashes from backend. This would require a much more detailed study of the existing code path. A lot of places assume the backend is fully working as expected. I think handling backend crashes should be a different, later, patch set.
On Wed, Apr 13, 2016 at 11:43:56PM +0200, Marc-André Lureau wrote: > On Wed, Apr 13, 2016 at 7:32 PM, Yuanhan Liu > <yuanhan.liu@linux.intel.com> wrote: > >> > >> > I'm asking because I found a seg fault issue sometimes, > >> > due to opaque is NULL. > > > > Oh, I was wrong, it's u being NULL, but not opaque. > >> > > >> > >> I would be interested to see the backtrace or have a reproducer. > > > > It's a normal test steps: start a vhost-user switch (I'm using DPDK > > vhost-switch example), kill it, and wait for a while (something like > > more than 10s or even longer), then I saw a seg fault: > > > > (gdb) p dev > > $4 = (struct vhost_dev *) 0x555556571bf0 > > (gdb) p u > > $5 = (struct vhost_user *) 0x0 > > (gdb) where > > #0 0x0000555555798612 in slave_read (opaque=0x555556571bf0) > > at /home/yliu/qemu/hw/virtio/vhost-user.c:539 > > #1 0x0000555555a343a4 in aio_dispatch (ctx=0x55555655f560) at /home/yliu/qemu/aio-posix.c:327 > > #2 0x0000555555a2738b in aio_ctx_dispatch (source=0x55555655f560, callback=0x0, user_data=0x0) > > at /home/yliu/qemu/async.c:233 > > #3 0x00007ffff51032a6 in g_main_context_dispatch () from /lib64/libglib-2.0.so.0 > > #4 0x0000555555a3239e in glib_pollfds_poll () at /home/yliu/qemu/main-loop.c:213 > > #5 0x0000555555a3247b in os_host_main_loop_wait (timeout=29875848) at /home/yliu/qemu/main-loop.c:258 > > #6 0x0000555555a3252b in main_loop_wait (nonblocking=0) at /home/yliu/qemu/main-loop.c:506 > > #7 0x0000555555846e35 in main_loop () at /home/yliu/qemu/vl.c:1934 > > #8 0x000055555584e6bf in main (argc=31, argv=0x7fffffffe078, envp=0x7fffffffe178) > > at /home/yliu/qemu/vl.c:4658 > > > > This patch set doesn't try to handle crashes from backend. This would > require a much more detailed study of the existing code path. A lot of > places assume the backend is fully working as expected. I think > handling backend crashes should be a different, later, patch set. Oh, sorry for not making it clear. I actually did the kill by "ctrl-c". It then is captured to send a SLAVE_SHUTDOWN request. So, I would say it's a normal quit. --yliu
On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: > From: Marc-André Lureau <marcandre.lureau@redhat.com> > > Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> > --- > docs/specs/vhost-user.txt | 15 +++++++++++++++ > hw/virtio/vhost-user.c | 16 ++++++++++++++++ > 2 files changed, 31 insertions(+) > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt > index 8a635fa..60d6d13 100644 > --- a/docs/specs/vhost-user.txt > +++ b/docs/specs/vhost-user.txt > @@ -487,3 +487,18 @@ Message types > request to the master. It is passed in the ancillary data. > This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL > feature is available. > + > +Slave message types > +------------------- > + > + * VHOST_USER_SLAVE_SHUTDOWN: > + Id: 1 > + Master payload: N/A > + Slave payload: u64 > + > + Request the master to shutdown the slave. A 0 reply is for > + success, in which case the slave may close all connections > + immediately and quit. A non-zero reply cancels the request. > + > + Before a reply comes, the master may make other requests in > + order to flush or sync state. Hi all, I threw this proposal as well as DPDK's implementation to our customer (OVS, Openstack and some other teams) who made such request before. I'm sorry to say that none of them really liked that we can't handle crash. Making reconnect work from a vhost-user backend crash is exactly something they are after. And to handle the crash, I was thinking of the proposal from Michael. That is to do reset from the guest OS. This would fix this issue ultimately. However, old kernel will not benefit from this, as well as other guest other than Linux, making it not that useful for current usage. Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance to get the vring base (last used idx) from the backend, Huawei suggests that we could still make it in a consistent state after the crash, if we get the vring base from vring->used->idx. That worked as expected from my test. The only tricky thing might be how to detect a crash, and we could do a simple compare of the vring base from QEMU with the vring->used->idx at the initiation stage. If mismatch found, get it from vring->used->idx instead. Comments/thoughts? Is it a solid enough solution to you? If so, we could make things much simpler, and what's most important, we could be able to handle crash. --yliu
Hi On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu <yuanhan.liu@linux.intel.com> wrote: > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: >> From: Marc-André Lureau <marcandre.lureau@redhat.com> >> >> Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> >> --- >> docs/specs/vhost-user.txt | 15 +++++++++++++++ >> hw/virtio/vhost-user.c | 16 ++++++++++++++++ >> 2 files changed, 31 insertions(+) >> >> diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt >> index 8a635fa..60d6d13 100644 >> --- a/docs/specs/vhost-user.txt >> +++ b/docs/specs/vhost-user.txt >> @@ -487,3 +487,18 @@ Message types >> request to the master. It is passed in the ancillary data. >> This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL >> feature is available. >> + >> +Slave message types >> +------------------- >> + >> + * VHOST_USER_SLAVE_SHUTDOWN: >> + Id: 1 >> + Master payload: N/A >> + Slave payload: u64 >> + >> + Request the master to shutdown the slave. A 0 reply is for >> + success, in which case the slave may close all connections >> + immediately and quit. A non-zero reply cancels the request. >> + >> + Before a reply comes, the master may make other requests in >> + order to flush or sync state. > > Hi all, > > I threw this proposal as well as DPDK's implementation to our customer > (OVS, Openstack and some other teams) who made such request before. I'm > sorry to say that none of them really liked that we can't handle crash. > Making reconnect work from a vhost-user backend crash is exactly something > they are after. Handling crashes is not quite the same as what I propose here. I see it as a different goal. But I doubt about usefulness and reliability of a backend that crashes. In many case, vhost-user was designed after kernel vhost, and qemu code has the same expectation about the kernel or the vhost-user backend: many calls are sync and will simply assert() on unexpected results. > And to handle the crash, I was thinking of the proposal from Michael. > That is to do reset from the guest OS. This would fix this issue > ultimately. However, old kernel will not benefit from this, as well > as other guest other than Linux, making it not that useful for current > usage. Yes, I hope Michael can help with that, I am not very familiar with the kernel code. > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance > to get the vring base (last used idx) from the backend, Huawei suggests Right, but after this message, the backend should have flushed all pending ring packets and stop processing them. So it's also a clean sync point. > that we could still make it in a consistent state after the crash, if > we get the vring base from vring->used->idx. That worked as expected You can have a backend that would have already processed packets and not updated used idx. You could also have out-of-order packets in flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a clean way to restore this, but to reset the queues and start over, with either packet loss or packet duplication. If the backend guarantees to process packets in order, it may simplify things, but it would be a special case. > from my test. The only tricky thing might be how to detect a crash, > and we could do a simple compare of the vring base from QEMU with > the vring->used->idx at the initiation stage. If mismatch found, get > it from vring->used->idx instead. I don't follow, would the backend restore its last vring->used->idx after a crash? > Comments/thoughts? Is it a solid enough solution to you? If so, we > could make things much simpler, and what's most important, we could > be able to handle crash. That's not so easy, many of the vhost_ops->vhost*() are followed by assert(r>0), which will be hard to change to handle failure. But, I would worry first about a backend that crashes that it may corrupt the VM memory too...
On Fri, Apr 29, 2016 at 12:40:09PM +0200, Marc-André Lureau wrote: > Hi > > On Thu, Apr 28, 2016 at 7:23 AM, Yuanhan Liu > <yuanhan.liu@linux.intel.com> wrote: > > On Fri, Apr 01, 2016 at 01:16:21PM +0200, marcandre.lureau@redhat.com wrote: > >> From: Marc-André Lureau <marcandre.lureau@redhat.com> > >> +Slave message types > >> +------------------- > >> + > >> + * VHOST_USER_SLAVE_SHUTDOWN: > >> + Id: 1 > >> + Master payload: N/A > >> + Slave payload: u64 > >> + > >> + Request the master to shutdown the slave. A 0 reply is for > >> + success, in which case the slave may close all connections > >> + immediately and quit. A non-zero reply cancels the request. > >> + > >> + Before a reply comes, the master may make other requests in > >> + order to flush or sync state. > > > > Hi all, > > > > I threw this proposal as well as DPDK's implementation to our customer > > (OVS, Openstack and some other teams) who made such request before. I'm > > sorry to say that none of them really liked that we can't handle crash. > > Making reconnect work from a vhost-user backend crash is exactly something > > they are after. > > Handling crashes is not quite the same as what I propose here. Yes, I know. However, handling crashes is exactly what our customers want. And I just want to let you know that, say, I don't ask you to do that :) > I see > it as a different goal. But I doubt about usefulness and reliability > of a backend that crashes. Agreed with you on that. However, I guess you have to admit that crashes just happen. Kernel sometimes crashes, too. So, it would be great if we could make whole stuff work again after an unexpected crash (say, from OVS), without restarting all guests. > In many case, vhost-user was designed after > kernel vhost, and qemu code has the same expectation about the kernel > or the vhost-user backend: many calls are sync and will simply > assert() on unexpected results. I guess we could at aleast try to dimish it, if we can't avoid it completely. > > And to handle the crash, I was thinking of the proposal from Michael. > > That is to do reset from the guest OS. This would fix this issue > > ultimately. However, old kernel will not benefit from this, as well > > as other guest other than Linux, making it not that useful for current > > usage. > > Yes, I hope Michael can help with that, I am not very familiar with > the kernel code. > > > Thinking of that the VHOST_USER_SLAVE_SHUTDOWN just gives QEMU a chance > > to get the vring base (last used idx) from the backend, Huawei suggests > > Right, but after this message, the backend should have flushed all > pending ring packets and stop processing them. So it's also a clean > sync point. > > > that we could still make it in a consistent state after the crash, if > > we get the vring base from vring->used->idx. That worked as expected > > You can have a backend that would have already processed packets and > not updated used idx. You could also have out-of-order packets in > flights (ex: desc 1-2-3 avail, 1-3 used, 2 pending..). I can't see a > clean way to restore this, but to reset the queues and start over, > with either packet loss or packet duplication. Judging that it (crash or restart) happens so rare, I don't think it matters. Moreoever, doesn't that happen in real world :) > If the backend > guarantees to process packets in order, it may simplify things, but it > would be a special case. Well, it's more like a backend thing: it's the backend to try to set a saner vring base as stated in above proposal. Therefore, I will not say it's a special case. > > > from my test. The only tricky thing might be how to detect a crash, > > and we could do a simple compare of the vring base from QEMU with > > the vring->used->idx at the initiation stage. If mismatch found, get > > it from vring->used->idx instead. > > I don't follow, would the backend restore its last vring->used->idx > after a crash? Yes, restore from the SET_VRING_BASE from QEMU. But it's a stale value, normally 0 if no start/stop happens before. Therefore, we can't use it after the crash, instead, we could try to detect the mismatch and try to fix it at SET_VRING_ADDR request. > > > Comments/thoughts? Is it a solid enough solution to you? If so, we > > could make things much simpler, and what's most important, we could > > be able to handle crash. > > That's not so easy, many of the vhost_ops->vhost*() are followed by > assert(r>0), which will be hard to change to handle failure. But, I > would worry first about a backend that crashes that it may corrupt the > VM memory too... Not quite sure I follow this. But, backend just touches the virtio related memory, so it will do no harm to the VM? --yliu
On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > But, I > > would worry first about a backend that crashes that it may corrupt the > > VM memory too... > > Not quite sure I follow this. But, backend just touches the virtio > related memory, so it will do no harm to the VM? It crashed so apparently it's not behaving as designed - how can we be sure it does not harm the VM?
On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > But, I > > > would worry first about a backend that crashes that it may corrupt the > > > VM memory too... > > > > Not quite sure I follow this. But, backend just touches the virtio > > related memory, so it will do no harm to the VM? > > It crashed so apparently it's not behaving as designed - > how can we be sure it does not harm the VM? Hi Michael, What I know so far, a crash might could corrupt the virtio memory in two ways: - vring_used_elem might be in stale state when we are in the middle of updating used vring. Say, we might have updated the "id" field, but leaving "len" untouched. - vring desc buff might also be in stale state, when we are copying data into it. However, the two issues will not be real issue, as used->idx is not updated in both case. Thefore, those corrupted memory will not be processed by guest. So, no harm I'd say. Or, am I missing anything? BTW, Michael, what's your thoughts on the whole reconnect stuff? --yliu
On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote: > On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > > But, I > > > > would worry first about a backend that crashes that it may corrupt the > > > > VM memory too... > > > > > > Not quite sure I follow this. But, backend just touches the virtio > > > related memory, so it will do no harm to the VM? > > > > It crashed so apparently it's not behaving as designed - > > how can we be sure it does not harm the VM? > > Hi Michael, > > What I know so far, a crash might could corrupt the virtio memory in two > ways: > > - vring_used_elem might be in stale state when we are in the middle of > updating used vring. Say, we might have updated the "id" field, but > leaving "len" untouched. > > - vring desc buff might also be in stale state, when we are copying > data into it. - a random write into VM memory due to backend bug corrupts it. > However, the two issues will not be real issue, as used->idx is not > updated in both case. Thefore, those corrupted memory will not be > processed by guest. So, no harm I'd say. Or, am I missing anything? Why is backend crashing? It shouldn't so maybe this means it's memory is corrupt. That is the claim. > BTW, Michael, what's your thoughts on the whole reconnect stuff? > > --yliu My thoughts are that we need to split these patchsets up. There are several issues here: 1. Graceful disconnect - One should be able to do vmstop, disconnect, connect then vm start This looks like a nice intermediate step. - Why would we always assume it's always remote initiating the disconnect? Monitor commands for disconnecting seem called for. 2. Unexpected disconnect - If buffers are processed in order or flushed before socket close, then on disconnect status can be recovered from ring alone. If that is of interest, we might want to add a protocol flag for that. F_DISCONNECT_FLUSH ? Without this, only graceful disconnect or reset with guest's help can be supported. - As Marc-André said, without graceful shutdown it is not enough to handle ring state though. We must handle socket getting disconnected in the middle of send/receive. While it's more work, it does seem like a nice interface to support. - I understand the concern that existing guests do not handle NEED_RESET status. One way to fix this would be a new feature bit. If NEED_RESET not handled, request hot-unplug instead. 3. Running while disconnected - At the moment, we wait on vm start for remote to connect, if we support disconnecting backend without stopping we probably should also support starting it without waiting for connection - Guests expect tx buffers to be used in a timely manner, thus: - If vm is running we must use them in qemu, maybe discarding packets in the process. There already are commands for link down, I'm not sure there's value in doing this automatically in qemu. - Alternatively, we should also have an option to stop vm automatically (like we do on disk errors) to reduce number of dropped packets. 4. Reconnecting - When acting as a server, we might want an option to go back to listening state, but it should not be the only option, there should be a monitor command for moving it back to listening state. - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only option, there should be a way to manually request connect, possibly to a different target, so a monitor command for re-connecting seems called for. - We'll also need monitor events for disconnects so management knows it must re-connect/restart listening. - If we stopped VM, there could be an option to restart on reconnect.
Hi On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > 1. Graceful disconnect > - One should be able to do vmstop, disconnect, connect then vm start > This looks like a nice intermediate step. > - Why would we always assume it's always remote initiating the disconnect? > Monitor commands for disconnecting seem called for. Those two solutions involve VM management. This looks more complex to communicate/synchronize vhost-user backend & vm management & qemu. The use case I cover is request from the backend to shutdown, because that's what the users wanted (and it is already possible to stop the backend and disconnect it from qemu, we would only need to know when, with a new command..) > > 2. Unexpected disconnect > - If buffers are processed in order or flushed before socket close, > then on disconnect status can be recovered > from ring alone. If that is of interest, we might want to add > a protocol flag for that. F_DISCONNECT_FLUSH ? Without this, > only graceful disconnect or reset with guest's help can be supported. (doing all this, at this point, I don't see much difference with having an explicit shutdown) > - As Marc-André said, without graceful shutdown it is not enough to > handle ring state though. We must handle socket getting disconnected > in the middle of send/receive. While it's more work, > it does seem like a nice interface to support. > - I understand the concern that existing guests do not handle NEED_RESET > status. One way to fix this would be a new feature bit. If NEED_RESET not > handled, request hot-unplug instead. > > 3. Running while disconnected > - At the moment, we wait on vm start for remote to connect, > if we support disconnecting backend without stopping > we probably should also support starting it without waiting > for connection That's what Tetsuya proposed in its initial series, but handling the flags was quite tedious. I think this can be considered easily a seperate enhancement. What I proposed is to keep waiting for the initial connect, and check the flags remains compatible on reconnect. > - Guests expect tx buffers to be used in a timely manner, thus: > - If vm is running we must use them in qemu, maybe discarding packets > in the process. > There already are commands for link down, I'm not sure there's value > in doing this automatically in qemu. Testing doesn't show such buffer issues when the link is down (this can be tested with vubr example above) > - Alternatively, we should also have an option to stop vm automatically (like we do on > disk errors) to reduce number of dropped packets. Why not, we would need to know if this is actually useful. > > 4. Reconnecting > - When acting as a server, we might want an option to go back to > listening state, but it should not be the only option, > there should be a monitor command for moving it back to > listening state. > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only > option, there should be a way to manually request connect, possibly to > a different target, so a monitor command for re-connecting seems called for. > - We'll also need monitor events for disconnects so management knows it > must re-connect/restart listening. > - If we stopped VM, there could be an option to restart on reconnect. That's all involving a third party, adding complexity but the benefit isn't so clear.
On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote: > Hi > > On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > 1. Graceful disconnect > > - One should be able to do vmstop, disconnect, connect then vm start > > This looks like a nice intermediate step. > > - Why would we always assume it's always remote initiating the disconnect? > > Monitor commands for disconnecting seem called for. > > Those two solutions involve VM management. This looks more complex to > communicate/synchronize vhost-user backend & vm management & qemu. The > use case I cover is request from the backend to shutdown, Right, but flushing buffers + closing the socket looks like a cleaner interface than a ton of messages going back and forth. > because > that's what the users wanted (and it is already possible to stop the > backend and disconnect it from qemu, we would only need to know when, > with a new command..) You assume the backend has a monitor interface to disconnect though. That's not a given. > > > > 2. Unexpected disconnect > > - If buffers are processed in order or flushed before socket close, > > then on disconnect status can be recovered > > from ring alone. If that is of interest, we might want to add > > a protocol flag for that. F_DISCONNECT_FLUSH ? Without this, > > only graceful disconnect or reset with guest's help can be supported. > > (doing all this, at this point, I don't see much difference with > having an explicit shutdown) With graceful shutdown we implicitly request flush when we ask backend to stop. > > - As Marc-André said, without graceful shutdown it is not enough to > > handle ring state though. We must handle socket getting disconnected > > in the middle of send/receive. While it's more work, > > it does seem like a nice interface to support. > > - I understand the concern that existing guests do not handle NEED_RESET > > status. One way to fix this would be a new feature bit. If NEED_RESET not > > handled, request hot-unplug instead. > > > > 3. Running while disconnected > > - At the moment, we wait on vm start for remote to connect, > > if we support disconnecting backend without stopping > > we probably should also support starting it without waiting > > for connection > > That's what Tetsuya proposed in its initial series, but handling the > flags was quite tedious. That would be up to management. E.g. let backend export the list of flags it supports in some file, and apply to QEMU. > I think this can be considered easily a > seperate enhancement. What I proposed is to keep waiting for the > initial connect, and check the flags remains compatible on reconnect. Seems asymmetrical unless we stop the vm. > > - Guests expect tx buffers to be used in a timely manner, thus: > > - If vm is running we must use them in qemu, maybe discarding packets > > in the process. > > There already are commands for link down, I'm not sure there's value > > in doing this automatically in qemu. > > Testing doesn't show such buffer issues when the link is down (this > can be tested with vubr example above) Not enough testing then - it's a race condition: buffers can be sent before link down. > > - Alternatively, we should also have an option to stop vm automatically (like we do on > > disk errors) to reduce number of dropped packets. > > Why not, we would need to know if this is actually useful. > > > > > 4. Reconnecting > > - When acting as a server, we might want an option to go back to > > listening state, but it should not be the only option, > > there should be a monitor command for moving it back to > > listening state. > > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only > > option, there should be a way to manually request connect, possibly to > > a different target, so a monitor command for re-connecting seems called for. > > - We'll also need monitor events for disconnects so management knows it > > must re-connect/restart listening. > > - If we stopped VM, there could be an option to restart on reconnect. > > That's all involving a third party, adding complexity but the benefit > isn't so clear. It's rather clear to me. Let's assume you want to restart bridge, so you trigger disconnect. If qemu auto-reconnects there's a race as it might re-connect to the old bridge. Management is required to make this robust, auto-reconnect is handy for people bypassing management. > -- > Marc-André Lureau
On Mon, May 02, 2016 at 01:45:31PM +0300, Michael S. Tsirkin wrote: > On Sun, May 01, 2016 at 02:04:42PM -0700, Yuanhan Liu wrote: > > On Sun, May 01, 2016 at 02:37:19PM +0300, Michael S. Tsirkin wrote: > > > On Fri, Apr 29, 2016 at 10:48:35AM -0700, Yuanhan Liu wrote: > > > > > But, I > > > > > would worry first about a backend that crashes that it may corrupt the > > > > > VM memory too... > > > > > > > > Not quite sure I follow this. But, backend just touches the virtio > > > > related memory, so it will do no harm to the VM? > > > > > > It crashed so apparently it's not behaving as designed - > > > how can we be sure it does not harm the VM? > > > > Hi Michael, > > > > What I know so far, a crash might could corrupt the virtio memory in two > > ways: > > > > - vring_used_elem might be in stale state when we are in the middle of > > updating used vring. Say, we might have updated the "id" field, but > > leaving "len" untouched. > > > > - vring desc buff might also be in stale state, when we are copying > > data into it. > > > - a random write into VM memory due to backend bug corrupts it. > > > However, the two issues will not be real issue, as used->idx is not > > updated in both case. Thefore, those corrupted memory will not be > > processed by guest. So, no harm I'd say. Or, am I missing anything? > > Why is backend crashing? It shouldn't so maybe this means it's > memory is corrupt. That is the claim. As far as virtio memory is not corrupted (or even corrupt in above ways), there would be no issue. But, yes, you made a good point: we make no guarantees that it's not the virtio memory corruption caused the crash. > > BTW, Michael, what's your thoughts on the whole reconnect stuff? > > > > --yliu > > My thoughts are that we need to split these patchsets up. > > There are several issues here: > > > 1. Graceful disconnect > - One should be able to do vmstop, disconnect, connect then vm start > This looks like a nice intermediate step. > - Why would we always assume it's always remote initiating the disconnect? > Monitor commands for disconnecting seem called for. A monitor command is a good suggestion. But I'm thinking why vmstop is necessary. Basically, a disconnect is as to a cable unplug to physical NIC; we don't need pause it before unplugging. > > 2. Unexpected disconnect > - If buffers are processed in order or flushed before socket close, > then on disconnect status can be recovered > from ring alone. If that is of interest, we might want to add > a protocol flag for that. F_DISCONNECT_FLUSH ? Sorry, what does this flag supposed to work here? > Without this, > only graceful disconnect or reset with guest's help can be supported. > - As Marc-André said, without graceful shutdown it is not enough to > handle ring state though. We must handle socket getting disconnected > in the middle of send/receive. While it's more work, > it does seem like a nice interface to support. Same as above, what the backend (or QEMU) can do for this case without the guest's (reset) help? > - I understand the concern that existing guests do not handle NEED_RESET > status. One way to fix this would be a new feature bit. If NEED_RESET not > handled, I could check the code by myself, but I'm thinking it might be trivial for you to answer my question: how do we know that NEED_RESET is not handled? > request hot-unplug instead. Same as above, may I know how to request a hot-unplug? > > 3. Running while disconnected > - At the moment, we wait on vm start for remote to connect, > if we support disconnecting backend without stopping > we probably should also support starting it without waiting > for connection Agreed. I guess that would anaswer my first question: we don't actually need to do vmstop before disconnect. --yliu > - Guests expect tx buffers to be used in a timely manner, thus: > - If vm is running we must use them in qemu, maybe discarding packets > in the process. > There already are commands for link down, I'm not sure there's value > in doing this automatically in qemu. > - Alternatively, we should also have an option to stop vm automatically (like we do on > disk errors) to reduce number of dropped packets. > > 4. Reconnecting > - When acting as a server, we might want an option to go back to > listening state, but it should not be the only option, > there should be a monitor command for moving it back to > listening state. > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only > option, there should be a way to manually request connect, possibly to > a different target, so a monitor command for re-connecting seems called for. > - We'll also need monitor events for disconnects so management knows it > must re-connect/restart listening. > - If we stopped VM, there could be an option to restart on reconnect. > > > -- > MST
On Mon, May 02, 2016 at 03:04:30PM +0300, Michael S. Tsirkin wrote: > On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote: > > Hi > > > > On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > > 1. Graceful disconnect > > > - One should be able to do vmstop, disconnect, connect then vm start > > > This looks like a nice intermediate step. > > > - Why would we always assume it's always remote initiating the disconnect? > > > Monitor commands for disconnecting seem called for. > > > > Those two solutions involve VM management. This looks more complex to > > communicate/synchronize vhost-user backend & vm management & qemu. The > > use case I cover is request from the backend to shutdown, > > Right, but flushing buffers + closing the socket looks like > a cleaner interface than a ton of messages going back and forth. I'd agree with Michael on that. It needs more cares when dealing with two stream buffers: you can't quit backend soon after the shutdown request, instead, you have to wait for the ACK from QEMU. Making request from QEMU avoids that. --yliu
Hi On Mon, May 2, 2016 at 2:04 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote: >> Hi >> >> On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote: >> > 1. Graceful disconnect >> > - One should be able to do vmstop, disconnect, connect then vm start >> > This looks like a nice intermediate step. >> > - Why would we always assume it's always remote initiating the disconnect? >> > Monitor commands for disconnecting seem called for. >> >> Those two solutions involve VM management. This looks more complex to >> communicate/synchronize vhost-user backend & vm management & qemu. The >> use case I cover is request from the backend to shutdown, > > Right, but flushing buffers + closing the socket looks like > a cleaner interface than a ton of messages going back and forth. What do you mean by "a ton of messages"? It adds VHOST_USER_SET_SLAVE_FD (generic), and VHOST_USER_SLAVE_SHUTDOWN. The amount of work to flush and close is the same regardless. I figured later that if we refactor vhost-user socket handling, we may be able to accept request from the main channel socket, without extra "slave channel". > >> because >> that's what the users wanted (and it is already possible to stop the >> backend and disconnect it from qemu, we would only need to know when, >> with a new command..) > > You assume the backend has a monitor interface to disconnect though. > That's not a given. What do you mean? The backend must have a way to request to close/quit indeed. That's outside of scope how the backend get this information (via signals or other means). It's external, having this information from VM management layer is the same (someone has to trigger this somehow). >> > 3. Running while disconnected >> > - At the moment, we wait on vm start for remote to connect, >> > if we support disconnecting backend without stopping >> > we probably should also support starting it without waiting >> > for connection >> >> That's what Tetsuya proposed in its initial series, but handling the >> flags was quite tedious. > > That would be up to management. E.g. let backend export the list > of flags it supports in some file, and apply to QEMU. That makes me worry that such unfriendly connections details have to spread outside of vhost-user to VM management layer, making usage & maintenance harder for no clear benefit. It's a similar concern you have with "the backend has a monitor interface", here "the backend must have an introspection interface" or at least export vhost-user details somehow. > >> I think this can be considered easily a >> seperate enhancement. What I proposed is to keep waiting for the >> initial connect, and check the flags remains compatible on reconnect. > > Seems asymmetrical unless we stop the vm. That's the point, there will be time with and without the backend if we keep the VM running. >> > - Guests expect tx buffers to be used in a timely manner, thus: >> > - If vm is running we must use them in qemu, maybe discarding packets >> > in the process. >> > There already are commands for link down, I'm not sure there's value >> > in doing this automatically in qemu. >> >> Testing doesn't show such buffer issues when the link is down (this >> can be tested with vubr example above) > > Not enough testing then - it's a race condition: buffers can be sent > before link down. Ok, I'll do more testing. In all cases, looks reasonable to discard. > >> > - Alternatively, we should also have an option to stop vm automatically (like we do on >> > disk errors) to reduce number of dropped packets. >> >> Why not, we would need to know if this is actually useful. >> >> > >> > 4. Reconnecting >> > - When acting as a server, we might want an option to go back to >> > listening state, but it should not be the only option, >> > there should be a monitor command for moving it back to >> > listening state. >> > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only >> > option, there should be a way to manually request connect, possibly to >> > a different target, so a monitor command for re-connecting seems called for. >> > - We'll also need monitor events for disconnects so management knows it >> > must re-connect/restart listening. >> > - If we stopped VM, there could be an option to restart on reconnect. >> >> That's all involving a third party, adding complexity but the benefit >> isn't so clear. > > It's rather clear to me. Let's assume you want to restart bridge, > so you trigger disconnect. > If qemu auto-reconnects there's a race as it might re-connect > to the old bridge. I would say that race can trivially be avoided, so it's a backend bug. > Management is required to make this robust, auto-reconnect > is handy for people bypassing management. tbh, I don't like autoreconnect. My previous series didn't include this and assumed the feature would be supported only when qemu is configured to be the server. I added reconnect upon request by users.
On Wed, May 04, 2016 at 03:16:44PM +0200, Marc-André Lureau wrote: > Hi > > On Mon, May 2, 2016 at 2:04 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > > On Mon, May 02, 2016 at 01:29:18PM +0200, Marc-André Lureau wrote: > >> Hi > >> > >> On Mon, May 2, 2016 at 12:45 PM, Michael S. Tsirkin <mst@redhat.com> wrote: > >> > 1. Graceful disconnect > >> > - One should be able to do vmstop, disconnect, connect then vm start > >> > This looks like a nice intermediate step. > >> > - Why would we always assume it's always remote initiating the disconnect? > >> > Monitor commands for disconnecting seem called for. > >> > >> Those two solutions involve VM management. This looks more complex to > >> communicate/synchronize vhost-user backend & vm management & qemu. The > >> use case I cover is request from the backend to shutdown, > > > > Right, but flushing buffers + closing the socket looks like > > a cleaner interface than a ton of messages going back and forth. > > What do you mean by "a ton of messages"? It adds > VHOST_USER_SET_SLAVE_FD (generic), and VHOST_USER_SLAVE_SHUTDOWN. The > amount of work to flush and close is the same regardless. I figured > later that if we refactor vhost-user socket handling, we may be able > to accept request from the main channel socket, without extra "slave > channel". > > > > >> because > >> that's what the users wanted (and it is already possible to stop the > >> backend and disconnect it from qemu, we would only need to know when, > >> with a new command..) > > > > You assume the backend has a monitor interface to disconnect though. > > That's not a given. > > What do you mean? The backend must have a way to request to close/quit > indeed. That's outside of scope how the backend get this information > (via signals or other means). It's external, having this information > from VM management layer is the same (someone has to trigger this > somehow). Correct. So for symmetry if nothing else, besides handling slave close request, we should be able to initiate close from qemu with a new command, and get event when not connected. Afterwards client can be killed with -9 as it's no longer connected to qemu. > >> > 3. Running while disconnected > >> > - At the moment, we wait on vm start for remote to connect, > >> > if we support disconnecting backend without stopping > >> > we probably should also support starting it without waiting > >> > for connection > >> > >> That's what Tetsuya proposed in its initial series, but handling the > >> flags was quite tedious. > > > > That would be up to management. E.g. let backend export the list > > of flags it supports in some file, and apply to QEMU. > > That makes me worry that such unfriendly connections details have to > spread outside of vhost-user to VM management layer, making usage & > maintenance harder for no clear benefit. It's a similar concern you > have with "the backend has a monitor interface", here "the backend > must have an introspection interface" or at least export vhost-user > details somehow. How can we start VM before backend connects otherwise? Have better ideas? > > > >> I think this can be considered easily a > >> seperate enhancement. What I proposed is to keep waiting for the > >> initial connect, and check the flags remains compatible on reconnect. > > > > Seems asymmetrical unless we stop the vm. > > That's the point, there will be time with and without the backend if > we keep the VM running. > > >> > - Guests expect tx buffers to be used in a timely manner, thus: > >> > - If vm is running we must use them in qemu, maybe discarding packets > >> > in the process. > >> > There already are commands for link down, I'm not sure there's value > >> > in doing this automatically in qemu. > >> > >> Testing doesn't show such buffer issues when the link is down (this > >> can be tested with vubr example above) > > > > Not enough testing then - it's a race condition: buffers can be sent > > before link down. > > Ok, I'll do more testing. In all cases, looks reasonable to discard. Discard has some issues - for example processing ring in qemu sometimes exposes us to more security risks. > > > >> > - Alternatively, we should also have an option to stop vm automatically (like we do on > >> > disk errors) to reduce number of dropped packets. > >> > >> Why not, we would need to know if this is actually useful. > >> > >> > > >> > 4. Reconnecting > >> > - When acting as a server, we might want an option to go back to > >> > listening state, but it should not be the only option, > >> > there should be a monitor command for moving it back to > >> > listening state. > >> > - When acting as a client, auto-reconnect might be handy sometimes, but should not be the only > >> > option, there should be a way to manually request connect, possibly to > >> > a different target, so a monitor command for re-connecting seems called for. > >> > - We'll also need monitor events for disconnects so management knows it > >> > must re-connect/restart listening. > >> > - If we stopped VM, there could be an option to restart on reconnect. > >> > >> That's all involving a third party, adding complexity but the benefit > >> isn't so clear. > > > > It's rather clear to me. Let's assume you want to restart bridge, > > so you trigger disconnect. > > If qemu auto-reconnects there's a race as it might re-connect > > to the old bridge. > > I would say that race can trivially be avoided, so it's a backend bug. How do you avoid it? > > Management is required to make this robust, auto-reconnect > > is handy for people bypassing management. > > tbh, I don't like autoreconnect. My previous series didn't include > this and assumed the feature would be supported only when qemu is > configured to be the server. I added reconnect upon request by users. I don't have better solutions so OK I guess. > -- > Marc-André Lureau
Hello, On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote: > How do you avoid it? > > > > Management is required to make this robust, auto-reconnect > > > is handy for people bypassing management. > > > > tbh, I don't like autoreconnect. My previous series didn't include > > this and assumed the feature would be supported only when qemu is > > configured to be the server. I added reconnect upon request by users. > > I don't have better solutions so OK I guess. Yes, it's a request from me :) Well, there may be few others also requested that. The reason I had this asking is that, so far, we just have only one vhost-user frontend, and that is QEMU. But we may have many backends, I'm aware of 4 at the time writing, including the vubr from QEMU. While we could do vhost-user client and reconnect implementation on all backends, it's clear that implementing it in the only backend (QEMU) introduces more benefits. OTOH, I could implement DPDK vhost-user as client and try reconnect there, if that could makes all stuff a bit easier. --yliu
On Wed, May 04, 2016 at 08:44:37PM -0700, Yuanhan Liu wrote: > Hello, > > On Wed, May 04, 2016 at 10:13:49PM +0300, Michael S. Tsirkin wrote: > > How do you avoid it? > > > > > > Management is required to make this robust, auto-reconnect > > > > is handy for people bypassing management. > > > > > > tbh, I don't like autoreconnect. My previous series didn't include > > > this and assumed the feature would be supported only when qemu is > > > configured to be the server. I added reconnect upon request by users. > > > > I don't have better solutions so OK I guess. > > Yes, it's a request from me :) > Well, there may be few others also requested that. > > The reason I had this asking is that, so far, we just have only one > vhost-user frontend, and that is QEMU. But we may have many backends, > I'm aware of 4 at the time writing, including the vubr from QEMU. > While we could do vhost-user client and reconnect implementation > on all backends, it's clear that implementing it in the only backend > (QEMU) introduces more benefits. > > OTOH, I could implement DPDK vhost-user as client and try reconnect > there, if that could makes all stuff a bit easier. > > --yliu In my opinion, if slave initiates disconnecting then slave should also initiate connecting. In a sense, autoretry of connections is not a vhost-user feature. It's a general chardev feature. It also does not have to be related to disconnect: retrying on first connect failure makes exactly as much sense.
diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt index 8a635fa..60d6d13 100644 --- a/docs/specs/vhost-user.txt +++ b/docs/specs/vhost-user.txt @@ -487,3 +487,18 @@ Message types request to the master. It is passed in the ancillary data. This message is only sent if VHOST_USER_PROTOCOL_F_SLAVE_CHANNEL feature is available. + +Slave message types +------------------- + + * VHOST_USER_SLAVE_SHUTDOWN: + Id: 1 + Master payload: N/A + Slave payload: u64 + + Request the master to shutdown the slave. A 0 reply is for + success, in which case the slave may close all connections + immediately and quit. A non-zero reply cancels the request. + + Before a reply comes, the master may make other requests in + order to flush or sync state. diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index 890c1bd..f91baee 100644 --- a/hw/virtio/vhost-user.c +++ b/hw/virtio/vhost-user.c @@ -67,6 +67,8 @@ typedef enum VhostUserRequest { typedef enum VhostUserSlaveRequest { VHOST_USER_SLAVE_NONE = 0, + VHOST_USER_SLAVE_SHUTDOWN = 1, + VHOST_USER_SLAVE_MAX } VhostUserSlaveRequest; @@ -539,6 +541,20 @@ static void slave_read(void *opaque) } switch (msg.request) { + case VHOST_USER_SLAVE_SHUTDOWN: { + uint64_t success = 1; /* 0 is for success */ + if (dev->stop) { + dev->stop(dev); + success = 0; + } + msg.payload.u64 = success; + msg.size = sizeof(msg.payload.u64); + size = send(u->slave_fd, &msg, VHOST_USER_HDR_SIZE + msg.size, 0); + if (size != VHOST_USER_HDR_SIZE + msg.size) { + error_report("Failed to write reply."); + } + break; + } default: error_report("Received unexpected msg type."); }